0% found this document useful (0 votes)

29 views110 pages

Unit I-V

Uploaded by

ynotmrhari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views110 pages

Unit I-V

Uploaded by

ynotmrhari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 110

UNIT I

LINEAR DATA STRUCTURES – LIST

Abstract Data Types (ADTs) – List ADT – array-based implementation – linked list implementation ––
singly linked lists- circularly linked lists- doubly-linked lists – applications of lists –Polynomial
Manipulation – All operations (Insertion, Deletion, Merge, Traversal).
___________________________________________________________________________________
ABSTRACT DATA TYPES (ADTS)
Abstract Data type (ADT) is a type (or class) for objects whose behaviour is defined by a set of
value and a set of operations.
The definition of ADT only mentions what operations are to be performed but not how these
operations will be implemented. It does not specify how data will be organized in memory and what
algorithms will be used for implementing the operations. It is called “abstract” because it gives an
implementation independent view. The process of providing only the essentials and hiding the details is
known as abstraction.
* An abstract data type is a type with associated operations, but whose representation is hidden.
* Objects such as lists, sets, and graphs, along with their operations, can be viewed as abstract data
types.
* The basic idea is that the implementation of these operations is written once in the program, and
any other part of the program that needs to perform an operation on the ADT can do so by
calling the appropriate function.
* If for some reason implementation details need to change, it should be easy to do so by merely
changing the routines that perform the ADT operations. This change, in a perfect world, would be
completely transparent to the rest of the program.

The user of data type need not know that data type is implemented, for example, we have been
using int, float, char data types only with the knowledge with values that can take and operations that
can be performed on them without any idea of how these types are implemented. So a user only needs to
know what a data type can do but not how it will do it. We can think of ADT as a black box which hides
the inner structure and design of the data type. Now we’ll define three ADTs namely List ADT, Stack
ADT, Queue ADT.
Some common ADTs, which have proved useful in a great variety of applications, are

Container List Set Multiset Map

Multimap Graph Stack Queue Priority queue

 Linear data structures:- in which insertion and deletion is possible in linear fashion .example:-
arrays, linked lists.
 Non linear data structures:-in which it is not possible. example:- trees ,stacks

LIST ADT
A list or sequence is an abstract data type that represents a countable number of ordered values,
where the same value may occur more than once. An instance of a list is a computer representation of
the mathematical concept of a finite sequence; the (potentially) infinite analog of a list is a stream.[1]:§3.5

1
Lists are a basic example of containers, as they contain other values. If the same value occurs multiple
times, each occurrence is considered a distinct item.

Operations

A list contains elements of same type arranged in sequential order and following operations can be
performed on the list.
get() – Return an element from the list at any given position.
insert() – Insert an element at any position of the list.
remove() – Remove the first occurrence of any element from a non-empty list.
removeAt() – Remove the element at a specified location from a non-empty list.
replace() – Replace an element at any position by another element.
size() – Return the number of elements in the list.
isEmpty() – Return true if the list is empty, otherwise return false.
isFull() – Return true if the list is full, otherwise return false.
AN ARRAY-BASED IMPLEMENTATION
 An Array is a data structure which can store a fixed-size sequential collection of elements of the
same type.
 An array is used to store a collection of data, but it is often more useful to think of an array as a
collection of variables of the same type.
 Instead of declaring individual variables, such as number0, number1, ..., and number99, you
declare one array variable such as numbers and use numbers[0], numbers[1], and ...,
numbers[99] to represent individual variables.
 A specific element in an array is accessed by an index.
 All arrays consist of contiguous memory locations. The lowest address corresponds to the first
element and the highest address to the last element.

2
Operations:
Is Empty(LIST)
If (Current Size==0) "LIST is Empty"
else "LIST is not Empty"
Is Full(LIST)
If (Current Size=Max Size) "LIST is FULL"
else "LIST is not FULL"
Insert Element to End of the LIST

1. Check that weather the List is full or not

1. If List is full return error message ”List is full. Can’t Insert”.
2. If List is not full.
1. Get the position to insert the new element by Position=Current Size+1
2. Insert the element to the Position
3. Increase the Current Size by 1 i.e. Current Size=Current Size+1

Delete Element from End of the LIST

1. Check that weather the List is empty or not

1. If List is empty return error message ”List is Empty. Can't Delete”.
2. If List is not Empty.
1. Get the position of the element to delete by Position=Current Size
2. Delete the element from the Position
3. Decrease the Current Size by 1 i.e. Current Size=Current Size-1

Insert Element to front of the LIST

1. Check that weather the List is full or not

1. If List is full return error message ”List is full. Can't Insert”.
2. If List is not full.
1. Free the 1st Position of the list by moving all the Element to one position forward
i.e New Position=Current Position + 1.
2. Insert the element to the 1st Position
3. Increase the Current Size by 1 i.e. Current Size=Current Size+1

Delete Element from front of the LIST

1. Check that weather the List is empty or not

1. If List is empty return error message ”List is Empty. Can't Delete”.
2. If List is not Empty.
1. Move all the elements except one in the 1st position to one position backward i.e
New Position= Current Position -1
2. After the 1st step, element in the 1st position will be automatically deleted.
3. Decrease the Current Size by 1 i.e. Current Size=Current Size-1

Insert Element to nth Position of the LIST

1. Check that weather the List is full or not

1. If List is full return error message ”List is full. Can't Insert”.
2. If List is not full.
1. If List is Empty, Insert element at Position 1.
2. If (nth Position > Current Size)
3
1. Return message “nth Position Not available in List”
3. else
1. Free the nth Position of the list by moving all Elements to one position
forward except n-1,n-2,... 1 Position i.e move only from n to current size
position Elements. i.e New Position=Current Position + 1.
2. Insert the element to the nth Position
3. Increase the Current Size by 1 i.e. Current Size=Current Size+1

Delete Element from nth Position of the LIST

1. Check that weather the List is Empty or not

1. If List is Empty return error message ”List is Empty.”
2. If List is not Empty.
1. If (nth Position > Current Size)
1. Return message “nth Position Not available in List”
2. If (nth Position == Current Size)
1. Delete the element from nth Position
2. Decrease the Current Size by 1 i.e. Current Size=Current Size-1
3. If (nth Position < Current Size)
1. Move all the Elements to one position backward except n,n-1,n-2,... 1
Position i.e move only from n+1 to current size position Elements. i.e
New Position=Current Position - 1.
2. After the previous step, nth element will be deleted automatically.
3. Decrease the Current Size by 1 i.e. Current Size=Current Size-1

Search Element in the LIST

1. Check that weather the list is empty or not.

1. If List is empty, return error message “List is Empty”.
2. If List is not Empty
1. Find the Position where the last element available in the List by Last Position =
Current Size
2. For( Position 1 to Last Position)
1. If(Element @ Position== Search Element)//If Element matches the search
element
2. return the Position by message “Search Element available in Position”
3. Else return message “Search Element not available in the List”

Print the Elements in the LIST

1. Check that weather the list is empty or not.

Array Implementation of List

#include<stdio.h>
#include<conio.h>
#define MAX 10

4
void create();
void insert();
void deletion();
void search();
void display();
int a,b[20], n, p, e, f, i, pos,ava=0;

void main()
{
//clrscr();
int ch;
char g='y';

do
{
printf("\n main Menu");
printf("\n 1.Create \n 2.Delete \n 3.Search \n 4.Insert \n 5.Display\n 6.Exit \n");
printf("\n Enter your Choice");
scanf("%d", &ch);

switch(ch)
{
case 1:
create();
break;

case 2:
deletion();
break;

case 3:
search();
break;

case 4:
insert();
break;

case 5:
display();
break;

case 6:
exit();
break;

default:
printf("\n Enter the correct choice:");
}
printf("\n Do u want to continue:::");
scanf("\n%c", &g);
}
while(g=='y'||g=='Y');

5
getch();
}

void create()
{
printf("\n Enter the number of nodes");
scanf("%d", &n);
for(i=0;i<n;i++)
{
printf("\n Enter the Element:",i+1);
scanf("%d", &b[i]);
}

void deletion()
{
printf("\n Enter the position u want to delete::");
scanf("%d", &pos);
if(pos>=n)
{
printf("\n Invalid Location::");
}
else
{
for(i=pos+1;i<n;i++)
{
b[i-1]=b[i];
}
n--;
}
printf("\n The Elements after deletion");
for(i=0;i<n;i++)
{
printf("\t%d", b[i]);
}
}
void search()
{
printf("\n Enter the Element to be searched:");
scanf("%d", &e);

for(i=0;i<n;i++)
{
if(b[i]==e)
{
ava=1;
}
}

if(ava==1)
{

6
printf("Value %d is in the list", e);
ava=0;
}
else
printf("Value %d is not in the list", e);

void insert()
{
printf("\n Enter the position u need to insert::");
scanf("%d", &pos);

if(pos>=n)
{
printf("\n invalid Location::");
}
else
{
for(i=MAX-1;i>=pos-1;i--)
{
b[i+1]=b[i];
}
printf("\n Enter the element to insert::\n");
scanf("%d",&p);
b[pos]=p;
n++;
}
printf("\n The list after insertion::\n");

display();
}

void display()
{
printf("\n The Elements of The list ADT are:");
for(i=0;i<n;i++)
{
printf("\n\n%d", b[i]);
}
}

LINKED LIST IMPLEMENTATION

Like arrays, Linked List is a linear data structure. Unlike arrays, linked list elements are not
stored at contiguous location; the elements are linked using pointers.

Why Linked List?

Arrays can be used to store linear data of similar types, but arrays have following limitations.
7
1) The size of the arrays is fixed: So we must know the upper limit on the number of elements in
advance. Also, generally, the allocated memory is equal to the upper limit irrespective of the usage.
2) Inserting a new element in an array of elements is expensive, because room has to be created for the
new elements and to create room existing elements have to shifted.

For example, in a system if we maintain a sorted list of IDs in an array id[].

id[] = [1000, 1010, 1050, 2000, 2040].

And if we want to insert a new ID 1005, then to maintain the sorted order, we have to move all the
elements after 1000 (excluding 1000).
Deletion is also expensive with arrays until unless some special techniques are used. For example, to
delete 1010 in id[], everything after 1010 has to be moved.

Advantages over arrays

1) Dynamic size
2) Ease of insertion/deletion

Drawbacks:
1) Random access is not allowed. We have to access elements sequentially starting from the first node.
So we cannot do binary search with linked lists.
2) Extra memory space for a pointer is required with each element of the list.

Representation in C:
A linked list is represented by a pointer to the first node of the linked list. The first node is called head.
If the linked list is empty, then value of head is NULL.

Each node in a list consists of at least two parts:

1) data
2) pointer to the next node
In C, we can represent a node using structures. Below is an example of a linked list node with an integer
data.

SINGLY LINKED LIST

8
DOUBLY LINKED LIST

SINGLY CIRCULAR LINKED LIST

DOUBLY CIRCULAR LINKED LIST

9
Type declaration for linked list:

Comparison of array-based lists and linked lists

Array-based lists Linked lists
only need space for the objects actually
Pro no wasted space for an individual element
on the list
size must be predetermined; wasted space for overhead for links (an extra pointer
Con
lists with empty slots added to every node)
Space
Ω(n) (or greater) Θ(n)
complexity
The break-even point beyond which the array-based implementation of a list is more space efficient than
the linked-list: n>DEP+E, where D is the length of the array-based list, E is the size of a data element,
and P is the size of a pointer.
(i.e., the array-based implementation would be more efficient---if the link field and the element field are
the same size---whenever the array is more than half full).
As a rule of thumb, linked lists are more space efficient when implementing lists whose number of
elements varies widely or is unknown.

Comparison of time complexity for some operations

10
APPLICATIONS OF LIST

Lists can be used to store a list of elements. However, unlike in traditional arrays, lists can expand and
shrink, and are stored dynamically in memory.

In computing, lists are easier to implement than sets. A finite set in the mathematical sense can be
realized as a list with additional restrictions; that is, duplicate elements are disallowed and order is
irrelevant. Sorting the list speeds up determining if a given item is already in the set, but in order to
ensure the order, it requires more time to add new entry to the list. In efficient implementations,
however, sets are implemented using self-balancing binary search trees or hash tables, rather than a list.

Lists also form the basis for other abstract data types including the queue, the stack, and their variations.

POLYNOMIAL MANIPULATION

What is Polynomial ?
A polynomial is a mathematical expression consisting of a sum of terms, each term including a variable
or variables raised to a power and multiplied by a coefficient. The simplest polynomials have one
variable.

Representation of a Polynomial: A polynomial is an expression that contains more than two terms. A
term is made up of coefficient and exponent. An example of polynomial is

P(x) = 4x3+6x2+7x+9

A polynomial thus may be represented using arrays or linked lists. Array representation assumes that
the exponents of the given expression are arranged from 0 to the highest value (degree), which is
represented by the subscript of the array beginning with 0. The coefficients of the respective exponent
are placed at an appropriate index in the array. The array representation for the above polynomial
expression is given below:

A polynomial may also be represented using a linked list. A structure may be defined such that it
contains two parts- one is the coefficient and second is the corresponding exponent. The structure
definition may be given as shown below:

struct polynomial
{
int coefficient;
int exponent;
struct polynomial *next;
};
11
Thus the above polynomial may be represented using linked list as shown below:

Addition of two Polynomials:

For adding two polynomials using arrays is straightforward method, since both the arrays may be added
up element wise beginning from 0 to n-1, resulting in addition of two polynomials. Addition of two
polynomials using linked list requires comparing the exponents, and wherever the exponents are found
to be same, the coefficients are added up. For terms with different exponents, the complete term is
simply added to the result thereby making it a part of addition result. The complete program to add two
polynomials is given in subsequent section.

Multiplication of two Polynomials:

Multiplication of two polynomials however requires manipulation of each node such that the exponents
are added up and the coefficients are multiplied. After each term of first polynomial is operated upon
with each term of the second polynomial, then the result has to be added up by comparing the exponents
and adding the coefficients for similar exponents and including terms as such with dissimilar exponents
in the result.

ALL OPERATIONS (INSERTION, DELETION, MERGE, TRAVERSAL)

Insertion
Insert Element to nth Position of the LIST

1. Check that weather the List is full or not

1. If List is full return error message ”List is full. Can't Insert”.
2. If List is not full.
1. If List is Empty, Insert element at Position 1.
2. If (nth Position > Current Size)
1. Return message “nth Position Not available in List”
3. else
1. Free the nth Position of the list by moving all Elements to one position
forward except n-1,n-2,... 1 Position i.e move only from n to current size
position Elements. i.e New Position=Current Position + 1.
2. Insert the element to the nth Position
3. Increase the Current Size by 1 i.e. Current Size=Current Size+1

Deletion
Delete Element from nth Position of the LIST

1. Check that weather the List is Empty or not

12
3. If (nth Position < Current Size)
1. Move all the Elements to one position backward except n,n-1,n-2,... 1
Position i.e move only from n+1 to current size position Elements. i.e
New Position=Current Position - 1.
2. After the previous step, nth element will be deleted automatically.
3. Decrease the Current Size by 1 i.e. Current Size=Current Size-1

Merge two sorted linked lists

Input Format
You have to complete the Node* MergeLists(Node* headA, Node* headB) method which takes two
arguments - the heads of the two sorted linked lists to merge. You should NOT read any input from
stdin/console.

Output Format
Change the next pointer of individual nodes so that nodes from both lists are merged into a single list.
Then return the head of this merged list. Do NOT print anything to stdout/console.

Sample Input

1 -> 3 -> 5 -> 6 -> NULL

2 -> 4 -> 7 -> NULL

15 -> NULL
12 -> NULL

NULL
1 -> 2 -> NULL

Sample Output

1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> NULL

12 -> 15 -> NULL
1 -> 2 -> NULL

Procedure:

if (headA == NULL && headB == NULL) return NULL;

else if (headA == NULL) return headB;
else if (headB == NULL) return headA;

if(headA->data <= headB->data)

headA->next = MergeLists(headA->next, headB);

else {
Node* temp = headB;
headB = headB->next;
temp->next = headA;
headA = temp;
headA->next = MergeLists(headA->next, headB);
}

return headA;

13
Traversal

Assume, that we have a list with some nodes. Traversal is the very basic operation, which presents as a
part in almost every operation on a singly-linked list. For instance, algorithm may traverse a singly-
linked list to find a value, find a position for insertion, etc. For a singly-linked list, only forward
direction traversal is possible.

Traversal algorithm

Beginning from the head,

1. check, if the end of a list hasn't been reached yet;

2. do some actions with the current node, which is specific for particular algorithm;
3. current node becomes previous and next node becomes current. Go to the step 1.

Example

As for example, let us see an example of summing up values in a singly-linked list.

14
A simple C program for traversal of a linked list
#include<stdio.h>
#include<stdlib.h>

struct Node
{
int data;
struct Node *next;
};

// This function prints contents of linked list starting from

// the given node
void printList(struct Node *n)
{
while (n != NULL)
{
printf(" %d ", n->data);
n = n->next;
}
}

int main()
{
struct Node* head = NULL;
struct Node* second = NULL;
struct Node* third = NULL;

// allocate 3 nodes in the heap

head = (struct Node*)malloc(sizeof(struct Node));
second = (struct Node*)malloc(sizeof(struct Node));
third = (struct Node*)malloc(sizeof(struct Node));

head->data = 56; //assign data in first node

head->next = second; // Link first node with second

second->data = 28; //assign data to second node

second->next = third;

third->data = 32; //assign data to third node

third->next = NULL;

printList(head);

return 0;
}

15
UNIT II
LINEAR DATA STRUCTURES – STACKS, QUEUES

Stack ADT – Operations - Applications - Evaluating arithmetic expressions- Conversion of Infix to

postfix expression - Queue ADT – Operations - Circular Queue – Priority Queue - deQueue –
applications of queues.
___________________________________________________________________________________
STACK ADT
Stack is a linear data structure in which the insertion and deletion operations are performed at
only one end. In a stack, adding and removing of elements are performed at single position which is
known as "top". That means, new element is added at top of the stack and an element is removed from
the top of the stack. In stack, the insertion and deletion operations are performed based on LIFO (Last
In First Out) principle.

In a stack, the insertion operation is performed using a function called "push" and deletion operation is
performed using a function called "pop".

In the figure, PUSH and POP operations are performed at top position in the stack. That means, both the
insertion and deletion operations are performed at one end (i.e., at Top)

Example

If we want to create a stack by inserting 10,45,12,16,35 and 50. Then 10 becomes the bottom most
element and 50 is the top most element. Top is at 50 as shown in the image below...

OPERATIONS

The following operations are performed on the stack...

1. Push (To insert an element on to the stack)
16
2. Pop (To delete an element from the stack)
3. Display (To display elements of the stack)

Stack data structure can be implement in two ways. They are as follows...
1. Using Array
2. Using Linked List
When stack is implemented using array, that stack can organize only limited number of elements. When
stack is implemented using linked list, that stack can organize unlimited number of elements.

Stack Operations using Array

A stack can be implemented using array as follows...

Before implementing actual operations, first follow the below steps to create an empty stack.

 Step 1: Include all the header files which are used in the program and define a constant 'SIZE'
with specific value.
 Step 2: Declare all the functions used in stack implementation.
 Step 3: Create a one dimensional array with fixed size (int stack[SIZE])
 Step 4: Define a integer variable 'top' and initialize with '-1'. (int top = -1)
 Step 5: In main method display menu with list of operations and make suitable function calls to
perform operation selected by the user on the stack.

push(value) - Inserting value into the stack

In a stack, push() is a function used to insert an element into the stack. In a stack, the new element is
always inserted at top position. Push function takes one integer value as parameter and inserts that value
into the stack. We can use the following steps to push an element on to the stack...

 Step 1: Check whether stack is FULL. (top == SIZE-1)

 Step 2: If it is FULL, then display "Stack is FULL!!! Insertion is not possible!!!" and
terminate the function.
 Step 3: If it is NOT FULL, then increment top value by one (top++) and set stack[top] to value
(stack[top] = value).

pop() - Delete a value from the Stack

In a stack, pop() is a function used to delete an element from the stack. In a stack, the element is always
deleted from top position. Pop function does not take any value as parameter. We can use the following
steps to pop an element from the stack...

 Step 1: Check whether stack is EMPTY. (top == -1)

 Step 2: If it is EMPTY, then display "Stack is EMPTY!!! Deletion is not possible!!!" and
terminate the function.
 Step 3: If it is NOT EMPTY, then delete stack[top] and decrement top value by one (top--).

display() - Displays the elements of a Stack

We can use the following steps to display the elements of a stack...

 Step 1: Check whether stack is EMPTY. (top == -1)

 Step 2: If it is EMPTY, then display "Stack is EMPTY!!!" and terminate the function.
 Step 3: If it is NOT EMPTY, then define a variable 'i' and initialize with top. Display stack[i]
value and decrement i value by one (i--).
17
 Step 3: Repeat above step until i value becomes '0'.

Array implementation of Stack ADTs

#include<stdio.h>
int stack[100],choice,n,top,x,i;
void push(void);
void pop(void);
void display(void);
int main()
{
//clrscr();
top=-1;
printf("\n Enter the size of STACK[MAX=100]:");
scanf("%d",&n);
printf("\n\t STACK OPERATIONS USING ARRAY");
printf("\n\t--------------------------------");
printf("\n\t 1.PUSH\n\t 2.POP\n\t 3.DISPLAY\n\t 4.EXIT");
do
{
printf("\n Enter the Choice:");
scanf("%d",&choice);
switch(choice)
{
case 1:
{
push();
break;
}
case 2:
{
pop();
break;
}
case 3:
{
display();
break;
}
case 4:
{
printf("\n\t EXIT POINT ");
break;
}
default:
{
printf ("\n\t Please Enter a Valid Choice(1/2/3/4)");
}

}
}
while(choice!=4);

18
return 0;
}
void push()
{
if(top>=n-1)
{
printf("\n\tSTACK is over flow");

}
else
{
printf(" Enter a value to be pushed:");
scanf("%d",&x);
top++;
stack[top]=x;
}
}
void pop()
{
if(top<=-1)
{
printf("\n\t Stack is under flow");
}
else
{
printf("\n\t The popped elements is %d",stack[top]);
top--;
}
}
void display()
{
if(top>=0)
{
printf("\n The elements in STACK \n");
for(i=top; i>=0; i--)
printf("\n%d",stack[i]);
printf("\n Press Next Choice");
}
else
{
printf("\n The STACK is empty");
}

Stack using Linked List

The major problem with the stack implemented using array is, it works only for fixed number of data
values. That means the amount of data must be specified at the beginning of the implementation itself.
Stack implemented using array is not suitable, when we don't know the size of data which we are going
19
to use. A stack data structure can be implemented by using linked list data structure. The stack
implemented using linked list can work for unlimited number of values. That means, stack implemented
using linked list works for variable size of data. So, there is no need to fix the size at the beginning of
the implementation. The Stack implemented using linked list can organize as many data values as we
want.

In linked list implementation of a stack, every new element is inserted as 'top' element. That means
every newly inserted element is pointed by 'top'. Whenever we want to remove an element from the
stack, simply remove the node which is pointed by 'top' by moving 'top' to its next node in the list. The
next field of the first element must be always NULL.

Example

In above example, the last inserted node is 99 and the first inserted node is 25. The order of elements
inserted is 25, 32,50 and 99.

Operations

To implement stack using linked list, we need to set the following things before implementing actual
operations.

 Step 1: Include all the header files which are used in the program. And declare all the user
defined functions.
 Step 2: Define a 'Node' structure with two members data and next.
 Step 3: Define a Node pointer 'top' and set it to NULL.
 Step 4: Implement the main method by displaying Menu with list of operations and make
suitable function calls in the main method.

push(value) - Inserting an element into the Stack

We can use the following steps to insert a new node into the stack...

 Step 1: Create a newNode with given value.

 Step 2: Check whether stack is Empty (top == NULL)
 Step 3: If it is Empty, then set newNode → next = NULL.
 Step 4: If it is Not Empty, then set newNode → next = top.
 Step 5: Finally, set top = newNode.

pop() - Deleting an Element from a Stack

We can use the following steps to delete a node from the stack...

 Step 1: Check whether stack is Empty (top == NULL).

20
 Step 2: If it is Empty, then display "Stack is Empty!!! Deletion is not possible!!!" and
terminate the function
 Step 3: If it is Not Empty, then define a Node pointer 'temp' and set it to 'top'.
 Step 4: Then set 'top = top → next'.
 Step 7: Finally, delete 'temp' (free(temp)).

display() - Displaying stack of elements

We can use the following steps to display the elements (nodes) of a stack...

 Step 1: Check whether stack is Empty (top == NULL).

 Step 2: If it is Empty, then display 'Stack is Empty!!!' and terminate the function.
 Step 3: If it is Not Empty, then define a Node pointer 'temp' and initialize with top.
 Step 4: Display 'temp → data --->' and move it to the next node. Repeat the same until temp
reaches to the first node in the stack (temp → next != NULL).
 Step 4: Finally! Display 'temp → data ---> NULL'.

APPLICATIONS - EVALUATING ARITHMETIC EXPRESSIONS-

CONVERSION OF INFIX TO POSTFIX EXPRESSION
Applications

 The simplest application of a stack is to reverse a word. You push a given word to stack - letter
by letter - and then pop letters from the stack.
 Another application is an "undo" mechanism in text editors; this operation is accomplished by
keeping all text changes in a stack.
 Backtracking. This is a process when you need to access the most recent data element in a
series of elements. Think of a labyrinth or maze - how do you find a way from an entrance to an
exit?

Once you reach a dead end, you must backtrack. But backtrack to where? to the previous choice
point. Therefore, at each choice point you store on a stack all possible choices. Then
backtracking simply means popping a next choice from the stack.

 Memory management, run-time environment for nested language features.

 Language processing:
o space for parameters and local variables is created internally using a stack.
o compiler's syntax check for matching braces is implemented by using stack.
o support for recursion
 Deapth-First Search with a Stack

In depth-first search we go down a path until we get to a dead end; then we backtrack or back up (by
popping a stack) to get an alternative path.

21
o Create a stack
o Create a new choice point
o Push the choice point onto the stack
o while (not found and stack is not empty)
 Pop the stack
 Find all possible choices after the last one tried
 Push these choices onto the stack
o Return
 Expression evaluation

Evaluate an expression represented by a String. Expression can contain parentheses, you can
assume parentheses are well-matched. For simplicity, you can assume only binary operations
allowed are +, -, *, and /. Arithmetic Expressions can be written in one of three forms:

Infix Notation: Operators are written between the operands they operate on, e.g. 3 + 4 .

Prefix Notation: Operators are written before the operands, e.g + 3 4

Postfix Notation: Operators are written after operands.

Infix, Prefix and Postfix Notation

Infix Prefix Postfix
a+b +ab ab+
a+b*c +a*bc abc*+
(a + b) * (c - d) *+ab-cd ab+cd-*
b*b-4*a*c -*bb **4ac bb*4a*c*-
40 - 3 * 5 + 1 = 26 + - 40 * 3 5 1 40 3 5 * - 1 +

Infix transformation to Postfix

This process uses a stack as well. We have to hold information that's expressed inside parentheses while
scanning to find the closing ')'. We also have to hold information on operations that are of lower
precedence on the stack. The algorithm is:

1. Create an empty stack and an empty postfix output string/stream

2. Scan the infix input string/stream left to right
3. If the current input token is an operand, simply append it to the output string (note the examples
above that the operands remain in the same order)
4. If the current input token is an operator, pop off all operators that have equal or higher
precedence and append them to the output string; push the operator onto the stack. The order of
popping is the order in the output.
5. If the current input token is '(', push it onto the stack
6. If the current input token is ')', pop off all operators and append them to the output string until a
'(' is popped; discard the '('.
7. If the end of the input string is found, pop all operators and append them to the output string.

This algorithm doesn't handle errors in the input, although careful analysis of parenthesis or lack of
parenthesis could point to such error determination.

22
QUEUE ADT

Queue is a linear data structure in which the insertion and deletion operations are performed at two
different ends. In a queue data structure, adding and removing of elements are performed at two
different positions. The insertion is performed at one end and deletion is performed at other end. In a
queue data structure, the insertion operation is performed at a position which is known as 'rear' and the
deletion operation is performed at a position which is known as 'front'. In queue data structure, the
insertion and deletion operations are performed based on FIFO (First In First Out) principle.

In a queue data structure, the insertion operation is performed using a function called "enQueue()" and
deletion operation is performed using a function called "deQueue()".

Example

Queue after inserting 25, 30, 51, 60 and 85.

OPERATIONS

The following operations are performed on a queue data structure...

1. enQueue(value) - (To insert an element into the queue)

2. deQueue() - (To delete an element from the queue)
3. display() - (To display the elements of the queue)

Queue data structure can be implemented in two ways. They are as follows...

1. Using Array
2. Using Linked List

When a queue is implemented using array, that queue can organize only limited number of elements.
When a queue is implemented using linked list, that queue can organize unlimited number of elements.

Queue Using Array

A queue data structure can be implemented using one dimensional array. But, queue
implemented using array can store only fixed number of data values. The implementation of queue data
structure using array is very simple, just define a one dimensional array of specific size and insert or
delete the values into that array by using FIFO (First In First Out) principle with the help of variables
'front' and 'rear'. Initially both 'front' and 'rear' are set to -1. Whenever, we want to insert a new value
into the queue, increment 'rear' value by one and then insert at that position. Whenever we want to
delete a value from the queue, then increment 'front' value by one and then display the value at 'front'
position as deleted element.
23
Operations
Before we implement actual operations, first follow the below steps to create an empty queue.
 Step 1: Include all the header files which are used in the program and define a constant 'SIZE'
with specific value.
 Step 2: Declare all the user defined functions which are used in queue implementation.
 Step 3: Create a one dimensional array with above defined SIZE (int queue[SIZE])
 Step 4: Define two integer variables 'front' and 'rear' and initialize both with '-1'. (int front = -
1, rear = -1)
 Step 5: Then implement main method by displaying menu of operations list and make suitable
function calls to perform operation selected by the user on queue.

enQueue(value) - Inserting value into the queue

In a queue data structure, enQueue() is a function used to insert a new element into the queue. In a
queue, the new element is always inserted at rear position. The enQueue() function takes one integer
value as parameter and inserts that value into the queue. We can use the following steps to insert an
element into the queue...
 Step 1: Check whether queue is FULL. (rear == SIZE-1)
 Step 2: If it is FULL, then display "Queue is FULL!!! Insertion is not possible!!!" and
terminate the function.
 Step 3: If it is NOT FULL, then increment rear value by one (rear++) and set queue[rear] =
value.

deQueue() - Deleting a value from the Queue

In a queue data structure, deQueue() is a function used to delete an element from the queue. In a queue,
the element is always deleted from front position. The deQueue() function does not take any value as
parameter. We can use the following steps to delete an element from the queue...

 Step 1: Check whether queue is EMPTY. (front == rear)

 Step 2: If it is EMPTY, then display "Queue is EMPTY!!! Deletion is not possible!!!" and
terminate the function.
 Step 3: If it is NOT EMPTY, then increment the front value by one (front ++). Then display
queue[front] as deleted element. Then check whether both front and rear are equal (front ==
rear), if it TRUE, then set both front and rear to '-1' (front = rear = -1).

display() - Displays the elements of a Queue

We can use the following steps to display the elements of a queue...

 Step 1: Check whether queue is EMPTY. (front == rear)

 Step 2: If it is EMPTY, then display "Queue is EMPTY!!!" and terminate the function.
 Step 3: If it is NOT EMPTY, then define an integer variable 'i' and set 'i = front+1'.
 Step 3: Display 'queue[i]' value and increment 'i' value by one (i++). Repeat the same until 'i'
value is equal to rear (i <= rear)

Queue using Linked List

The major problem with the queue implemented using array is, It will work for only fixed
number of data. That means, the amount of data must be specified in the beginning itself. Queue using
array is not suitable when we don't know the size of data which we are going to use. A queue data
structure can be implemented using linked list data structure. The queue which is implemented using
linked list can work for unlimited number of values. That means, queue using linked list can work for
variable size of data (No need to fix the size at beginning of the implementation). The Queue
implemented using linked list can organize as many data values as we want.

24
In linked list implementation of a queue, the last inserted node is always pointed by 'rear' and the first
node is always pointed by 'front'.

Example

In above example, the last inserted node is 50 and it is pointed by 'rear' and the first inserted node is 10
and it is pointed by 'front'. The order of elements inserted is 10, 15, 22 and 50.

Operations

To implement queue using linked list, we need to set the following things before implementing actual
operations.

 Step 1: Include all the header files which are used in the program. And declare all the user
defined functions.
 Step 2: Define a 'Node' structure with two members data and next.
 Step 3: Define two Node pointers 'front' and 'rear' and set both to NULL.
 Step 4: Implement the main method by displaying Menu of list of operations and make suitable
function calls in the main method to perform user selected operation.

enQueue(value) - Inserting an element into the Queue

We can use the following steps to insert a new node into the queue...

 Step 1: Create a newNode with given value and set 'newNode → next' to NULL.
 Step 2: Check whether queue is Empty (rear == NULL)
 Step 3: If it is Empty then, set front = newNode and rear = newNode.
 Step 4: If it is Not Empty then, set rear → next = newNode and rear = newNode.

deQueue() - Deleting an Element from Queue

We can use the following steps to delete a node from the queue...

 Step 1: Check whether queue is Empty (front == NULL).

 Step 2: If it is Empty, then display "Queue is Empty!!! Deletion is not possible!!!" and
terminate from the function
 Step 3: If it is Not Empty then, define a Node pointer 'temp' and set it to 'front'.
 Step 4: Then set 'front = front → next' and delete 'temp' (free(temp)).

display() - Displaying the elements of Queue

We can use the following steps to display the elements (nodes) of a queue...

 Step 1: Check whether queue is Empty (front == NULL).

 Step 2: If it is Empty then, display 'Queue is Empty!!!' and terminate the function.
 Step 3: If it is Not Empty then, define a Node pointer 'temp' and initialize with front.
 Step 4: Display 'temp → data --->' and move it to the next node. Repeat the same until 'temp'
reaches to 'rear' (temp → next != NULL).
 Step 4: Finally! Display 'temp → data ---> NULL'.

25
CIRCULAR QUEUE

In a normal Queue Data Structure, we can insert elements until queue becomes full. But once if
queue becomes full, we cannot insert the next element until all the elements are deleted from the queue.
For example consider the queue below...

After inserting all the elements into the queue.

Now consider the following situation after deleting three elements from the queue...

This situation also says that Queue is Full and we can not insert the new element because, 'rear' is still
at last position. In above situation, even though we have empty positions in the queue we can not make
use of them to insert new element. This is the major problem in normal queue data structure. To
overcome this problem we use circular queue data structure.

Graphical representation of a circular queue is as follows...

Implementation of Circular Queue

To implement a circular queue data structure using array, we first perform the following steps before
we implement actual operations.
 Step 1: Include all the header files which are used in the program and define a constant 'SIZE'
with specific value.
 Step 2: Declare all user defined functions used in circular queue implementation.
 Step 3: Create a one dimensional array with above defined SIZE (int cQueue[SIZE])
 Step 4: Define two integer variables 'front' and 'rear' and initialize both with '-1'. (int front = -
1, rear = -1)
 Step 5: Implement main method by displaying menu of operations list and make suitable
function calls to perform operation selected by the user on circular queue.

26
enQueue(value) - Inserting value into the Circular Queue

In a circular queue, enQueue() is a function which is used to insert an element into the circular queue. In
a circular queue, the new element is always inserted at rear position. The enQueue() function takes one
integer value as parameter and inserts that value into the circular queue. We can use the following steps
to insert an element into the circular queue...

 Step 1: Check whether queue is FULL. ((rear == SIZE-1 && front == 0) || (front ==
rear+1))
 Step 2: If it is FULL, then display "Queue is FULL!!! Insertion is not possible!!!" and
terminate the function.
 Step 3: If it is NOT FULL, then check rear == SIZE - 1 && front != 0 if it is TRUE, then set
rear = -1.
 Step 4: Increment rear value by one (rear++), set queue[rear] = value and check 'front == -1'
if it is TRUE, then set front = 0.

deQueue() - Deleting a value from the Circular Queue

In a circular queue, deQueue() is a function used to delete an element from the circular queue. In a
circular queue, the element is always deleted from front position. The deQueue() function doesn't take
any value as parameter. We can use the following steps to delete an element from the circular queue...

 Step 1: Check whether queue is EMPTY. (front == -1 && rear == -1)

 Step 2: If it is EMPTY, then display "Queue is EMPTY!!! Deletion is not possible!!!" and
terminate the function.
 Step 3: If it is NOT EMPTY, then display queue[front] as deleted element and increment the
front value by one (front ++). Then check whether front == SIZE, if it is TRUE, then set front
= 0. Then check whether both front - 1 and rear are equal (front -1 == rear), if it TRUE, then
set both front and rear to '-1' (front = rear = -1).

display() - Displays the elements of a Circular Queue

We can use the following steps to display the elements of a circular queue...

 Step 1: Check whether queue is EMPTY. (front == -1)

 Step 2: If it is EMPTY, then display "Queue is EMPTY!!!" and terminate the function.
 Step 3: If it is NOT EMPTY, then define an integer variable 'i' and set 'i = front'.
 Step 4: Check whether 'front <= rear', if it is TRUE, then display 'queue[i]' value and
increment 'i' value by one (i++). Repeat the same until 'i <= rear' becomes FALSE.
 Step 5: If 'front <= rear' is FALSE, then display 'queue[i]' value and increment 'i' value by one
(i++). Repeat the same until'i <= SIZE - 1' becomes FALSE.
 Step 6: Set i to 0.
 Step 7: Again display 'cQueue[i]' value and increment i value by one (i++). Repeat the same
until 'i <= rear' becomes FALSE.

PRIORITY QUEUE

In normal queue data structure, insertion is performed at the end of the queue and deletion is
performed based on the FIFO principle. This queue implementation may not be suitable for all
situations.

Consider a networking application where server has to respond for requests from multiple clients using
queue data structure. Assume four requests arrived to the queue in the order of R1 requires 20 units of

27
time, R2 requires 2 units of time, R3 requires 10 units of time and R4 requires 5 units of time. Queue is
as follows...

Now, check waiting time for each request to be complete.

1. R1 : 20 units of time
2. R2 : 22 units of time (R2 must wait till R1 complete - 20 units and R2 itself requeres 2
units. Total 22 units)
3. R3 : 32 units of time (R3 must wait till R2 complete - 22 units and R3 itself requeres 10
units. Total 32 units)
4. R4 : 37 units of time (R4 must wait till R3 complete - 35 units and R4 itself requeres 5
units. Total 37 units)
Here, average waiting time for all requests (R1, R2, R3 and R4) is (20+22+32+37)/4 ≈ 27 units of
time.

That means, if we use a normal queue data structure to serve these requests the average waiting time for
each request is 27 units of time.

Now, consider another way of serving these requests. If we serve according to their required amount of
time. That means, first we serve R2 which has minimum time required (2) then serve R4 which has
second minimum time required (5) then serve R3 which has third minimum time required (10) and
finnaly R1 which has maximum time required (20).

Now, check waiting time for each request to be complete.

1. R2 : 2 units of time
2. R4 : 7 units of time (R4 must wait till R2 complete 2 units and R4 itself requeres 5 units.
Total 7 units)
3. R3 : 17 units of time (R3 must wait till R4 complete 7 units and R3 itself requeres 10 units.
Total 17 units)
4. R1 : 37 units of time (R1 must wait till R3 complete 17 units and R1 itself requeres 20
units. Total 37 units)
Here, average waiting time for all requests (R1, R2, R3 and R4) is (2+7+17+37)/4 ≈ 15 units of
time.

From above two situations, it is very clear that, by using second method server can complete all four
requests with very less time compared to the first method. This is what exactly done by the priority
queue.
There are two types of priority queues they are as follows...
1. Max Priority Queue
2. Min Priority Queue

1. Max Priority Queue

In max priority queue, elements are inserted in the order in which they arrive the queue and always
maximum value is removed first from the queue. For example assume that we insert in order 8, 3, 2, 5
28
and they are removed in the order 8, 5, 3, 2.

The following are the operations performed in a Max priority queue...

1. isEmpty() - Check whether queue is Empty.

2. insert() - Inserts a new value into the queue.
3. findMax() - Find maximum value in the queue.
4. remove() - Delete maximum value from the queue.

Max Priority Queue Representations

There are 6 representations of max priority queue.

1. Using an Unordered Array (Dynamic Array)

2. Using an Unordered Array (Dynamic Array) with the index of the maximum value
3. Using an Array (Dynamic Array) in Decreasing Order
4. Using an Array (Dynamic Array) in Increasing Order
5. Using Linked List in Increasing Order
6. Using Unordered Linked List with reference to node with the maximum value

#1. Using an Unordered Array (Dynamic Array)

In this representation elements are inserted according to their arrival order and maximum element is
deleted first from max priority queue.

For example, assume that elements are inserted in the order of 8, 2, 3 and 5. And they are removed in
the order 8, 5, 3 and 2.

Now, let us analyse each operation according to this representation...

isEmpty() - If 'front == -1' queue is Empty. This operation requires O(1) time complexity that means
constant time.

insert() - New element is added at the end of the queue. This operation requires O(1) time complexity
that means constant time.

findMax() - To find maximum element in the queue, we need to compare with all the elements in the
queue. This operation requires O(n) time complexity.

remove() - To remove an element from the queue first we need to perform findMax() which requires
O(n) and removal of particular element requires constant time O(1). This operation requires O(n) time
complexity.

#2. Using an Unordered Array (Dynamic Array) with the index of the maximum value

In this representation elements are inserted according to their arrival order and maximum element is
deleted first from max priority queue.

29
For example, assume that elements are inserted in the order of 8, 2, 3 and 5. And they are removed in
the order 8, 5, 3 and 2.

Now, let us analyse each operation according to this representation...

isEmpty() - If 'front == -1' queue is Empty. This operation requires O(1) time complexity that means
constant time.

insert() - New element is added at the end of the queue with O(1) and for each insertion we need to
update maxIndex with O(1). This operation requires O(1) time complexity that means constant time.

findMax() - To find maximum element in the queue is very simple as maxIndex has maximum element
index. This operation requires O(1) time complexity.

remove() - To remove an element from the queue first we need to perform findMax() which requires
O(1) , removal of particular element requires constant time O(1) and update maxIndex value which
requires O(n). This operation requires O(n) time complexity.

#3. Using an Array (Dynamic Array) in Decreasing Order

In this representation elements are inserted according to their value in decreasing order and maximum
element is deleted first from max priority queue.

For example, assume that elements are inserted in the order of 8, 5, 3 and 2. And they are removed in
the order 8, 5, 3 and 2.

Now, let us analyse each operation according to this representation...

isEmpty() - If 'front == -1' queue is Empty. This operation requires O(1) time complexity that means
constant time.

insert() - New element is added at a particular position in the decreasing order into the queue with O(n),
because we need to shift existing elements inorder to insert new element in decreasing order. This
operation requires O(n) time complexity.

findMax() - To find maximum element in the queue is very simple as maximum element is at the
beginning of the queue. This operation requires O(1) time complexity.

remove() - To remove an element from the queue first we need to perform findMax() which requires
O(1), removal of particular element requires constant time O(1) and rearrange remaining elements
which requires O(n). This operation requires O(n) time complexity.

30
#4. Using an Array (Dynamic Array) in Increasing Order

In this representation elements are inserted according to their value in increasing order and maximum
element is deleted first from max priority queue.

For example, assume that elements are inserted in the order of 2, 3, 5 and 8. And they are removed in
the order 8, 5, 3 and 2.

Now, let us analyse each operation according to this representation...

isEmpty() - If 'front == -1' queue is Empty. This operation requires O(1) time complexity that means
constant time.

insert() - New element is added at a particular position in the increasing order into the queue with O(n),
because we need to shift existing elements inorder to insert new element in increasing order. This
operation requires O(n) time complexity.

findMax() - To find maximum element in the queue is very simple as maximum element is at the end of
the queue. This operation requires O(1) time complexity.

#5. Using Linked List in Increasing Order

In this representation, we use a single linked list to represent max priority queue. In this representation
elements are inserted according to their value in increasing order and node with maximum value is
deleted first from max priority queue.

For example, assume that elements are inserted in the order of 2, 3, 5 and 8. And they are removed in
the order 8, 5, 3 and 2.

Now, let us analyse each operation according to this representation...

isEmpty() - If 'head == NULL' queue is Empty. This operation requires O(1) time complexity that
means constant time.

insert() - New element is added at a particular position in the increasing order into the queue with O(n),
because we need to the position where new element has to be inserted. This operation requires O(n)
time complexity.
31
findMax() - To find maximum element in the queue is very simple as maximum element is at the end of
the queue. This operation requires O(1) time complexity.

remove() - To remove an element from the queue is simply removing the last node in the queue which
requires O(1). This operation requires O(1) time complexity.

#6. Using Unordered Linked List with reference to node with the maximum value

In this representation, we use a single linked list to represent max priority queue. Always we maitain a
reference (maxValue) to the node with maximum value. In this representation elements are inserted
according to their arrival and node with maximum value is deleted first from max priority queue.

For example, assume that elements are inserted in the order of 2, 8, 3 and 5. And they are removed in
the order 8, 5, 3 and 2.

Now, let us analyse each operation according to this representation...

isEmpty() - If 'head == NULL' queue is Empty. This operation requires O(1) time complexity that
means constant time.

insert() - New element is added at end the queue with O(1) and update maxValue reference with O(1).
This operation requires O(1) time complexity.

findMax() - To find maximum element in the queue is very simple as maxValue is referenced to the
node with maximum value in the queue. This operation requires O(1) time complexity.

remove() - To remove an element from the queue is deleting the node which referenced by maxValue
which requires O(1) and update maxValue reference to new node with maximum value in the queue
which requires O(n) time complexity. This operation requires O(n) time complexity.

2. Min Priority Queue Representations

Min Priority Queue is similar to max priority queue except removing maximum element first, we
remove minimum element first in min priority queue.
The following operations are performed in Min Priority Queue...
1. isEmpty() - Check whether queue is Empty.
2. insert() - Inserts a new value into the queue.
3. findMin() - Find minimum value in the queue.
4. remove() - Delete minimum value from the queue.
Min priority queue is also has same representations as Max priority queue with minimum value removal.

DOUBLE ENDED QUEUE (DEQUEUE)

Double Ended Queue is also a Queue data structure in which the insertion and deletion
operations are performed at both the ends (front and rear). That means, we can insert at both front and
rear positions and can delete from both front and rear positions.

32
Double Ended Queue can be represented in TWO ways, those are as follows...
1. Input Restricted Double Ended Queue
2. Output Restricted Double Ended Queue
Input Restricted Double Ended Queue
In input restricted double ended queue, the insertion operation is performed at only one end and deletion
operation is performed at both the ends.

Output Restricted Double Ended Queue

In output restricted double ended queue, the deletion operation is performed at only one end and
insertion operation is performed at both the ends.

APPLICATIONS OF QUEUES
A real-world example of queue can be a single-lane one-way road, where the vehicle enters first, exits
first. More real-world examples can be seen as queues at the ticket windows and bus-stops.
Vehicle on Road

33
Ticket Counter : First person get ticket first and go out first

Some Other Real life example of Queue are;

 Queue of people at any service point such as ticketing etc.

 Queue of processes in OS.
 Queue of packets in data communication.
 Queue of air planes waiting for landing instructions.
 When a resource is shared among multiple consumers. Examples include CPU scheduling, Disk
Scheduling.
 When data is transferred asynchronously (data not necessarily received at same rate as sent)
between two processes. Examples include IO Buffers, pipes, file IO, etc.

Breadth-First Search with a Queue

In breadth-first search we explore all the nearest possibilities by finding all possible successors and
enqueue them to a queue.

 Create a queue
 Create a new choice point
 Enqueue the choice point onto the queue
 while (not found and queue is not empty)
o Dequeue the queue
o Find all possible choices after the last one tried
o Enqueue these choices onto the queue
 Return

We will see more on search techniques later in the course.

34
UNIT III
NON LINEAR DATA STRUCTURES – TREES
Tree ADT – tree traversals - Binary Tree ADT – expression trees – applications of trees – binary search
tree ADT –Threaded Binary Trees- AVL Trees – B-Tree - B+ Tree - Heap – Applications of heap.
___________________________________________________________________________________

TREE ADT

A tree is a widely used abstract data type (ADT)—or data structure implementing this ADT—that
simulates a hierarchical tree structure, with a root value and subtrees of children with a parent node,
represented as a set of linked nodes.

A tree data structure can be defined recursively (locally) as a collection of nodes (starting at a root
node), where each node is a data structure consisting of a value, together with a list of references to
nodes (the "children"), with the constraints that no reference is duplicated, and none points to the root.

In linear data structure, data is organized in sequential order and in non-linear data structure, data is
organized in random order. Tree is a very popular data structure used in wide range of applications. A
tree data structure can be defined as follows...

Tree is a non-linear data structure which organizes data in hierarchical structure and this is a recursive
definition.

A tree data structure can also be defined as follows...

Tree data structure is a collection of data (Node) which is organized in hierarchical structure and this is a
recursive definition

In tree data structure, every individual element is called as Node. Node in a tree data structure, stores the
actual data of that particular element and link to next element in hierarchical structure.

In a tree data structure, if we have N number of nodes then we can have a maximum of N-1 number of
links.

Example

Terminology

In a tree data structure, we use the following terminology...

35
1. Root
In a tree data structure, the first node is called as Root Node. Every tree must have root node. We can
say that root node is the origin of tree data structure. In any tree, there must be only one root node. We
never have multiple root nodes in a tree.

2. Edge
In a tree data structure, the connecting link between any two nodes is called as EDGE. In a tree with 'N'
number of nodes there will be a maximum of 'N-1' number of edges.

3. Parent
In a tree data structure, the node which is predecessor of any node is called as PARENT NODE. In
simple words, the node which has branch from it to any other node is called as parent node. Parent node
can also be defined as "The node which has child / children".

36
4. Child
In a tree data structure, the node which is descendant of any node is called as CHILD Node. In simple
words, the node which has a link from its parent node is called as child node. In a tree, any parent node
can have any number of child nodes. In a tree, all the nodes except root are child nodes.

5. Siblings
In a tree data structure, nodes which belong to same Parent are called as SIBLINGS. In simple words,
the nodes with same parent are called as Sibling nodes.

6. Leaf
In a tree data structure, the node which does not have a child is called as LEAF Node. In simple words,
a leaf is a node with no child.
In a tree data structure, the leaf nodes are also called as External Nodes. External node is also a node
with no child. In a tree, leaf node is also called as 'Terminal' node.

37
7. Internal Nodes
In a tree data structure, the node which has atleast one child is called as INTERNAL Node. In simple
words, an internal node is a node with atleast one child.

In a tree data structure, nodes other than leaf nodes are called as Internal Nodes. The root node is also
said to be Internal Node if the tree has more than one node. Internal nodes are also called as 'Non-
Terminal' nodes.

8. Degree
In a tree data structure, the total number of children of a node is called as DEGREE of that Node. In
simple words, the Degree of a node is total number of children it has. The highest degree of a node
among all the nodes in a tree is called as 'Degree of Tree'

9. Level
In a tree data structure, the root node is said to be at Level 0 and the children of root node are at Level 1
and the children of the nodes which are at Level 1 will be at Level 2 and so on... In simple words, in a
tree each step from top to bottom is called as a Level and the Level count starts with '0' and incremented
by one at each level (Step).

38
10. Height
In a tree data structure, the total number of egdes from leaf node to a particular node in the longest path
is called as HEIGHT of that Node. In a tree, height of the root node is said to be height of the tree. In a
tree, height of all leaf nodes is '0'.

11. Depth
In a tree data structure, the total number of egdes from root node to a particular node is called as
DEPTH of that Node. In a tree, the total number of edges from root node to a leaf node in the longest
path is said to be Depth of the tree. In simple words, the highest depth of any leaf node in a tree is said
to be depth of that tree. In a tree, depth of the root node is '0'.

12. Path
In a tree data structure, the sequence of Nodes and Edges from one node to another node is called as
PATH between that two Nodes. Length of a Path is total number of nodes in that path. In below
example the path A - B - E - J has length 4.

39
13. Sub Tree
In a tree data structure, each child from a node forms a subtree recursively. Every child node will form a
subtree on its parent node.

Tree Representations
A tree data structure can be represented in two methods. Those methods are as follows...
1. List Representation
2. Left Child - Right Sibling Representation
Consider the following tree...

1. List Representation
In this representation, we use two types of nodes one for representing the node with data and
another for representing only references. We start with a node with data from root node in the tree. Then
it is linked to an internal node through a reference node and is linked to any other node directly. This
process repeats for all the nodes in the tree.

The above tree example can be represented using List representation as follows...

40
2. Left Child - Right Sibling Representation
In this representation, we use list with one type of node which consists of three fields namely Data field,
Left child reference field and Right sibling reference field. Data field stores the actual value of a node,
left reference field stores the address of the left child and right reference field stores the address of the
right sibling node. Graphical representation of that node is as follows...

In this representation, every node's data field stores the actual value of that node. If that node has left
child, then left reference field stores the address of that left child node otherwise that field stores NULL.
If that node has right sibling then right reference field stores the address of right sibling node otherwise
that field stores NULL.
The above tree example can be represented using Left Child - Right Sibling representation as follows...

TREE TRAVERSALS

There are three types of tree traversals.

1. In - Order Traversal
2. Pre - Order Traversal
3. Post - Order Traversal

Consider the following binary tree...

41
1. In - Order Traversal ( leftChild - root - rightChild )

In In-Order traversal, the root node is visited between left child and right child. In this traversal, the left
child node is visited first, then the root node is visited and later we go for visiting right child node. This
in-order traversal is applicable for every root node of all subtrees in the tree. This is performed
recursively for all nodes in the tree.

In the above example of binary tree, first we try to visit left child of root node 'A', but A's left child is a
root node for left subtree. so we try to visit its (B's) left child 'D' and again D is a root for subtree with
nodes D, I and J. So we try to visit its left child 'I' and it is the left most child. So first we visit 'I' then go
for its root node 'D' and later we visit D's right child 'J'. With this we have completed the left part of
node B. Then visit 'B' and next B's right child 'F' is visited. With this we have completed left part of
node A. Then visit root node 'A'. With this we have completed left and root parts of node A. Then we
go for right part of the node A. In right of A again there is a subtree with root C. So go for left child of C
and again it is a subtree with root G. But G does not have left part so we visit 'G' and then visit G's right
child K. With this we have completed the left part of node C. Then visit root node 'C' and next visit C's
right child 'H' which is the right most child in the tree so we stop the process.

That means here we have visited in the order of I - D - J - B - F - A - G - K - C - H using In-Order

Traversal.

In-Order Traversal for above example of binary tree is

I-D-J-B-F-A-G-K-C-H

2. Pre - Order Traversal ( root - leftChild - rightChild )

In Pre-Order traversal, the root node is visited before left child and right child nodes. In this traversal,
the root node is visited first, then its left child and later its right child. This pre-order traversal is
applicable for every root node of all subtrees in the tree.

In the above example of binary tree, first we visit root node 'A' then visit its left child 'B' which is a root
for D and F. So we visit B's left child 'D' and again D is a root for I and J. So we visit D's left child 'I'
which is the left most child. So next we go for visiting D's right child 'J'. With this we have completed
root, left and right parts of node D and root, left parts of node B. Next visit B's right child 'F'. With this
we have completed root and left parts of node A. So we go for A's right child 'C' which is a root node
for G and H. After visiting C, we go for its left child 'G' which is a root for node K. So next we visit left
of G, but it does not have left child so we go for G's right child 'K'. With this we have completed node
C's root and left parts. Next visit C's right child 'H' which is the right most child in the tree. So we stop
the process.

That means here we have visited in the order of A-B-D-I-J-F-C-G-K-H using Pre-Order Traversal.
42
Pre-Order Traversal for above example binary tree is
A-B-D-I-J-F-C-G-K-H

2. Post - Order Traversal ( leftChild - rightChild - root )

In Post-Order traversal, the root node is visited after left child and right child. In this traversal, left child
node is visited first, then its right child and then its root node. This is recursively performed until the
right most node is visited.

Here we have visited in the order of I - J - D - F - B - K - G - H - C - A using Post-Order Traversal.

Post-Order Traversal for above example binary tree is
I-J-D-F-B-K-G-H-C-A
BINARY TREE ADT
In a normal tree, every node can have any number of children. Binary tree is a special type of
tree data structure in which every node can have a maximum of 2 children. One is known as left child
and the other is known as right child.
A tree in which every node can have a maximum of two children is called as Binary Tree.
In a binary tree, every node can have either 0 children or 1 child or 2 children but not more than 2
children.
Example

There are different types of binary trees and they are...

1. Strictly Binary Tree
In a binary tree, every node can have a maximum of two children. But in strictly binary tree, every node
should have exactly two children or none. That means every internal node must have exactly two
children. A strictly Binary Tree can be defined as follows...
A binary tree in which every node has either two or zero number of children is called Strictly Binary
Tree
Strictly binary tree is also called as Full Binary Tree or Proper Binary Tree or 2-Tree

Strictly binary tree data structure is used to represent mathematical expressions.

43
Example

2. Complete Binary Tree

In a binary tree, every node can have a maximum of two children. But in strictly binary tree, every node
should have exactly two children or none and in complete binary tree all the nodes must have exactly
two children and at every level of complete binary tree there must be 2level number of nodes. For example
at level 2 there must be 22 = 4 nodes and at level 3 there must be 23 = 8 nodes.

A binary tree in which every internal node has exactly two children and all leaf nodes are at same level
is called Complete Binary Tree.Complete binary tree is also called as Perfect Binary Tree

3. Extended Binary Tree

A binary tree can be converted into Full Binary tree by adding dummy nodes to existing nodes wherever
required.
The full binary tree obtained by adding dummy nodes to a binary tree is called as Extended Binary Tree.

In above figure, a normal binary tree is converted into full binary tree by adding dummy nodes (In pink
colour).

44
EXPRESSION TREES
Expression tree is a binary tree in which each internal node corresponds to operator and each leaf node
corresponds to operand so for example expression tree for 3 + ((5+9)*2) would be:

Inorder traversal of expression tree produces infix version of given postfix expression (same with
preorder traversal it gives prefix expression)

Evaluating the expression represented by expression tree:

Let t be the expression tree

If t is not null then
If t.value is operand then
Return t.value
A = solve(t.left)
B = solve(t.right)

// calculate applies operator 't.value'

// on A and B, and returns value
Return calculate(A, B, t.value)

Construction of Expression Tree:

Now For constructing expression tree we use a stack. We loop through input expression and do
following for every character.
1) If character is operand push that into stack
2) If character is operator pop two values from stack make them its child and push current node again.
At the end only element of stack will be root of expression tree.

APPLICATIONS OF TREES

Unlike Array and Linked List, which are linear data structures, tree is hierarchical (or non-linear) data
structure.

1) One reason to use trees might be because you want to store information that naturally forms a
hierarchy. For example, the file system on a computer:

file system
———–

/ <-- root
/ \
... home
/ \
ugrad course
/ / | \
45
... cs101 cs112 cs113

2) If we organize keys in form of a tree (with some ordering e.g., BST), we can search for a given key in
moderate time (quicker than Linked List and slower than arrays). Self-balancing search trees like AVL
and Red-Black trees guarantee an upper bound of O(Logn) for search.

3) We can insert/delete keys in moderate time (quicker than Arrays and slower than Unordered Linked
Lists). Self-balancing search trees like AVL and Red-Black trees guarantee an upper bound of O(Logn)
for insertion/deletion.

4) Like Linked Lists and unlike Arrays, Pointer implementation of trees don’t have an upper limit on
number of nodes as nodes are linked using pointers.

BINARY SEARCH TREE ADT

In a binary tree, every node can have maximum of two children but there is no order of nodes based on
their values. In binary tree, the elements are arranged as they arrive to the tree, from top to bottom and
left to right.

A binary tree has the following time complexities...

1. Search Operation - O(n)

2. Insertion Operation - O(1)
3. Deletion Operation - O(n)

To enhance the performance of binary tree, we use special type of binary tree known as Binary Search
Tree. Binary search tree mainly focus on the search operation in binary tree. Binary search tree can be
defined as follows...

Binary Search Tree is a binary tree in which every node contains only smaller values in its left subtree
and only larger values in its right subtree.

In a binary search tree, all the nodes in left subtree of any node contains smaller values and all the nodes
in right subtree of that contains larger values as shown in following figure...

46
Example

The following tree is a Binary Search Tree. In this tree, left subtree of every node contains nodes with
smaller values and right subtree of every node contains larger values.

Every Binary Search Tree is a binary tree but all the Binary Trees need not to be binary search
trees.

Operations on a Binary Search Tree

The following operations are performed on a binary earch tree...

1. Search
2. Insertion
3. Deletion

Search Operation in BST

In a binary search tree, the search operation is performed with O(log n) time complexity. The search
operation is performed as follows...

 Step 1: Read the search element from the user

47
 Step 2: Compare, the search element with the value of root node in the tree.
 Step 3: If both are matching, then display "Given node found!!!" and terminate the function
 Step 4: If both are not matching, then check whether search element is smaller or larger than that
node value.
 Step 5: If search element is smaller, then continue the search process in left subtree.
 Step 6: If search element is larger, then continue the search process in right subtree.
 Step 7: Repeat the same until we found exact element or we completed with a leaf node
 Step 8: If we reach to the node with search value, then display "Element is found" and terminate
the function.
 Step 9: If we reach to a leaf node and it is also not matching, then display "Element not found"
and terminate the function.

Insertion Operation in BST

In a binary search tree, the insertion operation is performed with O(log n) time complexity. In binary
search tree, new node is always inserted as a leaf node. The insertion operation is performed as
follows...

 Step 1: Create a newNode with given value and set its left and right to NULL.
 Step 2: Check whether tree is Empty.
 Step 3: If the tree is Empty, then set set root to newNode.
 Step 4: If the tree is Not Empty, then check whether value of newNode is smaller or larger
than the node (here it is root node).
 Step 5: If newNode is smaller than or equal to the node, then move to its left child. If newNode
is larger than the node, then move to its right child.
 Step 6: Repeat the above step until we reach to a leaf node (e.i., reach to NULL).
 Step 7: After reaching a leaf node, then isert the newNode as left child if newNode is smaller
or equal to that leaf else insert it as right child.

Deletion Operation in BST

In a binary search tree, the deletion operation is performed with O(log n) time complexity. Deleting a
node from Binary search tree has follwing three cases...

 Case 1: Deleting a Leaf node (A node with no children)

 Case 2: Deleting a node with one child
 Case 3: Deleting a node with two children

Case 1: Deleting a leaf node

We use the following steps to delete a leaf node from BST...

 Step 1: Find the node to be deleted using search operation

 Step 2: Delete the node using free function (If it is a leaf) and terminate the function.

Case 2: Deleting a node with one child

We use the following steps to delete a node with one child from BST...

 Step 1: Find the node to be deleted using search operation

 Step 2: If it has only one child, then create a link between its parent and child nodes.
 Step 3: Delete the node using free function and terminate the function.

48
Case 3: Deleting a node with two children

We use the following steps to delete a node with two children from BST...

 Step 1: Find the node to be deleted using search operation

 Step 2: If it has two children, then find the largest node in its left subtree (OR) the smallest
node in its right subtree.
 Step 3: Swap both deleting node and node which found in above step.
 Step 4: Then, check whether deleting node came to case 1 or case 2 else goto steps 2
 Step 5: If it comes to case 1, then delete using case 1 logic.
 Step 6: If it comes to case 2, then delete using case 2 logic.
 Step 7: Repeat the same process until node is deleted from the tree.

Example

Construct a Binary Search Tree by inserting the following sequence of numbers...

10,12,5,4,20,8,7,15 and 13

Above elements are inserted into a Binary Search Tree as follows...

THREADED BINARY TREES

A binary tree is represented using array representation or linked list representation. When a
binary tree is represented using linked list representation, if any node is not having a child we use NULL
pointer in that position. In any binary tree linked list representation, there are more number of NULL
pointer than actual pointers. Generally, in any binary tree linked list representation, if there are 2N
number of reference fields, then N+1 number of reference fields are filled with NULL ( N+1 are NULL
49
out of 2N ). This NULL pointer does not play any role except indicating there is no link (no child).

A. J. Perlis and C. Thornton have proposed new binary tree called "Threaded Binary Tree",
which make use of NULL pointer to improve its traversal processes. In threaded binary tree, NULL
pointers are replaced by references to other nodes in the tree, called threads.

Threaded Binary Tree is also a binary tree in which all left child pointers that are NULL (in
Linked list representation) points to its in-order predecessor, and all right child pointers that are NULL
(in Linked list representation) points to its in-order successor. If there is no in-order predecessor or in-
order successor, then it point to root node.

To convert above binary tree into threaded binary tree, first find the in-order traversal of that tree...
In-order traversal of above binary tree...
H-D-I-B-E-A-F-J-C-G
When we represent above binary tree using linked list representation, nodes H, I, E, F, J and G left
child pointers are NULL. This NULL is replaced by address of its in-order predecessor, respectively (I
to D, E to B, F to A, J to F and G to C), but here the node H does not have its in-order predecessor, so it
points to the root node A. And nodes H, I, E, J and G right child pointers are NULL. This NULL
ponters are replaced by address of its in-order successor, respectively (H to D, I to B, E to A, and J to
C), but here the node G does not have its in-order successor, so it points to the root node A.

Above example binary tree become as follows after converting into threaded binary tree.

In above figure threaded are indicated with dotted links.

50
AVL TREES

What if the input to binary search tree comes in a sorted (ascending or descending) manner? It will then
look like this −

It is observed that BST's worst-case performance is closest to linear search algorithms, that is Ο(n). In
real-time data, we cannot predict data pattern and their frequencies. So, a need arises to balance out the
existing BST.

Named after their inventor Adelson, Velski & Landis, AVL trees are height balancing binary search
tree. AVL tree checks the height of the left and the right sub-trees and assures that the difference is not
more than 1. This difference is called the Balance Factor.

Here we see that the first tree is balanced and the next two trees are not balanced −

In the second tree, the left subtree of C has height 2 and the right subtree has height 0, so the difference
is 2. In the third tree, the right subtree of A has height 2 and the left is missing, so it is 0, and the
difference is 2 again. AVL tree permits difference (balance factor) to be only 1.

BalanceFactor = height(left-sutree) − height(right-sutree)

51
If the difference in the height of left and right sub-trees is more than 1, the tree is balanced using some
rotation techniques.

AVL Rotations

To balance itself, an AVL tree may perform the following four kinds of rotations −

 Left rotation
 Right rotation
 Left-Right rotation
 Right-Left rotation

The first two rotations are single rotations and the next two rotations are double rotations. To have an
unbalanced tree, we at least need a tree of height 2. With this simple tree, let's understand them one by
one.

Left Rotation

If a tree becomes unbalanced, when a node is inserted into the right subtree of the right subtree, then we
perform a single left rotation −

In our example, node A has become unbalanced as a node is inserted in the right subtree of A's right
subtree. We perform the left rotation by making A the left-subtree of B.

Right Rotation

AVL tree may become unbalanced, if a node is inserted in the left subtree of the left subtree. The tree
then needs a right rotation.

52
As depicted, the unbalanced node becomes the right child of its left child by performing a right rotation.

Left-Right Rotation

Double rotations are slightly complex version of already explained versions of rotations. To understand
them better, we should take note of each action performed while rotation.

Let's first check how to perform Left-Right rotation. A left-right rotation is a combination of left rotation
followed by right rotation.

State Action

A node has been inserted into the right subtree of the left subtree. This
makes C an unbalanced node. These scenarios cause AVL tree to
perform left-right rotation.

We first perform the left rotation on the left subtree of C. This makes A,
the left subtree of B.

Node C is still unbalanced, however now, it is because of the left-subtree

of the left-subtree.

We shall now right-rotate the tree, making B the new root node of this
subtree. C now becomes the right subtree of its own left subtree.

53
The tree is now balanced.

Right-Left Rotation

The second type of double rotation is Right-Left Rotation. It is a combination of right rotation followed
by left rotation.

State Action

A node has been inserted into the left subtree of the right subtree. This
makes A, an unbalanced node with balance factor 2.

First, we perform the right rotation along C node, making C the right
subtree of its own left subtree B. Now, B becomes the right subtree of A.

Node A is still unbalanced because of the right subtree of its right

subtree and requires a left rotation.

54
A left rotation is performed by making B the new root node of the
subtree. A becomes the left subtree of its right subtree B.

The tree is now balanced.

Operations on an AVL Tree

The following operations are performed on an AVL tree...
1. Search
2. Insertion
3. Deletion

Search Operation in AVL Tree

In an AVL tree, the search operation is performed with O(log n) time complexity. The search operation
is performed similar to Binary search tree search operation. We use the following steps to search an
element in AVL tree...

 Step 1: Read the search element from the user

 Step 2: Compare, the search element with the value of root node in the tree.
 Step 3: If both are matching, then display "Given node found!!!" and terminate the function
 Step 4: If both are not matching, then check whether search element is smaller or larger than that
node value.
 Step 5: If search element is smaller, then continue the search process in left subtree.
 Step 6: If search element is larger, then continue the search process in right subtree.
 Step 7: Repeat the same until we found exact element or we completed with a leaf node
 Step 8: If we reach to the node with search value, then display "Element is found" and terminate
the function.
 Step 9: If we reach to a leaf node and it is also not matching, then display "Element not found"
and terminate the function.

Insertion Operation in AVL Tree

In an AVL tree, the insertion operation is performed with O(log n) time complexity. In AVL Tree, new
node is always inserted as a leaf node. The insertion operation is performed as follows...

 Step 1: Insert the new element into the tree using Binary Search Tree insertion logic.
 Step 2: After insertion, check the Balance Factor of every node.
 Step 3: If the Balance Factor of every node is 0 or 1 or -1 then go for next operation.
 Step 4: If the Balance Factor of any node is other than 0 or 1 or -1 then tree is said to be
imbalanced. Then perform the suitable Rotation to make it balanced. And go for next operation.

55
Example: Construct an AVL Tree by inserting numbers from 1 to 8.

56
B-TREE
In a binary search tree, AVL Tree, Red-Black tree etc., every node can have only one value (key)
and maximum of two children but there is another type of search tree called B-Tree in which a node can
store more than one value (key) and it can have more than two children. B-Tree was developed in the
year of 1972 by Bayer and McCreight with the name Height Balanced m-way Search Tree. Later it
was named as B-Tree.

B-Tree can be defined as follows...

B-Tree is a self-balanced search tree with multiple keys in every node and more than two children for
every node.

57
Here, number of keys in a node and number of children for a node is depend on the order of the B-Tree.
Every B-Tree has order.

B-Tree of Order m has the following properties...

 Property #1 - All the leaf nodes must be at same level.

 Property #2 - All nodes except root must have at least [m/2]-1 keys and maximum of m-1 keys.
 Property #3 - All non leaf nodes except root (i.e. all internal nodes) must have at least m/2
children.
 Property #4 - If the root node is a non leaf node, then it must have at least 2 children.
 Property #5 - A non leaf node with n-1 keys must have n number of children.
 Property #6 - All the key values within a node must be in Ascending Order.

For example, B-Tree of Order 4 contains maximum 3 key values in a node and maximum 4 children for
a node.

Example

Operations on a B-Tree

The following operations are performed on a B-Tree...

1. Search
2. Insertion
3. Deletion

Search Operation in B-Tree

In a B-Ttree, the search operation is similar to that of Binary Search Tree. In a Binary search tree, the
search process starts from the root node and every time we make a 2-way decision (we go to either left
subtree or right subtree). In B-Tree also search process starts from the root node but every time we make
n-way decision where n is the total number of children that node has. In a B-Ttree, the search operation
is performed with O(log n) time complexity. The search operation is performed as follows...

 Step 1: Read the search element from the user

 Step 2: Compare, the search element with first key value of root node in the tree.
 Step 3: If both are matching, then display "Given node found!!!" and terminate the function
 Step 4: If both are not matching, then check whether search element is smaller or larger than that
key value.
 Step 5: If search element is smaller, then continue the search process in left subtree.
 Step 6: If search element is larger, then compare with next key value in the same node and
repeate step 3, 4, 5 and 6 until we found exact match or comparision completed with last key
value in a leaf node.

58
 Step 7: If we completed with last key value in a leaf node, then display "Element is not found"
and terminate the function.

Insertion Operation in B-Tree

In a B-Tree, the new element must be added only at leaf node. That means, always the new keyValue is
attached to leaf node only. The insertion operation is performed as follows...

 Step 1: Check whether tree is Empty.

 Step 2: If tree is Empty, then create a new node with new key value and insert into the tree as a
root node.
 Step 3: If tree is Not Empty, then find a leaf node to which the new key value cab be added
using Binary Search Tree logic.
 Step 4: If that leaf node has an empty position, then add the new key value to that leaf node by
maintaining ascending order of key value within the node.
 Step 5: If that leaf node is already full, then split that leaf node by sending middle value to its
parent node. Repeat tha same until sending value is fixed into a node.
 Step 6: If the spilting is occuring to the root node, then the middle value becomes new root node
for the tree and the height of the tree is increased by one.

Example

Construct a B-Tree of Order 3 by inserting numbers from 1 to 10.

59
60
61
B+ TREE

A B+ tree is an N-ary tree with a variable but often large number of children per node. A B+ tree
consists of a root, internal nodes and leaves. The root may be either a leaf or a node with two or more
children.

A B+ tree can be viewed as a B-tree in which each node contains only keys (not key–value pairs), and to
which an additional level is added at the bottom with linked leaves.

The primary value of a B+ tree is in storing data for efficient retrieval in a block-oriented storage
context — in particular, filesystems. This is primarily because unlike binary search trees, B+ trees have
very high fanout (number of pointers to child nodes in a node, typically on the order of 100 or more),
which reduces the number of I/O operations required to find an element in the tree.

A simple B+ tree example linking the keys 1–7 to data values d1-d7. The linked list (red) allows rapid in-
order traversal. This particular tree's branching factor is =4.

A B+-tree maintains the following invariants:

 Every node has one more references than it has keys.

 All leaves are at the same distance from the root.
 For every non-leaf node N with k being the number of keys in N: all keys in the first child's
subtree are less than N's first key; and all keys in the ith child's subtree (2 ≤ i ≤ k) are between
the (i − 1)th key of n and the ith key of n.
 The root has at least two children.
 Every non-leaf, non-root node has at least floor(d / 2) children.
62
 Each leaf contains at least floor(d / 2) keys.
 Every key from the table appears in a leaf, in left-to-right sorted order.

Insertion algorithm

Descend to the leaf where the key fits.

1. If the node has an empty space, insert the key/reference pair into the node.
2. If the node is already full, split it into two nodes, distributing the keys evenly between the two
nodes. If the node is a leaf, take a copy of the minimum value in the second of these two nodes
and repeat this insertion algorithm to insert it into the parent node. If the node is a non-leaf,
exclude the middle value during the split and repeat this insertion algorithm to insert this
excluded value into the parent node.

Initial:

Insert 20:

Insert 13:

Insert 15:

Insert 10:

63
Insert 11:

Insert 12:

Deletion algorithm

Descend to the leaf where the key exists.

1. Remove the required key and associated reference from the node.
2. If the node still has enough keys and references to satisfy the invariants, stop.
3. If the node has too few keys to satisfy the invariants, but its next oldest or next youngest sibling
at the same level has more than necessary, distribute the keys between this node and the
neighbor. Repair the keys in the level above to represent that these nodes now have a different
“split point” between them; this involves simply changing a key in the levels above, without
deletion or insertion.
4. If the node has too few keys to satisfy the invariant, and the next oldest or next youngest sibling
is at the minimum for the invariant, then merge the node with its sibling; if the node is a non-
leaf, we will need to incorporate the “split key” from the parent into our merging. In either case,
we will need to repeat the removal algorithm on the parent node to remove the “split key” that
previously separated these merged nodes — unless the parent is the root and we are removing
the final key from the root, in which case the merged node becomes the new root (and the tree
has become one level shorter than before).

64
Initial:

Delete 13:

Delete 15:

Delete 1:

HEAP

Heap is a special case of balanced binary tree data structure where the root-node key is compared with
its children and arranged accordingly. If α has child node β then −
65
key(α) ≥ key(β)

As the value of parent is greater than that of child, this property generates Max Heap. Based on this
criteria, a heap can be of two types −

For Input → 35 33 42 10 14 19 27 44 26 31

Min-Heap − Where the value of the root node is less than or equal to either of its children.

Max-Heap − Where the value of the root node is greater than or equal to either of its children.

Both trees are constructed using the same input and order of arrival.

Max Heap Construction Algorithm

We shall use the same example to demonstrate how a Max Heap is created. The procedure to create Min
Heap is similar but we go for min values instead of max values.

We are going to derive an algorithm for max heap by inserting one element at a time. At any point of
time, heap must maintain its property. While insertion, we also assume that we are inserting a node in an
already heapified tree.

Step 1 − Create a new node at the end of heap.

Step 2 − Assign new value to the node.
Step 3 − Compare the value of this child node with its parent.
Step 4 − If value of parent is less than child, then swap them.
Step 5 − Repeat step 3 & 4 until Heap property holds.
66
Note − In Min Heap construction algorithm, we expect the value of the parent node to be less than that
of the child node.

Max Heap Deletion Algorithm

Let us derive an algorithm to delete from max heap. Deletion in Max (or Min) Heap always happens at
the root to remove the Maximum (or minimum) value.

Step 1 − Remove root node.

Step 2 − Move the last element of last level to root.
Step 3 − Compare the value of this child node with its parent.
Step 4 − If value of parent is less than child, then swap them.
Step 5 − Repeat step 3 & 4 until Heap property holds.

APPLICATIONS OF HEAP

The heap data structure has many applications.

 Heapsort: One of the best sorting methods being in-place and with no quadratic worst-case
scenarios.
 Selection algorithms: A heap allows access to the min or max element in constant time, and
other selections (such as median or kth-element) can be done in sub-linear time on data that is in
a heap.[16]
 Graph algorithms: By using heaps as internal traversal data structures, run time will be reduced
by polynomial order. Examples of such problems are Prim's minimal-spanning-tree algorithm
and Dijkstra's shortest-path algorithm.
 Priority Queue: A priority queue is an abstract concept like "a list" or "a map"; just as a list can
be implemented with a linked list or an array, a priority queue can be implemented with a heap
or a variety of other methods.
 Order statistics: The Heap data structure can be used to efficiently find the kth smallest (or
largest) element in an array.

67
UNIT IV
NON LINEAR DATA STRUCTURES - GRAPHS
Definition – Representation of Graph – Types of graph - Breadth-first traversal - Depth-first traversal –
Topological Sort – Bi-connectivity – Cut vertex – Euler circuits – Applications of graphs.
___________________________________________________________________________________

DEFINITION
Graph is a non linear data structure, it contains a set of points known as nodes (or vertices) and set of
linkes known as edges (or Arcs) which connets the vertices. A graph is defined as follows...

Graph is a collection of vertices and arcs which connects vertices in the graph.Graph is a collection of
nodes and edges which connects nodes in the graph.

Generally, a graph G is represented as G = ( V , E ), where V is set of vertices and E is set of edges.

Example
The following is a graph with 5 vertices and 6 edges.
This graph G can be defined as G = ( V , E )
Where V = {A,B,C,D,E} and E = {(A,B),(A,C)(A,D),(B,D),(C,D),(B,E),(E,D)}.

Graph Terminology
We use the following terms in graph data structure...

Vertex
A individual data element of a graph is called as Vertex. Vertex is also known as node. In above
example graph, A, B, C, D & E are known as vertices.

Edge
An edge is a connecting link between two vertices. Edge is also known as Arc. An edge is represented
as (startingVertex, endingVertex). For example, in above graph, the link between vertices A and B is
represented as (A,B). In above example graph, there are 7 edges (i.e., (A,B), (A,C), (A,D), (B,D), (B,E),
(C,D), (D,E)).

Edges are three types.

1. Undirected Edge - An undirected egde is a bidirectional edge. If there is a undirected edge between
vertices A and B then edge (A , B) is equal to edge (B , A).
2. Directed Edge - A directed egde is a unidirectional edge. If there is a directed edge between vertices A
and B then edge (A , B) is not equal to edge (B , A).
3. Weighted Edge - A weighted egde is an edge with cost on it.

68
Undirected Graph
A graph with only undirected edges is said to be undirected graph.

Directed Graph
A graph with only directed edges is said to be directed graph.

Mixed Graph
A graph with undirected and directed edges is said to be mixed graph.

End vertices or Endpoints

The two vertices joined by an edge are called the end vertices (or endpoints) of the edge.

Origin
If an edge is directed, its first endpoint is said to be origin of it.

Destination
If an edge is directed, its first endpoint is said to be origin of it and the other endpoint is said to be the
destination of the edge.

Adjacent
If there is an edge between vertices A and B then both A and B are said to be adjacent. In other words,
Two vertices A and B are said to be adjacent if there is an edge whose end vertices are A and B.

Incident
An edge is said to be incident on a vertex if the vertex is one of the endpoints of that edge.

Outgoing Edge
A directed edge is said to be outgoing edge on its orign vertex.

Incoming Edge
A directed edge is said to be incoming edge on its destination vertex.

Degree
Total number of edges connected to a vertex is said to be degree of that vertex.

Indegree
Total number of incoming edges connected to a vertex is said to be indegree of that vertex.

Outdegree
Total number of outgoing edges connected to a vertex is said to be outdegree of that vertex.

Parallel edges or Multiple edges

If there are two undirected edges to have the same end vertices, and for two directed edges to have the
same origin and the same destination. Such edges are called parallel edges or multiple edges.

69
Self-loop
An edge (undirected or directed) is a self-loop if its two endpoints coincide.

Simple Graph
A graph is said to be simple if there are no parallel and self-loop edges.

Path
A path is a sequence of alternating vertices and edges that starts at a vertex and ends at a vertex such
that each edge is incident to its predecessor and successor vertex.

REPRESENTATION OF GRAPH

Graph data structure is represented using following representations...

1. Adjacency Matrix
2. Incidence Matrix
3. Adjacency List

Adjacency Matrix
In this representation, graph can be represented using a matrix of size total number of vertices by total
number of vertices. That means if a graph with 4 vertices can be represented using a matrix of 4X4
class. In this matrix, rows and columns both represents vertices. This matrix is filled with either 1 or 0.
Here, 1 represents there is an edge from row vertex to column vertex and 0 represents there is no edge
from row vertex to column vertex.

For example, consider the following undirected graph representation...

Directed graph representation...

70
Incidence Matrix
In this representation, graph can be represented using a matrix of size total number of vertices by total
number of edges. That means if a graph with 4 vertices and 6 edges can be represented using a matrix of
4X6 class. In this matrix, rows represents vertices and columns represents edges. This matrix is filled
with either 0 or 1 or -1. Here, 0 represents row edge is not connected to column vertex, 1 represents row
edge is connected as outgoing edge to column vertex and -1 represents row edge is connected as
incoming edge to column vertex.

For example, consider the following directed graph representation...

Adjacency List
In this representation, every vertex of graph contains list of its adjacent vertices.

For example, consider the following directed graph representation implemented using linked list...

This representation can also be implemented using array as follows..

71
TYPES OF GRAPH
Various flavours of graphs have the following specializations and particulars about how they are usually drawn.
 Undirected Graphs
In an undirected graph, the order of the vertices in the pairs in the Edge set doesn't matter. Thus, if we
view the sample graph above we could have written the Edge set as {(4,6),(4,5),(3,4),(3,2),(2,5)),(1,2)),
(1,5)}. Undirected graphs usually are drawn with straight lines between the vertices.
The adjacency relation is symetric in an undirected graph, so if u ~ v then it is also the case that
v ~ u.

 Directed Graphs
In a directed graph the order of the vertices in the pairs in the edge set matters. Thus u is adjacent to v
only if the pair (u,v) is in the Edge set. For directed graphs we usually use arrows for the arcs between
vertices. An arrow from u to v is drawn only if (u,v) is in the Edge set. The directed graph below

Has the following parts.

o The underlying set for the Verticies set is capital letters.

o The Vertices set = {A,B,C,D,E}
o The Edge set = {(A,B),(B,C),(D,C),(B,D),(D,B),(E,D),(B,E)}

Note that both (B,D) and (D,B) are in the Edge set, so the arc between B and D is an arrow in
both directions.

 Vertex labeled Graphs

In a labeled graph, each vertex is labeled with some data in addition to the data that identifies the
vertex. Only the indentifying data is present in the pair in the Edge set. This is silliar to the (key,satellite)
data distinction for sorting.

72
Here we have the following parts.

o The underlying set for the keys of the Vertices set is the integers.
o The underlying set for the satellite data is Color.
o The Vertices set = {(2,Blue),(4,Blue),(5,Red),(7,Green),(6,Red),(3,Yellow)}
o The Edge set = {(2,4),(4,5),(5,7),(7,6),(6,2),(4,3),(3,7)}
 Cyclic Graphs

A cyclic graph is a directed graph with at least one cycle. A cycle is a path along the directed edges from
a vertex to itself. The vertex labeled graph above as several cycles. One of them is 2 » 4 » 5 » 7 » 6 » 2

 Edge labeled Graphs

A Edge labeled graph is a graph where the edges are associated with labels. One can indicate this be
making the Edge set be a set of triples. Thus if (u,v,X) is in the edge set, then there is an edge from u to v
with label X

Edge labeled graphs are usually drawn with the labels drawn adjacent to the arcs specifying the
edges.

Here we have the following parts.

o The underlying set for the the Vertices set is Color.

o The underlying set for the edge labels is sets of Color.
o The Vertices set = {Red,Green,Blue,White}
o The Edge set = {(red,white,{white,green}) ,(white,red,{blue}) ,(white,blue,{green,red}) ,(red,blue,
{blue}) ,(green,red,{red,blue,white}) ,(blue,green,{white,green,red})}

 Weighted Graphs

A weighted graph is an edge labeled graph where the labels can be operated on by the usual arithmetic
operators, including comparisons like using less than and greater than. In Haskell we'd say the edge
labels are i the Num class. Usually they are integers or floats. The idea is that some edges may be more
(or less) expensive, and this cost is represented by the edge labels or weight. In the graph below, which
is an undirected graph, the weights are drawn adjacent to the edges and appear in dark purple.

73
Here we have the following parts.

o The underlying set for the the Vertices set is Integer.

o The underlying set for the weights is Integer.
o The Vertices set = {1,2,3,4,5}
o The Edge set = {(1,4,5) ,(4,5,58) ,(3,5,34) ,(2,4,5) ,(2,5,4) ,(3,2,14) ,(1,2,2)}
 Directed Acyclic Graphs

A Dag is a directed graph without cycles. They appear as special cases in CS applications all the time.

Here we have the following parts.

o The underlying set for the the Vertices set is Integer.

o The Vertices set = {1,2,3,4,5,6,7,8}
o The Edge set = {(1,7) ,(2,6) ,(3,1),(3,5) ,(4,6) ,(5,4),(5,2) ,(6,8) ,(7,2),(7,8)}
 Disconnected Graphs

Vertices in a graph do not need to be connected to other vertices. It is legal for a graph to have
disconnected components, and even lone vertices without a single connection.

Vertices (like 5,7,and 8) with only in-arrows are called sinks. Vertices with only out-arrows
(like 3 and 4) are called sources.

Here we have the following parts.

o The underlying set for the the Vertices set is Integer.

o The Vertices set = {1,2,3,4,5,6,7,8}
o The Edge set = {(1,7) ,(3,1),(3,8) ,(4,6) ,(6,5)}

74
BREADTH-FIRST TRAVERSAL (BFS)

Breadth First Search (BFS) algorithm traverses a graph in a breadth ward motion and uses a queue to
remember to get the next vertex to start a search, when a dead end occurs in any iteration.

As in the example given above, BFS algorithm traverses from A to B to E to F first then to C and G
lastly to D. It employs the following rules.

 Rule 1 − Visit the adjacent unvisited vertex. Mark it as visited. Display it. Insert it in a queue.
 Rule 2 − If no adjacent vertex is found, remove the first vertex from the queue.
 Rule 3 − Repeat Rule 1 and Rule 2 until the queue is empty.

Step Traversal Description

1 Initialize the queue.

We start from visiting S (starting node), and

2
mark it as visited.

75
We then see an unvisited adjacent node from
S. In this example, we have three nodes but
3
alphabetically we choose A, mark it as visited
and enqueue it.

Next, the unvisited adjacent node from S is B.

4
We mark it as visited and enqueue it.

Next, the unvisited adjacent node from S is C.

5
We mark it as visited and enqueue it.

Now, S is left with no unvisited adjacent

6
nodes. So, we dequeue and find A.

From A we have D as unvisited adjacent node.

7
We mark it as visited and enqueue it.

At this stage, we are left with no unmarked (unvisited) nodes. But as per the algorithm we keep on
dequeuing in order to get all unvisited nodes. When the queue gets emptied, the program is over.

76
DEPTH-FIRST TRAVERSAL (DFS)

Depth First Search (DFS) algorithm traverses a graph in a depthward motion and uses a stack to
remember to get the next vertex to start a search, when a dead end occurs in any iteration.

As in the example given above, DFS algorithm traverses from S to A to D to G to E to B first, then to F
and lastly to C. It employs the following rules.

 Rule 1 − Visit the adjacent unvisited vertex. Mark it as visited. Display it. Push it in a stack.
 Rule 2 − If no adjacent vertex is found, pop up a vertex from the stack. (It will pop up all the
vertices from the stack, which do not have adjacent vertices.)
 Rule 3 − Repeat Rule 1 and Rule 2 until the stack is empty.

Step Traversal Description

1 Initialize the stack.

Mark S as visited and put it onto the stack.

Explore any unvisited adjacent node from S.
2 We have three nodes and we can pick any of
them. For this example, we shall take the node
in an alphabetical order.

77
Mark A as visited and put it onto the stack.
Explore any unvisited adjacent node from A.
3
Both S and D are adjacent to A but we are
concerned for unvisited nodes only.

Visit D and mark it as visited and put onto the

stack. Here, we have B and C nodes, which
4 are adjacent to D and both are unvisited.
However, we shall again choose in an
alphabetical order.

We choose B, mark it as visited and put onto

5 the stack. Here B does not have any unvisited
adjacent node. So, we pop B from the stack.

We check the stack top for return to the

previous node and check if it has any unvisited
6
nodes. Here, we find D to be on the top of the
stack.

Only unvisited adjacent node is from D is C

7 now. So we visit C, mark it as visited and put
it onto the stack.

As C does not have any unvisited adjacent node so we keep popping the stack until we find a node that
has an unvisited adjacent node. In this case, there's none and we keep popping until the stack is empty.

78
TOPOLOGICAL SORT

Topological sorting for Directed Acyclic Graph (DAG) is a linear ordering of vertices such that for
every directed edge uv, vertex u comes before v in the ordering. Topological Sorting for a graph is not
possible if the graph is not a DAG.

For example, a topological sorting of the following graph is “5 4 2 3 1 0”. There can be more than one
topological sorting for a graph. For example, another topological sorting of the following graph is “4 5 2
3 1 0”. The first vertex in topological sorting is always a vertex with in-degree as 0 (a vertex with no in-
coming edges).

Topological Sorting vs Depth First Traversal (DFS):

In DFS, we print a vertex and then recursively call DFS for its adjacent vertices. In topological sorting,
we need to print a vertex before its adjacent vertices. For example, in the given graph, the vertex ‘5’
should be printed before vertex ‘0’, but unlike DFS, the vertex ‘4’ should also be printed before vertex
‘0’. So Topological sorting is different from DFS. For example, a DFS of the shown graph is “5 2 3 1 0
4”, but it is not a topological sorting

Let's take a graph and see the algorithm in action. Consider the graph given below:

Initially and is empty

79
So, we delete

from and append it to . The vertices directly connected to are and so we decrease their by . So, now and
so is pushed in .

Next we delete
from and append it to . Doing this we decrease by , and now it becomes and is pushed into .

So, we continue doing like this, and further iterations looks like as follows:

80
BI-CONNECTIVITY

A graph is said to be Biconnected if:

1. It is connected, i.e. it is possible to reach every vertex from every other vertex, by a simple path.
2. Even after removing any vertex the graph remains connected.

For example, consider the graph in the following figure

The given graph is clearly connected. Now try removing the vertices one by one and observe. Removing
any of the vertices does not increase the number of connected components. So the given graph is
Biconnected.
Now consider the following graph which is a slight modification in the previous graph.

In the above graph if the vertex 2 is removed, then here's how it will look:

81
Clearly the number of connected components have increased. Similarly, if vertex 3 is removed there will
be no path left to reach vertex 0 from any of the vertices 1, 2, 4 or 5. And same goes for vertex 4 and 1.
Removing vertex 4 will disconnect 1 from all other vertices 0, 2, 3 and 4. So the graph is not
Biconnected.

Now what to look for in a graph to check if it's Biconnected. By now it is said that a graph is
Biconnected if it has no vertex such that its removal increases the number of connected components in
the graph. And if there exists such a vertex then it is not Biconnected. A vertex whose removal increases
the number of connected components is called an Articulation Point.

CUT VERTEX (ARTICULATION POINTS)

A vertex in an undirected connected graph is an articulation point (or cut vertex) iff removing it (and
edges through it) disconnects the graph. Articulation points represent vulnerabilities in a connected
network – single points whose failure would split the network into 2 or more disconnected components.
They are useful for designing reliable networks.
For a disconnected undirected graph, an articulation point is a vertex removing which increases number
of connected components.

Following are some example graphs with articulation points encircled with red color.

How to find all articulation points in a given graph?

A simple approach is to one by one remove all vertices and see if removal of a vertex causes
disconnected graph. Following are steps of simple approach for connected graph.

1) For every vertex v, do following

a) Remove v from graph
b) See if the graph remains connected (We can either use BFS or DFS)
c) Add v back to the graph

EULER CIRCUITS
Eulerian Path is a path in graph that visits every edge exactly once. Eulerian Circuit is an Eulerian Path
which starts and ends on the same vertex.

82
How to find whether a given graph is Eulerian or not?
The problem is same as following question. “Is it possible to draw a given graph without lifting pencil
from the paper and without tracing any of the edges more than once”.

A graph is called Eulerian if it has an Eulerian Cycle and called Semi-Eulerian if it has an Eulerian Path.
The problem seems similar to Hamiltonian Path which is NP complete problem for a general graph.
Fortunately, we can find whether a given graph has a Eulerian Path or not in polynomial time. In fact,
we can find it in O(V+E) time.

Following are some interesting properties of undirected graphs with an Eulerian path and cycle. We can
use these properties to find whether a graph is Eulerian or not.

Eulerian Cycle
An undirected graph has Eulerian cycle if following two conditions are true.
….a) All vertices with non-zero degree are connected. We don’t care about vertices with zero degree
because they don’t belong to Eulerian Cycle or Path (we only consider all edges).
….b) All vertices have even degree.

Eulerian Path
An undirected graph has Eulerian Path if following two conditions are true.
….a) Same as condition (a) for Eulerian Cycle
….b) If zero or two vertices have odd degree and all other vertices have even degree. Note that only one
vertex with odd degree is not possible in an undirected graph (sum of all degrees is always even in an
undirected graph)

Note that a graph with no edges is considered Eulerian because there are no edges to traverse.

How does this work?

In Eulerian path, each time we visit a vertex v, we walk through two unvisited edges with one end point
as v. Therefore, all middle vertices in Eulerian Path must have even degree. For Eulerian Cycle, any
vertex can be middle vertex, therefore all vertices must have even degree.

APPLICATIONS OF GRAPHS

Graphs are nothing but connected nodes(vertex). So any network related, routing, finding relation, path
etc related real life applications use graphs.

 Connecting with friends on social media, where each user is a vertex, and when users connect
they create an edge.
 Using GPS/Google Maps/Yahoo Maps, to find a route based on shortest route.
 Google, to search for webpages, where pages on the internet are linked to each other by
hyperlinks; each page is a vertex and the link between two pages is an edge.
 On eCommerce websites relationship graphs are used to show recommendations.
83
84
UNIT V
SEARCHING, SORTING AND HASHING TECHNIQUES
Searching- Linear Search - Binary Search. Sorting - Bubble sort - Selection sort - Insertion sort - Shell
sort – Radix sort. Hashing- Hash Functions – Separate Chaining – Open Addressing – Rehashing –
Extendible Hashing.
___________________________________________________________________________________
SEARCHING
Searching is an operation or a technique that helps finds the place of a given element or value in the
list. Any search is said to be successful or unsuccessful depending upon whether the element that is
being searched is found or not. Some of the standard searching technique that is being followed in data
structure is listed below:

 Linear Search or Sequential Search

 Binary Search

LINEAR SEARCH (SEQUENTIAL SEARCH ALGORITHM)

Linear search algorithm finds given element in a list of elements with O(n) time complexity where n
is total number of elements in the list. This search process starts comparing of search element with the
first element in the list. If both are matching then results with element found otherwise search element is
compared with next element in the list. If both are matched, then the result is "element found".
Otherwise, repeat the same with the next element in the list until search element is compared with last
element in the list, if that last element also doesn't match, then the result is "Element not found in the
list". That means, the search element is compared with element by element in the list.

Linear search is implemented using following steps...

 Step 1: Read the search element from the user

 Step 2: Compare, the search element with the first element in the list.
 Step 3: If both are matching, then display "Given element found!!!" and terminate the function
 Step 4: If both are not matching, then compare search element with the next element in the list.
 Step 5: Repeat steps 3 and 4 until the search element is compared with the last element in the
list.
85
 Step 6: If the last element in the list is also doesn't match, then display "Element not found!!!"
and terminate the function.

Linear Search Program in C Programming Language

#include<stdio.h>
#include<conio.h>
void main(){
int list[20],size,i,sElement;
printf("Enter size of the list: ");
scanf("%d",&size);
printf("Enter any %d integer values: ",size);
for(i = 0; i < size; i++)
scanf("%d",&list[i]);
printf("Enter the element to be Search: ");
scanf("%d",&sElement);
// Linear Search Logic
for(i = 0; i < size; i++)
{
if(sElement == list[i])
{
printf("Element is found at %d index", i);
break;
}
}
if(i == size)
printf("Given element is not found in the list!!!");
getch();
}

BINARY SEARCH
Binary search algorithm finds given element in a list of elements with O(log n) time complexity
where n is total number of elements in the list. The binary search algorithm can be used with only sorted
list of element. That means, binary search can be used only with lkist of element which are already
arraged in a order. The binary search can not be used for list of element which are in random order. This
search process starts comparing of the search element with the middle element in the list. If both are
matched, then the result is "element found". Otherwise, we check whether the search element is smaller
or larger than the middle element in the list. If the search element is smaller, then we repeat the same
process for left sublist of the middle element. If the search element is larger, then we repeat the same
process for right sublist of the middle element. We repeat this process until we find the search element
in the list or until we left with a sublist of only one element. And if that element also doesn't match with
the search element, then the result is "Element not found in the list".
Binary search is implemented using following steps...

 Step 1: Read the search element from the user

 Step 2: Find the middle element in the sorted list
86
 Step 3: Compare, the search element with the middle element in the sorted list.
 Step 4: If both are matching, then display "Given element found!!!" and terminate the function
 Step 5: If both are not matching, then check whether the search element is smaller or larger than
middle element.
 Step 6: If the search element is smaller than middle element, then repeat steps 2, 3, 4 and 5 for
the left sublist of the middle element.
 Step 7: If the search element is larger than middle element, then repeat steps 2, 3, 4 and 5 for the
right sublist of the middle element.
 Step 8: Repeat the same process until we find the search element in the list or until sublist
contains only one element.
 Step 9: If that element also doesn't match with the search element, then display "Element not
found in the list!!!" and terminate the function.

Binary Search Program in C Programming Language

#include<stdio.h>
#include<conio.h>
void main()
{
int first, last, middle, size, i, sElement, list[100];
clrscr();
printf("Enter the size of the list: ");
scanf("%d",&size);
printf("Enter %d integer values in Assending order\n", size);
for (i = 0; i < size; i++)
scanf("%d",&list[i]);
printf("Enter value to be search: ");
scanf("%d", &sElement);
first = 0;
last = size - 1;
middle = (first+last)/2;
while (first <= last) {
if (list[middle] < sElement)
first = middle + 1;
else if (list[middle] == sElement) {
printf("Element found at index %d.\n",middle);
break;
}
else
last = middle - 1;
middle = (first + last)/2;
}
if (first > last)
printf("Element Not found in the list.");
87
getch();
}

SORTING
Sorting refers to arranging data in a particular format. Sorting algorithm specifies the way to
arrange data in a particular order. Most common orders are in numerical or lexicographical order.

The importance of sorting lies in the fact that data searching can be optimized to a very high level, if
data is stored in a sorted manner. Sorting is also used to represent data in more readable formats.
Following are some of the examples of sorting in real-life scenarios −

Telephone Directory − The telephone directory stores the telephone numbers of people sorted
by their names, so that the names can be searched easily.
 Dictionary − The dictionary stores words in an alphabetical order so that searching of any word
becomes easy.
Categories of Sorting

The techniques of sorting can be divided into two categories. These are:
 Internal Sorting
 External Sorting

Internal Sorting: If all the data that is to be sorted can be adjusted at a time in main memory, the
internal sorting method is being performed.

External Sorting: When the data that is to be sorted cannot be accommodated in the memory at the
same time and some has to be kept in auxiliary memory such as hard disk, floppy disk, magnetic tapes
etc, then external sorting methods are performed.

Complexity of sorting algorithm

The complexity of sorting algorithm calculates the running time of a function in which 'n' number of
items are to be sorted. The choice for which sorting method is suitable for a problem depends on several
dependency configurations for different problems. The most noteworthy of these considerations are:

 The length of time spent by the programmer in programming a specific sorting program
 Amount of machine time necessary for running the program
 The amount of memory necessary for running the program

Efficiency of Sorting Techniques

To get the amount of time required to sort an array of 'n' elements by a particular method, the normal
approach is to analyze the method to find the number of comparisons (or exchanges) required by it.
Most of the sorting techniques are data sensitive and so the metrics for them depends on the order in
which they appear in an input array.

Various sorting techniques are analyzed in various cases and named these cases as follows:

 Best case
 Worst case
 Average case

Hence, the result of these cases is often a formula giving the average time required for a particular sort
of size 'n'. Most of the sort methods have time requirements that range from O(nlog n) to O(n2).
88
Types of Sorting Techniques

 Bubble Sort
 Selection Sort
 Merge Sort
 Insertion Sort
 Quick Sort
 Heap Sort

BUBBLE SORT
Bubble Sort Algorithm is used to arrange N elements in ascending order, and for that you have
to begin with 0th element and compare it with the first element. If the 0th element is found greater than
the 1st element then the swapping operation will be performed i.e. the two values will get interchanged.
In this way all the elements of the array gets compared.
Below given figure shows how Bubble Sort works:

Implementation:
#include <stdio.h>
#include <stdbool.h>
#define MAX 10
int list[MAX] = {1,8,4,6,0,3,5,2,7,9};
void display() {
89
int i;
printf("[");
// navigate through all items
for(i = 0; i < MAX; i++) {
printf("%d ",list[i]);
}
printf("]\n");
}
void bubbleSort() {
int temp;
int i,j;
bool swapped = false;
// loop through all numbers
for(i = 0; i < MAX-1; i++) {
swapped = false;
// loop through numbers falling ahead
for(j = 0; j < MAX-1-i; j++) {
printf(" Items compared: [ %d, %d ] ", list[j],list[j+1]);
// check if next number is lesser than current no
// swap the numbers.
// (Bubble up the highest number)
if(list[j] > list[j+1]) {
temp = list[j];
list[j] = list[j+1];
list[j+1] = temp;
swapped = true;
printf(" => swapped [%d, %d]\n",list[j],list[j+1]);
} else {
printf(" => not swapped\n");
}
}
// if no number was swapped that means
// array is sorted now, break the loop.
if(!swapped) {
break;
}
printf("Iteration %d#: ",(i+1));
display();
}
}
void main() {
printf("Input Array: ");
display();
printf("\n");
bubbleSort();
printf("\nOutput Array: ");
display();
}

SELECTION SORT

90
Selection Sort algorithm is used to arrange a list of elements in a particular order (Ascending or
Descending). In selection sort, the first element in the list is selected and it is compared repeatedly with
remaining all the elements in the list. If any element is smaller than the selected element (for Ascending
order), then both are swapped. Then we select the element at second position in the list and it is
compared with remaining all elements in the list. If any element is smaller than the selected element,
then both are swapped. This procedure is repeated till the entire list is sorted.

The selection sort algorithm is performed using following steps...

 Step 1: Select the first element of the list (i.e., Element at first position in the list).
 Step 2: Compare the selected element with all other elements in the list.
 Step 3: For every comparision, if any element is smaller than selected element (for Ascending
order), then these two are swapped.
 Step 4: Repeat the same procedure with next position in the list till the entire list is sorted.

Below given figure shows how Selection Sort works:

Selection Sort Program in C Programming Language

#include<stdio.h>
#include<conio.h>
void main(){
int size,i,j,temp,list[100];
clrscr();
printf("Enter the size of the List: ");
scanf("%d",&size);
printf("Enter %d integer values: ",size);
for(i=0; i<size; i++)
scanf("%d",&list[i]);
//Selection sort logic
for(i=0; i<size; i++){
for(j=i+1; j<size; j++){
if(list[i] > list[j])
{
temp=list[i];
list[i]=list[j];
list[j]=temp;
91
}
}
}
printf("List after sorting is: ");
for(i=0; i<size; i++)
printf(" %d",list[i]);
getch();
}

Complexity of the Insertion Sort Algorithm

To sort a unsorted list with 'n' number of elements we need to make ((n-1)+(n-2)+(n-3)+......+1)
= (n (n-1))/2 number of comparisions in the worst case. If the list already sorted, then it requires 'n'
number of comparisions.
Worst Case : O(n2)
Best Case : Ω(n2)
Average Case : Θ(n2)
INSERTION SORT
Sorting is the process of arranging a list of elements in a particular order (Ascending or Descending).
Insertion sort algorithm arranges a list of elements in a particular order. In insertion sort algorithm,
every iteration moves an element from unsorted portion to sorted portion until all the elements are sorted
in the list.
The insertion sort algorithm is performed using following steps...

 Step 1: Asume that first element in the list is in sorted portion of the list and remaining all
elements are in unsorted portion.
 Step 2: Consider first element from the unsorted list and insert that element into the sorted list in
order specified.
 Step 3: Repeat the above process until all the elements from the unsorted list are moved into the
sorted list.

92
Below given figure shows how Selection Sort works:

Insertion Sort Program in C Programming Language

#include<stdio.h>
#include<conio.h>
void main(){
int size, i, j, temp, list[100];
printf("Enter the size of the list: ");
scanf("%d", &size);
printf("Enter %d integer values: ", size);
for (i = 0; i < size; i++)
scanf("%d", &list[i]);
//Insertion sort logic
for (i = 1; i < size; i++) {
temp = list[i];
j = i - 1;
while ((temp < list[j]) && (j >= 0)) {
list[j + 1] = list[j];
j = j - 1;
}
list[j + 1] = temp;
}
printf("List after Sorting is: ");
for (i = 0; i < size; i++)
printf(" %d", list[i]);
getch();
}

Complexity of the Insertion Sort Algorithm

93
To sort a unsorted list with 'n' number of elements we need to make (1+2+3+......+n-1) = (n (n-
1))/2 number of comparisions in the worst case. If the list already sorted, then it requires 'n' number of
comparisions.
Worst Case : O(n2)
Best Case : Ω(n)
Average Case : Θ(n2)

SHELL SORT
Shell sort is a highly efficient sorting algorithm and is based on insertion sort algorithm. This
algorithm avoids large shifts as in case of insertion sort, if the smaller value is to the far right and has to
be moved to the far left.

This algorithm uses insertion sort on a widely spread elements, first to sort them and then sorts
the less widely spaced elements. This spacing is termed as interval. This interval is calculated based on
Knuth's formula as −
h=h*3+1
where −
h is interval with initial value 1
This algorithm is quite efficient for medium-sized data sets as its average and worst case complexity are
of Ο(n), where n is the number of items.
How Shell Sort Works?
Let us consider the following example to have an idea of how shell sort works. We take the same array
we have used in our previous examples. For our example and ease of understanding, we take the interval
of 4. Make a virtual sub-list of all values located at the interval of 4 positions. Here these values are {35,
14}, {33, 19}, {42, 27} and {10, 44}

We compare values in each sub-list and swap them (if necessary) in the original array. After this step,
the new array should look like this −

Then, we take interval of 2 and this gap generates two sub-lists - {14, 27, 35, 42}, {19, 10, 33, 44}

94
We compare and swap the values, if required, in the original array. After this step, the array should look
like this −

Finally, we sort the rest of the array using interval of value 1. Shell sort uses insertion sort to sort the
array.

We see that it required only four swaps to sort the rest of the array.
Implementation:
#include <stdio.h>
#include <stdbool.h>
#define MAX 7
int intArray[MAX] = {4,6,3,2,1,9,7};
void printline(int count) {
int i;
for(i = 0;i < count-1;i++) {
printf("=");
}
printf("=\n");
}
void display() {

95
int i;
printf("[");
// navigate through all items
for(i = 0;i < MAX;i++) {
printf("%d ",intArray[i]);
}
printf("]\n");
}
void shellSort() {
int inner, outer;
int valueToInsert;
int interval = 1;
int elements = MAX;
int i = 0;
while(interval <= elements/3) {
interval = interval*3 +1;
}
while(interval > 0) {
printf("iteration %d#:",i);
display();
for(outer = interval; outer < elements; outer++) {
valueToInsert = intArray[outer];
inner = outer;
while(inner > interval -1 && intArray[inner - interval]
>= valueToInsert) {
intArray[inner] = intArray[inner - interval];
inner -=interval;
printf(" item moved :%d\n",intArray[inner]);
}
intArray[inner] = valueToInsert;
printf(" item inserted :%d, at position :%d\n",valueToInsert,inner);
}
interval = (interval -1) /3;
i++;
}
}
int main() {
printf("Input Array: ");
display();
printline(50);
shellSort();
printf("Output Array: ");
display();
printline(50);
return 1;
}

RADIX SORT
Radix sort is a small method that many people intuitively use when alphabetizing a large list of names.
Specifically, the list of names is first sorted according to the first letter of each name, that is, the names
are arranged in 26 classes.
Intuitively, one might want to sort numbers on their most significant digit. However, Radix sort
works counter-intuitively by sorting on the least significant digits first. On the first pass, all the numbers
96
are sorted on the least significant digit and combined in an array. Then on the second pass, the entire
numbers are sorted again on the second least significant digits and combined in an array and so on.

Example

Following example shows how Radix sort operates on seven 3-digits number.

Input 1st Pass 2nd Pass 3rd Pass

329 720 720 329
457 355 329 355
657 436 436 436
839 457 839 457
436 657 355 657
720 329 457 720
355 839 657 839

In the above example, the first column is the input. The remaining columns show the list after
successive sorts on increasingly significant digits position. The code for Radix sort assumes that each
element in an array A of n elements has d digits, where digit 1 is the lowest-order digit and d is the
highest-order digit.

Implementation:
void countsort(int arr[],int n,int place)
{
int i,freq[range]={0}; //range for integers is 10 as digits range from 0-9
int output[n];
for(i=0;i<n;i++)
freq[(arr[i]/place)%range]++;
for(i=1;i<range;i++)
freq[i]+=freq[i-1];
97
for(i=n-1;i>=0;i--)
{
output[freq[(arr[i]/place)%range]-1]=arr[i];
freq[(arr[i]/place)%range]--;
}
for(i=0;i<n;i++)
arr[i]=output[i];
}
void radixsort(ll arr[],int n,int maxx) //maxx is the maximum element in the array
{
int mul=1;
while(maxx)
{
countsort(arr,n,mul);
mul*=10;
maxx/=10;
}
}

Analysis
Each key is looked at once for each digit (or letter if the keys are alphabetic) of the longest key.
Hence, if the longest key has m digits and there are n keys, radix sort has order O(m.n). However, if we
look at these two values, the size of the keys will be relatively small when compared to the number of
keys. For example, if we have six-digit keys, we could have a million different records. Here, we see
that the size of the keys is not significant, and this algorithm is of linear complexity O(n).

HASHING
Hashing is a technique to convert a range of key values into a range of indexes of an array. We're going
to use modulo operator to get a range of key values. Consider an example of hash table of size 20, and
the following items are to be stored. Item are in the (key,value) format.

 (1,20) (2,70) (42,80) (4,25) (12,44) (14,32) (17,11) (13,78) (37,98)

Sr.No. Key Hash Array Index

1 1 1 % 20 = 1 1
2 2 2 % 20 = 2 2
3 42 42 % 20 = 2 2
4 4 4 % 20 = 4 4
5 12 12 % 20 = 12 12
6 14 14 % 20 = 14 14
7 17 17 % 20 = 17 17
8 13 13 % 20 = 13 13
9 37 37 % 20 = 17 17

98
Following are the basic primary operations of a hash table.
 Search − Searches an element in a hash table.
 Insert − inserts an element in a hash table.
 delete − Deletes an element from a hash table.

HASH FUNCTIONS
A function that converts a given big phone number to a small practical integer value. The mapped
integer value is used as an index in hash table. In simple terms, a hash function maps a big number or
string to a small integer that can be used as index in hash table.
A good hash function should have following properties
1) Efficiently computable.
2) Should uniformly distribute the keys (Each table position equally likely for each key)
For example for phone numbers a bad hash function is to take first three digits. A better function is
consider last three digits. Please note that this may not be the best hash function. There may be better
ways.

Hash Table: An array that stores pointers to records corresponding to a given phone number. An entry
in hash table is NIL if no existing phone number has hash function value equal to the index for the entry.

Collision Handling: Since a hash function gets us a small number for a big key, there is possibility that
two keys result in same value. The situation where a newly inserted key maps to an already occupied
slot in hash table is called collision and must be handled using some collision handling technique.
Following are the ways to handle collisions:

 Separate Chaining :The idea is to make each cell of hash table point to a linked list of records
that have same hash function value. Chaining is simple, but requires additional memory outside
the table.
 Open Addressing: In open addressing, all elements are stored in the hash table itself. Each table
entry contains either a record or NIL. When searching for an element, we one by one examine
table slots until the desired element is found or it is clear that the element is not in the table.

SEPARATE CHAINING

The idea is to make each cell of hash table point to a linked list of records that have same hash function
value.

Advantages:
1) Simple to implement.
2) Hash table never fills up, we can always add more elements to chain.
3) Less sensitive to the hash function or load factors.
4) It is mostly used when it is unknown how many and how frequently keys may be inserted or deleted.

99
Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85, 92, 73,
101.

Disadvantages:
1) Cache performance of chaining is not good as keys are stored using linked list. Open addressing
provides better cache performance as everything is stored in same table.
2) Wastage of Space (Some Parts of hash table are never used)
) If the chain becomes long, then search time can become O(n) in worst case.
4) Uses extra space for links.

OPEN ADDRESSING
Like separate chaining, open addressing is a method for handling collisions. In Open Addressing,
all elements are stored in the hash table itself. So at any point, size of table must be greater than or equal
to total number of keys (Note that we can increase table size by copying old data if needed).

Insert(k): Keep probing until an empty slot is found. Once an empty slot is found, insert k.

Search(k): Keep probing until slot’s key doesn’t become equal to k or an empty slot is reached.

Delete(k): Delete operation is interesting. If we simply delete a key, then search may fail. So slots of
deleted keys are marked specially as “deleted”.
Insert can insert an item in a deleted slot, but search doesn’t stop at a deleted slot.

Open Addressing is done following ways:

100
a) Linear Probing: In linear probing, we linearly probe for next slot. For example, typical gap between
two probes is 1 as taken in below example also.
let hash(x) be the slot index computed using hash function and S be the table size

If slot hash(x) % S is full, then we try (hash(x) + 1) % S

If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S
If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S

Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85, 92, 73,
101.

Clustering: The main problem with linear probing is clustering, many consecutive elements form
groups and it starts taking time to find a free slot or to search an element.

b) Quadratic Probing We look for i2‘th slot in i’th iteration.

let hash(x) be the slot index computed using hash function.

If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S
If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S

c) Double Hashing We use another hash function hash2(x) and look for i*hash2(x) slot in i’th rotation.

let hash(x) be the slot index computed using hash function.

If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) + 2*hash2(x)) % S
If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) + 3*hash2(x)) % S

Comparison of above three:

Linear probing has the best cache performance, but suffers from clustering. One more advantage of
linear probing is easy to compute.

101
Quadratic probing lies between the two in terms of cache performance and clustering.

Double hashing has poor cache performance but no clustering. Double hashing requires more
computation time as two hash functions need to be computed.

Open Addressing vs. Separate Chaining

Advantages of Chaining :
1) Chaining is Simpler to implement.
2) In chaining, Hash table never fills up, we can always add more elements to chain. In open addressing,
table may become full.
3) Chaining is Less sensitive to the hash function or load factors.
4) Chaining is mostly used when it is unknown how many and how frequently keys may be inserted or
deleted.
5) Open addressing requires extra care for to avoid clustering and load factor.

Advantages of Open Addressing

1) Cache performance of chaining is not good as keys are stored using linked list. Open addressing
provides better cache performance as everything is stored in same table.
2) Wastage of Space (Some Parts of hash table in chaining are never used). In Open addressing, a slot
can be used even if an input doesn’t map to it.
3) Chaining uses extra space for links.

REHASHING (VARIABLE HASHING)

Hash tables offer exceptional performance when not overly full.
This is the traditional dilemma of all array-based data structures:
 Make the table too small, performance degrades and the table may overflow
 Make the table too big, and memory gets wasted.
Rehashing or variable hashing attempts to circumvent this dilemma by expanding the hash table size
whenever it gets too full.

1. Expanding the hash Table

For example, using open addressing (linear probing) on a table of integers with hash(k)=k (assume the
table does an internal % hSize):
We know that performance degrades when λ > 0.5
Solution: rehash when more than half full

So if we have this table, everything is fine.

But if we try to add another element (24), then more than half the slots are occupied…

So we expand the table, and use the hash function to relocate the elements within the larger table…

In this case, I've shown the hash table size doubling, because that's easy to do, despite the fact that it
doesn't lead to prime-number sized tables. If we were going to use quadratic probing, we would
probably keep a table of prime numbers on hand for expansion sizes, and we would probably choose a
set of primes such that each successive prime number was about twice the prior one.
102
2. Saving the Hash Values

The rehashing operation can be quite lengthy. Luckily, it doesn't need to be done very often.
We can speed things up somewhat by storing the hash values in the table elements along with the data
so that we don't need to recompute the hash values. Also, if we structure the table as a vector of pointers
to the hash elements, then during the rehashing we will only be copying pointers, not the entire
(potentially large) data elements.

EXTENDIBLE HASHING
A hash table in which the hash function is the last few bits of the key and the table refers to
buckets. Table entries with the same final bits may use the same bucket. If a bucket overflows, it splits,
and if only one entry referred to it, the table doubles in size. If a bucket is emptied by deletion, entries
using it are changed to refer to an adjoining bucket, and the table may be halved.
Generalization
A hash table that grows to handle more items. The associated hash function must change as the
table grows. Some schemes may shrink the table to save space when items are deleted.

Extendible hashing is a type of hash system which treats a hash as a bit string, and uses a trie for
bucket lookup.[1] Because of the hierarchical nature of the system, re-hashing is an incremental operation
(done one bucket at a time, as needed). This means that time-sensitive applications are less affected by
table growth than by standard full-table rehashes.

Like Linear Hashing, Extendible Hashing is also a dynamic hashing scheme. First let’s talk a little bit
about static and dynamic hashing as I had skipped this part in my previous post.

1. Static Hashing v/s Dynamic Hashing

Static Hashing uses a single hash function, and this hash function is fixed and computes destination
bucket for a given key using the fixed number of locations/buckets in the hash table. This does not mean
that the hash table that uses static hashing can’t be reorganized. It can still be reorganized by adding
more number of buckets to the hash table. This would require:

 A new hash function that embraces the new size of hash table.
 Redistribution of ALL records stored in the hash table. Each record has to be touched and passed
to the hash function to determine the new location/bucket. It is still possible that a record
remains in the same bucket as it was before reorganization. But, hash function computation is
definitely required for all the items in the hash table.

Dynamic Hashing tends to solve such problems:

 It gives the ability to design a hash function that is automatically changed underneath when the
hash table is resized.
 Secondly, there is no need to recalculate the new bucket address for all the records in the hash
table. For example, as explained in Linear Hashing, we split an existing bucket B, create a new
bucket B*, and redistribute B’s contents between B and B*.

103
 This implies that rehash or redistribution is limited only to the particular bucket that is being
split. There is absolutely no need to touch items in all the other buckets in the hash table.

Readers who have read the post on Linear Hashing should already be familiar with this dynamic hashing
scheme. At this point, I would like to request the reader to first go through the post on Linear Hashing,
as I personally consider it to be a bit simpler than Extendible Hashing.

Reading the post on Linear Hashing would lay a give a good background on dynamic hashing and
prepare the reader for small complexities that will be talked about in Extendible Hashing algorithm.
Moreover I refer to linear hashing technique in some parts of the post just to highlight the difference and
also the uniqueness of both the approaches. So its better to read it first.

Now let’s talk about Extendible Hashing which is also another popular Dynamic Hashing method.

2. Extendible Hashing
Extendible Hashing is similar to Linear Hashing in some ways:

 Both are dynamic hashing schemes that allow graceful reorganization of the hash table, and
automatically accommodate this fact in the underlying hash functions.
o By “graceful”, I mean the luxury of not having to recompute the new location of all the
elements in the hash table when the tale is resized. Redistribution or Rehash is limited to
a single bucket.
 Both the schemes have a concept of bucket split. A target bucket B will be split into B and B*
(aka split image of B), and the contents of B will be rehashed between B and B*.
 Hash Function used in both the schemes gives out a hash value (aka binary string), and certain
number of bits are used to determine the index of destination bucket for a given key (and its
value).
 The number of bits “I” used from the hash value is gradually increased as and when the hash
table is resized, and more bucket(s) are added to the hash table.
 I would regard the last two points above as the fundamental principles behind these two dynamic
hashing schemes.

However there are subtle differences between these two schemes when it comes to achieving a
dynamic hashing behavior:

 Linear Hashing tolerates overflow buckets (aka blocks or pages). In other words, it is fine to
have an overflow bucket chain for any given bucket in the hash table.
o Extendible Hashing does not tolerate overflow buckets.
 In Linear Hashing, buckets are split in linear order starting from bucket 0 to bucket n-1, and
that completes a single round of linear hashing. The bucket that is split is not necessarily the
bucket that overflowed.
o In Extendible Hashing, the bucket that is split is always the one that is about to overflow.
This can be any random bucket in the hash table, and there is no given order on
splitting the buckets.
 In Linear Hashing, there is just the hash table array comprising of buckets and per bucket
overflow chain (if any). There is no auxiliary structure or any extra level of indirection.
o In Extendible Hashing, an auxiliary data structure called as bucket directory plays a
fundamental role in establishing the overall technique and algorithm. Each entry in the
directory has a pointer to the main buckets in the hash table array. This gives an extra
level of indirection. Before accessing the bucket, we first need to index the
corresponding directory entry that has a pointer to desired bucket.
o Just hold on to learn why a directory like structure is required in extendible hashing.

104
Bucket Directory

There are couple of strong reasons for using a bucket directory structure in extendible hashing.
 As mentioned earlier, there is no concept of overflow bucket chain in extendible hashing.
This implies that when a given bucket B is full, we can’t resort to creating an overflow bucket
and linking it to the chain of B as we did in linear hashing. The only thing that is possible is to
split B, create a new bucket B*, and rehash contents of B between B and B*. After a split
happens, the obvious question is how does the hash function correctly lookup the items that were
earlier stored in B, but now in B* as a result of split ?
o In Linear Hashing, the split was done in order. So if the split pointer S has moved ahead
of the concerned bucket given by first hash function H, we know that this bucket was
split. Thus we used the second hash function H1 to calculate the correct bucket (B or B*).
o Readers who have gone through the post on Linear Hashing should be able to understand
that the use of second hash function is equivalent to using one more bit from the hash
value to get the correct bucket index.
o We have to answer the same question for Extendible Hashing as well. The problem is
that there is no such thing as a split pointer S, and the buckets are not split in linear order.
Given any random bucket B that is about to overflow, we split it into B and B*. How do
we know that we have to use 1 more bit from the hash value to index the correct bucket ?
o Can we think of an auxiliary structure that embraces the fact that bucket was split, and
points us to the correct bucket always ? This is where bucket directory structure comes
into place.
 Bucket split is very crucial in extendible hashing. In linear hashing when an insert() detects a
full bucket, it is anyways going to complete the operation by creating an overflow bucket and
linking. Subsequently it will see if the condition for split is met or not. If yes, then the bucket
pointed to by S will be split.
o In extendible hashing when an insert() detects a full bucket, there is no way we can
complete the insert at that moment because overflow chains are _NOT_ allowed. So the
immediate action is to split, and then only the new item can be inserted somewhere.
Let’s discuss the data structure in more detail along with some examples.

Global Depth and Local Depth

105
The above diagram shows the data structures for extendible hashing. There is a bucket directory, and the
hash table buckets that store the records. Both the structure can be imagined in the form of arrays. Each
location in the directory array has a pointer to some bucket.

Please do not make any assumptions about the relation between number of pointers from the directory,
and the number of buckets in hash table. The diagram shows a simple 1-1 mapping, but this may not be
true always. This will become clear as you read along.

The diagram clearly shows that any operation get(), put(), delete() on the hash table has to go through an
extra level of indirection which is indexing the directory structure first and retrieving the bucket info.

Given a key K and hash function H, H has to map the key to a directory entry. This is where the global
depth is used. Global Depth is equal to the number of bits used from the hash value generated by H.
Let’s go with the “I” LSBs format as explained in linear hashing. So the integer value formed by “I”
LSBs of the hash value is known as global depth and determines the index into directory structure.

Number of directory entries = 2^I. If 2 bits are used, we have 4 directory entries — 00, 01, 10, 11 as
shown in the diagram.

On the other hand, local depth for bucket B is equal to “J” LSBs used from the hash value. The integer
value formed by “J” LSBs of the hash value for key K really tells the actual number of bits used by the
keys stored in bucket B. The following invariant always holds.

J (local depth) <= I (global depth)

Examples and Algorithms

Start with 2 directory pointers and 2 hash table buckets. This tells that I=1 and also J=0 since the
buckets are empty at the beginning. There is actually no harm in starting with J=I. Each bucket can store
only 2 KV items.

Initial State:

Following put() operations are executed.

 put(4, V)
 put(1, V)
 put(7, V)
 put(2, V)

The steps are:

106
 Do the hash computation H(key) to get the bit string R.
 Use “I” LSBs from R as the directory index D.
 In our case the keys are simple integers, so H(k) = k is fine and R is just the binary
representation of integer key K. However if K is a alpha-numeric value or anything other than
plain integers, then H() has to do some computation to throw out R and this is typical for any
hash function.
 All we need is to get the value of “I” LSBs from R and this will give us D.
 Go to the directory location D, follow the bucket pointer to get the desired bucket B.
 Store the item in bucket B.

For any given bit string R, if we want “I” LSBs from the binary representation of the number, the best
way to do is N%(2^I). This perfectly works here because the directory structure we are trying to index is
always in powers of 2 — 2, 4, 8, 16 etc. This is something also used in Linear Hashing.

 H(4) = 4 = 0100 ; I => 4%2 = 0 ; 0100 (the LSB is highlighted).

 H(1) = 1 = 0001 ; I => 1%2 = 1 ; 0001
 H(7) = 7 = 0111 ; I => 7%2 = 1 ; 0111
 H(2) = 2 = 0010 ; I => 4%2 = 0 ; 0100

Note that as long as the bucket has space to store the item, we do not need any sort of fancy stuff like
splitting etc.

At this time, our hash table is full. J = I = 1. J started out as 0, and was set to the value of I upon first
insert in an empty bucket.

We then do put(22,V)

H(22) = 22, and D = 22%2 = 0 = 00010110 which points to B0. The target bucket is already full, and we
can’t create an overflow chain. We split B0 as follows:

 Split of B0 to [B0, B0*] will create an additional hash table bucket. Two things are required:
o To be able to track the new bucket through directory.
o To be able to correctly locate items after rehash between B0 and B0*.
 We do not have additional space in the directory. With 1 bit we can only have 2 directory index
locations. So we need more than 1 bit which comes by incrementing global depth by 1.
 This causes the directory to double, and give us the following directory locations:
o 00, 01, 10, 11 because now we are using 2 bits from hash value (I = 2).
 Create bucket B0* as the split image of B0.
 Increase the local depth J of B0 by 1, and set this as the local depth of B0* as well. Why is this
step required ?
o It is because the bucket is being split. We will now use 1 more bit to pick up the
destination directory and bucket for the items.
107
o Items stored in B0 and B0* will no longer be stored using the only LSB. 2 LSBs will be
be used henceforth.
 Store a pointer in one of the new directory locations to bucket B0*. How do we determine which
directory location D ? Because we are using 1 more bit from the hash value, D0 which was
pointing to B0 should now have a companion D* location that points to B0*. How do we get
D* ?
o It simple !!. We have added 1 more bit to I, so D0 and D* should point to buckets B0 and
B0* that have the LSB common since earlier I was 1.
o So D* = D0 + 2^(old global depth) = 0 + 2^1 = 2 = bucket directory 10.
 Note that directory pointer to bucket that is split remains unchanged.
 Execute Put(22, V) and do a rehash of keys 2 and 4 since they were earlier stored at B0 and may
move to to B0*.
o H(22) = 22 = 00010110 ; D = 22%(2Î) = 22%4 = 2. Hence directory 10
o H(2) = 2 = 0000010 ; D = 2%(2Î) = 2%4 = 2. Hence directory 10
o H(4) = 4 = 00000100 ; D = 4%(2Î) = 4%4 = 0. Hence directory 00.

The highlighted red/violet bits in the directory location suggests that items stored in the bucket pointed
to by these respective directories share the LSB but not both LSBs.

For example items stored in B0 and B0* share the LSB as 0. Items store in B0 share both the LSBs as
00. Items stored in B0* share both the LSBs as 10.

In the diagram we see bucket B1 is being pointed to by directories 01 and 11.It was already pointed to
by 01 earlier, then why is there a need to create a pointer from new directory location 11 ?

Also why was the local depth of B1 not incremented ?

 The reason is that bucket directory has now doubled in size. “I ” will be used as 2 in all the
subsequent hash computations for put(), get(), delete() operations.
 The items 1 and 7 stored in B1 were stored using I as 1. Now that I is 2, they may or may not
have both the LSBs in common.
 B1 has to stay the same as it is not split. So all the items currently in B1 will remain in B1.
Hence local depth can’t be increased as keys 1 and 7 are stored in their destination bucket B1
using I as 1.
 Since any get() will now use the 2 LSBs, there is a chance that the 2 LSBs for these keys are
“11”, and because there is no pointer from directory 11, reader will miss out the entry.

108
 In the current case, this will happen for key 7 (0111) where D = directory 11.Hence there is a
need to store the pointer. This will become more obvious with the next split example.

Put(23, V) ; 23 = 00010111 D = 23%4 = 3 = directory 11. This points to B1 which is full. We will have
to split it.

 This time we don’t need to double the size of directory as we did during earlier split.
 The directory is already prepared to accommodate the split since split of B1 means using 2 LSBs
instead of 1. The directory is already using 2 LSBs and has 4 locations.
 Create split image of B1 as B1*. Items in B and also the new item 23 will go either to B1 or B1*
depending on whether there 2 LSBs are 01 or 11.
 Global depth I remains the same since directory size is not changed. Local depth J of B1 and
B1* is incremented by 1 since keys stored in these buckets will be using 1 more bit from the
hash value.

Put(18, V) ; H(18) = 18 = 00010010 , D = 18%4 = 2 => directory 10 that points to bucket B2. B2 is full,
and we will split B2.

 Current global depth is 2 and local depth at B2 is also 2. So when B2 splits and we start using 3
bits from the hash value, we can’t really do that without doubling the directory size from 4 to 8
and thus incrementing I to 3.
 A new split image of B2 is created as B2*. Pointers in directory locations are stored as explained
before. Note the multiple pointers to buckets B0, B1, and B3. These buckets aren’t yet split and
still at local depth of 2. But because directory has doubled and I has gone up by 1 bit, we need
these additional pointers.

109
The fundamental concepts behind put() are:

 If the target bucket B has space, store the items there. Simple and sweet !!
 If the target bucket B is full:
o If the local depth J of B is less than global depth I, then split the bucket but there is no
need to double the directory as the directory is already prepared for the split (as J < I).
This is actually the case of multiple pointers to a single bucket.
o If the local depth J of B is equal to global depth I, then double the directory (equivalent to
incrementing I), and split the bucket. Because J==I, there is no way for us to create a split
and increment J without incrementing I. So the directory has to be doubled to
accommodate the split.
o In any case when the bucket B is split, its local depth J is always incremented by 1, and
this will also be the local depth of split image B*.

So in the above diagram:

 If bucket B0, B1, B3 reach to a point of split, the directory size will still remain 8 as the local
depth at these buckets is less than global depth.
 However if buckets B2 or B4 reach a split point, then it would require incrementing I since the
local depth is already equal to global depth and the current state of directory will not be able to
manage the split. The directory will be doubled to 16 locations and I will be 4.

110

TSM Chemistry Teacher Support Material en 7be5ff0b 7505 44ac 9380 585f5b07a2e0
No ratings yet
TSM Chemistry Teacher Support Material en 7be5ff0b 7505 44ac 9380 585f5b07a2e0
121 pages
Exercises of Sets and Functions
From Everand
Exercises of Sets and Functions
Simone Malacrida
No ratings yet
Catálogo de Electroválvulas SMC
100% (1)
Catálogo de Electroválvulas SMC
0 pages
Additional CD3291 UNIT - II New
No ratings yet
Additional CD3291 UNIT - II New
123 pages
Traffic Analysis - LMC-01
67% (3)
Traffic Analysis - LMC-01
15 pages
Consumer Theory
No ratings yet
Consumer Theory
17 pages
50 most powerful Excel Functions and Formulas
From Everand
50 most powerful Excel Functions and Formulas
Andrei Besedin
4/5 (1)
Sankalp 24
No ratings yet
Sankalp 24
1 page
Westock - Ultra Slim Floor Beam (USFB) Design
100% (1)
Westock - Ultra Slim Floor Beam (USFB) Design
20 pages
Data Structures
No ratings yet
Data Structures
74 pages
CS301 Mid-Term Notes 1-22
No ratings yet
CS301 Mid-Term Notes 1-22
125 pages
1
No ratings yet
1
55 pages
Cbjescpl 02
No ratings yet
Cbjescpl 02
10 pages
2 CH 2
No ratings yet
2 CH 2
24 pages
Lists 074456
No ratings yet
Lists 074456
38 pages
DSA-Class 04-ADTs - ArrayList
No ratings yet
DSA-Class 04-ADTs - ArrayList
29 pages
MVD Universal Battery Chargers
0% (1)
MVD Universal Battery Chargers
4 pages
Unit I
No ratings yet
Unit I
94 pages
C Programming and Data Structures - Unit III Notes
No ratings yet
C Programming and Data Structures - Unit III Notes
32 pages
1 - UNIT I (DS) - Merged
No ratings yet
1 - UNIT I (DS) - Merged
35 pages
DataStructures Unit 1
No ratings yet
DataStructures Unit 1
42 pages
03CSE354 - Implement A List Class
No ratings yet
03CSE354 - Implement A List Class
24 pages
Section 13F - Engine Electrical System PDF
No ratings yet
Section 13F - Engine Electrical System PDF
14 pages
Short Bowel Syndrome: Tinjauan Pustaka
No ratings yet
Short Bowel Syndrome: Tinjauan Pustaka
19 pages
List ADT
No ratings yet
List ADT
37 pages
Energy For Sustainable Development: Martin Anyi, Brian Kirke
No ratings yet
Energy For Sustainable Development: Martin Anyi, Brian Kirke
7 pages
CD3291 - Data Structures - Unit 2 - Notes
No ratings yet
CD3291 - Data Structures - Unit 2 - Notes
60 pages
DS Unit 1 2020 12 July 2022
No ratings yet
DS Unit 1 2020 12 July 2022
44 pages
ETAP 21.0.1 - Unbalanced Load Flow Analysis
No ratings yet
ETAP 21.0.1 - Unbalanced Load Flow Analysis
80 pages
DSs&As - Lec 01-03
No ratings yet
DSs&As - Lec 01-03
95 pages
UNIT I Material
No ratings yet
UNIT I Material
72 pages
CH 4 Linked List
No ratings yet
CH 4 Linked List
38 pages
FID1 A, FID1A, Front Signal (2019/20190527 - PPNF2/20170619 - VINCI P12 2019-05-27 13-13-31/033F0201.D)
No ratings yet
FID1 A, FID1A, Front Signal (2019/20190527 - PPNF2/20170619 - VINCI P12 2019-05-27 13-13-31/033F0201.D)
2 pages
DS Unit I
100% (1)
DS Unit I
40 pages
Lecture # 02
No ratings yet
Lecture # 02
24 pages
Linked List4 1725361521928
No ratings yet
Linked List4 1725361521928
46 pages
DELEM Install GB
No ratings yet
DELEM Install GB
81 pages
Recirculation Pump Sizing
No ratings yet
Recirculation Pump Sizing
3 pages
Dsa Notes
No ratings yet
Dsa Notes
199 pages
UNIT 2 - List Structure
No ratings yet
UNIT 2 - List Structure
28 pages
7SR10 Argus Complete Technical Manual
0% (1)
7SR10 Argus Complete Technical Manual
205 pages
I-F Plus: FANUC Series 0
No ratings yet
I-F Plus: FANUC Series 0
3 pages
Unit-II List Adt
No ratings yet
Unit-II List Adt
351 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
41 pages
Cs3301 Data Structures U.I
No ratings yet
Cs3301 Data Structures U.I
33 pages
DS Lecture 6 List
No ratings yet
DS Lecture 6 List
33 pages
PHY210 CHAPTER 5 - THERMAL PHYSICS Students PDF
No ratings yet
PHY210 CHAPTER 5 - THERMAL PHYSICS Students PDF
34 pages
AD Unit 2
No ratings yet
AD Unit 2
28 pages
ICAO Frequency Management Manual
No ratings yet
ICAO Frequency Management Manual
19 pages
Lecture 2 & 3 Single-Linked List
No ratings yet
Lecture 2 & 3 Single-Linked List
25 pages
Geotube Cone Test Prueba de Cono Del Geotube
No ratings yet
Geotube Cone Test Prueba de Cono Del Geotube
12 pages
Ratios Rates
No ratings yet
Ratios Rates
2 pages
Wireless Network Lab 2
No ratings yet
Wireless Network Lab 2
3 pages
Unit 1 - Lists RB Notes
No ratings yet
Unit 1 - Lists RB Notes
34 pages
DS IAT - 1 Answerkey
No ratings yet
DS IAT - 1 Answerkey
20 pages
DSA Unit II
No ratings yet
DSA Unit II
14 pages
Dsa Adt List - Array
No ratings yet
Dsa Adt List - Array
13 pages
Grandstream HTTP API
No ratings yet
Grandstream HTTP API
39 pages
L3 - Array Based List
No ratings yet
L3 - Array Based List
47 pages
Clutches Technical Data
No ratings yet
Clutches Technical Data
7 pages
Dynamic Equilibrium
No ratings yet
Dynamic Equilibrium
4 pages
Sem Preparatin
No ratings yet
Sem Preparatin
42 pages
Automation of Sewage Treatment Plant Using PLC & SCADA: A Major Project Report
No ratings yet
Automation of Sewage Treatment Plant Using PLC & SCADA: A Major Project Report
23 pages
CH 4 Static and Dynamic List
No ratings yet
CH 4 Static and Dynamic List
89 pages
Bridge Course-DataStructures - Unit - 1
No ratings yet
Bridge Course-DataStructures - Unit - 1
17 pages
I B.SC CS DS Unit I
No ratings yet
I B.SC CS DS Unit I
19 pages
Anjana S (20104014) Experiment No. 2 DS Laboratory
No ratings yet
Anjana S (20104014) Experiment No. 2 DS Laboratory
9 pages
Sem Preparatin
No ratings yet
Sem Preparatin
44 pages
15 16 H2 Quantum Physics II Summary
No ratings yet
15 16 H2 Quantum Physics II Summary
1 page
DataStructures Unit 1
No ratings yet
DataStructures Unit 1
44 pages
Lists and Arrays
No ratings yet
Lists and Arrays
16 pages
Cs8391-Data Structures-1844715418-Cs8391 - DS PDF
No ratings yet
Cs8391-Data Structures-1844715418-Cs8391 - DS PDF
346 pages
Link List
No ratings yet
Link List
28 pages
The "Everything We Could Find On Microsoft VBA" List: Microsoft Support Knowledge Base
0% (1)
The "Everything We Could Find On Microsoft VBA" List: Microsoft Support Knowledge Base
3 pages
C-Full Programs 001
No ratings yet
C-Full Programs 001
25 pages
Lecture Notes On List, Stack and Queue
No ratings yet
Lecture Notes On List, Stack and Queue
40 pages
Lecture 9 ESO207 AbstractDataTypeListImplementation
No ratings yet
Lecture 9 ESO207 AbstractDataTypeListImplementation
26 pages
CS3301 - DS Unit 1 New
100% (1)
CS3301 - DS Unit 1 New
23 pages
CS8391 Notes
No ratings yet
CS8391 Notes
346 pages
Cs3301 Data Structures U.I
No ratings yet
Cs3301 Data Structures U.I
33 pages
Cable Laying Specification
No ratings yet
Cable Laying Specification
16 pages
Unit Ii Arrays and List
No ratings yet
Unit Ii Arrays and List
35 pages
Department of Computer Science and Engineering: Cs8391 Data Structure
No ratings yet
Department of Computer Science and Engineering: Cs8391 Data Structure
45 pages
Subject: Data Structures & Algorithms: in The Name of Allah The Most Beneficent The Most Merciful
No ratings yet
Subject: Data Structures & Algorithms: in The Name of Allah The Most Beneficent The Most Merciful
31 pages
Unit I Linear Data Structures - List
No ratings yet
Unit I Linear Data Structures - List
54 pages
Abstract Data Types (Adts) : Applications of Data Structures
No ratings yet
Abstract Data Types (Adts) : Applications of Data Structures
16 pages
Array Implementation of List
No ratings yet
Array Implementation of List
22 pages
CS8391-Data Structures Unit-1 Linear Data Structures-List: 1.1 Introduction For Data Structure
No ratings yet
CS8391-Data Structures Unit-1 Linear Data Structures-List: 1.1 Introduction For Data Structure
9 pages
Data Structure BSIT 3rd Semester
No ratings yet
Data Structure BSIT 3rd Semester
30 pages
Data Structures and Algorithms: (CS210/ESO207/ESO211)
No ratings yet
Data Structures and Algorithms: (CS210/ESO207/ESO211)
21 pages