Data Structures Notes Unit IV
Data Structures Notes Unit IV
I B.TECH – II Semester
Unit-III: Trees
Introduction, Binary Trees, Binary Tree Traversals, Additional Binary Tree Operations,
Binary Search Trees, Counting Binary Trees, Optimal Binary Search Trees, AVL Trees, B-
Trees: B-Trees, B+ Trees.
Text Books:
Reference Books:
***
UNIT-I
Introduction
Algorithm Specification, Performance analysis, Performance Measurement, Arrays: Arrays,
Dynamically Allocated Arrays. Structures and Unions. Sorting: Motivation, Quick sort, How
fast can we sort, Merge sort, Heap sort.
ALGORITHM
Step 1: START
Step 2: READ x, y
Step 3: sum ← x + y
Step 4: WRITE sum
Step 5: STOP
The study of algorithms includes four important active areas such as:
a) How to devise algorithms: Algorithms are designed by using design strategies like
Divide-and-Conquer strategy, Greedy method, Dynamic programming, Branch and
bound etc.,
d) How to test a program: Testing of a program consists of two phases: debugging and
profiling.
Debugging is the process of detecting and correcting the errors while compiling the
programs for proper execution. Profiling/Performance measurement is the process of
executing a correct program on data sets and measuring the time and space it takes to
compute the results.
ALGORITHM TYPES:
In general, the steps in an algorithm can be divided into three basic categories as:
a) Sequence algorithm
b) Selection algorithm
c) Iteration algorithm
a) Sequence algorithm:
A sequence algorithm is a series of steps in sequential order without any break. Here,
instructions are executed from top to bottom without any disturbances.
b) Selection algorithm:
Step 1: START
Step 2: READ x, y and z values
Step 3: IF x>y AND x>z THEN
Max ← x
c) Iteration algorithm:
Step 1: START
Step 2: READ n value
Step 3: rev ← 0
Step 4: Repeat WHILE n > 0
k ← n MOD 10
rev ← rev * 10 + k
n ← n / 10
EndRepeat
Step 5: WRITE rev
Step 6: STOP
RECURSIVE ALGORITHMS
A function calls itself is known as recursion and the function is called as recursive
function. The main advantage of recursion concept is to reduce length of the code.
Example: sum()
{
----
----
sum();
----
----
}
1. Direct recursion
2. Indirect recursion
2. Indirect recursion: A function calls another function, which initiates to call of the initial
function is known as indirect recursion.
While designing recursive concept, we must place two conditions namely base
condition and recursive condition. Base condition is a condition that avoids recursive call
without falling into infinite loop and recursive condition is a condition that calls recursive
procedures. Both conditions use a return statement associated with if / if-else statements.
Examples:
Fact(N) : IF N = 1 THEN
RETURN 1
ELSE
RETURN N*Fact(N-1)
ENDIF
Fib(N) : IF N = 0 OR N = 1 THEN
RETURN N
ELSE
RETURN Fib(N-1) + Fib(N-2)
ENDIF
A(m , n) = n+1 , if m = 0
A(m-1 , 1) , if n = 0
A(m-1 , A(m, n-1)) , otherwise.
In towers of Hanoi problem, three towers A, B and C are given. Consider tower A as source,
B as intermediate and C as destination towers. A finite number of N disks are arranged on the
source tower A in decreasing order of their size from bottom to top.
The main objective of this problem is to move these disks from source tower to
destination tower without violating the condition “decreasing order of their size from
bottom to top” and move only one disk at a time.
2
1
A B C
Source Intermediate Destination
2
1
A B C
Source Intermediate Destination
Let N = 1; then processing operations are : A→C
i.e., If the tower consists of N disks, then the problem statement can be solved by
performing 2N – 1 processing operations.
Step 1: IF N = 1 THEN
WRITE BEG → END
RETURN
ENDIF
Step 2: Call TOWER(N-1, BEG, END, AUX)
WRITE BEG → END
Call TOWER(N-1, AUX, BEG, END)
Step 3: RETURN
Tower(1,A,B,C) = A→C
Tower(2,A,C,B) A→B
Tower(1,C,A,B) = C→B
Tower(3,A,B,C) A→C
Tower(1,B,C,A) = B→A
Tower(2,B,A,C) B→C
Tower(1,A,B,C) = A→C
The time complexity of an algorithm is the amount of computer time it needs to run
for its completion. The space complexity of an algorithm is the amount of memory it needs
to run for its completion.
These complexities are calculated based on the size of the input. With this, analysis
can be divided into three cases as:
Best case analysis: In best case analysis, problem statement takes minimum
number of computations for the given input parameters.
Based on the size of input requirements, complexities can be varied. Hence, exact
representation of time and space complexities is not possible. But they can be shown in some
approximate representation using mathematical notations known as asymptotic notations.
SPACE COMPLEXITY
The process of estimating the amount of memory space to run for its completion is
known as space complexity.
Space complexity S(P) of any problem P is sum of fixed space requirements and
variable space requirements as:
1. Fixed space that is independent of the characteristics (Ex: number, size) of the input
and outputs. It includes the instruction space, space for simple variables and fixed-
size component variables, space for constants and so on.
2. Variable space that consists of the space needed by component variables whose size is
dependent on the particular problem instance being solved, the space needed by the
referenced variables and the recursion stack space.
When analyzing the space complexity of any problem statement, concentrate solely
on estimating the variable space requirements. First determine which instance characteristics
to use to measure the space requirements. Hence, the total space requirement S(P) of any
program can be represented as:
S (P) = C + SP (I)
Where,
C is a constant representing the fixed space requirements and I refer to
instance characteristics.
Here, the instance characteristic is n. Variable terms are n, i and s. So that count
values are treated as n for the list array, one each for n, i, and s.
Therefore Ssum (n) = n + 3.
Here, the instance characteristic is n. Recursion includes space for formal parameters,
local variables, and the return address. So that count values are treated as one for n, one for
return address and one for list[] array. Function works for n+1 times. For every call the three
word counts should be worked.
TIME COMPLEXITY
The process of estimating the amount of computing time to run for its completion is
known as time complexity.
The time T(P) taken by a program P is the sum of its compile time and its run time.
Here,
Compile time is a fixed component and does not depends on the instance
characteristics. Hence,
T(P) ≥ TP(I)
Where, I refer instance characteristic.
In Step count method, determine the number of steps that a program or a function
needs to solve a particular instance by creating a global variable count, which has an initial
value of ‘0’. Now increment count by number of program steps required by each executable
statement. Finally add the total number of times that the count variable is incremented.
Here, count variable is incremented by twice one for addition operation and one for
return statement.
Therefore Tsum = 2.
Here, inside the loop count variable is incremented by two times and the loop is
executed for n times so that it becomes 2n steps. Outside the loop count is incremented by 3
steps.
Therefore Tsum = 2n + 2.
First determine the step count of each statement known as steps/execution simply s/e.
Note down the number of times that each statement is executed known as frequency.
The frequency of a non-executable statement is 0.
Multiply s/e with frequency, gives us the total steps for each statement.
Finally, add these total steps, gives us the step count of the entire function.
Example 1:
Example 2:
Example 3:
Examples:
Based on the size of input requirements, complexities can be varied. Hence, exact
representation of time and space complexities is not possible. But they can be shown in some
approximate representation using mathematical notations known as asymptotic notations.
ASYMPTOTIC NOTATIONS
The function f(n) = O(g(n)) iff there exist two positive constants c and n 0 such that
f(n) ≤ c * g(n) for all n, n ≥ n0.
The graphical representation between n values on X-axis and f(n) values on Y-axis is as
follows:
Y
c*g(n)
Here, the functional value f(n) is always below the
estimated functional value c*g(n). Thus, the function n0
g(n) acts as upper bound value for the function f(n).
f(n)
Hence, Big ‘Oh’ notation is treated as “Upper bounded
Function”. f(n)
n X
Example:
From this,
c=4 g(n) = n and n0 = 2
Hence, the function 3n+2 = O(n) iff there exist two positive constants 4 and 2 such
that 3n+2 ≤ 4n for all n, n ≥ 2.
Example: The function n2 + n + 3 = O(n2) iff there exist two positive constants 2 and 3
such that n2 + n + 3 ≤ 2n2 for all n, n ≥ 3.
In these complexities,
O(1) means constant
O(n) means linear
O(log n) means logarithmic
O(n2) means quadratic
O(n3) means cubic
n
O(2 ) means exponential.
O(1) < O(log n) < O(n) < O(n logn) < O(n2) < O(n3) < - - - - - - - < O(2n) .
The function f(n) = Ω(g(n)) iff there exist two positive constants c and n 0 such that
f(n) ≥ c * g(n) for all n, n ≥ n0.
The graphical representation between n values on X-axis and f(n) values on Y-axis is as
follows:
Y f(n)
Example: n X
From this,
c=3 g(n) = n and n0 = 1
Hence, the function 3n+2 = Ω (n) iff there exist two positive constants 3 and 1 such
that 3n+2 ≥ 3n for all n, n ≥ 1.
The function f(n) = Ө(g(n)) iff there exist three positive constants c 1, c2 and n0 such
that c1 * g(n) ≤ f(n) ≤ c2 * g(n) for all n, n ≥ n0.
The graphical representation between n values on X-axis and f(n) values on Y-axis is as
follows:
Y
c2 * g(n)
Example:
From this,
c1 = 3 c2 = 4 g(n) = n and n0 = 2
Hence, the function 3n+2 = Ө(g(n)) iff there exist three positive constants 3,4 and 2
and n0 such that 3n ≤ 3n+2 ≤ 4n for all n, n ≥ 2.
The function f(n) = o(g(n)) iff there exist two positive constants c and n 0 such that
f(n) < c * g(n) for all n, n ≥ n0.
(OR)
n→∞
The function f(n) = ω(g(n)) iff there exist two positive constants c and n 0 such that
f(n) > c * g(n) for all n, n ≥ n0.
OR
n→∞
PERFORMANCE MEASUREMENT
clock() function:
clock() is a library function that gives the amount of processor time. The time is
returned as a built-in type, clock_t defined in time.h header file. The general format of
clock() library function is:
To get processing time of specific events, apply the function before calling the event
as start time and after completion of the event as end time. Then subtract start time from the
end time. Here, the result is measured as internal processor time; divide it by the number of
clock ticks per second to obtain the result in seconds. In ANSI C, the ticks per second are
held in the built-in constant, CLOCKS_PER_SEC.
time() function:
time() is a library function that gives the amount of processor time measured in
seconds. The time is returned as a built-in type, time_t defined in time.h header file. The
general format of clock() library function is:
To get processing time of specific events, apply the function before calling the event
as start time and after completion of the event as end time. Then subtract start time from the
end time.
#include<stdio.h>
#include<conio.h>
#include<time.h>
main()
{
int p[50],i,n;
double k;
time_t start,end;
clrscr();
printf("\nEnter How Many Values=");
scanf("%d",&n);
start=time(NULL);
printf("\nEnter %d Elements=",n);
for(i=1;i<=n;i++)
scanf("%d",&p[i]);
printf("\nArray Elements Are=\n");
for(i=1;i<=n;i++)
printf(" %d",p[i]);
end=time(NULL);
k=(double)(end-start);
printf("\n\nTime Duration = %lf Seconds",k);
}
ARRAYS
Let ‘m’ is the size of an array, then a single dimensional array can be defined as –
“Single dimensional array is a collection of m homogeneous data elements those are stored in
m successive memory locations”.
Index Values 0 1 2 3 4
K
Let ‘m’ is the row size and ‘n’ is the column size, then a double dimensional array can
be defined as – “Double dimensional array is a collection of m x n homogeneous data
elements those are stored in m x n successive memory locations”.
0 1 2 3
K
0
c) Multi-dimensional arrays
Multidimensional array uses three or more dimensions. Let m1, m2, - - - , mn are the
sizes, then a multidimensional array can be defined as – “Multidimensional array is a
collection of m1 x m2 x - - - - x mn homogeneous data elements those are stored in m1xm2x-
- - - x mn successive memory locations”.
The above example is a three dimensional array. For this, memory allocation will be:
0 1 2
K
0
0
1
0 1 2
0
1
1
2
Visvodaya Technical Academy Page 18
Note:
In arrays concept, the process of specifying array size at the time of its declaration is
referred to as static allocation. In most of the cases, static allocation is not useful to save
memory. For proper utilization, it is better to allocate specific amount of memory at runtime
known as dynamic memory with the help of dynamic memory allocation functions.
1. malloc() function: malloc() function is used to allocate memory for the variables
at run time. The general form of malloc() function is:
Here, malloc() function reserves a single block of memory with the specified size and
returns a pointer of type void. With this, we can assign it to any type of pointer variable. By
default memory location is filled with garbage values. For this, memory allocation will be:
X Garbage Values
10 BYTES
Where,
ptrvariable is a pointer variable of type casttype.
n represents number of blocks.
elesize represents block size.
X = (int*)calloc(5,sizeof(int));
Here, calloc() function allocates multiple blocks of storage space with each of same
size and by default all locations are initialized with ‘0’s. If there is not enough space to
allocate, then it returns a NULL pointer.
X 0 0 0 0 0
10 BYTES
Let ‘m’ is the size of an array, then a single dimensional array can be defined as – “It
is a collection of m homogeneous data elements those are stored in m successive memory
locations”.
K = (int*) calloc(n,sizeof(int));
/* Read and print a single dimensional array using dynamic memory allocation */
#include<stdio.h>
#include<conio.h>
main()
{
int *k,n,i;
clrscr();
printf("\nEnter How Many Elements =");
scanf("%d",&n);
k=(int*)calloc(n,sizeof(int));
printf("\nEnter %d Elements = ",n);
for(i=0;i<n;i++)
scanf("%d",k+i);
printf("\nArray Elements Are =\n");
for(i=0;i<n;i++)
printf(" %d",*(k+i));
}
Let ‘m’ is the row size and ‘n’ is the column size, then a double dimensional array can
be defined as – “It is a collection of m x n homogeneous data elements those are stored in m x
n successive memory locations”.
K[i]=(int*)calloc(n,sizeof(int));
K=(int**)calloc(m,sizeof(int*));
for(i=0 ; i<m ; i++)
K[i] = (int*)calloc(n,sizeof(int));
#include<stdio.h>
#include<conio.h>
main()
{
int *k[3],n,i,j;
clrscr();
printf(“\nEnter How Many Columns =”);
scanf(“%d”,&n);
for(i=0;i<3;i++)
k[i]=(int*)calloc(n,sizeof(int));
printf("\nEnter Array Elements=");
for(i=0;i<3;i++)
{
for(j=0;j<n;j++)
scanf("%d",*(k+i)+j);
}
printf("\nArray Elements Are=\n\n");
for(i=0;i<3;i++)
{
printf("\n");
for(j=0;j<n;j++)
printf(" %d",*(*(k+i)+j));
}
}
STRUCTURES
Structure is a user defined data type and can be defined as “It is a collection of non-
homogeneous / heterogeneous / different data elements that can be grouped together under a
common name”.
Declaration of the structure does not reserve any storage space. Memory is allocated
only at the time of defining a structure variable. The general format of defining a structure
variable is:
BName :
B1
Price :
14 Bytes
‘.’ Dot operator is used to access members of the structure with its structure variable.
Here dot operator is also known as member operator (or) period operator. It forms link
between structure member and structure variable. The general format of accessing a structure
member with structure variable is:
Syntax: structurevariable.member;
Example: B1.Pages;
#include<stdio.h>
#include<conio.h>
struct Book
{
char BName[50];
float Price;
};
main()
{
struct Book B1;
clrscr();
printf("\nEnter Titile of the Book:");
gets(B1.BName);
printf("\nEnter Cost of the Book:");
scanf("%f",&B1.Price);
printf("\nBOOK TITLE:%s",B1.BName);
printf("\nBOOK COST:%.2f RS",B1.Price);
}
Note:
1. A structure can be initialized with a list of values at the time of defining the structure
variable. But, individual member initialization inside the structure declaration is not possible.
The general format of initializing a structure is:
B1.Price = 125.75;
2. Two variables of the same structure type can be copied in the same way as ordinary
variables using assignment operator.
Example: Let s1 and s2 are two structure variables of the same structure type,
and then the details of s1 are copied into s2 as:
s2 = s1;
Here,
Each individual member of s1 is copied into individual member of s2.
3. Direct comparison of one structure variable with another structure variable using
relational operators is not possible.
i.e., Let s1 and s2 are two structure variables of the same structure type, and then the
comparisons like
s1>s2, s1<s2, s1<=s2, s1>=s2, s1==s2 and s1!=s2 are invalid operations.
4. “typedef” keyword is allowed the user to change the name of the existing data types.
typedef keyword is very useful in the case of user-defined data types like structure. While
using typedef keyword with the structure creation, tag of the structure is optional.
Complex k;
SELF-REFERENTIAL STRUCTURES
A structure in which, at least one member that points to same structure type is referred
to as a self-referential structure. Self-referential structures usually require dynamic storage
management functions to explicitly obtain and release memory. The general format of self-
referential structure is:
Node *New;
New = (Node*)malloc(sizeof(Node);
New -> info = 10;
New -> link = NULL;
UNION
Union is also a user-defined data type that is similar to structure. i.e., Union is a
collection of non-homogeneous / heterogeneous / different data type elements that can be
grouped together under a common name.
Members declared inside the union are known as union members (or) union elements.
Union declaration must be ended with a semicolon.
Memory is allocated for the union only at the time of creating union variable. The
general form of creating union variables is:
10 Bytes
In union, compiler selects the member which occupies highest memory, and that
memory is reserved only. So that, all the members of the union are shared that
common memory. It implies that, although a union may contain many members of
different types, it can handle only one member at a time.
Dot operator is used to access member of the union with the union variable. The
general format of accessing union members with union variable is:
Example : X . balance;
#include<stdio.h>
#include<conio.h>
union Account
{
int acno;
char actype[10];
float acbalance;
};
main()
{
union Account x;
clrscr();
printf("\nEnter Account Number:");
scanf("%d",&x.acno);
printf("\nAccount Number:%d",x.acno);
fflush(stdin);
printf("\n\n\nEnter Account Type:");
gets(x.actype);
printf("\nAccount Type:%s",x.actype);
printf("\n\n\nEnter Balance Amount:");
scanf("%f",&x.acbalance);
printf("\nBalance Amount:%.2f RS",x.acbalance);
}
Note:
1. The main difference between a structure and union is in terms of their storage space.
In structure, each member has its own storage location. Whereas. In union all
members shared a common memory. Hence, the main advantage of union is to save
memory compared to the structure with same members.
2. A union can be included within another union. In addition to this, A union may be a
member of a structure and structure may by a member of union. These concepts are
referred as union of structures and structure of unions.
Example:
struct demo1 union demo1
{ {
char x; char x;
int y; int y;
}; };
}K; }K;
SEARCHING
Searching refers to the process of checking whether a given element is present in the
list of elements or not.
In this procedure, consider an array with ‘n’ elements. A specific element ‘key’ is
given to search. Now, we want to find whether the key element is available in the list of n
elements or not. If the search item key is exists, then it refers to successful search i.e.,
element found; otherwise, it refers to unsuccessful search i.e., element not found.
Most important techniques used for search operation are:
1. Linear search
2. Binary search
3. Fibonacci search
4. Interpolation search etc.
In linear search technique, compare the key element with each element of K form
index 1 to index n. At any position i, if K[i] = Key, then return index i value refers to
successful search (Element Found). For the entire procedure K[i] ≠ Key and elements are not
available for further comparison, then return -1 refers to unsuccessful search (Element not
Found). For this, the functional procedure can be placed as:
f(n) =
f(n) =
= X
= O(n)
Therefore, the worst case and average case time complexity of linear search is O (n).
2. BINARY SEARCH
Binary search is another searching technique that takes less time complexity
compared with the linear search technique. Binary search can be applied only on the array
elements which is available in sorted order.
Consider K is an array consists of n elements in sorted order such that K[1] ≤ K[2]
≤ . . . . ≤ K[n]. Suppose an item of information is given to search in the variable Key. Then
binary search procedure works as:
Mid
Where, Low refers to the first index and High refers to last index of the array at the
initial call. Now, the process falls into any one of the following three cases.
Case 1: If Key = K[Mid]; Then the search is successful search i.e., Element Found.
Case 2: If Key > K[Mid]; Then the Key element can appear only in the right half of
the array. So, we reset the Low value as Low = Mid+1 and begin search
again.
Case 3: If Key < K[Mid]; Then the Key element can appear only in the left half of the
array. So, we reset the High value as High = Mid-1 and begin search again.
The above procedure is repeated up to we reach Low > High. When we obtain this
condition, it indicates that the search is unsuccessful search i.e., Element not found.
The complexity is measured by the number of comparisons to locate the search item
in the given array elements.
In binary search, each comparison reduces the size of the array into half. So that
number of comparisons is less compare to linear search. Hence, the worst case and average
case time complexity of binary search is O (log n).
3. INTERPOLATION SEARCH
Interpolation search is another searching technique can be used only when the list of
elements is ordered and uniformly distributed.
The interpolation search is an improvement over binary search for instances, where
the values in a sorted array are uniformly distributed. Binary search always goes to the
middle element to check. On the other hand, interpolation search may go to different
locations according to the value of key being searched.
Let K is an array that consists of n elements from index 0 to n-1. Search element is
given in the variable Key. Then interpolation search procedure works as:
Step 1:In a loop, calculate the value of “Pos” using the formula
Where, Low refers to starting index 0 and High refers to ending index n-1 at the first
call.
Step 2:Compare K[Pos] element with Key element. If it is a match, return Pos
position that refers to element found.
Step 3:If the Key element is less than K[Pos] element, calculate the next position at
the left sub-array by changing High value as High = Pos – 1.
Step 4: If the Key element is greater than K[Pos] element, calculate the next position
at the right sub-array by changing Low value as Low = Pos + 1.
Step 5:Repeat the above procedure until the search element is found or the sub-array
reduces to zero.
#include<stdio.h>
#include<conio.h>
int search(int[],int,int);
main()
{
int K[5]={10,12,13,16,18},n,Key,Pos;
clrscr();
printf("\nEnter the Search Element=");
scanf("%d",&Key);
Pos=search(K,5,Key);
if(Pos==-1)
printf("\nElement Not Found");
else
printf("\nElement Found");
}
int search(int K[5],int n,int Key)
{
int Pos;
int Low=0,High=n-1;
while(Low<=High)
{
if(Low==High)
{
if(K[Low]==Key)
return Low;
else
return -1;
}
Pos=Low+((Key-K[Low])*(High-Low)/(K[High]-K[Low]));
if(K[Pos]==Key)
return Pos;
else if(K[Pos]<Key)
Low=Pos+1;
else
High=Pos-1;
}
return -1;
}
SORTING
Sorting refers to the arrangement of data items either in the ascending (increasing)
order or in the descending (decreasing) order. Some of the most important sorting
techniques are:
a) Bubble Sort
b) Insertion Sort
c) Selection Sort
d) Quick Sort
e) Merge Sort etc.,
Sorting techniques are classified into two types as: Internal sorting techniques and
External sorting techniques.
Sorting that performed in main memory is called internal sorting i.e., in an internal
sort, all the data is held in primary memory during the sorting process. Internal sorting
techniques are used to handle small amount of data.
Examples: Bubble Sort, Insertion Sort, Selection Sort, Quick Sort etc.
Sorting that performed with the interaction of secondary storage devices like disks or
tapes is called external sorting i.e., in an external sort, some part of the data is in primary
memory during the sorting process and the remaining data is in secondary storage devices
that doesn’t fit in the primary memory. External sorting techniques are used to handle large
volume of data.
1) INSERTION SORT
In insertion sort, at each pass such number of data items is placed in sorted order. Let
K is an array that consists of ‘n’ elements from index 1 to index n. Sorting refers to the
process of rearranging the given elements of K in ascending order such that: K[1] ≤ K[2] ≤ . .
. . . . . . ≤ K[n]. For this, insertion sort procedure works as:
Step 1: Select the second data item and compare it with the first data item. If the second data
item is less than the first data item, then insert it before the first data item. Otherwise,
proceed with the next step.
Step 2: Select the third data item and compare it with the second data item. If it is less than
the second data item then compare it with the first data item. If it is less than the first
data item then insert it before the first data item. Otherwise, insert it in between the
first data item and the second data item.
Step 3: Repeat the above steps for n-1 times. At the end of the n-1 th pass, all elements of the
array are available in sorted order.
j = j – 1;
}
K[j+1] = temp;
}
return;
}
Example: Sort the following elements using insertion sort.
12 11 16 20 19
Pass 1: temp = 11
11 < 12 TRUE
-999 12 12 16 20 19
11 < -999 FALSE
-999 11 12 16 20 19
Pass 2: temp = 16
16 < 12 FALSE
-999 11 12 16 20 19
Pass 3: temp = 20
20 < 16 FALSE
-999 11 12 16 20 19
Pass 4: temp = 19
19 < 20 TRUE
-999 11 12 16 20 20
19 < 16 FALSE
-999 11 12 16 19 20
Insertion sort uses two loops. The outer loop executes n-1 times. For each outer loop,
inner loop executes from 0 to current time depending on the relationship between temp value
and the comparing value. On the average comparisons, the inner loop processes the data into
half of the sorted list. Hence, the function f(n) becomes
F(n) = n ((n+1)/2)
= n2/2 + n/2
= O(n2)
The Worst case and Average case time complexity of insertion sort is O(n2).
List insertion sort procedure is a slight variation of insertion sort procedure. In this
procedure, elements are represented as a linked list rather than an array and must be in sorted
order. Initially procedure starts with an empty list or the elements with sorted order.
The basic function used for inserting a new element into the sorted list is:
Here, a new element can be inserted into its appropriate position with O(1) time and
uses insertion sort comparison procedure.
1 2 3 4
10 2 27 3 49 4 78
START END
1 2 3 4
10 2 27 7 49 4 78
START END
36 3
New
Binary insertion sort procedure uses binary search procedure to find the proper
location to insert the new element in every pass. It reduces number of comparisons compared
to normal insertion sort. This procedure reduces to O(log i) time in each iteration sort
process. Program implementation of binary insertion sort procedure is as follows:
#include<stdio.h>
#include<conio.h>
void sort(int[],int);
int search(int[],int,int,int);
main()
{
int k[3]={6,2,10},i;
clrscr();
printf("\nBefore sorting elements are=\n");
for(i=0;i<3;i++)
printf("\t%d",k[i]);
sort(k,3);
printf("\nAfter sorting elements are=\n");
for(i=0;i<3;i++)
printf("\t%d",k[i]);
}
void sort(int k[3],int n)
{
int i,j,select,loc;
for(i=1;i<n;i++)
{
j=i-1;
select=k[i];
loc=search(k,0,j,select);
while(j>=loc)
{
k[j+1]=k[j];
j=j-1;
}
k[j+1]=select;
}
}
Let K is an array that consists of ‘n’ elements from index 1 to index n. Sorting refers
to the process of rearranging the given elements of K in ascending order such that: K[1] ≤
K[2] ≤ . . . . . . . . ≤ K[n]. For this quick sort procedure works as:
The above process refers to one pass. At the end of the pass, the pivot element is
positioned at its sorted position. At this stage, the elements before the pivot element are less
than or equal to pivot element and after the pivot element are greater than or equal to the
pivot element.
Now, the same procedure is repeated on the elements before the pivot element as well
as on the elements after the pivot element.
When all passes are completed, then list of array elements are available in sorted
order.
K[1:5] 12 9 17 16 94
pivot = 12
i=1
j=6 1<6 TRUE
i=2 9 ≥ 12 FALSE
i=3 17 ≥ 12 TRUE
j=5 94 ≤ 12 FALSE
j=4 16 ≤ 12 FALSE
j=3 17 ≤ 12 FALSE
j=2
K[1:5] = 9 12 17 16 94
K[1:1] K[3:5]
Pass 2:
K[3:5] = 17 16 94
pivot = 17
i =3
j=6 3<6 TRUE
i=4 16 ≥ 17 FALSE
i=5 94 ≥ 17 TRUE
j=5 94 ≤ 17 FALSE
j=4
Here, i>j (5 > 4) TRUE
Interchange 16 & 17
K[1:3] = 16 17 94
K[1:5] = 9 12 16 17 94
ALGORITHM
1. The Worst case time complexity of quick sort is O (n2). It occurs when the list of
elements already in sorted order.
Hence, f(0) = 0
f(1) = 0.
f(n) = f(n-1) + n
f(n-1) = f(n-1-1) + n – 1 = f(n-2) + n – 1
f(n-2) = f(n-2-1) + n – 2 = f(n-3) + n – 2
.
.
.
f(1) = f(0) + 1
f(0) = 0
Therefore,
f(n) = n + (n-1) + (n-2) + - - - - - - - - + 1 + 0
= (n(n+1)) / 2
= (n2+n)/2
= (n2 / 2) + (n / 2)
= O(n2)
Hence,
Worst case time complexity of Quick sort is O(n2) time.
2. The Average case time complexity of quick sort is O(n logn), which is less
compare to worst case time complexity.
In this case, the pivot element is positioned at one appropriate location and splits the
array elements into two sub arrays. From this, the recurrence relation is given by the function
The Average case time complexity of quick sort is O(n log n) time.
EXTERNAL SORTING
In external sorting process, some part of the data is held in primary memory during
the sorting process and the remaining data is in secondary storage devices that do not fit in
the primary memory. In general, external sorting techniques are used to handle large volume
of data.
One of the most important external sorting techniques is merge sort technique.
Merging is the process of combining two sorted files data into a single sorted file.
While performing merging operation, the two sub lists must be in sorted order.
Example:
1
File-1 1
3
2
5
3
File-3
MERGE 4
2 5
4 6
File-2
6 8
8
Here, File-1 and File-2 are merged into File-3. For this, compare the first value of
File-1 with the first value of File-2. Here, 1<2 then the value 1 of File-1 write into the output
file File-3. Now, compare the next value of File-1 with File-2 value and so on. Repeat the
same procedure until File-1 and File-2 data are empty. At this stage, entire data is available
in File-3 in sorted order.
For unordered files data, apply Merge sort technique for sorting data that internally
uses merge process. Different variations of merge sort are:
First divide the array elements into two sub arrays based on
Where,
Low is the first index of the array and High is the last index of the array.
Once, the sub arrays are formed, each set is individually sorted and the resulting sub
sequences are merged to produce a single sorted sequence of data elements.
Divide-and-Conquer strategy is applicable as splitting the array into sub arrays; and
combining operation is merging the sub arrays into a single sorted array.
Merging is the process of combining two sorted lists into a single sorted list. While
performing merging operation, the two sub lists must be in sorted order.
Given k[1:5] = 12 9 17 16 94
Then, k[1:5] spitted into two sub arrays as: k[1:3] and k[4:5]
Consider k[1:3] = 12 9 17
Then, k[1:3] spitted into two sub array as: k[1:2] and k[3:3]
Consider k[1:2] = 12 9
Then, k[1:2] spitted into two sub array as: k[1:1] and k[2:2]
Apply Merge operation on k[1:1] and k[2:2], it produces a sorted list k[1:2] as
K[1:2] = 9 12
Apply Merge operation on k[1:2] and k[3:3], it produces a sorted list k[1:3] as
K[1:3] = 9 12 17
Consider k[4:5] = 16 94
Then, k[4:5] spitted into two sub array as: k[4:4] and k[5:5]
Apply Merge operation on k[4:4] and k[5:5], it produces a sorted list k[4:5] as
k[4:5] = 16 94
Apply Merge operation on k[1:3] and k[4:5], it produces a sorted list k[1:5] as
K[1:5] = 9 12 16 17 94
ALGORITHM
MSort(Low, High): Let K is an array that consists of ‘n’ elements from index 1 to
index n. Low refers to the first index 1 and High refers to the last index n at the initial call.
This procedure sorts elements of K in ascending order.
Merge(Low, Mid, High): This procedure merges the two sub sorted arrays into a single
sorted array.
Step 1: h ← Low
i ← Low
j ← Mid+1
Step 2: Repeat WHILE h ≤ Mid AND j ≤ High
IF K[h] ≤ K[j] THEN
S[i]←K[h]
h ← h+1
ELSE
S[i]←K[j]
j ← j+1
ENDIF
i ← i+1
EndRepeat
Step 3: IF h > Mid THEN
Repeat FOR p ← j TO High DO STEPS BY 1
S[i]←K[p]
i ←i+1
EndRepeat
ELSE
Repeat FOR p ← h TO Mid DO STEPS BY 1
S[i]←K[p]
i ←i+1
EndRepeat
ENDIF
Step 4: Repeat FOR p ← Low TO High DO STEPS BY 1
K[p] ← S[p]
EndRepeat
Merge sort consists of several passes over the input. The first pass merges segments
of size1, second pass merges segments of size2, and the i th pass merges segments of size 2i-1.
Thus, the total number of passes is ┌ log 2n ┐.
At each pass, merge process required O(n) time. For ┌ log 2n ┐ passes, the total
computing time becomes O(n log n) time.
The Worst case and Average case time complexity of merge sort is O(n logn) time.
Note:
The main disadvantage of merge sort is its storage representation. In merge sort
technique, merge process required an auxiliary (temporary) array which has same size as the
original array. Hence, it requires more space compared to other sorting techniques.
Iterative merge sort begins by interpreting the input list as comprised of n sorted sub-
lists, each of size 1. In the first merge pass, these sub-lists are merged by pairs to obtain n/2
sub lists, each of size 2. In the second merge pass, these n/2 sub-lists are merged by pairs to
obtain n/4 sub lists. Each merge process reduces the number of sub-lists by half. Merge
passes are continued until the process reached to only one list.
26 , 5 , 77 , 1 , 61 , 11 , 59 , 15 , 48 , 19
For this,
26 5 77 1 61 11 59 15 48 19
5 26 1 77 11 61 15 59 19 48
1 5 26 77 11 15 59 61 19 48
1 5 11 15 26 59 61 77 19 48
1 5 11 15 19 26 48 59 61 77
3) HEAP TREE
Suppose H is a complete binary tree. Then it will be termed as a heap tree / max
heap if it satisfies the property as:
For each node N in H, the value at N is greater than or equal to the value of each of
the children of N.
Example:
9
5
8 4
4 8
5
7 2
6 3
In addition to this a min heap is possible, where the value at N is less than or equal to the
value of each of the children of N.
Example:
2
5
4 7
5 8
5
8 7
3 1
Heap tree is a complete binary tree, it is better to represent with a single dimensional
array. In this case, there is no wastage of memory space between two non-null entries.
Example:
9
5
Visvodaya Technical8Academy 4 Page 46
4 8
7 5 Downloaded by Karan Kamble ([email protected])
lOMoARcPSD|35685630
Array Representation
1 2 3 4 5 6
95 84 48 76 23
2
3
OPERATIONS ON HEAP TREE
Insertion into a heap tree: This operation is used to insert a new element into a
heap tree. Let K is an array that stores n elements of a heap tree. Assume an element of
information is given in the variable ‘key’ for insertion. Then insertion procedure works as:
First adjoin key value at the end of K so that still the tree is a complete binary tree, but
not necessarily a heap.
Then raise the key value to its appropriate position so that finally it is a heap tree.
The basic principle is that first add the data element into the complete binary tree.
Then compare it with the data in its parent node; if the value is greater than then the parent
node then interchange these two values. This procedure will continue between every two
nodes on the path from the newly inserted node to the root node till we get a parent whose
value is greater than its child or we reached at the root node.
14
0
85 45
7 2 3 1
5 5 5 5
55 6
5
Algorithm InHeap (Key): Let K is an array that stores a heap tree with ‘n’
elements. This procedure is used to store a new element Key into the heap tree.
Step 1: n ← n+1
Step 2: K[n] ← Key
Step 3: i←n
p ← i/2
Step 4: Repeat WHILE p > 0 AND K[p] < K[i]
Temp ← K[i]
K[i] ← K[p]
K[p] ← Temp
i←p
p ← p/2
End Repeat
Step 5: Return
14
0
85 45
7 2 3 1
5 5 5 5
55 6
5
Algorithm Deletion (): This procedure removes root element of the heap tree
and rearranges the elements into heap tree format.
Step 1: IF n = 0 THEN
WRITE ‘ HEAP TREE EMPTY’
RETURN
ENDIF
Two important applications of heap trees are: Heap sort and Priority queue
implementations.
Heap Sort:
Heap sort is also known as Tree sort. The procedure of heap sort is as follows:
Step 1: Build a heap tree with the given set of data elements.
Step 2: a) Delete the root node from the heap and place it in last location.
b) Rebuild the heap with the remaining elements.
Step 3: Continue Step-2 until the heap tree is empty.
Example: Sort the elements 33, 14, 65, 2 and 99 using heap sort.
Note: The worst case time complexity of heap sort is O(n logn) time.
END
UNIT-II
DATA STRUCTURES
a) Linear data structures: A data structure is said to be linear if the values are
arranged in a sequence order i.e., in linear fashion. Here, there is a relationship between the
adjacent elements of the list.
Examples: Arrays, Stack, Queues, Linked list etc.
b) Non-linear data structures: A data structure is said to be non-linear if the values are
arranged without any sequence order i.e., in non-linear fashion. Here, there is no relationship
between the adjacent elements of the list.
Examples: Trees, Graphs, Sets, Tables etc.
Data structures are extensively used in different application areas such as:
Compiler design
Operating system
Numerical analysis
Graphics
Artificial intelligence
Simulation etc.
Abstract Data Type (ADT) is a specification for the type of values that a data type can
store and the operations that can be performed on those values. It is just a specification
without any details on the implementation part.
ARRAYS
ADT Array
{
Data members: Collection of same type of data elements stored in successive
memory locations.
OPERATIONS ON ARRAYS
Consider a single dimensional array K that consists of n elements from index 0 to n-1.
Thus, these operations are:
Insertion Operation:
Insertion operation is used to insert a new element into the array at a specified
position. At this stage, first a specified number of elements are shift towards right hand side
location to obtain the specified position location as empty. Then insert the new element at the
specified position.
Algorithm Insertion (Item , Pos): This procedure inserts a new element Item into the
specified position Pos of the array K.
0 1 2 3 4 5 6 7
K 10 47 28 32 70
0 1 2 3 4 5 6 7
K 10 47 36 28 32 70
Deletion Operation:
Deletion operation is used to delete a particular element from the specified position of
the given array elements. At this stage, first remove the element from the specified position
and then shift the elements towards left hand side locations of the array.
Algorithm Deletion (Pos): This procedure deletes a specified position Pos element from
the array K.
0 1 2 3 4 5 6 7
K 10 47 28 32 70
Deletion (3):
0 1 2 3 4 5 6 7
K 10 47 32 70
Deleted Element = 28
Display Operation:
Algorithm Display (): This procedure prints elements of the array K from index 0 to
index n-1.
Step 2: RETURN
0 1 2 3 4 5 6 7
K 10 47 28 32 70
Display() :
APPLICATIONS OF ARRAYS
Arrays are used in different application areas where same type of elements is
necessary to use its computations.
Examples: Student Roll Numbers, Names, Marks etc.
Arrays are useful to represent matrix form of the given homogeneous data elements.
ARRAY DISADVANTAGES
Arrays concept is a static allocation of memory i.e., memory size of the array must be
declared in the declaration statement and can’t be expanded. So that it leads to space
complexity problems.
Insertion and deletion operations are quite time consuming process. Since, in these
two cases elements must be shift to other locations before processing the actual
operations.
STACKS
A stack is a linear data structure in which an element may be inserted or deleted only
at one end, called the TOP of the stack. The diagrammatic representation of a stack is as
follows:
TOP
Insertion 2 Deletion
TOP 40 3
Status of Stack: 30
2 SIZE = 5
20
1
10
0
Here, the order of deleted elements from the stack is reverse order in which they were
inserted into the stack. Hence, a stack is also known as a LIFO (Last-In-First-Out) list.
Since, the last inserted element is the first out coming element of the list.
REPRESENTATION OF STACK
a) Array Representation
Static representation of a stack uses array concept. In this case, assume S is an array
used to represent a stack with a maximum size as ‘N’. Initially no elements are available in
the stack S. As per C language array index starts from 0 th location. At this stage, a variable
TOP is set at -1 position. Then, the status of the stack is as follows:
N-1
|
Status of Stack:
|
1 TOP = -1
Example: Assume N = 4 and two elements 10 and 20 are inserted into the stack,
and then the status of stack is:
2
20
TOP 1
10
0
S
b) Linked Representation
Dynamic representation of a stack uses linked list concept. In general, single linked
list structure is sufficient to represent any stack.
In linked representation of a stack, each node consists of two parts namely DATA field
and LINK field. DATA field contains information part of the element and LINK field
contains address of the next node of the list. Then the linked representation of a stack is as
follows:
1 2 3 4
59 2 38 3 64 4 26
START TOP
ADT Stack
{
Data members: Linear collection of zero or more elements.
Operations: void push(Item): Process of inserting an element Item into
the stack.
int pop(): Process of deleting top element of the stack.
int peek(): Process of printing top element of the stack.
int isEmpty(): Process of checking whether the stack is empty
or not.
int isFull(): Process of checking whether the stack is full or
not.
void display(): Process of printing elements of the stack.
}
OPERATIONS ON STACKS
Basic operations on the stack are: push, pop, peek, empty, full, display etc.
Example: Let N = 3
Status of Stack:
2
1 TOP = -1
Push Operation:
The process of inserting an element into the stack is known as push operation.
At every push operation, first the variable TOP is incremented by 1 and then the new
element is inserted into top position of the stack.
While performing push operation, if empty locations are not available to insert the
element then the status of the stack is called as “STACK OVERFLOW” or “STACK FULL”.
Algorithm push(x): This procedure inserts a new element x into top position of the stack S.
Pop Operation:
The process of deleting an element from the stack is known as pop operation.
At every pop operation, first delete the element from top position of the stack and then
the variable TOP is decremented 1.
While performing pop operation, if elements are not available to delete from the stack
then the status of the stack is called as “STACK UNDERFLOW” or “STACK EMPTY”.
Algorithm pop(): Function is used to delete the topmost element from the stack S.
Peek Operation:
The process of printing the topmost element of the stack is known as peek operation.
Algorithm peek(): Function returns the top element from the stack S if exists;
otherwise, it returns -1.
Empty Operation:
Full operation is used to check whether the stack is full with elements or not.
Display Operation:
APPLICATIONS OF STACK
Stack operations are utilized by the computer system to do various tasks. Some of the
important applications areas are:
Example: 2x + 3y = 74
Infix notation
Prefix notation
Postfix notation.
Infix Notation: In infix notation, operators are placed in between the operands of the
mathematical expression. In general ordinary mathematical expressions are represented in
terms of infix expressions.
Example: A+B
2-7+4/3
(14+25) / (9+2-4) etc.
Prefix Notation: In prefix notation, operators are placed before the operands of the
mathematical expression. Prefix notation is also known as polish notation.
Example: +AB
- + 2 14 9 etc.
Postfix Notation: In postfix notation, operators are placed after the operands of the
mathematical expression. Postfix notation is also known as reverse polish notation.
Example: AB+
5 6 2 + * 12 4 / - etc.
First it converts the given infix expression into postfix expression and then evaluates
the postfix expression. In both stages, stack is the important tool used to perform the task.
OPERATOR PRIORITY
^ 3
* / 2
+ - 1
Then the following procedure converts the given infix expression Q into its equivalent
postfix expression P.
Step 1: Push “(“ onto the STACK, and add “)” to the end of Q.
Step 2: Scan Q from left to right and repeat Steps 3 to 6 for each element of Q
until the STACK is empty:
The above procedure transforms the infix expression Q into its equivalent postfix
expression P. It uses a stack to temporarily hold operators and left parenthesis.
The postfix expression P will be constructed from left to right using the operands
from Q and the operators which are removed from the stack. Procedure starts by pushing a
left parenthesis onto stack and adding a right parenthesis at the end of Q. The procedure is
completed when the stack is empty.
Procedure
Step 2: Scan P from left to right and repeat Steps 3 and 4 for each element of P
until the right parenthesis is encountered.
Step 5: Set result VALUE equal to the top element of the STACK.
Step 6: EXIT.
Example: Evaluate the postfix expression
5 6 2 + * 12 4 / -
Result Value = 37
2) Implementation of Recursion
A function calls itself is called recursion and the function is called as recursion
function. In case of recursive procedures stack is the important tool used by the compiler for
storing addresses temporarily while moving the control from one place to another place.
In recursion procedure, statements are expanded in forward direction and values are
substituted in backward direction.
Here, Steps 1 to 4 are implemented in stack as push operations. Then subsequent pop
operations will evaluate up to step 9 and produces result value.
Stack Advantages:
Insertion and deletion operations can be made easily compared to array insertion and
deletion operations.
Stack operations can be performed very easily and efficiently.
Time complexity of push operation is O(1) time.
Time complexity of pop operation is O(1) time.
Stack Disadvantages:
2 30
1 20
10
0
STACK
Now, if we want to delete the first inserted element 10 direct deletion is not possible.
First delete the element 30, then 20 and then the actual required element 10. After deletion
the actual element 10 again inserts the previous deleted elements 20 and 30 into the stack.
It’s a time consuming process.
***
QUEUES
A queue is a linear data structure in which elements can be inserted only at one end,
called REAR end and elements can be deleted from the other end, called FRONT end of the
list. The diagrammatic representation of a queue is as follows:
FRONT REAR
0 1 2 3 4
Status of Queue:
10 20 30 40 SIZE = 5
FRONT REAR
Here, the order of deleted elements from the queue is the same order in which they
were inserted into the queue. Hence, a queue is also known as a FIFO (First-In-First-Out)
list. Since, the first inserted element is the first out coming element of the list.
REPRESENTATION OF QUEUES
a) ARRAY REPRESENTATION
Static representation of a queue uses array concept. In this case, assume Q is an array
used to represent a queue with a maximum size as ‘N’. Initially no elements are available in
the queue Q. At this stage, two variables FRONT and REAR set at -1 position. Then, the
status of the queue is as follows:
0 1 - - N-1
FRONT = -1
REAR = -1
Q
Example: Assume N = 4 and two elements 10 and 20 are inserted into the queue,
and then the status of queue is:
0 1 2 3
Status of Queue:
10 20 Q SIZE = 4
FRONT REAR
b) LINKED REPRESENTATION
Dynamic representation of a queue uses linked list concept. In general, single linked
list structure is sufficient to represent any queue.
In linked representation of a queue, each node consists of two parts namely DATA
field and LINK field. DATA field contains information part of the element and LINK field
contains address of the next node of the list. Then the linked representation of a queue is as
follows:
1 2 3 4
59 2 38 3 64 4 26
FRONT REAR
ADT Queue
{
Data members: Linear collection of zero or more elements.
TYPES OF QUEUES
Depending on the way of performing operations on the queue data structure, queues
can be classified into four types as:
a) Linear Queue
b) Circular Queue
c) Deque
d) Priority Queue
a) LINEAR QUEUE
A linear queue is a linear data structure in which elements can be inserted only at one end,
called REAR end and elements can be deleted from the other end, called FRONT end of the
list. The diagrammatic representation of a linear queue is as follows:
FRONT REAR
Basic operations on the queue are: enqueue, dequeue, front element, rear element,
empty, full, display etc.
Example: Let N = 3
0 1 2
Status of Queue: F = -1
R = -1
Q
Enqueue Operation:
The process of inserting an element into the queue is known as enqueue operation.
For every enqueue operation, first the variable REAR is incremented by 1 and then
the new element is inserted into REAR position of the queue.
While performing enqueue operation, if empty locations are not available to insert the
element then the status of the queue is called as “QUEUE OVERFLOW” or “QUEUE FULL”.
Algorithm enqueue(x): This procedure inserts a new element x into REAR position of
the queue Q.
Dequeue Operation:
The process of deleting an element from the queue is known as dequeue operation.
For every dequeue operation, first delete an element from FRONT position of the
queue and then the variable FRONT is incremented 1.
While performing dequeue operation, if elements are not available to delete from the
queue then the status of the queue is called as “QUEUE UNDERFLOW” or “QUEUE
EMPTY”.
Algorithm dequeue(): Function is used to delete the FRONT position element
from the queue Q.
Step 1: IF F = -1 THEN
WRITE ‘LINEAR QUEUE UNDERFLOW’
RETURN -1
ENDIF
Step 2: K ← Q[F]
Step 3: IF F = R THEN
F←R←-1
ELSE
F←F+1
ENDIF
Step 4: RETURN K
Front Element Operation: This operation is used to print FRONT position element
of the Queue Q.
Step 1: IF F = -1 THEN
RETURN -1
ELSE
RETURN Q[F]
ENDIF
Rear Element Operation: This operation is used to print REAR position element
of the Queue Q.
Step 1: IF R = -1 THEN
RETURN -1
ELSE
RETURN Q[R]
ENDIF
Empty Operation: Empty operation is used to check whether the queue is empty
or not.
Step 1: IF F = -1 THEN
RETURN 1
ELSE
RETURN 0
ENDIF
Full Operation:
Full operation is used to check whether the queue is full with elements or not.
Display Operation:
Step 1: IF F = -1 THEN
WRITE ‘LINEAR QUEUE EMPTY’
ELSE
REPEAT FOR i ← F TO R DO STEPS BY 1
WRITE Q[i]
ENDREPEAT
ENDIF
Step 2: RETURN
Example: Let N = 5
0 1 2 3 4
Status of Queue Q: 30 40 50
F R
At this stage, if we tried to insert an element into the queue, it prints a message as
“Queue Overflow” that refers to empty locations is not available. But still empty locations
are available at the FRONT end of the list.
From this, Even though empty locations are available at FRONT end and REAR
reached to N-1th locations then further insertion is not possible in linear queues.
b) CIRCULAR QUEUE
In circular queue, elements are arranged in circular fashion. Here, when REAR end
reaches to N-1th location and still empty locations are available at the FRONT end of the list
then REAR variable is set back to starting location. The logical view of a circular queue can
be represented as:
Consider the basic operations on circular queue as Insertion, Deletion and Display
operations.
Algorithm Insertion(x): This procedure inserts a new element x into REAR position of
the circular queue CQ.
Step 1: IF F = -1 THEN
WRITE ‘CIRCULAR QUEUE UNDERFLOW’
RETURN -1
ENDIF
Step 2: K ← CQ[F]
Step 3: IF F = R THEN
F←R←-1
ELSEIF F = N-1 THEN
F←0
ELSE
F←F+1
ENDIF
Step 4: RETURN K
Step 1: IF F = -1 THEN
WRITE ‘CIRCULAR QUEUE EMPTY’
ELSE
IF F ≤ R THEN
REPEAT FOR i ← F TO R DO STEPS BY 1
WRITE CQ[i]
ENDREPEAT
ELSE
REPEAT FOR i ← F TO N-1 DO STEPS BY 1
WRITE CQ[i]
ENDREPEAT
REPEAT FOR i ← 0 TO R DO STEPS BY 1
WRITE CQ[i]
ENDREPEAT
ENDIF
ENDIF
Step 2: RETURN
c) DEQUE
A deque (Double Ended Queue) is a linear data structure in which elements may be
inserted and deleted at either ends of the list. The diagrammatic representation of a deque is
as follows:
Insertions Insertions
FRONT REAR
Deletions Deletions
Insertions
FRONT REAR
Deletions Deletions
Insertions Insertions
FRONT REAR
Deletions
d) PRIORITY QUEUE
A priority queue is a collection of elements such that each element has been assigned
a priority and such that the order in which elements are deleted and processed comes from the
following rules:
JOB/ELEMENT PRIORITY
10 3
20 2
30 1
40 2
Then the status of the priority queue after processing these elements is:
0 1 2 3 4
30 20 40 10
FRONT REAR
APPLICATION OF QUEUES
Queues are used in different application areas such as: simulation, multiprogramming
environments, job scheduling applications etc.
In multiprogramming environment a single CPU has to serve more than one program
simultaneously. In this case, scheduling is to classify the work load according to its
characteristics and to maintain separate process queues. Process will assign to their
respective queues. Then CPU will service the processes as per the priority of the queues.
Example:
Here, high priority programs are processed first before the medium and low priority
queue jobs. After completing high priority queue program, then CPU serve its service to
medium priority queue programs. Finally it serves to lowest priority programs.
Round Robin (RR) algorithm is a scheduling algorithm designed for time sharing
systems. The problem statement can be stated as:
Assume there are n processes P1, P2, - - - - , Pn served by the CPU. Different
processes require different execution time. Suppose sequence of processes are arrivals
according to their subscripts as P1 comes first than P2 and P2 comes first than P3 and so on.
Consider the following table consists of different processes and their burst time to
complete the execution.
Assume the time quantum is 4 units. For this the total execution procedure with RR
algorithm can be shown as:
P1 P2 P3 P1 P2 P3 P2 P2 P2
0 4 8 12 15 19 20 24 28 30
To implement such execution, all the programs are currently under execution are
maintained in a circular queue. When a process finished its execution then this process is
deleted from the queue and whenever a new process arrives it is inserted into tail of the
queue.
Advantages of Queues:
***
LINKED LIST
“A linked list is a linear collection of elements called NODES and the order of the
elements is given by means of LINKS / POINTERS”. The diagrammatic representation of a
linked list is as follows:
1 2 3
10 2 20 3 30
START END
Linked list can be represented in memory in two ways as: Static representation and
Dynamic representation.
Static Representation:
Static representation of a linked list uses array concept. Here, the list maintains two
arrays – one array as DATA array and other array as LINK array.
4 4
3 30 3
2 20 3 2
START 1 10 2 1
0 0
Dynamic Representation:
Example:
1 2 3
10 2 20 3 30
START END
Depending on the number of parts of a node and the way of establishing links in
between the nodes, linked list can be classified into different categories as:
In single linked list, each node is divided into two parts as INFO/DATA part and
LINK part.
DATA LINK
NODE
Where, the first part DATA field consist information part of the element and the
second part LINK field consist address of the next node in the list.
Example:
1 2 3
59 2 38 3 64
START END
Here, START and END are two pointer variables that points to beginning and ending
nods of the list. The LINK field of the last node filled with a special pointer called NULL
pointer.
i) Creating a list
ii) Traversing the list
iii) Insertion of a node into the list
iv) Deletion of a node from the list
v) Counting number of elements
vi) Searching an element etc.,
i) Creating a list
Creating a list refers to the process of creating nodes of the list and arranges links in
between the nodes of the list.
In single linked list, nodes are created using self-referential structure as:
Now, create new nodes with the format as: NODE *NEW;
Initially no elements are available in the list. At this stage, set two pointer variables
START and END to NULL pointer as:
Algorithm creation (): This procedure creates a single liked list with the
specified number of elements.
Traversing the list refers to the process of visiting every node of the list exactly once
from the first node to the last node of the list. To display the elements of the single linked
follow the procedure as:
Step 2 - If it is Empty then, display 'List is Empty' and terminate the function.
Step 3 - If it is Not Empty then, define a Node pointer 'temp' and initialize
with start. Keep displaying temp → data and then shift temp to next location until
temp is not equal to NULL pointer.
Algorithm display(): This procedure is used to display the elements of the single
linked list from the first node to the last node.
1 2 3
59 2 38 3 64
START END
The process of inserting a node into the single linked list falls into three cases as:
Front insertion
Rear insertion
Any position insertion
Case 1: Front Insertion: In this case, a new node is inserted at front position of
the single linked list. For this, follow the procedure as:
Step 3 - If it is Empty then, set START and END pointers to NEW node.
Step 4 - If it is Not Empty then, set LINK(NEW) with START pointer and move
START pointer to NEW node.
Algorithm Finsertion(x): This procedure inserts an element x at front end of the list.
1 2 3
59 2 38 3 64
START END
Finsertion (26):
0 1 2 3
26 1 59 2 38 3 64
START END
NEW
Case 2: Rear Insertion: In this case, a new node is inserted at rear position of
the single linked list. For this, follow the procedure as:
Step 2 - Check whether list is Empty or not. If it is Empty then, set START and
END pointers to NEW node.
Step 3 - If it is Not Empty then, set LINK(END) with NEW pointer and move END
pointer to NEW node.
Algorithm Rinsertion(x): This procedure inserts an element x at rear end of the list.
ELSE
LINK(END) ← NEW
END ← NEW
ENDIF
Step 3: RETURN
1 2 3
59 2 38 3 64
START END
Rinsertion (26):
1 2 3 4
59 2 38 3 64 4 26
START END
NEW
Case 3: Any Position Insertion: In this case, a new node is inserted at a specified
position of the single linked list. For this, follow the procedure as:
Step 2 – Set a temporary variable PTR points to START and move the PTR variable
up to the specified location of the list along with another temporary variable TEMP.
When the specified location achieved, establish links in between NEW, PTR and
TEMP variables.
1 2 3 4
59 2 38 3 64 4 26
START END
1 2 6 3 4
59 2 38 6 14 3 64 4 26
Deleting an element from the single linked list falls into three categories as:
Front deletion
Rear deletion
Any position deletion
Case 1: Front Deletion: In this case, front node information is deleted from the
single linked list. For this, follow the procedure as:
Step 2 - If it is Empty then, display 'List is Empty, Deletion is not possible' and
terminate the function.
Step 3 - If it is Not Empty then, define a Node pointer 'temp' and initialize
with START.
Step 4 - Check whether list is having only one node or not. If it is TRUE then set
START and END to NULL pointer; otherwise, change START to LINK(START) and
LINK(TEMP) if filled with NULL pointer.
Algorithm Fdeletion( ): This function deletes the front element of the list.
1 2 3 4
59 2 38 3 64 4 26
START END
2 3 4
38 3 64 4 26
START END
Case 2: Rear Deletion: In this case, rear node information is deleted from the
single linked list.
Step 2 - If it is Empty then, display 'List is Empty, Deletion is not possible' and
terminate the function.
Step 3 - If it is Not Empty then, define a Node pointer ‘ptr’ at START and 'temp' at
END pointer.
Step 4 - Check whether list is having only one node or not. If it is TRUE then set
START and END to NULL pointer; otherwise, move ptr link to the node which is
before the temp and make the appropriate links between END and ptr pointers.
Algorithm Rdeletion( ): This function deletes the rear element of the list.
TEMP ← END
Repeat WHILE LINK(PTR) ≠ END
PTR ← LINK(PTR)
EndRepeat
END ← PTR
LINK(END) ← NULL
Release memory of TEMP
ENDIF
RETURN k
ENDIF
1 2 3 4
59 2 38 3 64 4 26
START END
1 2 3
59 2 38 3 64
START END
Step 2 - If it is Empty then, display 'List is Empty, Deletion is not possible' and
terminate the function.
Step 3 - If it is Not Empty then, set a temporary variable PTR points to START and
move the PTR variable up to the specified location of the list along with another
temporary variable TEMP. When the specified location achieved, change links in
between NEW, PTR and TEMP variables.
Algorithm Anydeletion(pos): This function deletes the element of the list from the
specified position pos.
PTR ← START
p←1
Repeat WHILE p < pos
TEMP ← PTR
PTR ← LINK(PTR)
p ← p+1
EndRepeat
k ← DATA(PTR)
LINK(TEMP) ← LINK(PTR)
LINK(PTR) ← NULL
Release memory of PTR
RETURN k
ENDIF
Example: Assume the initial status of the list as:
1 2 3 4 5
59 2 38 3 14 4 64 5 26
START END
1 2 4 5
59 2 38 4 64 5 26
START END
Algorithm count(): This function is used to count number of elements of the list.
1 2 3 4 5
59 2 38 3 14 4 64 5 26
START END
In this case, function checks whether a key element is present in the list of elements or
not. If the search element is found it refers to successful search; otherwise, it refers to
unsuccessful search.
Algorithm search (key): This function checks whether an element ‘key’ present
in the list of elements or not. It returns 1 if the search element key is found; otherwise, it
returns 0.
1 2 3 4
59 2 38 3 64 4 26
START END
Disadvantages:
In double linked list, each node is divided into three parts as FLINK/LLINK,
INFO/DATA and RLINK.
NODE
Here,
The first part FLINK/LLINK consist address of the left node of the list
The second part DATA/INFO consist information part of the element
The third part RLINK field consist address of the right node of the list.
Example:
1 2 3
10 2 1 29 3 2 32
START END
Here, START and END are two pointer variables that points to beginning and ending
nods of the list. The LLINK field of the first node and RLINK field of the last node filled
with a special pointer called NULL pointer.
i) Creating a list
ii) Traversing the list
iii) Insertion of a node into the list
iv) Deletion of a node from the list
v) Counting number of elements
vi) Searching an element etc.,
i) Creating a list
Creating a list refers to the process of creating nodes of the list and arranges links in
between the nodes of the list.
In double linked list, nodes are created using self-referential structure as:
Now, create new nodes with the format as: NODE *NEW;
Initially no elements are available in the list. At this stage, set two pointer variables
START and END to NULL pointer as:
Algorithm creation (): This procedure creates a double liked list with the
specified number of elements.
Traversing the list refers to the process of visiting every node of the list exactly once
from the first node to the last node of the list.
Algorithm display(): This procedure is used to display the elements of the double
linked list from the first node to the last node.
Example:
1 2 3
10 2 1 29 3 2 32
START END
The process of inserting a node into the double linked list falls into three cases as:
Front insertion
Rear insertion
Any position insertion
Case 1: Front Insertion: In this case, a new node is inserted at front position of
the double linked list.
Algorithm Finsertion(x): This procedure inserts an element x at front end of the list.
1 2 3
59 2 1 24 3 2 64
START END
Finsertion (26):
0 1 2 3
26 1 0 59 2 1 24 3 2 64
Case 2: Rear Insertion: In this case, a new node is inserted at rear position of
the double linked list.
Algorithm Rinsertion(x): This procedure inserts an element x at rear end of the list.
1 2 3
59 2 1 24 3 2 64
START END
Rinsertion (26):
1 2 3 4
59 2 1 24 3 2 64 4 3 26
START NEW
END
Case 3: Any Position Insertion: In this case, a new node is inserted at a specified
position of the double linked list.
1 2 3
59 2 1 24 3 2 64
START END
Anyinsertion (26,3):
1 2 5 3
59 2 1 24 5 2 26 3 5 64
The process of deleting an element from the double linked list falls into three
categories as:
Front deletion
Rear deletion
Any position deletion
Case 1: Front Deletion: In this case, front node information is deleted from the
double linked list.
Algorithm Fdeletion( ): This function deletes the front element of the list.
1 2 3 4
59 2 1 24 3 2 26 4 3 64
START END
2 3 4
24 3 2 26 4 3 64
START END
Case 2: Rear Deletion: In this case, rear node information is deleted from the
double linked list.
Algorithm Rdeletion( ): This function deletes the rear element of the list.
ELSE
k← DATA(END)
IF START = END THEN
START ← END ← NULL
ELSE
PTR ← START
TEMP ← END
Repeat WHILE RLINK(PTR) ≠ END
PTR ← RLINK(PTR)
EndRepeat
END ← PTR
LLINK(END) ← NULL
RLINK(PTR) ← NULL
END ← PTR
Release memory of TEMP
ENDIF
RETURN k
ENDIF
1 2 3 4
59 2 1 24 3 2 26 4 3 64
START END
1 2 3
59 2 1 24 3 2 26
START END
1 2 3 4
59 2 1 24 3 2 26 4 3 64
START END
1 2 4
59 2 1 24 4 2 64
START END
Algorithm count(): This function is used to count number of elements of the list.
1 2 3 4
59 2 1 24 3 2 26 4 3 64
START END
In this case, function checks whether a key element is present in the list of elements or
not. If the search element is found it refers to successful search; otherwise, it refers to
unsuccessful search.
Algorithm search (key): This function checks whether an element ‘key’ present
in the list of elements or not. It returns 1 if the search element key is found; otherwise, it
returns 0.
1 2 3 4
59 2 1 24 3 2 26 4 3 64
START END
Disadvantages:
In doubly linked list, forward and backward directions traversing are possible. But
traversing from a specific location is not possible.
In circular linked list, elements are organized in circular fashion. Circular linked list
can be classified into two types as: Circular single linked list and Circular double linked list.
In circular single linked list, the LINK part of the last node contains address of the
starting node. The diagrammatic representation of a circular single linked list is as follows:
1 2 3
10 2 20 3 30 1
START END
Operations:
Algorithm creation(): This procedure creates a circular single linked with the
specified number of elements.
ENDIF
LINK(NEW) ← START
ENDIF
EndRepeat
Algorithm display(): This procedure is used to display the elements of the circular
single linked list from the first node to the last node.
ELSE
WRITE ‘LIST EMPTY’
ENDIF
Step 3: RETURN
1 2 3
10 2 20 3 30 1
START END
In circular double linked list, the RLINK part of the last node contains address of the
starting node and LLINK part of the first node contains address of the last node respectively.
The diagrammatic representation of a circular double linked list is as follows:
1 2 3
3 10 2 1 20 3 2 30 1
START END
Operations:
Algorithm creation(): This procedure creates a circular double linked with the
specified number of elements.
ENDIF
EndRepeat
Algorithm display(): This procedure is used to display the elements of the circular
double linked list from the first node to the last node.
A header linked list is a linked list which always contains a special node called as the
“header node” at beginning of the list that holds address of the starting node.
A grounded header list is a header linked list where the link part of the last node
consists of NULL pointer.
Example:
Header node
1
1 2 3
10 2 20 3 30
START END
A circular header list is a header linked list where the link part of the last node
contains address of the header node i.e., it points back to the header node.
Example:
Header node
1
1 2 3
10 2 20 3 30
START END
Linked lists are used in different application areas such as sparse matrix
manipulations, polynomial representations, stack implementations, queue implementations
etc.
Stack data structure can also be implemented using linked list (Dynamic Storage
Management). The diagrammatic representation of a linked stack can be shown as:
1 2 3 4
10 2 20 3 30 4 40
START TOP
Algorithm push (ITEM): This procedure inserts a new element ITEM at TOP position of
the stack.
1 2 3
59 2 38 3 64
START TOP
push (26):
1 2 3 4
59 2 38 3 64 4 26
START TOP
NEW
Algorithm pop (): This function deletes the topmost element from the stack.
1 2 3 4
59 2 38 3 64 4 26
START TOP
1 2 3
59 2 38 3 64
START TOP
1 2 3 4
59 2 38 3 64 4 26
START TOP
Queue data structure can also be implemented using linked list (Dynamic Storage
Management). The diagrammatic representation of a linked queue can be shown as:
1 2 3 4
10 2 20 3 30 4 40
FRONT REAR
Algorithm insertion (ITEM): This procedure inserts a new element ITEM at REAR
position of the queue.
1 2 3
59 2 38 3 64
FRONT REAR
insertion (26):
1 2 3 4
59 2 38 3 64 4 26
FRONT REAR
NEW
Algorithm deletion(): This function deletes the front position element of the linked
queue.
ELSE
k← DATA(FRONT)
IF FRONT = REAR THEN
FRONT ← REAR ← NULL
ELSE
TEMP ← FRONT
FRONT ← LINK(FRONT)
LINK(TEMP) ← NULL
Release memory of TEMP
ENDIF
RETURN k
ENDIF
1 2 3 4
59 2 38 3 64 4 26
FRONT REAR
2 3 4
38 3 64 4 26
FRONT REAR
Algorithm display (): This procedure is used to display the elements of the linked
queue.
Step 1: TEMP ← FRONT
Step 2: IF FRONT ≠ NULL THEN
REPEAT WHILE TEMP ≠ NULL
WRITE DATA (TEMP)
TEMP ← LINK(TEMP)
ENDREPEAT
ELSE
WRITE ‘QUEUE EMPTY’
ENDIF
Step 3: RETURN
1 2 3 4
59 2 38 3 64 4 26
FRONT REAR
Linked list used as dynamic implementation of the elements to save memory and can
be applied on huge amount of elements.
Memory utilization can be effective. Since memory for the variables is allocated at
run time based on the user requirements.
It provides flexibility to rearrange elements by changing links.
Insertion and deletion operations are easier compared to array implementation.
Linear data structures such as stack and queues can also be implemented with the help
of linked list.
Disadvantages of Linked list:
END
Unit-III
Trees
Introduction, Binary Trees, Binary Tree Traversals, Additional Binary Tree Operations,
Binary Search Trees, Counting Binary Trees, Optimal Binary Search Trees, AVL Trees, B-
Trees: B-Trees, B+ Trees.
TREE
Example:
A
B C D
E F G
H I
BASIC TERMINOLOGIES
Node: In tree structure each element is represented as a node. The concept of node is
same as used in linked list. Node of a tree stores the actual data and links to the other node.
Edge: The connecting link between any two nodes is called as an edge. If a tree
consists of ‘N’ number of nodes then the number of edges is N-1.
Path: The sequence of nodes and edges from one node to another node is called as a
path between the two nodes.
Parent Node: Parent of a node is the immediate predecessor of a node.
Child Node: In a tree data structure, if the immediate predecessor of a node is the parent
node then all immediate successor of a node are known as child nodes.
Leaf Node: The node which does not have any child is called a leaf node. Leaf nodes are
also known as terminal nodes.
Non-Leaf Node: The node which has children nodes is called a non-leaf node. Non-leaf
nodes are also known as non-terminal nodes.
Level: Level refers to the rank of the hierarchy. In general, root node is at level 1, its
children are at level 2, their children are at level 3 and so on. Hence, if a node is at level ‘K’
then its children are at level ‘K+1’.
Example: Level 1 : A
Level 2 : B, C, D
Level 3 : E, F, G
Level 4 : H, I
Degree of a Node: The number of subtrees of a node is called its degree. Degree of a leaf
node is 0.
Degree of a Tree: The degree of a tree is the maximum of the degree of nodes in the tree.
Height of a Tree: Maximum number of nodes that is possible in a path starting from root
node to a leaf node is called the height of a tree. Height of a tree is also known as depth of a
tree.
Example: Height of the above tree : 4
Siblings: The nodes which have the same parent are called siblings.
REPRESENTATION OF TREES
A) List representation
B) Left child – Right sibling representation
C) Degree – Two representation
A) List Representation
In list representation of a tree, the information in the root node comes first, followed
by a list of subtrees of that node.
Example:
A
B C D
E F G
H I
In left child – right sibling representation, the node structure format can be shown as:
DATA
Left child Right sibling
Here, every node has at most one leftmost child and at most one closest right sibling.
The left child field of each node points to its leftmost child, and the right sibling field points
to its closest right sibling. In this format, order of children in a tree is not important, any of
the children of a node could be the leftmost child, and any of its siblings could be the closest
right sibling.
Example:
A
B C D
E F G
H I
B C D
E F G
H I
Example:
A
B C D
E F G
H I
E C
F G D
BINARY TREES
Example:
A
B D
E F
H I
B C
D E F G
B C
D E F
i. In any binary tree, maximum number of nodes on a level K is 2K-1, where K≥1.
iv. For any non-empty binary tree, if n is the number of nodes and e is the number of
edges, then n=e+1.
v. A full binary tree of depth k is a binary tree of depth k having 2k – 1 nodes, k≥0.
vi. Height of a complete binary tree with n number of nodes is ┌log (n+1) ┐.
2
a) Sequential representation
b) Linked representation
a) Sequential Representation
Sequential representation of a binary tree is static and uses array concept. In this
representation, the nodes are stored level by level, starting from the first level. Root node is
stored in the first memory allocation.
Let X is an array used to store binary tree elements. Assume the root node is stored in
index 1 of the array. Then the remaining nodes follow the properties as:
1
2
7 3
2 4
9 2
9 1
0 1 2 3 4 5 6 7
12 72 34 99 21
Advantages:
Any node can be accessed from any other node by calculating its index value.
Data stored easily without help of pointers.
BASIC, FORTRAN programming languages doesn’t support dynamic memory
allocation concept. In such cases, array representation is an efficient way to store tree
structures.
Disadvantages
It allows only static representation. It is not possible to enhance the tree structure if
the array size is limited.
Other than full binary trees, majority of the array entries may be empty. Hence, the
structure leads to space complexities problems.
b) Linked Representation
NODE
Where,
Example:
Traversing operation refers to the process of visiting elements of the tree exactly once.
Any tree can be traversed in three ways as:
a) Inorder Traversal
b) Preorder Traversal
c) Postorder Traversal
a) Inorder Traversal
Algorithm Inorder (ROOT): This procedure is used to visit the binary tree in
inorder recursively.
Example:
+
Inorder Traversal Output:
* C
A * B + C
A B
b) Preorder Traversal
Algorithm Preorder (ROOT): This procedure is used to visit the binary tree in
preorder recursively.
Example:
+
Preorder Traversal Output:
* C
Visvodaya Technical Academy Page 113
+ * A B C
A B
c) Postorder Traversal
Example:
+
Postorder Traversal Output:
* C
A B * C +
A B
1. Level-Order Traversal
Inorder, Preorder and Postorder traversals operations uses stack data structure,
whereas level-order traversal uses queue data structure.
In this process, first visit the root node, then root’s left child followed by the root’s
right child. Continue the same process, visiting the nodes at each new level from the leftmost
node to the rightmost node.
Example:
+
A B
Copy operation is used to copy the existing binary tree into another binary tree.
Functional implementation is as follows:
{
BTree *Temp;
if(Original)
{
Temp = (BTree*)malloc(sizeof(BTree));
Temp → LCHILD = copy(Original → LCHILD);
Temp → RCHILD = copy(Original → RCHILD);
Temp → DATA = Original → DATA;
return Temp;
}
return NULL;
}
3. Merging Operation
Suppose T1 and T2 are two binary trees. Merging operation refers to the process of
combining the entire tree T2 (or T1) as a sub tree of T1 (or T2). For this observe that in either
(or both) tree there must be at least one null sub tree.
Example:
V V
1 1
V V V
V + =
5 2 5
2
V V V V V V
V V 6 7 3 4 6 7
3 4
Let T1 consists of n1 nodes and T2 consists of n2 nodes, then the resultant tree T
consists n1+n2 nodes.
i.e., T1 (n1) + T2 (n2) = T (n1 + n2)
ROOT ← ROOT1
ELSEIF RC(ROOT1) = NULL THEN
RC(ROOT1) ← ROOT2 /* Merge T2 as right child of T1 */
ROOT ← ROOT1
ELSEIF LC(ROOT2) = NULL THEN
LC(ROOT2) ← ROOT1 /* Merge T1 as left child of T2 */
ROOT ← ROOT2
ELSEIF RC(ROOT2) = NULL THEN
RC(ROOT2) ← ROOT1 /* Merge T1 as right child of T2 */
ROOT ← ROOT2
ELSE
WRITE ‘MERGE OPERATION NOT POSSIBLE’
EXIT
ENDIF
Step 2: STOP
4. Formation of a binary tree from its traversal techniques
A binary tree can be constructed with two traversal values in which one should be
inorder traversal and other should be either preorder or postorder traversal. Basic principle
for the formation can be stated as:
If the preorder traversal is given, then the first node is the ROOT node. If the
postorder traversal is given, then the last node is the ROOT node.
Once the ROOT node is identified, all the nodes in the left subtree and right subtree of
the ROOT node can be gathered.
Same procedure is applied repeatedly on the left and right subtrees.
Inorder Traversal : D B H E A I F J C G
Preorder Traversal : A B D E H C F I J G
Step 1:
A
In: D B H E In: I F J C G
Pre: B D E H Pre: C F I J G
A
Visvodaya Technical Academy Page 117
Step 2:
C
B
D
In : H E In : I F J G
Pre : E H
Pre: F I J
Step 3:
A
C
B
D E
F
G
H
I J
1) Expression tree
2) Binary search tree
3) Heap tree
4) Threaded binary tree
5) Huffman tree
6) Height balanced tree
7) Decision tree etc.
1) EXPRESSION TREE
An expression tree is a binary tree which stores an arithmetic expression. The leaves
of an expression tree are operands, such as constants/variable names and all internal nodes
are the operators.
Example: A+B*C
+ C
A B
A binary search tree is a binary tree. It may be empty. If it is not empty then it
satisfies the following properties as:
Each node has exactly one value and the values in the tree are distinct.
The values in the left subtree are smaller than the value in the root node.
The values in the right subtree are larger than the value in the root node.
The left and right subtrees are also binary search trees.
Example:
7
5
2 9
7 2
8 9
3 9
Basic operations on a binary search tree are searching, insertion, deletion, merging,
traversing etc.
i) Search Operation:
Suppose T is a binary search tree and is represented using linked structure. Assume
search element is given in the variable Key. Then search procedure works as:
If Key value is less than the root node R, then proceed to its left child; If Key value is
greater than the root node R, then proceed to its right child.
The above process will be continued till the Key is found or NULL pointer occurred.
If search procedure reached to NULL pointer, it refers to Key value not found.
Example:
7
5 Search (92): Key Found
2 9
7 2 Search (77): Key Not Found
8 9
3 9
Suppose T is a binary search tree. Inserting element is given in the variable Key.
Then insertion procedure works as:
To insert a node with the given value, the tree T is to be searched from the ROOT
node.
If the same Key value is found at any stage, then print a message as ‘Key already
exists – Insertion not possible’; otherwise, a new node is inserted at the dead node
where the search procedure terminates.
Example:
7 7
5 5
Insertion (39)
2 9 2 9
7 2 7 2
3
8 9 8 9
9
3 9 3 9
Algorithm Insertion (Key): This procedure is used to insert a new node with the
data as Key into the binary search tree.
Suppose T is a binary search tree and an element of information is given in the variable
Key to delete from T.
Consider N be the node which contains the information element as Key. Assume
PARENT (N) denotes the parent node of N and SUCC (N) denotes the inorder successor of
the node N.
Then the deletion of the node N depends on any one of the three following cases
based on its children nodes as:
In this case, N is deleted from the tree T by simply setting the pointer of N in the
parent node PARENT (N) by NULL pointer.
Example:
7
7
5
5
Deletion (39) 9
9 2
2
7 2
7 2
3 8 9
8 9
9 3 9
3 9
In this case, N is deleted from the tree T by replacing address of its children pointer in
the parent node PARENT (N).
Example:
7
7 5
5 3 9
9 9 2
Visvodaya
2 Technical Academy Page 122
7 2
3 8 9 8 9
Downloaded by Karan Kamble ([email protected])
3 9
lOMoARcPSD|35685630
Deletion (27)
In this case, N is deleted from the tree T by first deleting succ(N) from T (by using
case 1 or case 2) and then replace the data content in node N by the data content of node
succ(N).
7
5 7
Deletion (92)
9 5
2
7 2 2 9
7 4
3
8 9
9
3 9 8 9
3 9
9
4
Algorithm deletion(Key): This function is used to delete a specified key element from the
binary search tree.
Flag ← TRUE
ENDIF
EndRepeat
Step 3: IF Flag = FLASE THEN
WRITE ‘ KEY DOES NOT EXIST’
EXIT
ENDIF
Step 4: IF LCHILD (PTR) = NULL AND RCHILD (PTR) = NULL THEN
IF LCHILD(Parent) = PTR THEN
LCHILD(Parent) = NULL
ELSE
RCHILD(Parent) = NULL
ENDIF
Step 1: P ← RCHILD(PTR)
Step 3: RETURN P
A binary search tree can be traversed in three ways such as: Inorder traversal,
Preorder traversal and Postorder traversal techniques.
Example:
6
5
1 7
9 4
1 2 8
5 8 8
2 5
5 7
Inorder Traveral : 15 19 25 28 57 65 74 88
Preorder Traversal : 65 19 15 28 25 57 74 88
Postorder Traversal : 15 25 57 28 19 88 74 65
Note:
Inorder traversal on a binary search tree will give the sorted order of data in
ascending order.
To sort the given set of data, a binary search tree can be built and then inorder
traversal can be applied. This method of sorting is known as binary sort and such
binary search tree can be treated as binary sorted tree.
Assume there is a fixed set of keys. When we wish to create a binary search tree, it
leads to different performance characteristics.
1 1
0 0
etc.,
5 2 5 2
5 0
2 1 2
0 5 5
1
5
When we apply search operations on those keys:
o First tree takes 1, 2, 2, 3, 4 comparisons to find the keys. Thus, the average
number of comparisons = (1+2+2+3+4) / 5 = 12/5.
o Second tree takes 1, 2, 2, 3, 3 comparisons to find the keys. Thus, the average
number of comparisons = (1+2+2+3+3) / 5 = 11/5.
o Hence, the second binary search tree has a better performance than the first
binary search tree.
Suppose each of these keys 5, 10, 15, 20 and 25 is searched with a probability of 0.3,
0.3, 0.05, 0.05 and 0.3 respectively. Then the average number of comparisons for the
first tree is 1.85 and second tree is 2.05. In this case, the first binary search tree has a
better performance than the second binary search tree.
Based on the different factors, it leads different binary search trees and gives different
performances.
Cost construction for a binary search tree is obtained by considering cost of successful
search and unsuccessful search. In evaluating binary search trees, it is useful to add a special
‘square’ node at every null link. These nodes are known as external nodes. Other nodes are
treated as internal nodes. A binary tree with external nodes is known as an extended binary
tree. If it satisfies the properties of binary search tree, then it is termed as extended binary
search tree.
Example: Extended binary search tree:
1
0
5 2
0
1 2
5 5
Construction of OBST:
Let a1 , a2 , - - - , a n with a1 < a2 < - - - - < an be the element keys. To design optimal
binary search tree, apply dynamic programming approach. Suppose that the probability of
searching for ai is pi and probability of unsuccessful search is qi. Then, calculate w (Weight
value), c (Cost value) and r (Root node) at each stage. For this, computation equations are
given below.
Initially wi,i = qi
ci,i = 0
ri,i = 0
wi,j = pj + qj + wi,j-1
ci,j = Min [ ci,k-1 + ck,j ] + wi,j
i<k≤j
ri,j = k
For a node position ri,j = k, the left child position is ri,k-1 and the right child position is rk,j.
Example: Construct an optimal binary search tree with n=3, (a 1 , a2 , a3) = (10, 15,
20), (p1 , p2 , p3) = (3, 3, 1) and (q0 , q1 , q2 , q3) = (2, 3, 1, 1).
Trees with a worst case height of O (log n) are called height balanced trees.
Here, AVL tree and Red-Black trees are useful for internal memory applications
whereas B-tree is useful for external memory applications.
i) AVL Tree
AVL tree is a height balanced tree introduced in 1962 by Adelson-Velskii and Landis.
“An empty binary tree T is an AVL tree. If T is non-empty binary tree with T L and TR
as its left and right subtrees, then T is an AVL tree if:
In an AVL tree every node is associated with a value called balance factor and it is
denoted for the node x as:
From the definition of AVL tree, the allowable balance factors are 0, 1 and -1.
Example:
0
2
0
0 -1
1 4
5 0
0 0 1 0
9 7
0 7 6
Note: If the AVL tree satisfies the properties of binary search tree, then it is referred to as an
AVL search tree.
Insertion Operation:
Inserting an element into an AVL search tree follows the same procedure as the
insertion of an element into a binary search tree. But, the insertion may leads to a situation
where the balance factors of any of the nodes may be other than -1, 0, 1 and the tree is
unbalanced.
If the insertion makes an AVL search tree unbalanced, the height of the subtree must
be adjusted by the operations called rotations.
For this, consider N is the newly inserted node and A is the nearest ancestor which
has balance factor as 2 or -2. Then, the imbalance rotations are classified into four types as:
Here, the transformations done to LL and RR imbalances are often called single
rotations, while those done for LR and RL imbalances are called double rotations.
LL Rotation: In LL rotation, every node moves one position from the current position.
A
B
LL Rotation
B
N A
N
RR Rotation: In RR rotation, every node moves one position from the current position.
A
B
RR Rotation
B
A N
N
LR Rotation: The LR Rotation is a sequence of single left rotation followed by a single right
rotation. In LR Rotation, at first, every node moves one position to the left and one position
to right from the current position.
A A
N
B
N
N B
A
B
RL Rotation: The RL Rotation is sequence of single right rotation followed by single left
rotation. In RL Rotation, at first every node moves one position to right and one position to
left from the current position.
A A
N
B N
B A
N B
Deletion Operation:
Deletion of an element from an AVL search tree follows the same procedure as the
deletion of binary search tree operations. Due to deletion of the node, some or all of the
nodes of balance factors on the path might be changed and tree becomes unbalanced. To
make it balanced format, it also requires rotations.
In this case, the deleting node is not available after performing the deletion operation.
Hence based on the balanced factor of siblings of the deleted node, rotations are classified
into six types as L0 , L1 , L-1 and R0 , R1 , R-1 rotations.
R0 Rotation: Assume a node is deleted from the right subtree of a specific position C. After
deletion operation, the sibling node B has a balance factor as 0, and then R 0 rotation is used to
rebalance the tree as:
C
B
R0 Rotation
0 B C1R BL C
BL BR BR C1R
R1 Rotation: Assume a node is deleted from the right subtree of a specific position C. After
deletion operation, the sibling node B has a balance factor as 1, and then R 1 rotation is used to
rebalance the tree as:
C
B
R1 Rotation
B 1
1 CR BL C
BL BR BR C1R
R-1 Rotation: Assume a node is deleted from the right subtree of a specific position C. After
deletion operation, the sibling node B has a balance factor as -1, and then R -1 rotation is used
to rebalance the tree as:
C B
R-1 Rotation
-1 A C1R A C
B
Visvodaya Technical Academy Page 131
AL
AL BL BR C1R
BL BR
L0 Rotation: Assume a node is deleted from the left subtree of a specific position B. After
deletion operation, the sibling node C has a balance factor as 0, and then L 0 rotation is used to
rebalance the tree as:
B
C
L0 Rotation
B1L 0 C CR
B
CL CR B1L CL
L1 Rotation: Assume a node is deleted from the left subtree of a specific position B. After
deletion operation, the sibling node D has a balance factor as 1, and then L 1 rotation is used to
rebalance the tree as:
B
C
L1 Rotation
BL 1 D
D
B
C
DR B1L CL CR DR
CL CR
L-1 Rotation: Assume a node is deleted from the left subtree of a specific position B. After
deletion operation, the sibling node C has a balance factor as -1, and then L -1 rotation is used
to rebalance the tree as:
B
C
L-1 Rotation
B1L -1 C CR
B
CL CR B1L CL
4 8
0 5
7 9
5 9
ii) B-TREE
An m-way search tree T may be an empty tree. If T is non-empty then it satisfies the
properties as:
20 40
0
10 15 25 30 45 50
0 0
28
B-TREE Definition
The root node should have a minimum of two children and a maximum of m children.
All the internal nodes except the root node should have a minimum of ┌ m/2 ┐ non-
empty children and a maximum of m non-empty children.
All the external nodes are at the same level.
43 75
6 24 52 64 87
Note:
1. B-tree of order 3 is also referred to as 2-3 trees. Since, its internal nodes can have
only two or three children.
2. B-tree of order 4 is also referred to as 2-3-4 or 2-4 trees. Since, its internal nodes can
have either two, three or four children.
OPERATIONS ON B-TREE
Insertion Operation:
Inserting a new element into the B-tree of order m is followed by the search operation
for the proper location in a node. When the search terminates at a particular node, then
insertion falls into the either of the cases as:
Case-1: In node X contains space for insertion, then inserts the element in proper
position and child pointers are adjusted accordingly.
40 82
11 25 38 58 74 86 89 93 97
Insertion (64):
40 82
11 25 38 58 64 74 86 89 93 97
Case-2: If node X contains full of elements, then first insert the element into its list of
elements. Then split the node into two sub nodes at the median value. The elements that are
less than the median becomes the left node and that are greater than the median becomes the
right node. Then the median element is shifted up into the parent node of X. Sometimes the
process may propagate up to root level also.
40 82
11 25 38 58 74 86 89 93 97
Insertion (99):
40 82 93
11 25 38 58 74 86 89 97 99
Deletion Operation:
Case-1: When the key exists in the leaf node and deletion may not effect of the
properties, then simply delete the key from the node and child pointers are adjusted.
11 25 38 58 74 86 89 93 97
Deletion (89):
40 82
11 25 38 58 74 86 93 97
Case-2: When the key exists in the non-leaf node, replace key with the largest element
from its left sub-tree or the smallest element from its right sub-tree.
11 25 38 58 74 86 89 93 97
Deletion (40):
82
11 25 38 58 74 86 89 93 97
38 82
11 25 58 74 86 89 93 97
Case-3: If deleting an element k from a node leaves it with less than its minimum
number of elements, then elements can be borrowed from either of its sibling nodes. If the
left sub tree node is capable to spare the element then its largest element is shifted into the
parent node. If the right sub tree node is capable to space the element then its smallest
element is shifted into the parent node. From the parent node the intervening element is
shifted down to fill the vacancy created by the deleted element.
11 25 38 58 74 86 89 93 97
Deletion (58):
40 82
11 25 38 74 86 89 93 97
38 82
11 25 40 74 86 89 93 97
Case-4: If deletion of an element is making the elements of the node to be less than its
minimum number and either of the sibling nodes have no chance of sparing an element, then
this node is merged with either of the sibling nodes including the intervening element from
the parent node.
38 82
11 25 40 74 86 89 93 97
Deletion (25):
82
11 38 40 74 86 89 93 97
Visvodaya Technical Academy Page 137
iii) B+ - TREE
B+ - Tree can be viewed as B-Tree in which it has two types of nodes – index nodes
and data nodes.
The index nodes store keys (not elements) and pointers and the data nodes store
elements.
The data nodes are linked together, in left to right order, to form a doubly linked list.
Definition:
1. All data nodes are the same level and are leaves. Data nodes contain elements only.
2. The index nodes define a B-tree of order m; each index node has keys but not
elements.
3. Let n, A0 , (K1 , A1) , (K2 , A2) , - - - , (Kn , An)
Where, Ai are pointers to subtrees and the Ki are the keys be the format of
some index node.
All elements in the subtree Ai have keys less than Ki+1 and greater than or
equal to Ki.
Example:
20 40
10 30 70 80
2 12 20 32 40 71 80
4 16 25 36 50 72 82
6 18 60 84
OPERATIONS ON B+ - TREE
Search Operation
In B+ - Tree search procedure begin at the root node. From the definition of B + - tree,
search element is less than the specific key index value then control moves to left supporting
node; otherwise, moves to right supporting node until the data node reaches. Then compare
the search element with data key values. If element found, then it refers to successful search;
otherwise, it refers to unsuccessful search.
Insertion Operation
To implement insertion operations, first apply search operation to locate proper node
for insertion. Then it falls into different cases such as:
Case – 1: The data node is not full and has space for insertion then; simply insert the key
into the node.
Example:
20 40
10 30 70 80
2 12 20 32 40 71 80
4 16 25 36 50 72 82
6 18 60 84
Insertion (27) :
20 40
10 30 70 80
2 12 20 32 40 71 80
4 16 25 36 50 72 82
6 18 27 60 84
Case – 2: The data node is full, and then the overall node is split into two by moving the
largest half of the elements into a new node, which is then inserted into the doubly linked list
of data nodes. The smallest key in the new node is placed as a pointer in the parent node.
Example:
20 40
10 30 70 80
2 12 20 32 40 71 80
4 16 25 36 50 72 82
6 18 60 84
Insertion (14) :
20 40
10 16 30 70 80
2 12 16 20 32 40 71 80
4 14 18 25 36 50 72 82
6 60 84
Case – 3: The data node is full, and then the overall node is split into two by moving the
largest half of the elements into a new node, which is then inserted into the doubly linked list
of data nodes. The smallest key in the new node is placed as a pointer in the parent node.
But the parent index node is full and locations are not empty, then split the index node into
two are more it to upper level if necessary.
Example:
20 40
10 16 30 70 80
2 12 16 20 32 40 71 80
4 14 18 25 36 50 72 82
6 60 84
Insertion (86) :
40
20 80
10 16 30 70 84
2 12 16 20 32 40 71 80 84
4 14 18 25 36 50 72 82 86
6
Visvodaya Technical Academy 60 Page 141
Deletion Operation
In B+ tree, keys are available in leaves. Hence, elements are deleted from the leaf
node. Assume the data node consists at least ┌ c / 2 ┐ elements, where c is the capacity of B +
tree.
20 40
10 30 70 80
2 12 20 32 40 71 80
4 16 25 36 50 72 82
6 18 60 84
To delete any element, first search the node in which the key element exists. Then,
deletion process falls into different cases as:
Case – 1: If the deleting element doesn’t effects the properties of the B + tree, simply
delete the key element from the data node.
Deletion (40):
20 40
10 30 70 80
2 12 20 32 50 71 80
4 16 25 36 60 72 82
6 18 84
Case – 2: If the deleting element deficient the data node, then check either its nearest
right or left sibling and determine whether the checked sibling has more than the required
minimum number of ┌ c / 2 ┐ of elements.
If left node has an excess element, borrow the largest element and if the right node has
an excess element, borrow the smallest element and update the in-between key in the parent
node.
Deletion (71):
20 40
10 30 70 82
2 12 20 32 40 72 82
4 16 25 36 50 80 84
6 18 60
Case – 3: If the deleting element deficient the data node, then check either its nearest
right or left sibling and determine whether the checked sibling has more than the required
minimum number of ┌ c / 2 ┐ of elements. If the sibling does not have minimum number of
keys then combines the two data nodes into single node and update the parent node according
to the new node values.
20 40
10 30 70 82
2 12 20 32 50 72 82
4 16 25 36 60 80 84
6 18
Visvodaya Technical Academy Page 143
Deletion (80):
20 40
10 30 70
2 12 20 32 50 72
4 16 25 36 60 82
6 18 84
END
Unit-IV
The Graph Abstract Data Type, Elementary Graph Operations, Minimum Cost Spanning
Trees, Shortest Paths and Transitive Closure.
Hashing: Introduction to Hash Table, Static Hashing, Dynamic Hashing.
***
GRAPHS
A graph G=(V,E) consists of a finite non-empty set of vertices V also called as nodes
and a finite set of edges E also called as arcs.
Example:
e1
a b
e2 e3
e5
c d
e4
GRAPH TERMINOLOGIES
Digraph: A graph in which every edge is directed is called a digraph. A digraph is also
known as a directed graph.
Example:
e1
a b
e2 e3
c d
e4
Example: e1
a b
e3
e2 c
Where, V = {a,b,c} and E = {e1 , e2 , e3}
e1 = (a,b) e2 = (a,c) e3 = (b,c)
Mixed Graph: A graph in which some edges are directed and some edges are
undirected is called a mixed graph.
Example: e1
a b
e2 e4
c d
e3
Where, V = {a,b,c,d} and E = {e1 , e2 , e3 , e4}
e1 = (a,b) e2 = <a,c> e3 = <c,d> e4 = (a,d)
Weighted Graph: A graph is termed as a weighted graph if all the edges in it are labeled
with some weight values.
Example: 5
7 a b
3
c d
9
Adjacent Vertices: A vertex Vi is adjacent / neighbor of another vertex Vj, if there is an
edge from Vi to Vj.
Self Loop: If there is an edge whose starting and ending vertices are same is called
as a loop or self loop.
Example:
a Self loop at vertex c
e2 e4
e1
b c
e3
In-degree of a Vertex: The number of edges coming into the vertex Vi is called the in-
degree of vertex Vi.
Out-degree of a Vertex: The number of edges going out from a vertex Vi is called the
out-degree of vertex Vi.
Degree of a Vertex: Sum of out-degree and in-degree of a node V is called the total
degree of the node V and is denoted by degree (V).
Example:
a In-degree (a) = 1
Out-degree (a) = 2
b d
Complete Graph:
Note: An n-vertex, undirected graph with exactly n(n-1) / 2 edges is said to be complete
graph.
Connected Graph:
Example: V
2
V V
1 4
V
3
A digraph is said to be strongly connected graph if for every pair of distinct vertices
vi , vj in G, there is a directed path from vi to vj and also from vj to vi.
Example: V
1
V V
4 2
V
3
Acyclic Graph: If there is a path containing one or more edges which starts from a
vertex vi and terminates into the same vertex then the path is known as a cycle. If a graph
does not have any cycle then it is called as acyclic graph.
Example: V
1
V V
4 2
V
3
Sub Graph: A sub graph of G is a graph G1 such that V(G1) is a subset of V(G) and
1
E(G ) is a sub set of E(G).
1 2
0
0 0
1 2
GRAPH REPRESENTATIONS
A graph can be represented in many ways. Some of the important representations are:
Set representation, Adjacency matrix representation, Adjacency list representations
etc.,
Set representation:
One of the straight forward methods of representing any graph is set representation.
In this method two sets are maintained: V as the set of vertices and E as the set of edges
which is a subset of V x V.
In case of weighted graph, V as the set of vertices and E as the set of edges which is a subset
of W x V x V.
V
Example:
1
V(G) = {V1,V2,V3,V4,V5,V6,V7} V
V
E(G) = { (V1,V2), (V1,V3), (V2,V4), 3
(V2,V5), (V3,V4), (V3,V6), 2
V
(V4,V7), (V5,V7), (V6,V7) }
4
V
V
5 6
V
7
Example:
5 V(G) = {A, B, C, D}
A B E(G) = { (3,A,C), (5,B,A),
(1,B,C), (7,B,D), (2,C,A),
2 3 1 6 7 (4,C,D), (6,D,B), (8,D,C)}
4
C D
8
Linked representation:
Node Adjacency
Node structure of Non-weighted graph:
Information List
In linked representation, the number of lists depends on the number of vertices in the
graph.
Example:
V1 V2 V3
V
1 V2 V1 V4 V5
V
V
3 V3 V1 V4 V6
2
V
4 V4 V2 V3 V7
V V5 V2 V7
V
6
5 V6 V7
V3
V
7 V7 V4 V5 V6
Example:
5
A B
2 3 1 6 7
4
C D
8
A 3 C
B 1 C 7 D 5 A
C 2 A 4 D
D 6 B 8 C
Sequential representation:
Sequential (Matrix) representation is the most useful way for representing any graphs.
For this, different matrices are allowed as Adjacency matrix, Incidence matrix, Circuit matrix,
Cut set matrix, Path matrix etc.
Example: V1 V2 V3 V4
V 0 1 1 0
3 1 0 1 0
V A =
V 4 1 1 0 1
1 0 0 1 0
V
2
Example:
V 0 1 1 1
3 0 0 0 1
V A =
V 4 0 1 0 1
1 0 0 0 0
V
2
ADT Graph
{
Data Members: A non empty set of vertices and a set of undirected edges.
Functions:
OPERATIONS ON GRAPHS
Insertion operation:
Insertion of a vertex into the graph involves the way of insertion and establishes
connectivity with other vertices in the existing graph.
In case of directed graph, if Vx is inserted and Vi be its adjacent vertex then based on
the directions necessary entries are incorporated in the adjacency list.
Example: Insertion(V5)
V
V1 V2 V3
3
V
V 4 V2 V1 V3 V5
1
V
V3 V1 V2 V4
2 V V4 V3 V5
5
V5 V2 V4
Deletion operation:
For deleting a vertex from a graph, identify all the vertices that are adjacent to the
vertex and break the edges.
Example:
V
V1 V2 V3
3
V
V 4 V2 V1 V3 V4
1
V
V3 V1 V2 V4
2
V4 V2 V3
Deletion (V4):
V
V1 V2 V3
3
V V2 V1 V3
1
V
V3 V1 V2
2
Merging operation:
Consider two graphs G1 and G2. Merging operation refers to the process of
combining these two graphs as a single graph G. This is accomplished by establishing one or
more edges between the graphs G1 and G2.
Example:
V W
V 1 Merge((V4,W1), (V3,W2)
1
4 + W
V 3
2 V W
3 2
V W
1 V 1
4 W
V 3
2 V W
3 2
Traversing a graph means visiting all the vertices in the graph exactly once. Graph
traversal techniques are:
a) Breadth First Search (BFS) Traversal
b) Depth First Search (DFS) Traversal
The traversal starts from a vertex u which is said to be visited. Now all nodes V i
adjacent to u are visited. The unvisited vertices Wij adjacent to each of Vi are visited next and
so on. This process continues till all the vertices of the graph are visited.
BFS uses a queue data structure that keep a track of order of nodes whose adjacent
nodes are not to be visited.
Algorithm BFS (u): This procedure visits all the vertices of the graph starting from the
vertex u.
Step 1: Initialize a queue as Q
Visited(µ) ← 1
Enqueue(Q, u)
Step 2: Repeat WHILE NOT EMPTY(Q)
Dequeue(Q,u)
WRITE (u)
For all vertices V adjacent to u
IF Visited (V) = 0 THEN
Enqueue(Q,V)
Visited (V) ← 1
ENDIF
EndFor
EndRepeat
Step 3: RETURN
5 4
2
1
9
7
3
6
1
0
8
The traversal starts from a vertex u which is said to be visited. Now all nodes V i
adjacent to u are collected and the first adjacent vertex V j is visited. The nodes adjacent to V j
namely Wk are collected and the first adjacent vertex is visited. The traversal progresses until
there are no more visits possible.
DFS uses a stack data structure that keep a track of order of nodes whose adjacent
nodes are not to be visited.
Algorithm DFS (u): This procedure visits all the vertices of the graph starting from the
vertex u.
Step 1: Visited(u) ← 1
WRITE (u)
Step 2: For each vertex V adjacent to u
IF Visited (V) = 0 THEN
Call DFS(V)
ENDIF
EndFor
Step 3: RETURN
A C E
Analysis: For the adjacency list representation, each node determines the links search
operation is O(e) time. For adjacency matrix representation, the determining all vertices
adjacent to the vertex requires O(n) time. Therefore, the total time is O(n2).
CONNECTED COMPONENTS
Example:
1 1
2 3 2 3
4 4
If the graph is connected undirected graph, then we can visit all the vertices of the
graph by using either breadth first search or depth first search. The subgraph which has been
obtained after traversing the graph using either BFS or DFS represents the connected
component of the graph.
Example:
0
1 2
3 5 6
4
0
0
1 2
1 2
4
3 5 6
3 5 6 4
7
7
void components(G, n)
{
int i;
for(i =0 ; i<n; i++)
Visited[i] = 0;
Visvodaya Technical Academy Page 156
Analysis:
1. If the graph G is represented by its adjacency lists, then the total time needed to
generate all the connected components is O(n+e) time.
2. If the graph G is represented by its adjacency matrix, then the total time needed to
generate all the connected components is O(n2) time.
SPANNING TREES
Example:
B
A
C
D
C C C
D D D
Note:
1. If the graph contains n vertices, then spanning tree contains exactly n-1 edges to
connect the n vertices.
2. When breadth first search traversal applied on a graph, the resultant spanning tree is
known as a breadth first spanning tree.
3. When depth first search traversal applied on a graph, the resultant spanning tree is
known as a depth first spanning tree.
A Spanning Tree for G is a sub-graph of G that it is a free tree connecting all vertices
in V. The cost of a spanning tree is the sum of costs on its edges.
The most important techniques used for finding minimum cost spanning trees are:
Prim’s algorithm, Kruskal’s algorithm and Sollin’s algorithm.
Prim’s Algorithm
Prim’s algorithm to find minimum cost spanning tree uses greedy method. In this
method, edge by edge is selected based on optimization criteria.
Step 1: Randomly choose any vertex. The vertex connecting to the edge having least
weight is usually selected.
Step 2: Find all the edges that connect the tree to new vertices. Select the least weight
edge among those edges and include it in the existing tree. If including that edge creates a
cycle, then reject that edge and look for the next least weight edge.
Step 3: Keep repeating step-02 until all the vertices are included and Minimum
Spanning Tree (MST) is obtained.
T = { };
TV = {0};
while( T contains less than n-1 edges )
{
Visvodaya Technical Academy Page 158
Let (u,v) be a least cost edge such that u € TV and v not belongs to TV;
if(there is no such edge)
break;
add v to TV;
add (u,v) to T;
}
if(T contains fewer than n-1 edges)
printf(“ No Spanning Tree”);
Example: Design minimum cost spanning tree for the following graph
Kruskal’s Algorithm
In kruskal’s algorithm the edges of the graph are considered in non decreasing order
of cost.
In this method, place all vertices as in forest. First lowest cost edge is selected. Then
the next lowest cost edge is selected and so on. The set T of edges so far selected for the
spanning tree be such that it is possible to complete T into a tree. Thus T may not be a tree at
all stages in the algorithm.
T = { };
while( T contains less than n-1 edges && E is not empty)
{
Choose a least cost edge (v,w) from E
Delete (v,w) from E
if((v,w) does not create a cycle in T)
add (v,w) to T
else
discard (v,w)
}
if(T contains fewer than n-1 edges)
printf(“ No Spanning Tree”);
Example: Design minimum cost spanning trees for the following graphs
Sollin’s Algorithm
Example: Design minimum cost spanning tree for the following graph
BICONNECTED COMPONENTS
Examples:
Example:
1
5
4
3
Example:
1 0 3
2 4
1 0 0 3 3
2 4
APPLICATION OF GRAPHS
Graph are used to represent networks, road maps, facebook etc. In addition to this,
graphs are used in different application areas which include:
In a graph, the vertices represent cities and the edges represent sections of the
highway. Each edge has a weight representing the distance between the two cities connected
by the edge. Aim of the problem is to find a path between two vertices in such a way that the
path will satisfy optimization criteria. The starting vertex of the path will be referred to as the
source and the last vertex be the destination.
Graphs are used to represent highway structure with vertices represent cities and the
edges represent sections of the highway. Each edge has a weight representing the distance
between the two cities connected by the edge. Everyone is interested to move from one city
to another city with minimum distance.
Let G=(V,E) be a weighted graph. The aim of the problem statement is to determine
the shortest paths from vertex V0 to all remaining vertices of G.
The Greedy strategy (Dijkstra’s algorithm) generates shortest paths from vertex V0
to the remaining vertices in non-decreasing order of their path length. Let S denotes the set
of vertices including V0 to which the shortest path have already been generated.
Procedure:
Step 1: Create a set S that keeps track of vertices included in shortest path tree.
Initially the set is empty.
Step 2: Assign a distance value to all vertices in the input graph. Initialize all
distance values as INFINITE. Assign distance value as 0 for the
source vertex so that it is picked first.
Step 3: While S doesn’t include all vertices, then
i. Pick a vertex u which is not there in S and has minimum
distance value.
ii. Include u to S.
iii. Update distance value of all adjacent vertices of u. To update
the distance values, iterate through all adjacent vertices. For
every adjacent vertex v, if sum of distance value of u and
weight of edge u-v, is less than the distance value of v, then
update the distance of v.
S[v] ← TRUE
Dist[v] ← 0
Analysis: The time complexity of single shortest path problem is O(n2) time.
All Pairs Shortest Path
Let G=(V,E) be a directed graph with n vertices. Let Cost be an adjacency matrix of
G such that cost(i,i) = 0; cost(i,j) is the length of the edge <i,j> if <i,j> belongs to E(G) and
cost(i,j) = α if <i,j> not belongs to E(G).
All pairs shortest path problem is to determine a matrix A such that A(i,j) is length of
the shortest path from i to j. The matrix A can be obtained by solving n-single source shortest
path problems.
Step 2: Ak(i,j) be the length of the shortest path from node i to node j such that
every intermediate node will be <= k. Now compute Ak = 1, 2, - - - -, n
nodes.
When intermediate vertex k arises, the two possible cases are:
i. Path going from i to j via k.
ii. Path not going via k, then principle of optimality holds.
If k is selected then,
Ak (i,j) = Ak-1(i,k) + Ak-1(k,j)
Otherwise,
Ak (i,j) = Ak-1(i,j)
Algorithm shortestpath(): This procedure is used to generate path matrices that shows
shortest path between every pair of vertices of graph G.
EndRepeat
Step 3: RETURN
1
2 1
2 3 3
1 2
4
Analysis: The first nested for loop takes O(n2) time. The second loop nested for loop
takes O(n3) time. Therefore, the overall time complexity of above procedure is O(n3) time.
TRANSITIVE CLOSURE
Consider a directed graph G with unweighted edges. To determine shortest path from
i to j for all values of i and j, two cases are possible. The first case requires positive path
lengths, while the second requires only nonnegative path lengths. These cases are known as
the transitive closure and reflexive transitive closure of a graph respectively.
Transitive Closure: The transitive closure matrix, denoted A+, of a graph, G, is a matrix
such that
Reflexive Transitive Closure: The reflexive transitive closure matrix, denoted A*, of a
graph, G, is a matrix such that
0 1 2 3 4
For this,
Adjacency matrix A = 0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
0 0 1 0 0
***
TABLES
A table is a non-linear data structure that organizes the given data into rows and
columns. It can be used to store and display data in a structure format. The diagrammatic
representation of a table data structure is as follows:
Examples: Rectangular tables, Jagged tables, Inverted tables, Hash tables etc.
Rectangular Tables
Rectangular tables are also known as matrices. In matrix format data is organized in
terms of rows and columns.
Example:
* * * *
* * * *
* * * *
Jagged Tables
Jagged tables are the special kind of sparse matrices such as triangular matrices; band
matrices etc. with an additional constraint such that in a row (or in a column) if elements are
present then they are contiguous.
Example:
* * *
* * * * *
* * *
* * *
* * *
* * * *
* *
Inverted Tables
Inverted table is kinds of table that can be avoid multiple set of records while
organizing in a structured format based on the specified constraints.
HASH TABLES
Collections of records are possible to store in a particular table format called Hash
table by using the function known as Hash function. This procedure is known as Hashing.
Example:
4 84
Hash Table Size = 5
3
13
27
2
1 36
0 40
Here,
Assume ‘K’ is the key element. Then key K is stored in position f(K) of the
hash table. Where, f is the hash function. Each position of the hash table is a bucket and
f(K) is the home bucket for the key element K. Each key is mapped into some numbered
cell of the hash table using hash function.
Hash Table: Table used for storing a collection of records into appropriate positions
by using hash function is known as a hash table.
Hash Function: Function used for storing a collection of records into appropriate
positions of the hash table is known as a hash function.
The given set of values is stored into appropriate a position of the hash table with the
help of hash function is known as hashing. Methods used for creating hash functions are:
1. Truncation method
2. Folding method
3. Mid square method
4. Division hash functions etc.
1. Truncation Method
In truncation method, the hash function H of the key ‘K’ is obtained by ignoring a set
of digits and rest of the combination is used as the array index position of the hash table.
H(K) = H(928314275) = 37
Therefore, the given key 928314275 is stored in 37th index position of the hash table.
2. Folding Method
In folding method, partition the key ‘K’ into a number of parts as K 1 , K2 , - - - Kn.
Each of these individual parts is added together and applies truncation method to obtain the
final index position to the key element ‘K’.
Therefore, the given key 928314275 is stored in 11th index position of the hash table.
In mid square method, the hash function H of the key ‘K’ is obtained by selecting an
appropriate number of digits from the middle of the square of the key value ‘K’.
Therefore, the given key 486 is stored in 269th index position of the hash table.
One of the fast and most widely used hash function method is division hash function.
In this method, the hash function H of the key ‘K’ is defined by
H (K) = K%D
Where, K is the key value and D is the length/size of the hash table.
Ideal choice is D is taken as a prime number.
Example: Assume the key K = 486 and size of the hash table is 5.
Therefore, the given key 486 is stored in 1st index position of the hash table
Overflow: While inserting keys into the hash table, if empty locations are not available
for further insertion then status of the hash table is referred to as overflow situation.
Here, insertion is not possible. Since, the hash table is full with the given keys.
Overflow occurs.
Overflow problem can be avoided by determining size of the hash table as
greater than the number of inserting keys.
Collision: While inserting keys into the hash table, if two or more keys tried to access the
same location of the hash table then status of the hash table is referred to as collision
situation.
4
Key 3 => H(3) = 3%5 = 3
3 Key 7 => H(7) = 7%5 = 2
Key 15 => H(15) = 15%5 = 0
3
2 Key 60 => H(60) = 60%5 = 0
1 7
0 15
A. Closed hashing
B. Open hashing
A) CLOSED HASHING
The simplest method to resolve a collision problem is closed hashing process. In this
process, whenever a collision occurred then alternative locations are tried until an empty
location is found.
The alternative empty locations can be obtained with various location methods as:
i) Linear probing
ii) Quadratic probing
iii) Double hashing
iv) Rehashing
In every method, the table and locations are made as in circular fashion.
i) Linear Probing:
When values are substituted, empty locations are probed as 1 st location from collision,
nd
2 location from collision and so on.
0 1 2 3 4
15 60 7 3
f(i) = i
i.e. 1st location from collision. This is an empty location. Hence, the key 60 is possible to
insert into index position 1 of the hash table.
When values are substituted, empty locations are probed as 1 st location from collision,
4th location from collision, 9th location from collision and so on.
Example: Insert the keys 89, 18, 49 and 58 into a hash table of size 10.
0 1 2 3 4 5 6 7 8 9
49 58 18 89
Here, collision occurred. Since, other element 89 is available in 9th index position.
f(i) = i2
f(i) = i2
Hence, the key 58 is possible to insert into index position 2 of the hash table.
f(k) = i * h1 (k)
Where i = 1, 2, 3, - - - -
h1 (k) = q – (k mod q)
Where q is a prime number
Example: Insert the keys 89, 18, 49, 58 and 69 into a hash table of size 10.
0 1 2 3 4 5 6 7 8 9
69 58 49 18 89
Here, collision occurred. Since, other element 89 is available in 9th index position.
iv) Rehashing:
In rehashing method if more than half of the locations are filled with the given keys
then build another hash table that is twice as big in size as the old hash table and scan down
all the keys into the new hash table.
Example: Insert the keys 13, 15, 24 and 60 into a hash table of size 7.
0 1 2 3 4 5 6
15 24 60 13
At this stage, it uses the hash function H (k) = k % 7 and more than half of the hash
table is full with the given keys.
Now, if we like to insert another key as 23 into the hash table. Then it is better to
apply rehashing process.
In this method, create a new table of size 17. Since 17 is the first prime number
which is near to the twice of the old hash table size. So that new hash function becomes:
H (k) = k % 17
After rehashing process, the elements positioned into the new table as:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
23 24 13 13 15
B) OPEN HASHING
Open hashing is also known as separate chaining method. This method uses a hash
table as an array of pointers; each pointer points to a linked list. Hence, this method also
treated as linked list collision resolution.
In this procedure first hash table location of the given key is calculated with the help
of the division hash function. Now control enters into the particular location and then
searches the linked list pointed by the pointer in the same location. Then the key is inserted
into the location at the end of the list.
The main advantage of this type of hashing is the keys are inserted into the same
index positions specified by the division hash function.
Example: Insert the keys 10, 24, 12, 23, 36, 57, 64, 32, 54 into a hash table of
size 10.
9
H(10) = 10%10 = 0
H(24) = 24%10 = 4 8
H(12) = 12%10 = 2
H(23) = 23%10 = 3 7 57
H(36) = 36%10 = 6
H(57) = 57%10 = 7 6 36
H(64) = 64%10 = 4
H(32) = 32%10 = 2 5
H(54) = 54%10 = 4
4 24 64 54
3 23
2 12 32
1
10
0
Load Factor:
Load factor (α) of hashed list is the number of filled elements in the list divided by the
number of physical elements allocated for the list, expressed as a percentage. i.e., load factor
is given by
α = (k/n) * 100;
Where,
k is the number of filled elements and n is the total number of elements allocated to
the list.
Clustering:
As data are added to a list and collisions are resolved, then data keys are spread
unevenly across the hashed list. This tendency of data to build up unevenly across a hashed
list is known as clustering.
Clustering is divided into two types as: primary clustering and secondary clustering.
BUCKET HASHING
In bucket hashing method, consider a hash table with M slots that is divided into B
buckets with each bucket consists of M/B slots.
The hash function assigns each record to the first slot within one of the specified
buckets. If the slot is occupied with other key; then the bucket slots are searched sequentially
until an open slot is found. If a bucket is entirely full with the given keys then the new key is
placed in an overflow bucket table which is maintained at the end of the hash table. All
buckets of the hash table shares the same overflow bucket.
Example: Insert the keys 23, 16, 30, 20, 48, 25 and 8 into a hash table of size 10.
Hash Table:
0 1 2 3 4 5 6 7 8 9
30 20 16 23 48
B0 B1 B2 B3 B4
0 1 2 3 4 5
25 8
i.e. Key 20 is inserted into the first location of B0 bucket. But location is not empty.
Hence, control moves to next location of the B0 bucket which is empty. Hence insert the key
20 into second location of B0 bucket.
***
Unit-V
File Organization: Sequential file organization, Direct file organization, Indexed sequential
file organization.
Advanced sorting: Sorting on several keys, Lists and Table sorts, Summary on Internal
sorting and External sorting
***
INTRODUCTION
A file is a collection of records which contain data about individual entities. A record
may be decomposed into more units called fields which contain specific values.
Here,
Organization Access
Sequential File Sequential
Direct File Direct
Indexed Sequential File Sequential and Direct
In sequential file organization, elements are stored in sequential order i.e., i+1th
element of a file is stored immediately after the ith element as:
a) Sequential search
b) Binary search
c) Interpolation search etc.
a) Sequential Search
In sequential search technique, compare the key element with each element of K form
index 1 to index n. At any position i, if K[i] = Key, then return index i value refers to
successful search (Element Found). For the entire procedure K[i] ≠ Key and elements are not
available for further comparison, then return -1 refers to unsuccessful search (Element not
Found). For this, the functional procedure can be placed as:
Analysis:
Hence, the worst case and average case time complexity of sequential search is O (n).
b) Binary Search
Binary search can be applied only on the array elements which is available in sorted
order. Consider K is an array consists of n elements in sorted order such that K[1] ≤ K[2]
≤ . . . . ≤ K[n]. Suppose an item of information is given to search in the variable Key. Then
binary search procedure works as:
Mid
Where, Low refers to the first index and High refers to last index of the array at the
initial call. Now, the process falls into any one of the following three cases.
Case 1: If Key = K[Mid]; Then the search is successful search i.e., Element Found.
Case 2: If Key > K[Mid]; Then the Key element can appear only in the right half of
the array. So, we reset the Low value as Low = Mid+1 and begin search
again.
Case 3: If Key < K[Mid]; Then the Key element can appear only in the left half of the
array. So, we reset the High value as High = Mid-1 and begin search again.
The above procedure is repeated up to we reach Low > High. When we obtain this
condition, it indicates that the search is unsuccessful search i.e., Element not found.
Analysis:
The complexity is measured by the number of comparisons to locate the search item
in the given array elements.
In binary search, each comparison reduces the size of the array into half. So that
number of comparisons is less compare to linear search. Hence, the worst case and average
case time complexity of binary search is O (log n).
c) Interpolation Search
Interpolation search is another searching technique can be used only when the list of
elements is ordered and uniformly distributed.
The interpolation search is an improvement over binary search for instances, where
the values in a sorted array are uniformly distributed. Binary search always goes to the
middle element to check. On the other hand, interpolation search may go to different
locations according to the value of key being searched.
Let K is an array that consists of n elements from index 0 to n-1. Search element is
given in the variable Key. Then interpolation search procedure works as:
Step 1:In a loop, calculate the value of “Pos” using the formula
Where, Low refers to starting index 0 and High refers to ending index n-1 at the first
call.
Step 2:Compare K[Pos] element with Key element. If it is a match, return Pos
position that refers to element found.
Step 3:If the Key element is less than K[Pos] element, calculate the next position at
the left sub-array by changing High value as High = Pos – 1.
Step 4: If the Key element is greater than K[Pos] element, calculate the next position
at the right sub-array by changing Low value as Low = Pos + 1.
Step 5:Repeat the above procedure until the search element is found or the sub-array
reduces to zero.
Pos=Low+((Key-K[Low])*(High-Low)/(K[High]-K[Low]));
if(K[Pos]==Key)
return Pos;
else if(K[Pos]<Key)
Low=Pos+1;
else
High=Pos-1;
}
return -1;
}
Analysis:
The interpolation search is an improvement over binary search for instances, where
the values in a sorted array are uniformly distributed. Binary search always goes to the
middle element to check. On the other hand, interpolation search may go to different
locations according to the value of key being searched.
Hence, the worst case complexity of interpolation search is O(n) and average case
time complexity of interpolation search is O (log2 log2 n).
KEY OBSERVATIONS
Sequential search is preferable when data is stored in primary memory and need to be
handling limited amount of data.
Binary search is preferable when data is stored in primary memory and it must be
available in sorted order.
Interpolation search is preferable when data is stored in auxiliary memory due to
additional calculations and it must be available in sorted order and uniformly
distributed.
In previous search methods, all the records in the file would be accessed equally
often. To improve the retrieval performance for a sequential search by organizing the file,
consider the frequency of access for the records and place them at the beginning of the file.
So that a self-organizing sequential search modifies their order for the purpose of
moving the most frequently retrieved records to the beginning of the file to improve the
performance of sequential search.
a) Move_to_front Method
The Move_to_front algorithm does handle locality of access well; locality means that
a record that has recently been accessed is more likely to be accessed again in near future.
This process is essentially the same as the LRU (Least Recently Used) paging algorithm used
by the operating system.
A B C D E F G H I J K L M N O P Q R S T U VW X Y Z
Assume frequently occurring records are E and I then, they shifted to front positions as:
E I A B C D F G H JK L M N O P Q R S T U VW X Y Z
Note: Move_to_front is appropriate when space is not limited and locality of access
is important.
b) Transpose Method
In transpose method, the particular search record interchanges with its immediate
predecessor unless it is in the first position. With this approach, a record needs to be accessed
many times before it is moved to the front of the list. This method is stable compared to
Move_to_front and avoid big mistakes.
c) Count Method
The count method keeps a count of the number of access of each record. The record
is then moved in the file to a position in front of all the records with fewer accesses. The file
is then always ordered in a decreasing order of frequently of access.
The disadvantage of the count algorithm is it requires extra storage to keep the count
of the accesses and it does not handle the locality of access phenomenon well.
Note: Because of its storage representation, use it only when the counts are needed
for another purpose.
Direct file organization refers to access the records directly without accessing all the
records. In such cases, sometimes keys were also treated as address. In direct file
organization, we consider two types of methods for finding direct locations as unique address
based on subscript value and hashing methods.
In this method, if the subscript of an array element is given then determine its address
to store in contiguous location as:
1 2 3 4 5 6
B) HASHING PROCESS
The process of mapping a collection of records into appropriate positions of the hash
table is known as a hashing process.
Example: 2 14
46
1
30
0
The given set of values is stored into appropriate a position of the hash table with the
help of hash function is known as hashing. Methods used for creating hash functions are:
1. Truncation method
2. Folding method
3. Mid square method
4. Division hash functions etc.
1. Truncation Method
In truncation method, the hash function H of the key ‘K’ is obtained by ignoring a set
of digits and rest of the combination is used as the array index position of the hash table.
H(K) = H(928314275) = 37
Therefore, the given key 928314275 is stored in 37th index position of the hash table.
2. Folding Method
In folding method, partition the key ‘K’ into a number of parts as K 1 , K2 , - - - Kn.
Each of these individual parts is added together and applies truncation method to obtain the
final index position to the key element ‘K’.
Therefore, the given key 928314275 is stored in 11th index position of the hash table.
In mid square method, the hash function H of the key ‘K’ is obtained by selecting an
appropriate number of digits from the middle of the square of the key value ‘K’.
Therefore, the given key 486 is stored in 269th index position of the hash table.
One of the fast and most widely used hash function method is division hash function.
In this method, the hash function H of the key ‘K’ is defined by
H (K) = K%D
Where, K is the key value and D is the length/size of the hash table.
Ideal choice is D is taken as a prime number.
Example: Assume the key K = 486 and size of the hash table is 5.
Therefore, the given key 486 is stored in 1st index position of the hash table
COLLISION
While inserting keys into the hash table, if two or more keys tried to access the same
location of the hash table then status of the hash table is referred to as collision situation.
4
Key 3 => H(3) = 3%5 = 3
3 Key 7 => H(7) = 7%5 = 2
Key 15 => H(15) = 15%5 = 0
3
2 Key 60 => H(60) = 60%5 = 0
1 7
0 15
a) Closed hashing
b) Open hashing
a) CLOSED HASHING
The simplest method to resolve a collision problem is closed hashing process. In this
process, whenever a collision occurred then alternative locations are tried until an empty
location is found.
The alternative empty locations can be obtained with various location methods as:
i) Linear probing
ii) Quadratic probing
iii) Double hashing
iv) Rehashing
In every method, the table and locations are made as in circular fashion.
i) Linear Probing:
When values are substituted, empty locations are probed as 1 st location from collision,
2nd location from collision and so on.
When values are substituted, empty locations are probed as 1 st location from collision,
4 location from collision, 9th location from collision and so on.
th
f(k) = i * h1 (k)
Where i = 1, 2, 3, - - - -
h1 (k) = q – (k mod q)
Where q is a prime number
iv) Rehashing:
In rehashing method if more than half of the locations are filled with the given keys
then build another hash table that is twice as big in size as the old hash table and scan down
all the keys into the new hash table.
b) OPEN HASHING
Open hashing is also known as separate chaining method. This method uses a hash
table as an array of pointers; each pointer points to a linked list. Hence, this method also
treated as linked list collision resolution.
In this procedure first hash table location of the given key is calculated with the help
of the division hash function. Now control enters into the particular location and then
searches the linked list pointed by the pointer in the same location. Then the key is inserted
into the location at the end of the list.
The main advantage of this type of hashing is the keys are inserted into the same
index positions specified by the division hash function.
Note:
BUCKET HASHING
In bucket hashing method, consider a hash table with M slots that is divided into B
buckets with each bucket consists of M/B slots.
The hash function assigns each record to the first slot within one of the specified
buckets. If the slot is occupied with other key; then the bucket slots are searched sequentially
until an open slot is found. If a bucket is entirely full with the given keys then the new key is
placed in an overflow bucket table which is maintained at the end of the hash table. All
buckets of the hash table shares the same overflow bucket.
***
To improve the performance, order the information and put tabs or an index at some
particular points to group the information. Then search the record with index until we find
the group that desired record should be present. So, that it reduces number of comparisons
compared to normal sequential file accessing method. Still to improve performance, next
high level indexes can also be placed and so on.
100 200
Here,
100 and 200 are high level index values
50, 100, 150, 200 and 250 are low level index values
Now to locate a particular record, first search record key value compared with high
level index value. If the value is less than or equal to specific index value then search process
moves to its left side; otherwise, it moves to its right side. Now, it compares with its low
level index value and applies the same procedure.
Based on the comparison, finally control moves to one particular set of records then
search procedure starts sequentially. If key is found then it refers to successful search;
otherwise, it refers to unsuccessful search.
BASIC STRUCTURE
In general indexed sequential file organization can handle huge amount of data and
uses auxiliary devices to store a collection of records. A file organized with an indexed
sequential structure is commonly referred to as an ISAM (Indexed Sequential Access
Method) file.
To maintain indexed sequential file structure, use tracks as the lowest level of
grouping information and cylinders as the highest level of grouping information.
The cylinder index contains a pair of entries such as key and ptr terms.
Key Ptr
The track in a cylinder has two pairs of entries associated with it in the track index.
One pair contains information on the primary storage area and the other pair has
information on the overflow records associated with the track
Key Ptr
Key Ptr
Primary Overflow
Here, the Key in the primary storage area is the highest key on that track and Ptr
indicates the track containing the primary records. The key in the overflow is the highest key
in the overflow area associated with the track and Ptr indicates the first overflow record, if
one exists for the primary track.
Here, the cylinder index gives the explicit entries for the cylinders 1, 2, 3 and so on.
The highest key on cylinder 1 is 350 and pointer notation is x-y format. Where x gives the
cylinder number and y gives track number where the track index for that cylinder is stored.
So the pointer 1-0 means cylinder 1 and track 0.
Track index focuses on primary storage index values and overflow storage index
values. The track number keys are the maximum value of primary storage area value as same
as overflow storage key value. Track pointer locations are filled with primary storage index
values and overflow pointer values are initially filled with NULL values.
Assume a record of key 8 is needed to add into the existing structure. First compare it
with cylinder index 1 value 350. Here 8<=350 hence, it enters into cylinder 1. Now the
value is compared with track number 0 which is associated with cylinder 1. Here 8<=10
hence control enters into primary storage index value 1 and compare with the values in
sequential order. Here, 8 is placed in between 7 and 9 so that 13 comes out from the index
number 1 and enters into overflow storage with necessary modifications.
Performance: When we observe the above example, it maintains only one pointer
overflow storage index for each primary index values. If number of values are exceed in
overflow storage are then it causes different complexity problems.
Overflow
The overflow area may be implemented in either one of two ways – cylinder
overflow or independent overflow.
In cylinder overflow, the overflow area is one the same cylinder as the primary
storage area. The advantage of cylinder overflow area is that the read/write head on an
auxiliary storage disk does not need to be repositioned to another cylinder to access an
overflow record. The diagrammatic view is as follows:
***
RADIX SORT
Radix sort is one of the sorting algorithms used to sort a list of integer numbers in
order. In radix sort algorithm, a list of integer numbers will be sorted based on the digits of
individual numbers. Sorting is performed from least significant digit to the most significant
digit.
Radix sort algorithm requires the number of passes which are equal to the number of
digits present in the largest number among the list of numbers. For example, if the largest
number is a 3 digit number then that list is sorted with 3 passes.
To implement radix sort consider base of radix value as r. In general base value is
taken as 10 (Decimal number). Based on the base value, maintain 10 individual buckets with
index values from 1 to 9 to store sorted values temporarily at each pass.
The Radix sort algorithm is performed using the following steps as:
Visvodaya Technical Academy Page 193
Solution:
Pass 1: 0:
1: 271
2:
3: 93 33
4: 984
5: 55
6: 306
7:
8: 208
9: 179 859 9
Pass 3: 0: 9 33 55 93
1: 179
2: 208 271
3: 306
4:
5:
6:
7:
8: 859
9: 984
Therefore,
Sorted list of elements = 9 33 55 93 179 208 271 306 859 954
i = 0;
for(k = 0; k < 10; k++)
{
for(j = 0; j < bucket_count[k]; j++)
{
a[i] = bucket[k][j];
i++;
}
}
divisor *= 10;
Analysis:
Radix sort makes d number of passes over the data, each pass taking O(n+r) time.
The value of d will depend on the choice of the radix r and also on the largest key. Hence, the
total computing time is O(d (n+r)) .
A) LIST SORT
Apart from radix sort and merge sort, all the sorting methods require excessive data
movement. As result of comparisons, records may be physically moved. When number of
records is too large, it degrades the performance of sorting procedure. Thus, with large
records, it is necessary to modify the sorting methods so as to minimize data movement.
Methods such as insertion sort and iterative merge sort can be modified to work with
a linked list rather than a sequential list. In this case each record will require an additional
link field. So that simply changes the link fields to reflect the change in the position of the
record in the list. At the end of the sorting process, the records are linked together in the
required order. This modified sorted process is known as list sort.
Example: Consider the following list of records with the link fields as:
i R1 R2 R3 R4 R5
Key 26 5 77 1 61
Link 9 6 0 2 3
Pass-1: first = 4
Interchange R1 and R4 records, then it becomes
i R1 R2 R3 R4 R5
Key 1 5 77 26 61
Link 2 6 0 9 3
Pass-2: first = 2
Interchange R2 and R2 records, then it becomes
i R1 R2 R3 R4 R5
Key 1 5 77 26 61
Link 2 6 0 9 3
Pass-3: first = 4
Interchange R3 and R4 records, then it becomes
i R1 R2 R3 R4 R5
Key 1 5 26 77 61
Link 2 6 9 0 3
Pass-4: first = 5
Interchange R4 and R5 records, then it becomes
i R1 R2 R3 R4 R5
Key 1 5 26 61 77
Link 2 6 9 3 0
i R1 R2 R3 R4 R5
Key 1 5 26 61 77
Link 2 6 9 3 0
Analysis:
If the list consists n records, and number of swap operations is m, then total time
complexity is O(nm).
B) TABLE SORT
The list sort technique is not well suited for quick sort and heap sort. For these sort
methods, as well as for methods suited to list sort, it is necessary to maintain an auxiliary
table as t with one entry per record. The entries in this table serve as an indirect reference to
the records. This modified sorting process is referred to as a table sort.
Consider the list of records are given in the array a, and table index values are in t. In
table sort process, when records are shifted to some other location, table entries are also
changed. At the end of the sorting process, the record with the smallest key is a[t[1]] and that
with the largest key is a[t[n]].
Here, the rearrangement of values followed cycle process. It involves two types such
as trivial cycle and nontrivial cycle.
In a trivial cycle for Ri (i.e., t[i] = i), no rearrangement is required, since the condition
t[i] = i means that the record with the ith smallest key is Ri.
In a nontrivial cycle for Ri (i.e., t[i] ≠ i), rearrangement is required to some other
locations based on the specified conditions and so on.
i R1 R2 R3 R4 R5 R6 R7 R8
Key 35 14 12 42 26 50 31 18
t 3 2 8 5 7 1 4 6
Analysis:
If the list consists n records, and number of swap operations is m only in nontrivial
cycles, then total time complexity is O(nm).
Sorting refers to the arrangement of data items either in the ascending (increasing)
order or in the descending (decreasing) order. Some of the most important sorting
techniques are:
a) Bubble Sort
b) Insertion Sort
c) Selection Sort
d) Quick Sort
e) Merge Sort etc.,
Sorting techniques are classified into two types as: Internal sorting techniques and
External sorting techniques.
Sorting that performed in main memory is called internal sorting i.e., in an internal
sort, all the data is held in primary memory during the sorting process. Internal sorting
techniques are used to handle small amount of data.
Examples: Bubble Sort, Insertion Sort, Selection Sort, Quick Sort etc.
Sorting that performed with the interaction of secondary storage devices like disks or
tapes is called external sorting i.e., in an external sort, some part of the data is in primary
memory during the sorting process and the remaining data is in secondary storage devices
that doesn’t fit in the primary memory. External sorting techniques are used to handle large
volume of data.
Key observations:
1) When we observe several sorting methods, no one method is best under all
circumstances.
2) Internal sorting techniques are good to handle limited amount of data and external
sorting techniques are good to handle large volume of data.
3) Consider the following sorting techniques performance characteristics.
Here,
Insertion sort is good when the list is already partially ordered and consist small
amount of data.
Merge sort has the best average case behavior, but it requires more space
compared to other sorting techniques.
Quick sort average case behavior is good, but it takes more time in case of worst
case.
In such a way they lead to different performance characteristics based on the size
of the input.
Finally, we can conclude that quick sort is best compared to other sorting
techniques since, its average case time complexity is O(n logn) and takes less
space compared to external sorting techniques.
EXTERNAL SORTING
Sorting that performed with the interaction of secondary storage devices like disks or
tapes is called external sorting i.e., in an external sort, some part of the data is in primary
memory during the sorting process and the remaining data is in secondary storage devices
that doesn’t fit in the primary memory. External sorting techniques are used to handle large
volume of data.
Assume that the list (or file) to be sorted resides on a disk. At this stage, a unit of data
is necessary to read from or write to a disk at a time referred to as block. A block generally
consists of several records. For a disk, there are three factors contributing to the read/write
time:
a) Seek time: Time taken to position the read/write heads to correct cylinder. This
will depend on the number of cylinders across which the heads have to move.
b) Latency time:Time until the right sector of the track is under the read/write head.
c) Transmission time: Time to transmit the block of data to/from the disk.
The most popular method for sorting on external storage devices is merge sort. This
method consists of two phases.
First, segments of the input list are sorted using a sorting method. These segments,
known as runs, are written onto external storage as they are generated. Second, the runs
generated in phase one are merged together using merge phases, until only one run is left.
Example: Assume that a file of 2300 records needs to be sorted. Suppose the primary is
capable of sorting a maximum size of 500 records at a time.
In this case, procedure begins by reading and sorting the first 500 records and writing
them into a merge output file. At this stage, the remaining 1800 records are in secondary
storage device. After the first merge run, another 500 records are accessed and stored in
another merge output file and so on. This processing of data into merge runs is known as the
sort phase. After completing sort phase, proceed with merge phase. Merging of the input
merge files into one or more merge output files is known as merge phase. Number of merge
phases depends on the number of merge runs.
In computer technology, merge phases are available in different forms as: Natural
Merge, Balanced Merge and Poly-phase Merge.
Natural Merge
A natural merge, merges a constant number of input files to one output file. Between
each merge phase, a distribution phase is required to redistribute the merge runs to the input
files for remerging.
Example: Input
File:
2300
Records
SORT PHASE
DISTRIBUTION PHASE
MERGE
DISTRIBUTION PHASE
MERGE
Balanced Merge
A balanced merge, merges a constant number of input files to same number of output
files. In addition to this, distribution phase is not required in between the merge phases.
Example: Input
File:
2300
Records
SORT PHASE
MERGE
MERGE
Poly-phase Merge
A poly-phase merge, merges a constant number of input files to one output file. As
the data in each input file are completely merged, it immediately becomes the output file and
the output file becomes as an input file for the next merge phase run.
Example:
SORT PHASE
MERGE
NOTE:
1. Sort phase uses natural merge for merging process is referred as “Natural Two-way
Merge Sort”.
2. Sort phase uses balanced merge for merging process is referred as “Balanced Two-
way Merge Sort”.
3. Sort phase uses polyphase merge for merging process is referred as “Polyphase
Merge Sort”.
SORT STABILITY
Stability of a sorting method refers to the process of arranging two or more data
elements with same key in their relative input order in the output format.
Stable sorting methods : Bubble sort, Insertion sort, Merge sort etc.
Unstable sorting methods : Shell sort, Selection sort, Quick sort etc.
Example:
SORT EFFICIENCY
Among all sorting techniques, quick sort is the best sorting technique. Since, its
average time complexity O(n log n) is less compared to other sorting techniques and takes
less space compared to external sorting techniques.
Quick Sort O(n logn) O(n2) O(n logn) Unstable Partition Exchange
Method
Merge Sort O(n logn) O(n logn) O(n logn) Stable Merging Process
THE END