0% found this document useful (0 votes)
323 views115 pages

Data Structure Notes Ugca 1915

Data structure bca 4 semester

Uploaded by

gauravmandal165
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
323 views115 pages

Data Structure Notes Ugca 1915

Data structure bca 4 semester

Uploaded by

gauravmandal165
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 115

Course Code: UGCA1915

Course Name: Data Structures


Unit -1
Introduction to Data Structures:
Algorithms and Flowcharts, Basics Analysis on Algorithm,
Complexity of Algorithm, Introduction and Definition of Data
Structure, Classification of Data, Arrays, Various types of Data
Structure, Static and Dynamic Memory Allocation, Function,
Recursion.
Arrays, Pointers and Strings:
Introduction to Arrays, Definition, One Dimensional Array and Multi-
Dimensional Arrays, Pointer, Pointer to Structure, various Programs
for Array and Pointer. Strings. Introduction to Strings, Definition,
Library Functions of Strings.
Unit -2
Stacks and Queue
Introduction to Stack, Definition, Stack Implementation, Operations
of Stack, Applications of Stack and Multiple Stacks. Implementation
of Multiple Stack Queues, Introduction to Queue, Definition, Queue
Implementation, Operations of Queue, Circular Queue, De-queue and
Priority Queue.
Unit -3
Linked Lists and Trees
Introduction, Representation and Operations of Linked Lists, Singly
Linked List, Doubly Linked List, Circular Linked List, And Circular
Doubly Linked List.
Trees
Introduction to Tree, Tree Terminology Binary Tree, Binary Search
Tree, Strictly Binary Tree, Complete Binary Tree, Tree Traversal,
Threaded Binary Tree, AVL Tree B Tree, B+ Tree.
Unit 4
Graphs, Searching, Sorting and Hashing
Graphs: Introduction, Representation to Graphs, Graph Traversals
Shortest Path Algorithms.
Searching and Sorting: Searching, Types of Searching, Sorting,
Types of sorting like quick sort, bubble sort, merge sort, selection
sort.
Hashing: Hash Function, Types of Hash Functions, Collision,
Collision Resolution Technique (CRT), Perfect Hashing
UNIT -I Introduction to Data Structures:
1.1 Algorithms
1.1.1 Flowcharts
1.1.2 Complexity of Algorithm
1.1.3 Introduction and Definition of Data Structure
1.1.4 Classification of Data
1.1.5 Various types of Data Structure,
1.1.6 Static and Dynamic Memory Allocation
1.1.7 Function
1.1.8 Recursion.
1.2 Arrays, Pointers and Strings:
1.2.1 Introduction to Arrays
1.2.2 Definition
1.2.3 One Dimensional Array
1.2.4 Multi-Dimensional Arrays
1.2.5 Pointer
1.2.6 Introduction to Strings
1.2.7 Library Functions of Strings
UNIT-II Stacks and Queue
2.1 Introduction to Stack
2.2 Stack Implementation
2.3 Operations of stack
2.4 Applications of stack
2.5 Implementation of Multiple stack
2.6 Introduction to Queue
2.7 Queue implementation
2.8 Operations of queue
2.9 Circular queue
2.10 De-queue
2.11 Priority Queue
Unit -3 Linked lists and Trees
3.1 Introduction
3.2 Representation operations of linked list
3.3 Operations of linked list
3.4 Types of linked list
3.4.1 Singly linked list
3.4.2 Doubly linked list
3.4.3 Circular linked list
3.4.4 Circular doubly linked list
3.5 Introduction to Tree
3.6 Tree Terminology Binary Tree
3.7 Binary Search Tree
3.8 Strictly Binary Tree
3.9 Complete Binary Tree
3.10 Tree Traversal
3.11 Threaded Binary Tree
3.12 AVL Tree
3.13 B Tree
Unit 4 Graphs, Searching, Sorting and Hashing
3.14 Introduction to Graphs
3.15 Representation to Graphs
3.16 Graph Traversals
3.17 Shortest Path Algorithms
3.17.1 Floyd-Warshall Algorithm
3.17.2 Shortest Path Algorithm
3.18 Searching and Sorting
3.18.1 Searching
3.18.2 Types of Searching
3.18.3 Linear Search
3.18.4 Binary Search
3.19 Sorting
3.20 Types of Sorting
3.20.1 Bubble Sort
3.20.2 Insertion Sort
3.20.3 Selection Sort
3.20.4 Quick Sort
3.20.5 Merge Sort
3.20.6 Heap Sort
3.21 Hashing
3.22 Hash Functions
3.23 Collison
3.24 Collision Resolution technique
3.25 Perfect Hashing
UNIT -I Introduction to Data Structures:
1.1 Algorithms
The word “algorithm” relates to the name of the mathematician Al-
khowarizmi, which means a procedure or a technique. Software
Engineer commonly uses an algorithm for planning and solving the
problems. An algorithm is a sequence of steps to solve a particular
problem or algorithm is an ordered set of unambiguous steps that
produces a result and terminates in a finite time
Algorithm has the following characteristics
• Input: An algorithm may or may not require input
• Output: Each algorithm is expected to produce at least one result
• Definiteness: Each instruction must be clear and unambiguous.
• Finiteness: If the instructions of an algorithm are executed, the
algorithm should terminate after finite number of steps
The algorithm and flowchart include following three types of control
structures.
1. Sequence: In the sequence structure, statements are placed one
after the other and the execution takes place starting from up to down.
2. Branching (Selection): In branch control, there is a condition and
according to a condition, a decision of either TRUE or FALSE is
achieved. In the case of TRUE, one of the two branches is explored;
but in the case of FALSE condition, the other alternative is taken.
Generally, the ‘IF-THEN’ is used to represent branch control.
3. Loop (Repetition): The Loop or Repetition allows a statement(s) to
be executed repeatedly based on certain loop condition e.g. WHILE,
FOR loops.
Advantages of algorithm
• It is a step-wise representation of a solution to a given problem,
which makes it easy to understand.
• An algorithm uses a definite procedure.
• It is not dependent on any programming language, so it is easy to
understand for anyone even without programming knowledge.
• Every step in an algorithm has its own logical sequence so it is easy
to debug.

HOW TO WRITE ALGORITHMS


Step 1 Define your algorithms input: Many algorithms take in data to
be processed, e.g. to calculate the area of rectangle input may be the
rectangle height and rectangle width.
Step 2 Define the variables: Algorithm's variables allow you to use it
for more than one place. We can define two variables for rectangle
height and rectangle width as HEIGHT and WIDTH (or H & W). We
should use meaningful variable name e.g. instead of using H & W use
HEIGHT and WIDTH as variable name.
Step 3 Outline the algorithm's operations: Use input variable for
computation purpose, e.g. to find area of rectangle multiply the
HEIGHT and WIDTH variable and store the value in new variable
(say) AREA. An algorithm's operations can take the form of multiple
steps and even branch, depending on the value of the input variables.
Step 4 Output the results of your algorithm's operations: In case of
area of rectangle output will be the value stored in variable AREA. if
the input variables described a rectangle with a HEIGHT of 2 and a
WIDTH of 3, the algorithm would output the value of 6.
1.1.1 Flowcharts
The first design of flowchart goes back to 1945 which was designed
by John Von Neumann. Unlike an algorithm, Flowchart uses different
symbols to design a solution to a problem. It is another commonly
used programming tool. By looking at a Flowchart one can
understand the operations and sequence of operations performed in a
system. Flowchart is often considered as a blueprint of a design used
for solving a specific problem.
Advantages of flowchart:
• Flowchart is an excellent way of communicating the logic of a
program.
• Easy and efficient to analyze problem using flowchart.
• During program development cycle, the flowchart plays the role of a
blueprint, which makes program development process easier.
• After successful development of a program, it needs continuous
timely maintenance during the course of its operation. The flowchart
makes program or system maintenance easier.
• It is easy to convert the flowchart into any programming language
code.
Flowchart is diagrammatic /Graphical representation of sequence of
steps to solve a problem. To draw a flowchart following standard
symbols are use
Symbol Name Symbol function
Oval Used to represent
start and end of
flowchart
Parallelogram Used for input and
output operation

Rectangle Processing: Used for


arithmetic operations
and data-
manipulations
Diamond Decision making.
Used to represent the
operation in which
there are two/three
alternatives, true and
false etc
Arrows Flow line Used to
indicate the flow of
logic by connecting
symbols

Circle Page Connector

Off Page Connector

Predefined
Process /Function
Used to represent a
group of statements
performing one
processing task.
Preprocessor

|-------------- --------- | Comments


|--------------

The language used to write algorithm is simple and similar to day-to-


day life language. The variable names are used to store the values.
1.1.2 Complexity of Algorithm
Algorithm complexity is a measure which evaluates the order of the
count of operations, performed by a given or algorithm as a function
of the size of the input data. To put this simpler, complexity is a rough
approximation of the number of steps necessary to execute an
algorithm. When we evaluate complexity we speak of order of
operation count, not of their exact count. For example if we have an
order of N2 operations to process N elements, then N2/2 and 3*N2 are
of one and the same quadratic order.
Algorithm complexity is commonly represented with the O(f)
notation, also known as asymptotic notation or “Big O notation”,
where f is the function of the size of the input data. The asymptotic
computational complexity O(f) measures the order of the consumed
resources (CPU time, memory, etc.) by certain algorithm expressed as
function of the input data size.
Complexity can be constant, logarithmic, linear, n*log(n),
quadratic, cubic, exponential, etc. This is respectively the order of
constant, logarithmic, linear and so on, number of steps, are executed
to solve a given problem. For simplicity, sometime instead of
“algorithms complexity” or just “complexity” we use the term
“running time”.
Complexity and Execution Time
The execution speed of a program depends on the complexity of the
algorithm, which is executed. If this complexity is low, the program
will execute fast even for a big number of elements. If the complexity
is high, the program will execute slowly or will not even work (it will
hang) for a big number of elements.
If we take an average computer from 2008, we can assume that it can
perform about 50,000,000 elementary operations per second. This
number is a rough approximation, of course. The different processors
work with a different speed and the different elementary operations
are performed with a different speed, and also the computer
technology constantly evolves. Still, if we accept we use an average
home computer from 2008, we can make the following conclusions
about the speed of execution of a given program depending on the
algorithm complexity and size of the input data.

Complexity by Several Variables


Time Complexity
The amount of time needed by a program to complete its execution is
known as Time complexity.
The measurement of time is done in terms of number of instructions
executed by the program during its execution.
Thus Time Complexity depends on the Size of the program and type
of the algorithm being used.
Space Complexity
The amount of memory needed by a program during its execution is
known as Space complexity.
Total memory space need by the program is the sum of following two
memory:
(1) Fixed size Memory: It contains the space required for simple
variables, constants, instructions and fixed size structured variable
such as array.
(2) Variable size Memory: It contains the space required for
structured variable to which memory is allocated run time. It also
contains space required while function is calling itself.
Best case Time Complexity
The measurement of minimum time that is required by an algorithm
to complete its execution is known as Beast Case Time Complexity.
Time complexity of particular algorithm can be calculated by
providing different input values to the algorithm.
Consider an example of sorting N elements. If we supply input values
that is already sorted, an algorithm required less time to sort them.
This is known as Best case time complexity.
However best case time complexity does not guarantee that the
algorithm will always execute within this time for different input
values.
Average case Time Complexity
The measurement of average time that is required by an algorithm to
complete its execution is known as Average Case Time Complexity.

Time complexity of particular algorithm can be calculated by


providing different input values to the algorithm.
Consider an example of sorting N elements. Average time complexity
can be calculated by measuring the time required to complete the
execution of an algorithm for different input values and then calculate
the average time required to sort N elements.
Worst case Time Complexity
The measurement of maximum time that is required by an algorithm
to complete its execution is known as Worst Case Time Complexity.

Time complexity of particular algorithm can be calculated by


providing different input values to the algorithm.
Consider an example of sorting N elements. If we supply input values
that is in reverse order, an algorithm required maximum time to sort
them. This is known as worst case time complexity.
Thus, worst case time complexity always guarantees that the
algorithm will always execute within this time for different input
values.
Asymptotic Notations
Asymptotic Notations are used to describe the complexity of an
algorithm. Complexity of an algorithm indicates how much time
needed by an algorithm to complete its execution for given set of
input data.
The same problem can be solved using different algorithms. In order
to select the best algorithm for a problem, we need to determine how
much time the different algorithma will take to run and then select the
better algorithm.
There are various Asymptotic Notations are available to describe
complexity of an algorithm. Which are
1.Big-OhNotation
2. Big-Omega Notation
3. Big-Theta Notation
4. Little-oh Notation
5. Little-omega Notation

Big - O Notation
Big - O Notation is used to describe the Time complexity of an
algorithm. It means how much time is needed by an algorithm to
complete its execution for the input size of N. For Example a sorting
algorithm take longer time to sort 5000 elements than 50.
Following are commonly used Orders of an algorithm.
(1) O(1): An algorithm that will always execute in the same time
regardless of the size of the input data is having complexity of O(1).
(2) O(n): An algorithm whose performance is directly proportional to
the size of the input data is having complexity of O(n). It is also
known as linear complexity. If an algorithm uses looping structure
over the data then it is having linear complexity of O(n). Linear
Search is an example of O(n).
(3) O(n2): An algorithm whose performance is directly proportional
to the square of the size of the input data is having complexity of
O(n2). If an algorithms uses nested looping structure over the data
then it is having quadratic complexity of O(n2). Bubble sort,
Selection Sort are the example of O(n2).
(4) O(logn): An algorithm in which during each iteration the input
data set is partitioned into to sub parts is having complexity of
O(logn). Quick Sort, Binary Search are the example of O(logn)
complexity.
Arrays
Array is a container which can hold a fix number of items and these
items should be of the same type. Most of the data structures make
use of arrays to implement their algorithms. Following are the
important terms to understand the concept of Array.
 Element − Each item stored in an array is called an element.
 Index − Each location of an element in an array has a numerical
index, which is used to identify the element.

1.1.3 Introduction and Definition of Data Structure


Data: Data is a collection of numbers, alphabets and symbols and
symbols combined to represent information. a computer takes raw
data as input and after processing of data it produces refined data as
output. We might say that computer service is the study of data.
Data Structure
Data Structure is a mathematical or logical way of organizing data in
memory. Data Structure does not represent only data in memory but it
also represents the relationship among the data that is stored in
memory.
There are various operations that can be performed on Data Structure:
(1) Traversal
(2) Insertion
(3) Deletion
(4) Searching
(5) Sorting
(6) Merging

Data Structure allows us to manipulate data by specifying a set of


values, set of operations that can be performed on set of values and
set of rules that needs to be followed while performing operations.
1.1.4 Classification of Data
Atomic data: atomic data are non-decomposable entity. For example,
an integer value 523 or a character value ‘a’ cannot be further divide
the value 523 in three digits ‘5’,’2’ and ’3’ then the meaning may be
lost.
Composite data: it is a composition of several atomic data and hence
it can be further divided into atomic data.
Data types: a data type is a term which refers the kind of data that
variables may hold in a programming language.
Ex. Int x; [ x can hold, integer type data]
Every programming language has a method for every programming
language has a method for declaring a set of variables of a particular
type. A value stored in a variable cannot be interrupted properly
without knowing its type. A byte of information stored in computer
memory represent an integer value, a character value a BCD (binary
coded decimal) value or a Boolean value. Therefore, it is necessary
that the value stored in value, therefore, it is necessary that the value
stored in memory must be treated as of a particular type and
interpreted accordingly.
1.1.5 Various types of Data Structure,
Data Structure can be classified into two categories:
(1) Primitive Data Structure
(2) Non Primitive Data Structure

Primitive Data Structure


"The Data Structure which is directly operated by machine level
instruction is known as Primitive Data Structure."
All primary (built in) data types are known as Primitive data
Structure.
Following are Primitive Data Structure:
Integer
Integer Data Structure is used to represent numbers without
decimal points.
For Example: 23, -45, 11 etc… Using this Data Structure we can
represent positive as well as negative numbers without decimal
point.

Floating point
Float Data Structure is used to represent numbers having decimal
point.
For Example: 3.2, 4.56 etc…
In computer floating point numbers can be represented using
normalized floating point representation. In this type of
representation, a floating point number is expressed as a
combination of mantissa and exponent.

Character
Character is used to represent single character enclosed between
single inverted comma.
It can store letters (a-z, A-Z), digit (0-9) and special symbols.
Non Primitive Data Structure
"The Data Structure which is not directly operated by machine
level instruction is known as Non Primitive Data Structure."
Non Primitive Data Structure is derived from Primitive Data
Structure.
Non Primitive Data Structure are classified into two categories:
(1) Linear Data Structure
(2) Non Linear Data Structure
Linear data structure
"The Data Structure in which elements are arranged such that we
can process them in linear fashion (sequentially) is called linear
data structure."
Following are Examples of Linear Data Structure:
Array
Array is a collection of variables of same data type that share common
name. Array is an ordered set which consist of fixed number of
elements. In array memory is allocated sequentially to each element.
So it is also known as sequential list.
Stack
Stack is a linear Data Structure in which insertion and deletion
operation are performed at same end.
Queue
Queue is a linear Data Structure in which insertion operation is
performed at one end called Rear end and deletion operation is
performed at another end called front end.
Linked list
Linked list is an ordered set which consist of variable number of
elements. In Linked list elements are logically adjacent to each other
but they are physically not adjacent. It means elements of linked list
are not sequentially stored in memory.
Non-Linear data structure
"The Data Structure in which elements are arranged such that we
can not process them in linear fashion (sequentially) is called
Non-Linear data structure."
Non Linear Data Structures are useful to represent more complex
relationships as compared to Linear Data Structures.
Following are Examples of Non-Linear Data Structure:
Tree
A tree is a collection of one or more nodes such that:
(1) There is a special node called root node.
(2) All the nodes in a tree except root node having only one
Predecessor.
(3) Each node in a tree having 0 or more Successor.
A tree is used to represent hierarchical information.
Graph
A Graph is a collection of nodes and edges which is used to represent
relationship between pair of nodes. A graph G consists of set of nodes
and set of edges and a mapping from the set of edges to a set of pairs
of nodes.
1.1.6 Static and Dynamic Memory Allocation
Memory allocation is the process of setting aside sections of memory
in a program to be used to store variables, and instances of structures
and classes.
There are two types of memory allocations possible in C:
1. Compile-time or Static allocation.
2. Run-time or Dynamic allocation (using pointers).
Compile-time or Static allocation
Static memory allocation allocated by the compiler. Exact size and
type of memory must be known at compile time.
int x,y;
float a[5];
When the first statement is encountered, the compiler will allocate
two bytes to each variables x and y. The second statement results into
the allocation of 20 bytes to the array a (5*4, where there are five
elements and each element of float type tales four bytes). Note that as
there is no bound checking in C for array boundaries, i.e., if you have
declared an array of five elements, as above and by mistake you are
intending to read more than five values in the array a, it will still work
without error.
For example we are reading the above array as follows:
For(i=0; i<10; i++)
{
Scanf(“%d”,&a[i]);
}
Run-time or Dynamic allocation
Dynamic memory allocation is when an executing program requests
that the operating system give it a block of main memory. The
program then uses this memory for some purpose. Usually the
purpose is to add a node to a data structure. In object-oriented
languages, dynamic memory allocation is used to get the memory for
a new object.
 The memory comes from above the static part of the data segment.
Programs may request memory and may also return previously
dynamically allocated memory. Memory may be returned whenever it
is no longer needed. Memory can be returned in any order without
any relation to the order in which it was allocated. The heap may
develop "holes" where previously allocated memory has been
returned between blocks of memory still in use.
 A new dynamic request for memory might return a range of
addresses out of one of the holes. But it might not use up all the hole,
so further dynamic requests might be satisfied out of the original hole.
dynamic allocation and de-allocation functions:
 malloc( )
 calloc( )
 free( )
 realloc( )
The Malloc( ) Function
The malloc( ) function allocates a block of memory in bytes. The user
should explicitly give the block sixe it requires of the use. The
malloc( ) is like a request to the RAM of the system to allocate
memory.
Syntax:-
malloc (number of elements * size of each element);
Example :
int *ptr;
ptr = malloc (10*sizeof(int));
The Calloc( ) Function
This function works exactly similar to malloc( ) function except for
the fact that it needs two arguments as against one argument required
by malloc( ).
Example:-
int *ptr;
ptr = (int*) calloc (10,2);
The Free( ) Function
The free( ) function is used to de-allocate the previously allocated
memory using malloc( ) or calloc( ) functions.
Syntax:-
Free(ptr_var);
// where ptr is the pointer in which the address of the allocated
memory block is assigned.
The Realloc( ) Function
This function is used to resize the size of memory block, which is
already allocated. It found use of in two situations :
 If the allocated memory block is insufficient for current application.
 If the allocated memory is much more than what is required by the
current application.
Syntax:-
ptr_var = realloc (ptr_var,new_size);
1.1.7 Function
A function is a set of statements that take inputs, do some specific
computation and produces output.
Need of functions
 Functions help us in reducing code redundancy. If functionality is
performed at multiple places in software, then rather than writing the
same code, again and again, we create a function and call it
everywhere. This also helps in maintenance as we have to change at
one place if we make future changes to the functionality.
 Functions make code modular. Consider a big file having many lines
of codes. It becomes really simple to read and use the code if the code
is divided into functions.
 Functions provide abstraction. For example, we can use library
functions without worrying about their internal working.
Function Declaration
A function declaration tells the compiler about the number of
parameters function takes, data-types of parameters and return type of
function. Putting parameter names in function declaration is optional
in the function declaration, but it is necessary to put them in the
definition.

It is always recommended to declare a function before it is used


Parameter Passing to functions
The parameters passed to function are called actual parameters. For
example, in the above program 10 and 20 are actual parameters.
The parameters received by function are called formal parameters.
For example, in the above program x and y are formal parameters.
There are two most popular ways to pass parameters.
Pass by Value: In this parameter passing method, values of actual
parameters are copied to function’s formal parameters and the two
types of parameters are stored in different memory locations. So any
changes made inside functions are not reflected in actual parameters
of caller.
Pass by Reference Both actual and formal parameters refer to same
locations, so any changes made inside the function are actually
reflected in actual parameters of caller.
Following are some important points about functions in C.
1) Every program has a function called main() that is called by
operating system when a user runs the program.
2) Every function has a return type. If a function doesn’t return any
value, then void is used as return type. Moreover, if the return type of
the function is void, we still can use return statement in the body of
function definition by not specifying any constant, variable, etc. with
it, by only mentioning the ‘return;’ statement which would symbolise
the termination of the function as shown below:
void function name(int a)
{
....... //Function Body
return; //Function execution would get terminated
}
3) Functions can return any type except arrays and functions. We can
get around this limitation by returning pointer to array or pointer to
function.
4) Empty parameter list mean that the parameter list is not specified
and function can be called with any parameters. It is not a good idea
to declare a function like fun(). To declare a function that can only be
called without any parameter, we should use “void fun(void)”.
5)If in a program, a function is called before its declaration then the
compiler automatically assumes the declaration of that function in the
following way:
int function name();
And in that case if the return type of that function is different than
INT ,compiler would show an error.
1.1.8 Recursion
Recursion is one of the most powerful tools in a programming
language, but one of the most threatening topics-as most of the
beginners and not surprising to even experienced students feel.
When function is called within the same function, it is known
as recursion in C. The function which calls the same function, is
known as recursive function.
Recursion is defined as defining anything in terms of itself. Recursion
is used to solve problems involving iterations, in reverse order.
Types of Recursion
There are two types of Recursion
 Direct recursion
 Indirect recursion
Direct Recursion
When in the body of a method there is a call to the same method, we
say that the method is directly recursive.
There are three types of Direct Recursion
 Linear Recursion
 Binary Recursion
 Multiple Recursion
Linear Recursion
 Linear recursion begins by testing for a set of base cases there should
be at least one.
In Linear recursion we follow as under:
 Perform a single recursive call. This recursive step may involve a
test that decides which of several possible recursive calls to make, but
it should ultimately choose to make just one of these calls each time
we perform this step.
 Define each possible recursion call, so that it makes progress
towards a base case.
Binary Recursion
 Binary recursion occurs whenever there are two recursive calls for
each non base case.
Multiple Recursion
 In multiple recursion we make not just one or two but many
recursive calls.
Disadvantages of Recursion
 It consumes more storage space the recursive calls along with
automatic variables are stored on the stack.
 The computer may run out of memory if the recursive calls are not
checked.
 It is not more efficient in terms of speed and execution time.
 According to some computer professionals, recursion does not offer
any concrete advantage over non-recursive procedures/functions.
 Recursive solution is always logical and it is very difficult to trace
(debug and understand).
 In recursive we must have an if statement somewhere to force the
function to return without the recursive call being executed, otherwise
the function will never return.
 Recursion takes a lot of stack space, usually not considerable when
the program is small and running on a PC.
 Recursion uses more processor time.
 Recursion is not advocated when the problem can be through
iteration.
 Recursion may be treated as a software tool to be applied carefully
and selectively.
1.2 Arrays, Pointers and Strings:
1.2.1 Introduction to Arrays
Data structures are classified as either linear or nonlinear. A data
structure is said to be linear if its elements form a sequence, or, in
other words, a linear list. There are two basic ways of representing
such linear structures in memory. One way is to have the linear
relationship between the elements represented by means of sequential
memory locations. These linear structures are called arrays. The other
way is to have the linear relationship between the elements
represented by means of pointers or links.
1.2.2 Definition
Arrays are defined as the collection of similar type of data items
stored at contiguous memory locations.
Arrays are the derived data type in C programming language which
can store the primitive type of data such as int, char, double, float, etc.
Array is the simplest data structure where each data element can be
randomly accessed by using its index number.
For example, if we want to store the marks of a student in 6 subjects,
then we don't need to define different variable for the marks in
different subject. instead of that, we can define an array which can
store the marks in each subject at the contiguous memory locations.
The array marks [10] defines the marks of the student in 10 different
subjects where each subject mark is located at a particular subscript in
the array i.e. marks [0] denotes the marks in first subject, marks
[1] denotes the marks in 2nd subject and so on.
Properties of the Array
1. Each element is of same data type and carries a same size i.e. int =
4 bytes.
2. Elements of the array are stored at contiguous memory locations
where the first element is stored at the smallest memory location.
3. Elements of the array can be randomly accessed since we can
calculate the address of each element of the array with the given base
address and the size of data element.
There are two types of Arrays
 One Dimensional Arrays
 Two Dimensional Arrays
1.2.3 One Dimensional Arrays
 A one-dimensional array is one in which only one subscript
specification is needed to specify a particular element of the array.
 A one-dimensional array is a list of related variables. Such lists are
common in programming.
One-dimensional array can be declared as follows:
Data_type var_name[Expression];
Initializing One-Dimensional Array
ANSI C allows automatic array variables to be initialized in
declaration by constant initializers as we have seen we can do for
scalar variables.
These initializing expressions must be constant value; expressions
with identifiers or function calls may not be used in the initializers.
The initializers are specified within braces and separated by commas.
Int ex[5] ={10,5,15,20,25}
Char word[10] ={‘h’,’e’,’l’,’l’,’o’};
Passing Arrays to Function
 Arrays like other simple variables can be passed to function. To pass,
its name is written inside the argument list in the call statement.
 Arrays are by default passed to function by call by reference method
because array name is itself a pointer to the first memory location of
the array.
 However, we can pass individual array elements through call by
value method also.
1.2.4 Two-Dimensional Array
 Two dimensional arrays are also called table or matrix, two
dimensional arrays have two subscripts.
 Two dimensional array in which elements are stored column by
column is called as column major matrix.
 Two dimensional array in which elements are stored row by row is
called as row major matrix.
 First subscript denotes number of rows and second subscript denotes
the number of columns.
 The simplest form of the Multi Dimensional Array is the Two
Dimensional Array. A Multi Dimensional Array is essence a list
of One Dimensional Arrays.
Two dimensional arrays can be declared as follows:
int int_array2d[10][10]; // A two dimensional array
Initializing a Two Dimensional Array
A 2D array can be initialized like this :
Int array[3][3] = {1,2,3,4,5,6,7,8,9};
1.2.5 Pointers
Pointers are the variables that are used to store the location of value
present in the memory. A pointer to a location stores its memory
address. The process of obtaining the value stored at a location being
referenced by a pointer is known as dereferencing. It is the same as
the index for a textbook where each page is referred by its page
number present in the index. One can easily find the page using the
location referred to there. Such pointers usage helps in the dynamic
implementation of various data structures such as stack or list.
Need of Pointers
Optimization of our code and improving the time complexity of one
algorithm. Since using pointers helps to reduce the time needed by an
algorithm to copy data from one place to another. Since it used the
memory locations directly, any change made to the value will be
reflected at all the locations.
Example:
 Call_by_value needs the value of arguments to be copied every time
any operation needs to be performed.
 Call_by_reference makes this task easier using its memory location
to update the value at memory locations.
Control Program Flow: Another use of pointers is to control the
program flow. This is implemented by control tables that use these
pointers. These pointers are stored in a table to point to the entry point
of each subroutine to be executed one after the other. These pointers
reference the addresses of the various procedures. This helps while
working with a recursive procedure or traversal of algorithms where
there is a need to store the location of the calling step.
1.2.6 Introduction to Strings
Computers are widely used for word processing applications such as
creating, inserting, updating, and modifying textual data. Besides this,
we need to search for a particular pattern within a text, delete it, or
replace it with another pattern. So, there is a lot that we as users do to
manipulate the textual data.
 Strings in C are stored as null character, '', terminated character
arrays.
 This means that the length of a string is the number of characters it
contains one more to store the null character.
 Common string operations include finding lengths, copying,
searching, replacing and counting the occurrences of specific
characters and worlds.
1.2.7 Library Functions of Strings
1. strcpy : strcpy copies a string, including the null character
terminator from the source string to the destination. This function
returns a pointer to the destination string. Its prototype is:
char*strcpy(char*dst,const char*src);
2. strncpy : strncpy is similar to strcpy, but it allows the number of
characters to be copied to be specified.
Char*strncpy(char*dst,const char*src,size_t len);
3. strcat : This function appends a source string to the end of a
destination string. This function returns a pointer to the destination
string, or NULL pointer on error. Its prototype is:
char*strcat(char *dst,const char*src);
4. strncat : This function appends at most N characters from the
source string to the end of the destination string. This function returns
a pointer to the destination string, or a NULL pointer on error. Its
prototype is:
char*strncat(char*dst,const char *src,size_t N);
5. strcmp : Two strings are compred with this function. If the first
string is greater than the second, it returns a number greater than zero.
If the second string is greater, it returns a number less than zero. If the
strings are equal, it returns 0. Its prototype is:
int strcmp(const char*first, const char *second);
6. strncmp : This function compares the first N characters of each
string. If the first string is greater than the second, it returns a number
greater than zero. If the second string is greater, it returns a number
less than zero. If the strings are equal, it returns 0. Its prototype is :
int strncmp(const char*first,const char *second,size_T N);
7. strlen: This function returns the length of a string, not counting the
null character at the end. That is, it returns the character count of the
string, without the terminator. Its prototype is :
size_t strlen(const char *str);
Arrays of Strings
An array of strings is just a two dimensional array of characters.
Consider this:
Char names[7][6] =
{“Abc”,”Tom”,”Chad”,”Hello”,”Xyz”,”SSD”,”TTT”};
Passing Strings to Functions
 It is often best to treat an array as an abstract data type with a set of
allowed operations on the array which are performed by functional
modules.
 Let us return to our exam score example to read and store scores in
array and then print them, except that we now wish to use functions to
read and print the array.
 Read an array and print a list of scores using functional modules.
n = read_intarray(exam_scores,MAX);
print_intarray(exam_scores,n) ;
UNIT-II Stacks and Queue
2.1 Introduction to Stack
There are certain frequent situations in computer science when one
wants to restrict insertions and deletions so that they can take place
only at the beginning or the end of the list, not in the middle. Two of
the data structures that are useful in such situations are stacks and
queues.
A stack is a linear structure in which items may be added and
removed only at one end. Everyday Example of such data structure is
a stack of dishes, a stack of pennies and a stack of folded towels.
Observe that an item may be added or removed only from the top of
any of the stacks. This means, in particular, that the last item to be
added to a stack is the first item to be removed. Stacks are also called
Last in first out (LIFO) lists. Other names used for the stacks are
“piles” and “push-down lists”. Although the stack may seem to be a
very restricted type of data structure, it has many important
applications in computer science.
A stack is a basic data structure that can be logically thought of as a
linear structure represented by a real physical stack or pile, a structure
where insertion and deletion of items takes place at one end called top
of the stack. The basic concept can be illustrated by thinking of your
data set as a stack of plates or books where you can only take the top
item off the stack in order to remove things from it. This structure is
used all throughout programming.
The basic implementation of a stack is also called a LIFO (Last In
First Out) to demonstrate the way it accesses data, since as we will
see there are various variations of stack implementations.
2.2 Operations of stack
There are basically three operations that can be performed on stacks.
They are
1) Inserting an item into a stack (push).
2) Deleting an item from the stack (pop).
3) Displaying the contents of the stack (pip).
Stacks are dynamic data structures that follow the Last In First Out
(LIFO) principle. The last item to be inserted into a stack is the first
one to be deleted from it.
For example, you have a stack of trays on a table. The tray at the top
of the stack is the first item to be moved if you require a tray from
that stack
Inserting and deleting elements
Stacks have restrictions on the insertion and deletion of elements.
Elements can be inserted or deleted only from one end of the stack i.e.
from the top The element at the top is called the top element. The
operations of inserting and deleting elements are called push() and
pop() respectively.
When the top element of a stack is deleted, if the stack remains non-
empty, then the element just below the previous top element becomes
the new top element of the stack.
For example, in the stack of trays, if you take the tray on the top and
do not replace it, then the second tray automatically becomes the top
element (tray) of that stack.
Features of stacks
 Dynamic data structures
 Do not have a fixed size
 Do not consume a fixed amount of memory
 Size of stack changes with each push() and pop() operation. Each
push() and pop() operation increases and decreases the size of the
stack by 1 , respectively.
A stack can be visualized as follows:

2.3 Stack Implementation


Array representation of Stack
Stack can be represented by means of a one-way list or a linear array.
A pointer variable top contains the locations of the top element of the
stack and a variable max stk gives the maximum number of elements
of the Stack that can be held by Stack. the condition top=0 will
indicate that the stack is empty.
Push Operation on Stack
This procedure pushes an item onto the Stack via Top.
PUSH(Stack, Top,MaxStk,Item)
1. If Top==MaxStk // check stack already fill or not then print
“overflow” and return
2. Set Top=Top+1 // increase top by 1
3. Stack[Top]=Item // insert item in new top position
4. Return
Pop operation on Stack
This procedure deletes the top element of Stack and assigns it to the
variable item.
POP(Stack, top,Item)
1. If Top==Null //check stack top element to be deleted is empty
then print “Underflow” and return
2. Item = Stack[Top] /// assign top element to item
3. Set Top =Top-1 // decrease top by 1
4. Return
Stack using Linked List
The major problem with the stack implemented using array is, it
works only for fixed number of data values. That means the amount
of data must be specified at the beginning of the implementation
itself. Stack implemented using array is not suitable, when we don't
know the size of data which we are going to use. A stack data
structure can be implemented by using linked list data structure. The
stack implemented using linked list can work for unlimited number of
values. That means, stack implemented using linked list works for
variable size of data. So, there is no need to fix the size at the
beginning of the implementation. The Stack implemented using
linked list can organize as many data values as we want.

In linked list implementation of a stack, every new element is inserted


as 'top' element. That means every newly inserted element is pointed
by 'top'. Whenever we want to remove an element from the stack,
simply remove the node which is pointed by 'top' by moving 'top' to
its next node in the list. The next field of the first element must be
always NULL.
Example

In above example, the last inserted node is 99 and the first inserted
node is 25. The order of elements inserted is 25, 32,50 and 99.
PUSH OPERATION
This procedure pushes an ITEM into a linked stack
PUSH_LINKSTACK (INFO, LINK, TOP, AVAIL, ITEM)
1. [Available space?] if AVAIL=NULL THEN WRITE
OVERFLOW and Exit.
2. [Remove first node from AVAIL list]
Set NEW: =AVAIL and AVAIL: =LINK[AVAIL].
3. Set INFO[NEW]: =ITEM [copies Item into new node]
4. Set LINK[NEW]: =TOP [new node points to the original top
node in the stack]
5. Set TOP=NEW [Reset TOP to point to the new node at the top
of the stack]
6. Exit.

POP OPERATION
This procedure deletes the top element of a linked stack and assigns it
to the variable ITEM
POP_LINKSTACK (INFO, LINK, TOP, AVAIL, ITEM)
1. [Stack has an element to be removed]
IF TOP: =NULL then Write: UNDERFLOW and Exit.
2. Set ITEM: =INFO[TOP] [Copies the top element of stack into
ITEM]
3. Set TEMP: = TOP and TOP = LINK[TOP]
[Remember the old value of the TOP pointer in TEMP and reset TOP
to point to the next element in the stack]
4. [Return deleted node to the AVAIL list]
Set LINK[TEMP]=AVAIL and AVAIL = TEMP.
5. Exit

2.3 Applications of stack


1. Polish notation
Arithmetic Expression
Arithmetic Expression is a expression having operators and operands.
Also Arithmetic Expression involves constants and operations.
The precedence in a binary operation have different levels. Three
levels of precedence for the usual 5 binary operations-
i. Highest Precedence- Exponential (^)
ii. Medium Precedence- Multiplication and Division (*) and
(/)
iii. Lowest Precedence- Addition and Subtraction (+) and (-)
Arithmetic Expression or Polish Notion is of 3 types:-
1. Prefix
2. Infix
3. Postfix
1. Infix
When the operator is placed between its 2 operands, the expression is
in Infix form. Example: a+b
2. Prefix
When the operator is placed before its 2 operands, the expression is in
Prefix form. Example: +ab
3. Postfix
When the operator is placed after its 2 operands, the expression is in
Postfix form. Example: ab+
Example of Infix to Prefix Conversion
The conversion is done according to precedence of operators. First the
equation under brackets are solve. Put the solve equation equation in
square bracket for easiness. Then next operator precedence will solve
and so on.

1. (A + B) * C + D
[+ AB] * C + D
[*+ ABC] + D
+*+ ABCD

2. (A + B) * (C + D)
[+ AB] * [+ CD]
*+ AB + CD
Example of Infix to Postfix Conversion
1. (A + B) * C + D
[AB + ] * C + D
[AB + C * ] + D
AB + C * D +

2. A + B - (C * D) + E / F + G * H
A + B - [CD * ] + E / F + G * H
A + B - [CD * ] + [EF / ] + [GH * ]
[AB + ] - [CD * ] + [EF / ] + [GH * ]
[AB + CD *- ] + [EF / GH *+ ]
AB + CD *- EF / GH *++
Evaluation of postfix Expression using Stack
For solving postfix expression using Stack certain steps are to be
followed. The steps are :
1) Add a left parenthesis '(' at start of expression and right
parenthesis ')' at end of expression.
2) Initially push any symbol to the stack and consider it a
lower precedence.
3) Each operand simply add to the expression(Postfix) and
does not change the state of the stack.
4) If any higher or equal precedence operator is in the stack,
and next we get any lower preference operator then we pop
that higher preference operator and push lower preference
operator onto the stack.
5) In any parenthesis expression if there is left parenthesis
in the stack then we only remove this parenthesis when a right
parenthesis of a same level encounter.
6) Exit
Evaluation of Prefix Expression using Stack
For solving prefix expression using Stack certain steps are to be
followed. The steps are :
1) Reverse the given Infix expression.
2) Make every open bracket as close bracket and vice-versa.
3) Convert expression into postfix form.
4) Reverse the expression.
5) Receive expression is Prefix expression.
2. Recursion
A procedure that calls itself directly or indirectly is called Recursion.
Every Recursive procedure must follow the following recursive
properties:
1) Every Recursion must have a base criteria at which the
recursive procedure terminates.
2) In each step Recursive Procedure must be closer to the
base criteria.
The procedure that follows the above Recursive properties is said to
be well defined Recursive Procedure.
Basic requirements of Recursion
For implementing and designing the good recursive program we must
make certain assumptions which are as follows-
1) Base Case:is the terminating condition for the problem
while designing any recursive algorithm, we must choose a
proper terminating condition for the problem.
2) If condition defines the terminating condition.
3) Every time a new recursive call is made, a new memory is
allocated to each variable automatically used by recursive
routine.
4) Each time recursive call is there, the duplicate values of
the local variables of the recursive call are pushed onto the
stack within the respective call and all these values are
available to the respective function call when it is popped off
from the stack.
5) Recursive Case: else part of the recursive definition calls
the function recursively
Types of Recursion
The characterization is based on:
1) Whether the function calls itself or not (Direct or Indirect
Recursion).
2) Whether there are pending operations at each recursive
calls (Tail Recursion or not).
3) The shape of the calling pattern whether pending
operations are also recursive (Linear or Tree Recursion)
3. Tower of Hanoi
In tower of Hanoi problem there are 3 pegs (post or tower) and n disc
of different sizes. each disc has a hole in the middle so that it can fit
on any peg. At the beginning of game, all n disc is on first peg
arranged such that the largest is on the bottom and smallest is on the
top.
The goal of the game is to end up with all disc in the third peg in the
same order i.e, smallest on top and increasing order towards the
bottom. There are some restrictions to how the disc are moved -
1) The only allowed type of move is to grab one disc from
the top of one peg and drop it on another peg i.e, only one disc
can be moved at a time.
2) A larger disc can never lie above a smaller disc on any
cost.
3) The solution of the problem is given in 2 n - 1 number of
steps, where 'n' is number of disc.
Tower of Hanoi problem for n=3 and A,B,C post
2n-1 = 23-1 = 7steps
2.5 Implementation of Multiple stack
When a stack is created using single array, we can not able to
store large amount of data, thus this problem is rectified using
more than one stack in the same array of sufficient array. This
technique is called as Multiple Stack.
To implement multiple stacks in a single array, one approach is to
divide the array in k slots of size n/k each, and fix the slots for
different stacks, we can use arr[0] to arr[n/k-1] for first stack, and
arr[n/k] to arr[2n/k-1] for stack2 and so on where arr[] is the array of
size n.
Although this method is easy to understand, but the problem with this
method is inefficient use of array space.A stack push operation may
result in stack overflow even if there is space available in arr[].
Algorithm:
1. Here we use 2 arrays min[] and max[] to represent the lower
and upper bounds for a stack
2. Array s[] stores the elements of the stack
3. Array top[] is used to store the top index for each stack
4. Variable ns represents the stack number
5. Variable size represents the size for each stack in an array
6. First we build a function init() to initialize the starting values
7. Then we have a function createstack() to create the stack
8. Function Push() & Pop() are used to push and pop an element to
and from the stack
9. Function Display() is used to display the elements in a particular
stack
2.6 Introduction to Queue
Like Stack, Queues are also an ordered collection of items. But unlike
Stacks that have only one end for insertion and deletion.
Queues have 2 ends, one end for insertion and other for deletion. The
end at which we insert an item is called Rear End (Back) and end
where we remove an item is called Front End.
The items are removed from the queue in the same order as they were
inserted in the queue i.e the first item inserted into the queue will be
serviced first and so it has to be removed first from the queue i.e the
queue operations are performed in FIFO(First in First Out) basis.
Whenever an item is removed or deleted from the queue, then the
value of front is incremented by 1.
Example- At the ticket window, people will served as First in First
Out(FIFO) basis
Queue Representation
2.7 Queue implementation
Queue can be implemented using an Array, Stack or Linked List. The
easiest way of implementing a queue is by using an Array.
Initially the head(FRONT) and the tail(REAR) of the queue points at
the first index of the array (starting the index of array from 0). As we
add elements to the queue, the tail keeps on moving ahead, always
pointing to the position where the next element will be inserted, while
the head remains at the first index.

When we remove an element from Queue, we can follow two possible


approaches (mentioned [A] and [B] in above diagram). In [A]
approach, we remove the element at head position, and then one by
one shift all the other elements in forward position.
In approach [B] we remove the element from head position and then
move head to the next position.
In approach [A] there is an overhead of shifting the elements one
position forward every time we remove the first element.
In approach [B] there is no such overhead, but whenever we move
head one position ahead, after removal of first element, the size on
Queue is reduced by one space each time.
Algorithm for ENQUEUE operation
1. Check if the queue is full or not.
2. If the queue is full, then print overflow error and exit the program.
3. If the queue is not full, then increment the tail and add the element.
Algorithm for DEQUEUE operation
1. Check if the queue is empty or not.
2. If the queue is empty, then print underflow error and exit the
program.
3. If the queue is not empty, then print the element at the head and
increment the head.
2.8 Operations on Queue
Insertion: Insert an item in the queue. The value of rear will
incremented by 1.
Rear = Rear + 1
Deletion: Delete an item from the queue. The value of front will
incremented by 1.
Front = Front + 1
Algorithm of Queue in Array Representation
N is the size of Queue, FRONT is the first element in the Queue,
REAR is the last element in the Queue and the ITEM is the element to
be inserted or deleted.
Insertion Algorithm
INSERT (QUEUE, N, FRONT, REAR, ITEM)
1) If FRONT = 1 and REAR = N or
FRONT = REAR + 1 //then write overflow and exit
2) If FRONT = REAR = NULL then
Set FRONT = 1 and REAR = 1
Else if REAR = N then set REAR = 1
Else REAR = REAR + 1
3) Set QUEUE [REAR] = ITEM
4) EXIT
Deletion Algorithm
DELETE (QUEUE, N, FRONT, REAR, ITEM)
1) If FRONT = REAR = NULL //then write underflow and exit
2) ITEM = QUEUE [FRONT]
3) If FRONT = REAR = 1 then
Set FRONT = NULL and REAR = NULL
Else if FRONT = N then
Set FRONT = 1
Else FRONT = FRONT + 1
4) EXIT
Applications of Queue
Queue, as the name suggests is used whenever we need to manage
any group of objects in an order in which the first one coming in, also
gets out first while the others wait for their turn, like in the following
scenarios:
1. Serving requests on a single shared resource, like a printer, CPU
task scheduling etc.
2. In real life scenario, Call Center phone systems uses Queues to hold
people calling them in an order, until a service representative is free.
3. Handling of interrupts in real-time systems. The interrupts are
handled in the same order as they arrive i.e First come first served.
2.9 Circular queue
A Circular Queue is one in which the insertion of a new element is
done at the very first location of the queue, if the last location of the
queue is full.
In other words, we can say that a Circular Queue is one in which the
first element comes just after the last element.
This implies that even if the last element is occupied, a new value can
be inserted behind it in the first element of the array (provided the
first location is empty).
Example of Circular Queue
Consider a Queue of size 8. Initially the queue is empty. Perform
following operations on queue.

2.10 De-queue
Double Ended Queue (DEQue)
In DEQue, bot insertion and deletion operations are performed at
either i.e both ends of the queue i.e we can insert the element from the
rear or front end. Also deletion is possible from both the ends.
This DEQue can be used both as a stack and as a Queue. There are
various ways by which this DEQue can be represented-
i. using a circular array
ii. Doubly Linked List

Types of DEQueues
Input Restricted DEQue :
In this, element can be added only at one end but we can delete the
element from both the ends.
Output Restricted DEQue :
In this, element can be deleted only from one end but insertion is
allowed at both the ends.
2.11 Priority Queue
Priority Queue is more specialized data structure than Queue. Like
ordinary queue, priority queue has same method but with a major
difference. In Priority queue items are ordered by key value so that
item with the lowest value of key is at front and item with the highest
value of key is at rear or vice versa. So we're assigned priority to item
based on its key value. Lower the value, higher the priority.
Following are the principal methods of a Priority Queue.
Basic Operations
 insert / enqueue − add an item to the rear of the queue.
 remove / dequeue − remove an item from the front of the queue.
Priority Queue Representation
We're going to implement Queue using array in this article. There is
few more operations supported by queue which are following.
 Peek − get the element at front of the queue.
 isFull − check if queue is full.
 isEmpty − check if queue is empty.
Insert / Enqueue Operation
Whenever an element is inserted into queue, priority queue inserts the
item according to its order. Here we're assuming that data with high
value has low priority.

Remove / Dequeue Operation


Whenever an element is to be removed from queue, queue get the
element using item count. Once element is removed. Item count is
reduced by one.
Unit -3 Linked lists and Trees
3.1 Introduction
A linked list is a sequence of data structures, which are connected
together via links.
Linked List is a sequence of links which contains items. Each link
contains a connection to another link. Linked list is the second most-
used data structure after array. Following are the important terms to
understand the concept of Linked List.
 Link − Each link of a linked list can store a data called an element.
 Next − Each link of a linked list contains a link to the next link
called Next.
 LinkedList − A Linked List contains the connection link to the first
link called First.
3.2 Representation and operations of linked list
Linked List Representation
Linked list can be visualized as a chain of nodes, where every node
points to the next node.

As per the above illustration, following are the important points to be


considered.
 Linked List contains a link element called first.
 Each link carries a data field(s) and a link field called next.
 Each link is linked with its next link using its next link.
 Last link carries a link as null to mark the end of the list.

3.3 Operations of Linked List


Following are the basic operations supported by a list.
 Insertion − Adds an element at the beginning of the list.
 Deletion − Deletes an element at the beginning of the list.
 Display − Displays the complete list.
 Search − Searches an element using the given key.
 Delete − Deletes an element using the given key.
Insertion Operation
Adding a new node in linked list is a more than one step activity. We
shall learn this with diagrams here. First, create a node using the same
structure and find the location where it has to be inserted.

Imagine that we are inserting a node B (NewNode),


between A (LeftNode) and C (RightNode). Then point B.next to C −
NewNode.next −> RightNode;
It should look like this −

Now, the next node at the left should point to the new node.
LeftNode.next −> NewNode;

This will put the new node in the middle of the two. The new list
should look like this −
Similar steps should be taken if the node is being inserted at the
beginning of the list. While inserting it at the end, the second last
node of the list should point to the new node and the new node will
point to NULL.
Insertion algorithms
a) Insertion at the beginning of list
b) Insertion after a given node
c) Inserting into a sorted linked list.
We assume that the linked list is in memory in the form
LIST(INFO,LINK,START,AVAIL) and that the variable ITEM
contains the new information to be added to the list.
Since our insertion algorithms will use a node AVAIL list , all of the
algorithms will include following steps:
(a) Checking to see if space is available in the AVAIL list. if , not ,
that is, if AVAIL= null, then the algorithm will print the message
OVERFLOW.
(b) Removing the first node from the AVAIL list. Using the
variable NEW to keep track of the location of new node, the step can
be implemented by the pair of assignments
NEW:= AVAIL, AVAIL:=LINK[AVAIL]
(c) Copying new information into the new node.
INFO [NEW]:= ITEM
Insertion at the beginning of list
The easiest place to inset the node is at the beginning of the list.
Algorithm:: INSFIREST(INFO,LINK,START,AVAIL,ITEM)
This algorithm inserts ITEM as the first node in the list.
1. [OVERFLOW?]If AVAIL=NULL, then: Write: OVERFLOW,
and EXIT.
2. [Remove first node from AVAIL list.]
Set NEW: =AVAIL and AVAIL:= LINK[AVAIL].
3. Set INFO [NEW] := ITEM. [ copies new data into new node]
4. Set LINK [NEW] := START. [New node now points to original
first node.]
5. Set START := NEW.[ Changes START so it points to the new
node.]
6. EXIT.
Insertion after a given node
Suppose we are given the value of LOC where either LOC is the
location of a node A in a linked LIST or LOC=NULL. When LOC:=
NULL , Then ITEM is inserted as first nide.
Let N denote the new mode. if LOC =NULL, then N is inserted as the
first node in LIST.
LINK[NEW]:= LINK[ LOC]
And we let node A to the new node N by the assignment
LINK[LOC]:= NEW
Algorithm:
INSLOC(INFO, LINK, START , AVAIL, LOC, ITEM)
This algorithm inserts ITEM so that TEM follows the node with
location LOC or inserts ITEM as the first node when LOC:= NULL.
1. [OVERFLOW?] If AVAIL=NULL, then write: OVERFLOW ,
and Exit.
2. [Remove first node from AVAIL list.]
Set NEW ;= AVAIL and AVAIL ;= LINK[AVAIL].
3. Set INFO[NEW] := ITEM .[copies new data into new node.]
4. If LOC=NULL, then:[insert as first node.]
Set LINK[NEW] := START and START := NEW.
Else: [insert after node with location LOC.]
Set LINK[NEW] := LINK[LOC] and LINK[LOC] := NEW.
[End of if structure.]
5. Exit.
Inserting into a sorted list
Suppose ITEM is to be inserted into a sorted linked LIST. Then ITEM
must be inserted between nodes A and B so that
INFO (A)< ITEM < INFO(B)
The following is a procedure which finds the location LOC of node
A , that is, which finds the location LOC of the last node in LIST
whose value is less than ITEM.
Traverse the list, using a pointer variable PTR and comparing ITEM
with INFO[PTR] at each node. While traversing, keep track of the
location of the preceding node by the assignments variable SAVE .
thus, SAVE and PTR are updated by the assignment.
SAVE: = PTR and PTR: = LINK[PTR]
The traversing continuous as long as INFO[PTR] > ITEM, or in other
words, the traversing stops as soon as ITEM<= INFO[PTR].Then
PTR points to node B, so SAVE will contain the location of the node
A.
The formal statement of procedure follows. The cases where the list is
empty or where ITEM < INFO[START], so LOC= NULL, are treated
separately, since they do not involve the variable SAVE.
PROCEDURE: FINDA (INFO, LINK, START, ITEM, LOC)
This procedure finds the location LOC of the last node in a sorted list
such that INFO[LOC]<ITEM, or sets LOC=NULL.
1. [list empty?] If START = NULL, then : LOC := NULL, and
return.
2. [special case?] if ITEM < INFO[START] , then Set LOC:=
NULL, and Return .
3. Set SAVE := START and PTR := LINK[ START] .
[ INTIALIZE POINTERS.]
4. Repeat steps 5 and 6 while PTR ≠ NULL
5. If ITEM < INFO[PTR],then
Set LOC := SAVE, and Return .
[end of if statement.]
6. Set SAVE:= PTR and PTR:= LINK[PTR].[Update pointers]
[end of step 4 loop.]
7. Set LOC:= SAVE.
8. Return.
Now we have all components to present an algorithm which inserts
ITEM into a linked list. The simplicity of algorithm comes from using
the previous two procedures.
Algoritm: INERT(INFO, LINK, START,AVAIL, ITEM)
This algorithm inserts ITEM into a sorted list .
1. [use procedure to find the location of the node preceding ITEM.]
Call FINDA (INFO, LINK, START, ITEM,LOC)
2. [use algorithm to insert ITEM after the node with location
LOC.]
Call INSLOC( INFO, LINK, START, AVAIL, LOC , ITEM).
3. Exit.
Deletion Operation
Deletion is also a more than one step process. We shall learn with
pictorial representation. First, locate the target node to be removed, by
using searching algorithms.

The left (previous) node of the target node now should point to the
next node of the target node −
Left Node.next −> TargetNode.next;

This will remove the link that was pointing to the target node. Now,
using the following code, we will remove what the target node is
pointing at.
TargetNode.next −> NULL;

We need to use the deleted node. We can keep that in memory


otherwise we can simply deallocate memory and wipe off the target
node completely.

Let LIST be a linked list with a node N between nodes A and


B .Suppose node N is to be deleted from the linked list. The deletion
occurs as soon as the next pointer field of node A is changed so that t
points to node B .
Suppose our linked list is maintained in memory in the form
LIST( INFO, LINK, START, AVAIL.)
When a node N is deleted from our list, we will immediately return
its memory space to the AVAIL list. Specifically, for easier
processing, it will be returned to the begning of the AVAIL list.
Observe that three pointer fields are changed as follows:
1. The next pointer field of node A now points to node B , where
node N previously pointed.
2. The next pointer field of N now points to the original first node
in the free pool, where AVAIL previously pointed.
3. AVAIL now points to the deleted node N .
There are also two special cases . if the deleted node N is the first
node in the list, then START will point to node B; and if the deleted
node N is the last node in the list, then node A will contain the NULL
pointer.
Deletion algorithm
Assume that the linked list is in memory in the form
LIST(INFO,LINK, START, AVAIL).
Our algorithm return the memory space of the deleted node N to the
beginning of the AVAIL list. In algorithm we include the following
pair of assignments, where LOC is the location of the deleted node
N:
LINK[LOC] := AVAIL and then AVAIL := LOC
Some of our algorithm may want to delete the first node or the last
node from the list . an algorithm that does so must check to see if
there is a node in the list . if not, i.e., if START = NULL , then the
algorithm will print the message UNDERFLOW.
Deleting the node following a given node
Let LIST be a linked list in memory. suppose we are given the
location LOC of a node N in LIST. FUTUREMORE, Suppose we are
given the location LOCP of the node preceding N or , when N is the
first NODE , we are given LOCP = NULL.
Algorithm:
DEL(INFO,LINK,START, AVAIL, LOC, LOCP)
This algorithm deletes the node N with LOC . LOCP is the location of
node which precedes N or, when N is the first node LOCP = NULL.
1. If LOCP= NULL, then:
Set START := LINK[START].[deletes first node.]
Else:
Set LINK[LOCP] := LINK[LOC].[deletes node N.]
[end of if structure.]
2. [return deleted node to the AVAIL list.]
Set LIBK[LOC] := AVAIL and AVAIL := LOC.
3. Exit.
Reverse Operation
This operation is a thorough one. We need to make the last node to be
pointed by the head node and reverse the whole linked list.

First, we traverse to the end of the list. It should be pointing to NULL.


Now, we shall make it point to its previous node −

We have to make sure that the last node is not the lost node. So we'll
have some temp node, which looks like the head node pointing to the
last node. Now, we shall make all left side nodes point to their
previous nodes one by one.

Except the node (first node) pointed by the head node, all nodes
should point to their predecessor, making them their new successor.
The first node will point to NULL.
We'll make the head node point to the new first node by using the
temp node.

The linked list is now reversed.

3.4 Types of Linked Lists


There are 3 different implementations of Linked List available, they
are:
 Simple Linked List − Item navigation is forward only.
 Doubly Linked List − Items can be navigated forward and
backward.
 Circular Linked List − Last item contains link of the first element
as next and the first element has a link to the last element as previous.
3.4.1 Simple Linked List
Each node has a single link to another node is called Singly Linked
List. Singly Linked List does not store any pointer any reference to
the previous node. Each node stores the contents of the node and a
reference to the next node in the list.
In a singly linked list, last node has a pointer which indicates that it is
the last node. It requires a reference to the first node to store a single
linked list. It has two successive nodes linked together in linear way
and contains address of the next node to be followed. It has successor
and predecessor. First node does not have predecessor while last node
does not have successor. Last node has successor reference as NULL.
It has only single link for the next node. In this type of linked list,
only forward sequential movement is possible, no direct access is
allowed.
In the above figure, the address of the first node is always store in a
reference node known as Head or Front. Reference part of the last
node must be null.
3.4.2 Doubly Linked List
Doubly linked list is a sequence of elements in which every node has
link to its previous node and next node. Traversing can be done in
both directions and displays the contents in the whole list.

In the above figure, Link1 field stores the address of the previous
node and Link2 field stores the address of the next node. The Data
Item field stores the actual value of that node. If we insert a data into
the linked list, it will be look like as follows:

Note:
First node is always pointed by head. In doubly linked list, previous
field of the first node is always NULL (it must be NULL) and the
next field of the last must be NULL.
In the above figure we see that, doubly linked list contains three
fields. In this, link of two nodes allow traversal of the list in either
direction. There is no need to traverse the list to find the previous
node. We can traverse from head to tail as well as tail to head.
Advantages of Doubly Linked List
 Doubly linked list can be traversed in both forward and backward
directions.
 To delete a node in singly linked list, the previous node is required,
while in doubly linked list, we can get the previous node using
previous pointer.
 It is very convenient than singly linked list. Doubly linked list
maintains the links for bidirectional traversing.
Disadvantages of Doubly Linked List
 In doubly linked list, each node requires extra space for previous
pointer.
 All operations such as Insert, Delete, Traverse etc. require extra
previous pointer to be maintained.

3.4.3 Circular Linked List


Circular linked list is similar to singly linked list. The only difference
is that in circular linked list, the last node points to the first node in
the list.
It is a sequence of elements in which every element has link to its
next element in the sequence and has a link to the first element in the
sequence.

In the above figure we see that, each node points to its next node in
the sequence but the last node points to the first node in the list. The
previous element stores the address of the next element and the last
element stores the address of the starting element. It forms a circular
chain because the element points to each other in a circular way.
In circular linked list, the memory can be allocated when it is required
because it has a dynamic size.
Circular linked list is used in personal computers, where multiple
applications are running. The operating system provides a fixed time
slot for all running applications and the running applications are kept
in a circular linked list until all the applications are completed. This is
a real life example of circular linked list.
We can insert elements anywhere in circular linked list, but in the
array we cannot insert elements anywhere in the list because it is in
the contiguous memory.
3.4.4 Doubly circular Linked List
Doubly circular linked list is a linked data structure which consists of
a set of sequentially linked records called nodes.
Doubly circular linked list can be conceptualized as two singly linked
lists formed from the same data items, but in opposite sequential
orders.

The above diagram represents the basic structure of Doubly Circular


Linked List. In doubly circular linked list, the previous link of the first
node points to the last node and the next link of the last node points to
the first node.
In doubly circular linked list, each node contains two fields called
links used to represent references to the previous and the next node in
the sequence of nodes.
Advantages of Linked Lists
 They are a dynamic in nature which allocates the memory when
required.
 Insertion and deletion operations can be easily implemented.
 Stacks and queues can be easily executed.
 Linked List reduces the access time.
Disadvantages of Linked Lists
 The memory is wasted as pointers require extra memory for
storage.
 No element can be accessed randomly; it has to access each
node sequentially.
 Reverse Traversing is difficult in linked list.
Applications of Linked Lists
 Linked lists are used to implement stacks, queues, graphs, etc.
 Linked lists let you insert elements at the beginning and end of
the list.
 In Linked Lists we don't need to know the size in advance.
3.5 Introduction to Tree
Tree represents the nodes connected by edges. We will discuss binary
tree or binary search tree specifically.
Binary Tree is a special datastructure used for data storage purposes.
A binary tree has a special condition that each node can have a
maximum of two children. A binary tree has the benefits of both an
ordered array and a linked list as search is as quick as in a sorted array
and insertion or deletion operation are as fast as in linked list.

3.6 Tree Terminology Binary Tree


Following are the important terms with respect to tree.
 Path − Path refers to the sequence of nodes along the edges of a tree.
 Root − The node at the top of the tree is called root. There is only
one root per tree and one path from the root node to any node.
 Parent − Any node except the root node has one edge upward to a
node called parent.
 Child − The node below a given node connected by its edge
downward is called its child node.
 Leaf − The node which does not have any child node is called the
leaf node.
 Subtree − Subtree represents the descendants of a node.
 Visiting − Visiting refers to checking the value of a node when
control is on the node.
 Traversing − Traversing means passing through nodes in a specific
order.
 Levels − Level of a node represents the generation of a node. If the
root node is at level 0, then its next child node is at level 1, its
grandchild is at level 2, and so on.
 keys − Key represents a value of a node based on which a search
operation is to be carried out for a node.
3.7 Binary Search Tree
Binary Search Tree Representation
Binary Search tree exhibits a special behavior. A node's left child
must have a value less than its parent's value and the node's right child
must have a value greater than its parent value.

We're going to implement tree using node object and connecting them
through references.
Tree Node
The code to write a tree node would be similar to what is given below.
It has a data part and references to its left and right child nodes.
Struct node
{
int data;
struct node *leftChild;
struct node *rightChild;
};
In a tree, all nodes share common construct.
BST Basic Operations
The basic operations that can be performed on a binary search tree
data structure, are the following −
 Insert − Inserts an element in a tree/create a tree.
 Search − Searches an element in a tree.
 Preorder Traversal − Traverses a tree in a pre-order manner.
 Inorder Traversal − Traverses a tree in an in-order manner.
 Postorder Traversal − Traverses a tree in a post-order manner.
We shall learn creating (inserting into) a tree structure and searching a
data item in a tree in this chapter. We shall learn about tree traversing
methods in the coming chapter.
Insert Operation
The very first insertion creates the tree. Afterwards, whenever an
element is to be inserted, first locate its proper location. Start
searching from the root node, then if the data is less than the key
value, search for the empty location in the left subtree and insert the
data. Otherwise, search for the empty location in the right subtree and
insert the data.
Algorithm
If root is NULL
then create root node
return
If root exists then
compare the data with node.data
while until insertion position is located
If data is greater than node.data
goto right subtree
else
goto left subtree
endwhile
insert data
end If

Search Operation
Whenever an element is to be searched, start searching from the root
node, then if the data is less than the key value, search for the element
in the left subtree. Otherwise, search for the element in the right
subtree. Follow the same algorithm for each node.
Algorithm
If root.data is equal to search.data
return root
else
while data not found
If data is greater than node.data
goto right subtree
else
goto left subtree
If data found
return node
endwhile
return data not found
end if

3.8 Strictly Binary Tree


A binary tree in which every node has either 0 or two children is
called strict binary tree.
Properties:
 A strict binary tree with x internal nodes has exactly x+1 leaves.
 A strict binary tree with y external (leaf) nodes has (2y – 1) nodes
(internal + external) and y-1 internal nodes exactly.
 A strict binary tree with n nodes (internal + external) has exactly (n-
1)/2 internal nodes and (n+1)/2 external (leaf) nodes exactly.
Example:
3.9 Complete Binary Tree
A complete binary tree is a binary tree in which all the levels are
completely filled except possibly the lowest one, which is filled from
the left.
A complete binary tree is just like a full binary tree, but with two
major differences
1. All the leaf elements must lean towards the left.
2. The last leaf element might not have a right sibling i.e. a complete
binary tree doesn't have to be a full binary tree.

3.10 Tree Traversal


There are three types of binary tree traversals.
1. In - Order Traversal
2. Pre - Order Traversal
3. Post - Order Traversal
Consider the following binary tree...

1. In - Order Traversal ( leftChild - root - rightChild )


In In-Order traversal, the root node is visited between the left child
and right child. In this traversal, the left child node is visited first,
then the root node is visited and later we go for visiting the right child
node. This in-order traversal is applicable for every root node of all
subtrees in the tree. This is performed recursively for all nodes in the
tree.
In the above example of a binary tree, first we try to visit left child of
root node 'A', but A's left child 'B' is a root node for left subtree. so we
try to visit its (B's) left child 'D' and again D is a root for subtree with
nodes D, I and J. So we try to visit its left child 'I' and it is the leftmost
child. So first we visit 'I' then go for its root node 'D' and later we
visit D's right child 'J'. With this we have completed the left part of
node B. Then visit 'B' and next B's right child 'F' is visited. With this
we have completed left part of node A. Then visit root node 'A'. With
this we have completed left and root parts of node A. Then we go for
the right part of the node A. In right of A again there is a subtree with
root C. So go for left child of C and again it is a subtree with root G.
But G does not have left part so we visit 'G' and then visit G's right
child K. With this we have completed the left part of node C. Then
visit root node 'C' and next visit C's right child 'H' which is the
rightmost child in the tree. So we stop the process.

That means here we have visited in the order of I - D - J - B - F - A -


G - K - C - H using In-Order Traversal.
In-Order Traversal for above example of binary tree is
I-D-J-B-F-A-G-K-C-H
2. Pre - Order Traversal ( root - leftChild - rightChild )
In Pre-Order traversal, the root node is visited before the left child and
right child nodes. In this traversal, the root node is visited first, then
its left child and later its right child. This pre-order traversal is
applicable for every root node of all subtrees in the tree.
In the above example of binary tree, first we visit root node 'A' then
visit its left child 'B' which is a root for D and F. So we visit B's left
child 'D' and again D is a root for I and J. So we visit D's left
child 'I' which is the leftmost child. So next we go for visiting D's
right child 'J'. With this we have completed root, left and right parts
of node D and root, left parts of node B. Next visit B's right child 'F'.
With this we have completed root and left parts of node A. So we go
for A's right child 'C' which is a root node for G and H. After visiting
C, we go for its left child 'G' which is a root for node K. So next we
visit left of G, but it does not have left child so we go for G's right
child 'K'. With this, we have completed node C's root and left parts.
Next visit C's right child 'H' which is the rightmost child in the tree.
So we stop the process.

That means here we have visited in the order of A-B-D-I-J-F-C-G-K-


H using Pre-Order Traversal.
Pre-Order Traversal for above example binary tree is
A-B-D-I-J-F-C-G-K-H
3. Post - Order Traversal ( leftChild - rightChild - root )
In Post-Order traversal, the root node is visited after left child and
right child. In this traversal, left child node is visited first, then its
right child and then its root node. This is recursively performed until
the right most node is visited.
Here we have visited in the order of I - J - D - F - B - K - G - H - C -
A using Post-Order Traversal.
Post-Order Traversal for above example binary tree is
I-J-D-F-B-K-G-H-C-A
3.11 Threaded Binary Tree
A binary tree is threaded by making all right child pointers that would
normally be null point to the inorder successor of the node (if it
exists), and all left child pointers that would normally be null point to
the inorder predecessor of the node.
 We have the pointers reference the next node in an inorder traversal;
called threads
 We need to know if a pointer is an actual link or a thread, so we keep
a boolean for each pointer
Why do we need Threaded Binary Tree?
 Binary trees have a lot of wasted space: the leaf nodes each have 2
null pointers. We can use these pointers to help us in inorder
traversals.
 Threaded binary tree makes the tree traversal faster since we do not
need stack or recursion for traversal
Types of threaded binary trees:
 Single Threaded: each node is threaded towards either the in-order
predecessor or successor (left orright) means all right null pointers
will point to inorder successor OR all left null pointers will point to
inorder predecessor.
 Double threaded: each node is threaded towards both the in-order
predecessor and successor (left andright) means all right null pointers
will point to inorder successor AND all left null pointers will point to
inorder predecessor.
3.12 AVL Tree
AVL tree is a binary search tree in which the difference of heights of
left and right subtrees of any node is less than or equal to one. The
technique of balancing the height of binary trees was developed by
Adelson, Velskii, and Landi and hence given the short form as AVL
tree or Balanced Binary Tree.
An AVL tree can be defined as follows:
Let T be a non-empty binary tree with TL and TR as its left and right
subtrees. The tree is height balanced if:
 TL and TR are height balanced
 hL - hR <= 1, where hL - hR are the heights of TL and TR
The Balance factor of a node in a binary tree can have value 1, -1, 0,
depending on whether the height of its left subtree is greater, less than
or equal to the height of the right subtree.
Advantages of AVL tree
Since AVL trees are height balance trees, operations like insertion and
deletion have low time complexity. Let us consider an example:
If you have the following tree having keys 1, 2, 3, 4, 5, 6, 7 and then
the binary tree will be like the second figure:
To insert a node with a key Q in the binary tree, the algorithm
requires seven comparisons, but if you insert the same key in AVL
tree, from the above 1st figure, you can see that the algorithm will
require three comparisons.
Representation of AVL Trees
Struct AVLNode
{
int data;
struct AVLNode *left, *right;
int balfactor;
};
Algorithm for operations on AVL Tree
For Insertion:
Step 1: First, insert a new element into the tree using BST's (Binary
Search Tree) insertion logic.
Step 2: After inserting the elements you have to check the Balance
Factor of each node.
Step 3: When the Balance Factor of every node will be found like 0 or
1 or -1 then the algorithm will proceed for the next operation.
Step 4: When the balance factor of any node comes other than the
above three values then the tree is said to be imbalanced. Then
perform the suitable Rotation to make it balanced and then the
algorithm will proceed for the next operation
For Deletion:
Step 1: Firstly, find that node where k is stored
Step 2: Secondly delete those contents of the node (Suppose the node
is x)
Step 3: Claim: Deleting a node in an AVL tree can be reduced by
deleting a leaf. There are three possible cases:
 When x has no children then, delete x
 When x has one child, let x' becomes the child of x.
 Notice: x' cannot have a child, since subtrees of T can differ in
height by at most one :
o then replace the contents of x with the contents of x'
o then delete x' (a leaf)
 Step 4: When x has two children,
o then find x's successor z (which has no left child)
o then replace x's contents with z's contents, and
o delete z

3.13 B Tree
B-Tree is a self-balancing search tree. In most of the other self-
balancing search trees (like AVL and Red-Black Trees), it is assumed
that everything is in main memory. To understand the use of B-Trees,
we must think of the huge amount of data that cannot fit in main
memory. When the number of keys is high, the data is read from disk
in the form of blocks. Disk access time is very high compared to main
memory access time. The main idea of using B-Trees is to reduce the
number of disk accesses. Most of the tree operations (search, insert,
delete, max, min, ..etc ) require O(h) disk accesses where h is the
height of the tree. B-tree is a fat tree. The height of B-Trees is kept
low by putting maximum possible keys in a B-Tree node. Generally, a
B-Tree node size is kept equal to the disk block size. Since h is low
for B-Tree, total disk accesses for most of the operations are reduced
significantly compared to balanced Binary Search Trees like AVL
Tree, Red-Black Tree, ..etc.
Properties of B-Tree
1) All leaves are at same level.
2) A B-Tree is defined by the term minimum degree ‘t’. The value of t
depends upon disk block size.
3) Every node except root must contain at least t-1 keys. Root may
contain minimum 1 key.
4) All nodes (including root) may contain at most 2t – 1 keys.
5) Number of children of a node is equal to the number of keys in it
plus 1.
6) All keys of a node are sorted in increasing order. The child
between two keys k1 and k2 contains all keys in the range from k1
and k2.
7) B-Tree grows and shrinks from the root which is unlike Binary
Search Tree. Binary Search Trees grow downward and also shrink
from downward.
8) Like other balanced Binary Search Trees, time complexity to
search, insert and delete is O(Logn).
Following is an example B-Tree of minimum degree 3. Note that in
practical B-Trees, the value of minimum degree is much more than 3.

Operations on a B-Tree
The following operations are performed on a B-Tree...
1. Search
2. Insertion
3. Deletion
Search Operation in B-Tree
The search operation in B-Tree is similar to the search operation in
Binary Search Tree. In a Binary search tree, the search process starts
from the root node and we make a 2-way decision every time (we go
to either left subtree or right subtree). In B-Tree also search process
starts from the root node but here we make an n-way decision every
time. Where 'n' is the total number of children the node has. In a B-
Tree, the search operation is performed with O(log n) time
complexity. The search operation is performed as follows...
 Step 1 - Read the search element from the user.
 Step 2 - Compare the search element with first key value of root
node in the tree.
 Step 3 - If both are matched, then display "Given node is found!!!"
and terminate the function
 Step 4 - If both are not matched, then check whether search element
is smaller or larger than that key value.
 Step 5 - If search element is smaller, then continue the search
process in left subtree.
 Step 6 - If search element is larger, then compare the search
element with next key value in the same node and repeate steps 3, 4, 5
and 6 until we find the exact match or until the search element is
compared with last key value in the leaf node.
 Step 7 - If the last key value in the leaf node is also not matched
then display "Element is not found" and terminate the function.
Insertion Operation in B-Tree
In a B-Tree, a new element must be added only at the leaf node. That
means, the new keyValue is always attached to the leaf node only.
The insertion operation is performed as follows...
 Step 1 - Check whether tree is Empty.
 Step 2 - If tree is Empty, then create a new node with new key
value and insert it into the tree as a root node.
 Step 3 - If tree is Not Empty, then find the suitable leaf node to
which the new key value is added using Binary Search Tree logic.
 Step 4 - If that leaf node has empty position, add the new key value
to that leaf node in ascending order of key value within the node.
 Step 5 - If that leaf node is already full, split that leaf node by
sending middle value to its parent node. Repeat the same until the
sending value is fixed into a node.
 Step 6 - If the spilting is performed at root node then the middle
value becomes new root node for the tree and the height of the tree is
increased by one.
Example
Construct a B-Tree of Order 3 by inserting numbers from 1 to 10.
Unit 4 Graphs, Searching, Sorting and Hashing
3.1 Introduction to Graphs
A graph can be defined as group of vertices and edges that are used to
connect these vertices. A graph can be seen as a cyclic tree, where the
vertices (Nodes) maintain any complex relationship among them
instead of having parent child relationship.
Definition
A graph G can be defined as an ordered set G(V, E) where V(G)
represents the set of vertices and E(G) represents the set of edges
which are used to connect these vertices.
A Graph G(V, E) with 5 vertices (A, B, C, D, E) and six edges ((A,B),
(B,C), (C,E), (E,D), (D,B), (D,A)) is shown in the following figure.

Graph Terminology
Path
A path can be defined as the sequence of nodes that are followed in
order to reach some terminal node V from the initial node U.
Closed Path
A path will be called as closed path if the initial node is same as
terminal node. A path will be closed path if V0=VN.
Simple Path
If all the nodes of the graph are distinct with an exception V0=VN,
then such path P is called as closed simple path.
Cycle
A cycle can be defined as the path which has no repeated edges or
vertices except the first and last vertices.
Connected Graph
A connected graph is the one in which some path exists between
every two vertices (u, v) in V. There are no isolated nodes in
connected graph.
Complete Graph
A complete graph is the one in which every node is connected with all
other nodes. A complete graph contain n(n-1)/2 edges where n is the
number of nodes in the graph.
Weighted Graph
In a weighted graph, each edge is assigned with some data such as
length or weight. The weight of an edge e can be given as w(e) which
must be a positive (+) value indicating the cost of traversing the edge.
Digraph
A digraph is a directed graph in which each edge of the graph is
associated with some direction and the traversing can be done only in
the specified direction.
Loop
An edge that is associated with the similar end points can be called as
Loop.
Adjacent Nodes
If two nodes u and v are connected via an edge e, then the nodes u and
v are called as neighbours or adjacent nodes.
Degree of the Node
A degree of a node is the number of edges that are connected with
that node. A node with degree 0 is called as isolated node.
3.2 Representation to Graphs
Graphs are mathematical structures that represent pairwise
relationships between objects. A graph is a flow structure that
represents the relationship between various objects. It can be
visualized by using the following two basic components:
 Nodes: These are the most important components in any graph.
Nodes are entities whose relationships are expressed using edges. If a
graph comprises 2 nodes A and B and an undirected edge between
them, then it expresses a bi-directional relationship between the nodes
and edge.
 Edges: Edges are the components that are used to represent the
relationships between various nodes in a graph. An edge between two
nodes expresses a one-way or two-way relationship between the
nodes.
Types of nodes
 Root node: The root node is the ancestor of all other nodes in a
graph. It does not have any ancestor. Each graph consists of exactly
one root node. Generally, you must start traversing a graph from the
root node.
 Leaf nodes: In a graph, leaf nodes represent the nodes that do not
have any successors. These nodes only have ancestor nodes. They can
have any number of incoming edges but they will not have any
outgoing edges.
Types of graphs
 Undirected: An undirected graph is a graph in which all the edges
are bi-directional i.e. the edges do not point in any specific direction.

 Directed: A directed graph is a graph in which all the edges are uni-
directional i.e. the edges point in a single direction.

Weighted: In a weighted graph, each edge is assigned a weight or


cost. Consider a graph of 4 nodes as in the diagram below. As you can
see each edge has a weight/cost assigned to it. If you want to go from
vertex 1 to vertex 3, you can take one of the following 3 paths:
o 1 -> 2 -> 3
o 1 -> 3
o 1 -> 4 -> 3
Therefore the total cost of each path will be as follows: - The total
cost of 1 -> 2 -> 3 will be (1 + 2) i.e. 3 units - The total cost of 1 -> 3
will be 1 unit - The total cost of 1 -> 4 -> 3 will be (3 + 2) i.e. 5 units

Cyclic: A graph is cyclic if the graph comprises a path that starts


from a vertex and ends at the same vertex. That path is called a cycle.
An acyclic graph is a graph that has no cycle.
A tree is an undirected graph in which any two vertices are connected
by only one path. A tree is an acyclic graph and has N - 1 edges where
N is the number of vertices. Each node in a graph may have one or
multiple parent nodes. However, in a tree, each node (except the root
node) comprises exactly one parent node.
Note: A root node has no parent.
A tree cannot contain any cycles or self loops, however, the same
does not apply to graphs.

Graph representation
You can represent a graph in many ways. The two most common
ways of representing a graph is as follows:
Adjacency matrix
An adjacency matrix is a VxV binary matrix A. Element Ai,j is 1 if
there is an edge from vertex i to vertex j else Ai,j is 0.
Note: A binary matrix is a matrix in which the cells can have only one
of two possible values - either a 0 or 1.
The adjacency matrix can also be modified for the weighted graph in
which instead of storing 0 or 1 in Ai,j, the weight or cost of the edge
will be stored.
In an undirected graph, if Ai,j = 1, then Aj,i = 1. In a directed graph,
if Ai,j = 1, then Aj,i may or may not be 1.
Adjacency matrix provides constant time access (O(1) ) to determine
if there is an edge between two nodes. Space complexity of the
adjacency matrix is O(V2).
The adjacency matrix of the following graph is:
i/j : 1 2 3 4
1:0101
2:1010
3:0101
4:1010

The adjacency matrix of the following graph is:


i/j: 1 2 3 4
1:0100
2:0001
3:1001
4:0100
3.3 Graph Traversals
Graph traversal is a technique used for a searching vertex in a graph.
The graph traversal is also used to decide the order of vertices is
visited in the search process. A graph traversal finds the edges to be
used in the search process without creating loops. That means using
graph traversal we visit all the vertices of the graph without getting
into looping path.
There are two graph traversal techniques and they are as follows...
1. DFS (Depth First Search)
2. BFS (Breadth First Search)
DFS (Depth First Search)
DFS traversal of a graph produces a spanning tree as final
result. Spanning Tree is a graph without loops. We use Stack data
structure with maximum size of total number of vertices in the graph
to implement DFS traversal.
We use the following steps to implement DFS traversal...
 Step 1 - Define a Stack of size total number of vertices in the graph.
 Step 2 - Select any vertex as starting point for traversal. Visit that
vertex and push it on to the Stack.
 Step 3 - Visit any one of the non-visited adjacent vertices of a
vertex which is at the top of stack and push it on to the stack.
 Step 4 - Repeat step 3 until there is no new vertex to be visited from
the vertex which is at the top of the stack.
 Step 5 - When there is no new vertex to visit then use back
tracking and pop one vertex from the stack.
 Step 6 - Repeat steps 3, 4 and 5 until stack becomes Empty.
 Step 7 - When stack becomes Empty, then produce final spanning
tree by removing unused edges from the graph
Example
BFS (Breadth First Search)
BFS traversal of a graph produces a spanning tree as final
result. Spanning Tree is a graph without loops. We use Queue data
structure with maximum size of total number of vertices in the graph
to implement BFS traversal.

We use the following steps to implement BFS traversal...


 Step 1 - Define a Queue of size total number of vertices in the
graph.
 Step 2 - Select any vertex as starting point for traversal. Visit that
vertex and insert it into the Queue.
 Step 3 - Visit all the non-visited adjacent vertices of the vertex
which is at front of the Queue and insert them into the Queue.
 Step 4 - When there is no new vertex to be visited from the vertex
which is at front of the Queue then delete that vertex.
 Step 5 - Repeat steps 3 and 4 until queue becomes empty.
 Step 6 - When queue becomes empty, then produce final spanning
tree by removing unused edges from the graph
EXAMPLE
3.4 Shortest Path Algorithms
3.4.1 Floyd-Warshall Algorithm
Floyd-Warshall Algorithm is an algorithm for finding the shortest
path between all the pairs of vertices in a weighted graph. This
algorithm works for both the directed and undirected weighted
graphs. But, it does not work for the graphs with negative cycles
(where the sum of the edges in a cycle is negative).
A weighted graph is a graph in which each edge has a numerical
value associated with it.
Floyd-Warhshall algorithm is also called as Floyd's algorithm, Roy-
Floyd algorithm, Roy-Warshall algorithm or WFI algorithm.
This algorithm follows the dynamic programming approach to find
the shortest paths.
How Floyd-Warshall Algorithm Works?
Let the given graph be:

Follow the steps below to find the shortest path between all the pairs
of vertices.
1. Create a matrix A1 of dimension n*n where n is the number of
vertices. The row and the column are indexed
as i and j respectively. i and j are the vertices of the graph.
Each cell A[i][j] is filled with the distance from the ith vertex to
the jth vertex. If there is no path from ith vertex to jth vertex, the cell is
left as infinity.

2. Now, create a matrix A1 using matrix A0 . The elements in the


first column and the first row are left as they are. The remaining cells
are filled in the following way.

Let k be the intermediate vertex in the shortest path from source to


destination. In this step, k is the first vertex.A[i][j] is filled with (A[i]
[k] + A[k][j]) if (A[i][j] > A[i][k] + A[k][j]).

That is, if the direct distance from the source to the destination is
greater than the path through the vertex k, then the cell is filled
with A[i][k] + A[k][j].

In this step, k is vertex 1. We cacluate the distance from source vertex


to destination vertex through this vertex k.

For example: For A1[2, 4], the direct distance from vertex 2 to 4 is 4
and the sum of the distance from vertex 2 to 4 through vertex (ie.
from vertex 2 to 1 and from vertex 1 to 4) is 7. Since 4 < 7, A0[2, 4] is
filled with 4.
3. In a similar way, A2 is created using A3 . The elements in the
second column and the second row are left as they are.
In this step, k is the second vertex (i.e. vertex 2). The remaining steps
are the same as in step 2.

4. Similarly, A3 and A4 is also created.

5. A4gives the shortest path between each pair of vertices.


Floyd-Warshall Algorithm
n = no of vertices
A = matrix of dimension n*n
for k = 1 to n
for i = 1 to n
for j = 1 to n
Ak[i, j] = min (Ak-1[i, j], Ak-1[i, k] + Ak-1[k, j])
return A
4.4.2 Shortest Path algorithm
let G be a directed graph with m nodes, v1,v2,…..vm, suppose G is
weighted: that is, suppose each edge e in G is assigned a nonnegative
number w(e) called the weight or length of the edge e. Then G may be
maintained in memory by its weight matrix W=(wij), defined as
follows:
wij = { w(e) if there is an edge e from vi to vJ
{0 if there is no edge from vi to vJ
The path matrix P tells us whether or not there are Paths between the
nodes. Now we want to find a matrix Q which will tell us the length
of the shortest paths between the nodes or, more exactly a matrix Q
=(qij) where
Qij = length of a shortest paths from vi to vj
Next we describe a modification of warshall’s algorithm which finds
us the matrix Q.
Here we define a sequence of matrices Q0,Q1……Qm (analogous to the
above matrices P0,P1…..Pm) whose entries are defined as follows:
Qk[i,j] = the smaller of the length of the preceding path from vi to vj or
the sum of the lengths of the preceding paths from vi to vi and from vk
to vj
More exactly ,
Qk[i,j] =MN(Qk-1[i,j] , Qk-1[I,k]+Qk-1 [k,j])
The initial matrix Q0 is the same as the weight matrix W except that
each 0 in W is replaced by ∞ (or a very , very large number). The final
matrix Qm will be the desired matrix Q
Algorithm: ( Shortest Path Algorithm) A weighted graph G with M
nodes is maintained in memory by its weight matrix W . This
algorithm finds a matrix Q such that Q[I,J] is the length of a shortest
path from node V1 to node VJ INFINITY is a very large number,and
MIN is the minimum value function.
1.Repeat for1,J= 1,2,….,M :[Initializes Q.]
W [I,J]=0, then: Set Q[I,J]:=INFINITY;
Else: SetQ[I,J]:W[I,J]
[End of Loop]
2.Repeat Steps 3and 4for K=1,2,….,M:[Updates Q.]
3. Repeat step 4 for I =1,2…..M
4. Repeat step J = 1,2, …..,M:
Set Q[I,J]:= MN(Q[I,J],Q[I,K] + Q[K,J]).
[End of loop]
[End of step 2 loop]
5.Exit.
3.5 Searching and Sorting
3.5.1 Searching
Searching is the process of finding a particular item in a collection of
items. A search typically answers whether the item is present in the
collection or not. Searching requires a key field such as name, ID,
code which is related to the target item. When the key field of a target
item is found, a pointer to the target item is returned. The pointer may
be an address, an index into a vector or array, or some other indication
of where to find the target. If a matching key field isn’t found, the
user is informed.
Let DATA be a collection of data elements in memory, and suppose a
specific ITEM of information is given. Searching refers to the
operation of finding and location LOC of ITEM in DATA, or printing
some message that ITEM does not appear there. The search said to be
successful if ITEM does appear in DATA and unsuccessful otherwise.
There are many different searching algorithms. the algorithm that one
chooses generally depends on the way the information in DATA is
organised.
The complexity of searching algorithm is measured in terms of the
number f(n) of comparisons required to find ITEM in DATA where
DATA contains n elements.

3.5.2 Types of Searching


The most common searching algorithms are:
a) Linear search
b) Binary search
3.5.3 Linear search
Linear Search Algorithm is the simplest search algorithm. In this
search algorithm a sequential search is made over all the items one by
one to search for the targeted item. Each item is checked in sequence
until the match is found. If the match is found, particular item is
returned otherwise the search continues till the end.
Suppose DATA is a linear array with n elements. Given no
information about DATA, the most intuitive way to search for a given
ITEM in DATA is to compare ITEM with element of DATA one by
one. That is, first we test whether DATA [1] = ITEM, then we test
whether DATA [2] =ITEM, and so on. This method, which traverses
DATA sequentially to locate ITEM, is called linear search or
sequential search.
To simplify the matter, we first assign ITEM to DATA [N+1], the
position following the last element of DATA. Then the outcome
LOC = N +1
Where LOC denotes the location where ITEM first occurs in DATA,
signifies the search is unsuccessful. The purpose of this initial
assignment is to avoid repeatedly testing whether or not we have
reached the end of the array DATA. This way, the search must
eventually “succeed”.
Observe that step 1 guarantees that the loop in step 3 must terminate.
Without step 1, repeat statement in step 3 must be replaced by the
following statement, which involves two comparisons, not one:
Repeat while LOC <= N and DATA[LOC]! = ITEM:
On the other hand, in order to use step 1, one must guarantee that
there is an unused memory location at end of the array DATA:
otherwise, one must use the linear search algorithm.
Algorithm: (Linear search) LINEAR (DATA, N, ITEM, LOC)
Here DATA is a linear array with N elements, and ITEM is a given
item of information. this algorithm finds the location LOC of ITEM in
DATA, or sets LOC: = 0 if the search is unsuccessful.
1. [Insert ITEM at the end of DATA.] Set DATA [N+1]: = ITEM.
2. [Initialize counter.] Set LOC: =1.
3. [Search for ITEM.]
Repeat while DATA [LOC]! = ITEM:
Set LOC: =LOC+1.
[End of loop]
4. [Successful?] if LOC =N+1, then: set LOC: =0.
5. Exit.
Complexity of the linear search algorithm
F(n) = (n+1)/2
3.5.4 Binary search
Binary search is a fast search algorithm with run-time complexity of
Ο (log n). This search algorithm works on the principle of divide and
conquers. For this algorithm to work properly, the data collection
should be in the sorted form.
Binary search looks for a particular item by comparing the middle
most item of the collection. If a match occurs, then the index of item
is returned. If the middle item is greater than the item, then the item is
searched in the sub-array to the left of the middle item. Otherwise, the
item is searched for in the sub-array to the right of the middle item.
This process continues on the sub-array as well until the size of the
subarray reduces to zero.
Working of Binary Search
For a binary search to work, it is mandatory for the target array to be
sorted. We shall learn the process of binary search with a pictorial
example. The following is our sorted array and let us assume that we
need to search the location of value 31 using binary search.

First, we shall determine half of the array by using this formula −


mid = low + (high - low) / 2
Here it is, 0 + (9 - 0 ) / 2 = 4 (integer value of 4.5). So, 4 is the mid of
the array.

Now we compare the value stored at location 4, with the value being
searched, i.e. 31. We find that the value at location 4 is 27, which is
not a match. As the value is greater than 27 and we have a sorted
array, so we also know that the target value must be in the upper
portion of the array.

We change our low to mid + 1 and find the new mid value again.
low = mid + 1
mid = low + (high - low) / 2
Our new mid is 7 now. We compare the value stored at location 7
with our target value 31.

The value stored at location 7 is not a match, rather it is more than


what we are looking for. So, the value must be in the lower part from
this location.

Hence, we calculate the mid again. This time it is 5.

We compare the value stored at location 5 with our target value. We


find that it is a match.

We conclude that the target value 31 is stored at location 5.


Binary search halves the searchable items and thus reduces the count
of comparisons to be made to very less numbers.

Suppose DATA is an array which is stored in increasing numerical


order or, equivalently, alphabetically. Then there is an extremely
efficient searching algorithm, called binary search, which can be used
to find the location LOC of a given ITEM of information in DATA .
Before formally discussing the algorithm, we indicate the general idea
of this algorithm by means of an idealized version of a familiar
everyday example.
The binary search algorithm applied to our array DATA works as
follows. During each stage of our algorithm, our search for ITEM is
reduced to a segment of elements of DATA:
DATA [BEG], DATA[BEG+1],DATA [BEG+2],…….,
DATA[END]
Variables BEG and END denote the beginning and end locations of
the segment under consideration. the algorithm compares ITEM with
the middle element DATA[MID] of the segment , where MID is
obtained by
MID = INT ((BEG+END)/2)
If DATA [MID] = ITEM, then the search is successful and we set
LOC;= MID . otherwise a new segment of DATA is obtained as
follows:
(a) If ITEM < DATA[MID]. Then ITEM can appear ony in the left of
the segment:
DATA[BEG], DATA [BEG+1],.., DATA[MID-1]
So we reset END: =MID – 1 and begin searching again.
(b) If ITEM > DATA[MID], then ITEM can appear only in the right
half of the segment:
DATA [MID +1], DATA [MID+2],…, DATA[END]
So we reset BEG: = MID + 1 and begin searching again.
Initially, we begin with the entire array DATA ; i.e., we begin with
BEG = 1 and END = n , or, more generally , with BEG =LB and END
= UB .
If ITEM is not in DATA , then eventually we obtain
END < BEG
This condition signals that the search is unsuccessful, and in such a
case we assign LOC: = NULL. Here NULL is a value that lies outside
the set of indices of DATA.

Algorithm: (Binary search) BINARY (DATA, LB, UB, ITEM,


LOC)
Here DATA is a stored array with lower bound LB and upper bound
UB, and ITEM is a given item of information. The variables BEG,
END, and MID denote, respectively, the beginning, end and middle
locations of a segment of elements of DATA. This algorithm finds he
location LOC of ITEM in DATA or sets LOC=NULL.
1. [Initialize segment variables.]
Set BEG:= LB, END := UB and MID = INT((BEG+ END)/2).
2. Repeat steps 3 and 4 while BEG<= END and DATA [MID]!=
ITEM.
3. If ITEM < DATA[MID], then
Set END: = MID – 1.
Else:
Set BEG: = MID + 1.
[End of if structure.]
4. Set MID: = INT ((BEG + END)/2)
[End of step 2 loop]
5. If DATA[MID] = ITEM, then:
Set LOC; = MID.
Else:
Set LOC: = NULL.
[end of if structure.]
6. Exit.
Complexity of the binary search algorithm
2f(n) > n or equivalently f [log 2 n] + 1
Limitations of the binary search algorithm
Binary search algorithm is very efficient. the algorithm requires two
condition:
(1) The list must be sorted.
(2) One must have direct access to the middle element in any sub
list.
3.6 Sorting
Sorting is a process of ordering or placing a list of elements from a
collection in some kind of order. It is nothing but storage of data in
sorted order. Sorting can be done in ascending and descending order.
It arranges the data in a sequence which makes searching easier.
Sorting is nothing but arranging the data in ascending or descending
order. The term sorting came into picture, as humans realised the
importance of searching quickly.
There are so many things in our real life that we need to search for,
like a particular record in database, roll numbers in merit list, a
particular telephone number in telephone directory, a particular page
in a book etc. All this would have been a mess if the data was kept
unordered and unsorted, but fortunately the concept of sorting came
into existence, making it easier for everyone to arrange data in an
order, hence making it easier to search.
Sorting arranges data in a sequence which makes searching easier.
3.7 Types of Sorting
There are many different techniques available for sorting,
differentiated by their efficiency and space requirements.
1. Bubble Sort
2. Insertion Sort
3. Selection Sort
4. Quick Sort
5. Merge Sort
6. Heap Sort

3.7.1 Bubble Sort


Bubble Sort is a simple algorithm which is used to sort a given set
of n elements provided in form of an array with n number of
elements. Bubble Sort compares all the element one by one and sort
them based on their values.
If the given array has to be sorted in ascending order, then bubble sort
will start by comparing the first element of the array with the second
element, if the first element is greater than the second element, it
will swap both the elements, and then move on to compare the second
and the third element, and so on.
If we have total n elements, then we need to repeat this process for n-
1 times.
It is known as bubble sort, because with every complete iteration the
largest element in the given array, bubbles up towards the last place or
the highest index, just like a water bubble rises up to the water
surface.
Sorting takes place by stepping through all the elements one-by-one
and comparing it with the adjacent element and swapping them if
required.
Implementing Bubble Sort Algorithm
Following are the steps involved in bubble sort (for sorting a given
array in ascending order):
1. Starting with the first element (index = 0), compare the current
element with the next element of the array.
2. If the current element is greater than the next element of the array,
swap them.
3. If the current element is less than the next element, move to the next
element. Repeat Step 1.

Let's consider an array with values {5, 1, 6, 2, 4, 3}


Below, we have a pictorial representation of how bubble sort will sort
the given array.

So as we can see in the representation above, after the first


iteration, 6 is placed at the last index, which is the correct position for
it.
3.7.2 Insertion Sort
Consider you have 10 cards out of a deck of cards in your hand. And
they are sorted, or arranged in the ascending order of their numbers.
If I give you another card, and ask you to insert the card in just the
right position, so that the cards in your hand are still sorted. What will
you do?
Well, you will have to go through each card from the starting or the
back and find the right position for the new card, comparing it's value
with each card. Once you find the right position, you will insert the
card there.
Similarly, if more new cards are provided to you, you can easily
repeat the same process and insert the new cards and keep the cards
sorted too.
This is exactly how insertion sort works. It starts from the
index 1(not 0), and each index starting from index 1 is like a new
card, that you have to place at the right position in the sorted sub array
on the left.
Following are some of the important characteristics of Insertion
Sort:
1. It is efficient for smaller data sets, but very inefficient for larger
lists.
2. Insertion Sort is adaptive, that means it reduces its total number of
steps if a partially sorted array is provided as input, making it
efficient.
3. It is better than Selection Sort and Bubble Sort algorithms.
4. Its space complexity is less. Like bubble Sort, insertion sort also
requires a single additional memory space.
5. It is a stable sorting technique, as it does not change the relative
order of elements which are equal.

Insertion Sort Working


Following are the steps involved in insertion sort:
1. We start by making the second element of the given array, i.e.
element at index 1, the key. The key element here is the new card that
we need to add to our existing sorted set of cards (remember the
example with cards above).
2. We compare the key element with the element(s) before it, in this
case, element at index 0:
o If the key element is less than the first element, we insert
the key element before the first element.
o If the key element is greater than the first element, then we insert it
after the first element.
3. Then, we make the third element of the array as key and will
compare it with elements to its left and insert it at the right position.
4. And we go on repeating this, until the array is sorted.

Let's consider an array with values {5, 1, 6, 2, 4, 3}


Below, we have a pictorial representation of how bubble sort will sort
the given array.
As we can see in the diagram above, after picking a key, we start
iterating over the elements to the left of the key.
We continue to move towards left if the elements are greater than
the key element and stop when we find the element which is less than
the key element.
And, insert the key element after the element which is less than
the key element.
3.7.3 Selection Sort
Selection sort is conceptually the simplest sorting algorithm. This
algorithm will first find the smallest element in the array and swap it
with the element in the first position, then it will find the second
smallest element and swap it with the element in the second position,
and it will keep on doing this until the entire array is sorted.
It is called selection sort because it repeatedly selects the next-
smallest element and swaps it into the right place.
Working of Selection Sort
Following are the steps involved in selection sort (for sorting a given
array in ascending order):
1. Starting from the first element, we search the smallest element in
the array, and replace it with the element in the first position.
2. We then move on to the second position, and look for smallest
element present in the subarray, starting from index 1, till the last
index.
3. We replace the element at the second position in the original array,
or we can say at the first position in the subarray, with the second
smallest element.
4. This is repeated, until the array is completely sorted.
Let's consider an array with values {3, 6, 1, 8, 4, 5}
Below, we have a pictorial representation of how selection sort will
sort the given array.
In the first pass, the smallest element will be 1, so it will be placed at
the first position.
Then leaving the first element, next smallest element will be
searched, from the remaining elements. We will get 3 as the smallest,
so it will be then placed at the second position.
Then leaving 1 and 3(because they are at the correct position), we will
search for the next smallest element from the rest of the elements and
put it at third position and keep doing this until array is sorted.
Finding Smallest Element in a sub array
In selection sort, in the first step, we look for the smallest element in
the array and replace it with the element at the first position. This
seems doable, isn't it?
Consider that you have an array with following values {3, 6, 1, 8, 4,
5}. Now as per selection sort, we will start from the first element and
look for the smallest number in the array, which is 1 and we will find
it at the index 2. Once the smallest number is found, it is swapped
with the element at the first position.
Well, in the next iteration, we will have to look for the second
smallest number in the array. How can we find the second smallest
number? This one is tricky?
If you look closely, we already have the smallest number/element at
the first position, which is the right position for it and we do not have
to move it anywhere now. So, we can say, that the first element is
sorted, but the elements to the right, starting from index 1 are not.
So, we will now look for the smallest element in the subarray, starting
from index 1, to the last index.
After we have found the second smallest element and replaced it with
element on index 1(which is the second position in the array), we will
have the first two positions of the array sorted.
Then we will work on the sub array, starting from index 2 now, and
again looking for the smallest element in this sub array.
Complexity Analysis of Selection Sort
Selection Sort requires two nested for loops to complete itself,
one for loop is in the function selection Sort, and inside the first loop
we are making a call to another function index Of Minimum, which
has the second(inner) for loop.
Hence for a given input size of n, following will be the time and space
complexity for selection sort algorithm:
Worst Case Time Complexity [ Big-O ]: O(n2)
Best Case Time Complexity [Big-omega]: O(n2)
Average Time Complexity [Big-theta]: O(n2)
Space Complexity: O(1)
3.7.4 Quick Sort
Quick Sort is also based on the concept of Divide and Conquer, just
like merge sort. But in quick sort all the heavy lifting (major work) is
done while dividing the array into subarrays, while in case of merge
sort, all the real work happens during merging the subarrays. In case
of quick sort, the combine step does absolutely nothing.
It is also called partition-exchange sort. This algorithm divides the
list into three main parts:
1. Elements less than the Pivot element
2. Pivot element (Central element)
3. Elements greater than the pivot element
Pivot element can be any element from the array, it can be the first
element, the last element or any random element. In this tutorial, we
will take the rightmost element or the last element as pivot.
For example: In the array {52, 37, 63, 14, 17, 8, 6, 25}, we
take 25 as pivot. So after the first pass, the list will be changed like
this.
{6 8 17 14 25 63 37 52}
Hence after the first pass, pivot will be set at its position, with all the
elements smaller to it on its left and all the elements larger than to its
right. Now 6 8 17 14 and 63 37 52 are considered as two separate
sunarrays, and same recursive logic will be applied on them, and we
will keep doing this until the complete array is sorted.

Quick Sort Working


Following are the steps involved in quick sort algorithm:
1. After selecting an element as pivot, which is the last index of the
array in our case, we divide the array for the first time.
2. In quick sort, we call this partitioning. It is not simple breaking
down of array into 2 subarrays, but in case of partitioning, the array
elements are so positioned that all the elements smaller than
the pivot will be on the left side of the pivot and all the elements
greater than the pivot will be on the right side of it.
3. And the pivot element will be at its final sorted position.
4. The elements to the left and right, may not be sorted.
5. Then we pick subarrays, elements on the left of pivot and elements
on the right of pivot, and we perform partitioning on them by
choosing a pivot in the subarrays.
Let's consider an array with values {9, 7, 5, 11, 12, 2, 14, 3, 10, 6}
Below, we have a pictorial representation of how quick sort will sort
the given array.
In step 1, we select the last element as the pivot, which is 6 in this
case, and call for partitioning, hence re-arranging the array in such a
way that 6 will be placed in its final position and to its left will be all
the elements less than it and to its right, we will have all the elements
greater than it.
Then we pick the subarray on the left and the subarray on the right
and select a pivot for them, in the above diagram, we chose 3 as pivot
for the left subarray and 11 as pivot for the right subarray.
And we again call for partitioning.
3.7.5 Merge Sort
Merge Sort follows the rule of Divide and Conquer to sort a given
set of numbers/elements, recursively, hence consuming less time.
In the last two tutorials, we learned about Selection Sort and Insertion
Sort, both of which have a worst-case running time of O(n2). As the
size of input grows, insertion and selection sort can take a long time
to run.
Merge sort, on the other hand, runs in O(n*log n) time in all the cases.
Before jumping on to, how merge sort works and it's implementation,
first lets understand what is the rule of Divide and Conquer?
Divide and Conquer
If we can break a single big problem into smaller sub-problems, solve
the smaller sub-problems and combine their solutions to find the
solution for the original big problem, it becomes easier to solve the
whole problem.
Let's take an example, Divide and Rule.
When Britishers came to India, they saw a country with different
religions living in harmony, hardworking but naive citizens, unity in
diversity, and found it difficult to establish their empire. So, they
adopted the policy of Divide and Rule. Where the population of India
was collectively a one big problem for them, they divided the problem
into smaller problems, by instigating rivalries between local kings,
making them stand against each other, and this worked very well for
them.
Well that was history, and a socio-political policy (Divide and Rule),
but the idea here is, if we can somehow divide a problem into smaller
sub-problems, it becomes easier to eventually solve the whole
problem.
In Merge Sort, the given unsorted array with n elements, is divided
into n subarrays, each having one element, because a single element is
always sorted in itself. Then, it repeatedly merges these subarrays, to
produce new sorted subarrays, and in the end, one complete sorted
array is produced.
The concept of Divide and Conquer involves three steps:
1. Divide the problem into multiple small problems.
2. Conquer the subproblems by solving them. The idea is to break
down the problem into atomic subproblems, where they are actually
solved.
3. Combine the solutions of the subproblems to find the solution of
the actual problem.

Working of Merge Sort


As we have already discussed that merge sort utilizes divide-and-
conquer rule to break the problem into sub-problems, the problem in
this case being, sorting a given array.
In merge sort, we break the given array midway, for example if the
original array had 6 elements, then merge sort will break it down into
two subarrays with 3 elements each.
But breaking the original array into 2 smaller subarrays is not helping
us in sorting the array.
So, we will break these subarrays into even smaller subarrays, until
we have multiple subarrays with single element in them. Now, the
idea here is that an array with a single element is already sorted, so
once we break the original array into subarrays which has only a
single element, we have successfully broken down our problem into
base problems.
And then we have to merge all these sorted subarrays, step by step to
form one single sorted array.
Let's consider an array with values {14, 7, 3, 12, 9, 11, 6, 12}
Below, we have a pictorial representation of how merge sort will sort
the given array.
In merge sort we follow the following steps:
1. We take a variable p and store the starting index of our array in this.
And we take another variable r and store the last index of array in it.
2. Then we find the middle of the array using the formula (p + r)/2 and
mark the middle index as q, and break the array into two subarrays,
from p to q and from q + 1 to r index.
3. Then we divide these 2 subarrays again, just like we divided our
main array and this continues.
4. Once we have divided the main array into subarrays with single
elements, then we start merging the subarrays.
3.7.6 Heap Sort
Heap Sort is one of the best sorting methods being in-place and with
no quadratic worst-case running time. Heap sort involves building
a Heap data structure from the given array and then utilizing the Heap
to sort the array.
You must be wondering, how converting an array of numbers into a
heap data structure will help in sorting the array. To understand this,
let's start by understanding what is a Heap.
Heap
Heap is a special tree-based data structure, that satisfies the following
special heap properties:
1. Shape Property: Heap data structure is always a Complete Binary
Tree, which means all levels of the tree are fully filled.

2. Heap Property: All nodes are either greater than or equal


to or less than or equal to each of its children. If the parent nodes are
greater than their child nodes, heap is called a Max-Heap, and if the
parent nodes are smaller than their child nodes, heap is called Min-
Heap.
Working of Heap Sort
Heap sort algorithm is divided into two basic parts:
 Creating a Heap of the unsorted list/array.
 Then a sorted array is created by repeatedly removing the
largest/smallest element from the heap, and inserting it into the array.
The heap is reconstructed after each removal.
Initially on receiving an unsorted list, the first step in heap sort is to
create a Heap data structure (Max-Heap or Min-Heap). Once heap is
built, the first element of the Heap is either largest or smallest
(depending upon Max-Heap or Min-Heap), so we put the first element
of the heap in our array. Then we again make heap using the
remaining elements, to again pick the first element of the heap and put
it into the array. We keep on doing the same repeatedly until we have
the complete sorted list in our array.
In the below algorithm, initially heap sort () function is called, which
calls heapify() to build the heap.
3.8 Hashing
Hashing is the process of mapping large amount of data item to
smaller table with the help of hashing function.
Hashing is also known as Hashing Algorithm or Message Digest
Function.
It is a technique to convert a range of key values into a range of
indexes of an array.
It is used to facilitate the next level searching method when compared
with the linear or binary search.
Hashing allows to update and retrieve any data entry in a constant
time O(1).
Constant time O(1) means the operation does not depend on the size
of the data.
Hashing is used with a database to enable items to be retrieved more
quickly.
It is used in the encryption and decryption of digital signatures.

3.9 Hash Functions


 A fixed process converts a key to a hash key is known as a Hash
Function.
 This function takes a key and maps it to a value of a certain length
which is called a Hash value or Hash.
 Hash value represents the original string of characters, but it is
normally smaller than the original.
 It transfers the digital signature and then both hash value and
signature are sent to the receiver. Receiver uses the same hash
function to generate the hash value and then compares it to that
received with the message.
 If the hash values are same, the message is transmitted without
errors.
Hash Table
 Hash table or hash map is a data structure used to store key-value
pairs.
 It is a collection of items stored to make it easy to find them later.
 It uses a hash function to compute an index into an array of buckets
or slots from which the desired value can be found.
 It is an array of list where each list is known as bucket.
 It contains value based on the key.
 Hash table is used to implement the map interface and extends
Dictionary class.
 Hash table is synchronized and contains only unique elements.

 The above figure shows the hash table with the size of n = 10. Each
position of the hash table is called as Slot. In the above hash table,
there are n slots in the table, names = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. Slot
0, slot 1, slot 2 and so on. Hash table contains no items, so every slot
is empty.
 As we know the mapping between an item and the slot where item
belongs in the hash table is called the hash function. The hash
function takes any item in the collection and returns an integer in the
range of slot names between 0 to n-1.
 Suppose we have integer items {26, 70, 18, 31, 54, 93}. One
common method of determining a hash key is the division method of
hashing and the formula is:
Hash Key = Key Value % Number of Slots in the Table
 Division method or reminder method takes an item and divides it by
the table size and returns the remainder as its hash value.

Data Item Value % No. of Slots Hash Value


26 26 % 10 = 6 6
70 70 % 10 = 0 0
18 18 % 10 = 8 8
31 31 % 10 = 1 1
54 54 % 10 = 4 4
93 93 % 10 = 3 3

 After computing the hash values, we can insert each item into the
hash table at the designated position as shown in the above figure. In
the hash table, 6 of the 10 slots are occupied, it is referred to as the
load factor and denoted by, λ = No. of items / table size. For
example , λ = 6/10.
 It is easy to search for an item using hash function where it
computes the slot name for the item and then checks the hash table to
see if it is present.
 Constant amount of time O(1) is required to compute the hash value
and index of the hash table at that location.
Linear Probing
 Take the above example, if we insert next item 40 in our collection,
it would have a hash value of 0 (40 % 10 = 0). But 70 also had a hash
value of 0, it becomes a problem. This problem is called
as Collision or Clash. Collision creates a problem for hashing
technique.
 Linear probing is used for resolving the collisions in hash table,
data structures for maintaining a collection of key-value pairs.
 Linear probing was invented by Gene Amdahl, Elaine M. McGraw
and Arthur Samuel in 1954 and analyzed by Donald Knuth in 1963.
 It is a component of open addressing scheme for using a hash table
to solve the dictionary problem.
 The simplest method is called Linear Probing. Formula to compute
linear probing is:
P = (1 + P) % (MOD) Table_size
For example,

If we insert next item 40 in our collection, it would have a hash value


of 0 (40 % 10 = 0). But 70 also had a hash value of 0, it becomes a
problem.
Linear probing solves this problem:
P = H(40)
44 % 10 = 0
Position 0 is occupied by 70. so we look elsewhere for a position to
store 40.
Using Linear Probing:
P= (P + 1) % table-size
0 + 1 % 10 = 1
But, position 1 is occupied by 31, so we look elsewhere for a position
to store 40.
Using linear probing, we try next position : 1 + 1 % 10 = 2
Position 2 is empty, so 40 is inserted there.

3.10 Collison
Since a hash function gets us a small number for a key which is a big
integer or string, there is a possibility that two keys result in the same
value. The situation where a newly inserted key maps to an already
occupied slot in the hash table is called collision and must be handled
using some collision handling technique.
A situation when the resultant hashes for two or more data elements
in the data set U, maps to the same location in the has table, is called
a hash collision. In such a situation two or more data elements would
qualify to be stored/mapped to the same location in the hash table.

3.11 Collision Resolution techniques

Open Hashing (Separate chaining)


Open Hashing, is a technique in which the data is not directly stored
at the hash key index (k) of the Hash table. Rather the data at the key
index (k) in the hash table is a pointer to the head of the data structure
where the data is actually stored. In the most simple and common
implementations the data structure adopted for storing the element is a

linked-list.
In this technique when a data needs to be searched, it might become
necessary (worst case) to traverse all the nodes in the linked list to
retrieve the data.
Note that the order in which the data is stored in each of these linked
lists (or other data structures) is completely based on implementation
requirements. Some of the popular criteria are insertion order,
frequency of access etc.
closed hashing (open Addressing)
In this technique a hash table with pre-identified size is considered.
All items are stored in the hash table itself. In addition to the data,
each hash bucket also maintains the three states: EMPTY,
OCCUPIED, DELETED. While inserting, if a collision occurs,
alternative cells are tried until an empty bucket is found. For which
one of the following techniques is adopted.
1. Liner Probing
2. Quadratic probing
3. Double hashing
A COMPARATIVE ANALYSIS OF CLOSED HASHING
VS OPEN HASHING
Open Addressing Closed Addressing
All elements would be stored in
Additional Data structure needs
the Hash table itself. No
to be used to accommodate
additional data structure is
collision data.
needed.
Simple and effective approach to
In cases of collisions, a unique
collision resolution. Key may or
hash key must be obtained.
may not be unique.
Determining size of the hash Performance deterioration of
table, adequate enough for closed addressing much slower
storing all the data is difficult. as compared to Open addressing.
State needs be maintained for the No state data needs to be
data (additional work) maintained (easier to maintain)
Uses space efficiently Expensive on space

3.12 Perfect Hashing


Perfect hashing is implemented using two hash tables, one at each
level. Each of the table uses universal hashing. The first level is the
same a hashing with chaining such that n elements is hashed into m
slots in the hash table. This is done using a has function selected from
a universal family of hash functions.
The second level of uses a second hash table (instead of a linked list
used in chaining). Elements that hash to the same slot j in the first
hash table are stored in a second hash table. This second hash table is
known as secondary hash table. The hash function hj is
carefully chosen such that there is no collision in the secondary table.
To ensure there are no collisions in the secondary hash table Sj, we
need to make the size mj of the secondary table equal to the square of
the number of keys hashing into slot j in the first table. That is:
mj = nj2
The hash functions for the primary hash table is carefully chosen so
that we limit the expected total amount of space used to be O(n).

You might also like