0% found this document useful (0 votes)
35 views72 pages

Original Copy As On 27.11.2024

Uploaded by

saranya.cse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views72 pages

Original Copy As On 27.11.2024

Uploaded by

saranya.cse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 72

LECTURE NOTES

ON

23CS312-DESIGN AND ANALYSIS OF ALGORITHMS

III Semester

Computer Science & Engineering

Prepared by

V.SARANYA,M.TECH

RAGAVI.,M.E
INDEX

1.1 Understanding of Algorithm


1.2 Algorithm analysis
1.2.1 Space Complexity
1.2.2 Time Complexity
1.3 Asymptotic notation
1.4 Solving Recurrence (Recurrence relation)
1.4.1 Substution method
1.5 Lower and Upper Bound
1.6 HASH FUNCTION
1.7 Searching
1.7.1 Linear Search

1.7.2 Binary Search


1.7.3 Interpolation Search

1.8 Pattern Search

1.8.1 Naïve pattern searching

1.8.2 Rabin-Karp matching pattern


1.9 Sorting
1.9.1 Insertion sort

2.GRAPH ALGORITHM

2.1 Graph traversal

2.1.1 BFS(Breadth first search)


2.1.2 DFS(Depth First Traversal
2.3 Minimum Spanning Tree
2.3.1 Prim’s Algorithm
2.3.2 Kruskal Algorithm

2.4 Shortest Path Algorithm


2.4.1 Bellman Algorithm
2.4.2 Dijkstra Algorithm
2.4.3 Floyd Warshall Algorithm
2.5 Maximum Flow
2.6 Network Flow

2.7 Maximum Bipartite Matching


Introduction
Algorithm: The word algorithm came from the name of a Persian mathematician Abu Jafar
Mohammed Ibn Musa Al Khowarizmi (ninth century). An algorithm is simply s set of rules used to
perform some calculations either by hand or more usually on a machine (computer).

Definition: An algorithm is a finite set of instructions that accomplishes a particular task. Another
definition is a sequence of unambiguous instructions for solving a problem i.e, for obtaining a
required output for any legitimate (genuine) input in a finite amount of time.
In addition all algorithms must satisfy the following criteria (characteristics).

Input: zero or more quantities are externally supplied as input.


Consider Fibonacci numbers program, here aim of the problem is to display ten Fibonacci
numbers. No input is required; in the problem itself this is clearly mentioned as ten Fibonacci values.
So zero items required for input.
Another problem is displaying given numbers of evens, so user should accept how many
evens required. Based on the user input the number of evens is to be displayed. So, one data item is
required as input.
Output: At least one quantity is produced by given algorithm as output.
In the case of Fibonacci numbers program after executing the program, first ten Fibonacci
values displayed as output.
In second case, based on user input it should display given number of evens. An input of
negative number is wrong, should display proper error message as output. So this program displays at
least one output as error message, or number if outputs that show given number of steps.
1. Definiteness: Each instruction is clear and unambiguous i.e. each step must be easy to
understand and convey only a single meaning.
2. Effectiveness: each instruction must be very basic, so that it can be carried out by a
person using only pencil and paper.
This step is common in both Fibonacci and primes. For example, if user enters a negative
numbers as input in evens, if you have a step like
Step: If N < 0 then Go
to ERROR
A wrong instruction given as go to ERROR, those kinds of instructions should not be there in
an algorithm.
3. Finiteness: If we can trace out the instructions of an algorithm then for all cases, the
algorithm terminate after a finite number of steps.
Either in the case of Fibonacci or even numbers problem should be solved in some number of
steps. For example, continuous display or Fibonacci series without termination leads to abnormal
termination.

Criteria for Algorithms


Input: Zero or more inputs
Output: At least one output.
Finiteness: N number of steps.
Definiteness: Clear algorithm step.
Effectiveness: A carried out step.

An algorithm is a finite set of instructions that accomplishes a particular task. Another definition is a
sequence of unambiguous instructions for solving a problem i.e, for obtaining a required output for any
legitimate (genuine) input in a finite amount of time.
Algorithmic is the backend concept of the program or it is just like the recipe of the program.
1.1 Understanding of Algorithm
An algorithm is a sequence of unambiguous instructions for solving a problem. i.e., for obtaining a
required output for any legitimate input in a finite amount of time.There are various methods to solve the
same problem. The important points to be remembered are:
1. The non-ambiguity requirement for each step of an algorithm cannot be compromised.
2. The range of input for which an algorithm works has to be specified carefully.
3. The same algorithm can be represented in different ways.
4. Several algorithms for solving the same problem may exist.
5. Algorithms for the same problem can be based on very different ideas and
can solve the problem with dramatically different speeds.

The example here is to find the gcd of two integers with three different ways: The gcd of
two nonnegative, not-both –zero integers m & n, denoted as gcd (m, n) is defined as the
largest integer that divides both m & n evenly, i.e., with a remainder of zero.
Euclid of Alexandria outlined an algorithm, for solving this problem in one of the
volumes of his Elements.
Gcd (m, n) = gcd (n,

m mod n) is applied repeatedly until m mod n is equal

to 0;

since gcd (m, o) = m. {the last value of m is also the gcd of the initial m & n.}

The structured description of this algorithm is:


Step 1: If n=0, return the value of m as the answer and stop; otherwise, proceed to step2.
Step 2: Divide m by n and assign the value of the remainder to r.
Step 3: Assign the value of n to m and the value of r to n. Go to step 1.

Euclid”s algorithm : ALGORITHM Euclid(m, n)


//Computes gcd(m, n) by Euclid’s algorithm
//Input: Two nonnegative, not-both-zero integers m and n
//Output: Greatest common divisor of m and n
while n _= 0 do
r←m mod
n m←n n←r
return m

nd
This algorithm comes to a stop, when the 2 no becomes 0. The second number of the pair gets
smaller with each iteration and it cannot become negative. Indeed, the new value of n on the next
iteration is m mod n, which is always smaller than n. hence, the value of the second number in
the pair eventually becomes 0, and the algorithm stops.

Example: gcd (60, 24) = gcd (24,12) = gcd (12,0) = 12.

The second method for the same problem is: obtained from the definition itself. i.e., gcd of m & n
is the largest integer that divides both numbers evenly. Obviously, that number cannot be greater
than the second number (or) smaller of these two numbers,which we will denote by t = min {m,
n}. So start checking whether t divides both m and n: if it does t is the answer ; if it doesn’t t is
decreased by 1 and try again. (Do this repeatedly till you reach 12 and then stop for the example
given below)

Consecutive integer checking algorithm:


Step 1: Assign the value of min {m, n} to t.
Step 2: Divide m by t. If the remainder of this division is 0, go to step 3; otherwise go to step 4.
Step 3: Divide n by t. If the remainder of this division is 0, return the value of t as the answer
and stop; otherwise, proceed to step 4.
Step 4: Decrease the value of t by 1. Go to step 2.
Note: this algorithm, will not work when one of its input is zero. So we have to specify the range
of input explicitly and carefully.
The third procedure is as follows:
Step 1: Find the prime factors of
m. Step 2: Find the prime factors
of n.
Step 3: Identify all the common factors in the two prime expansions found in step 1 & 2. (If p
is a common factor occurring pm & pn times is m and n, respectively,it should be repeated
min { pm, pn } times.).
Step 4: Compute the product of all the common factors and return it as gcd of the numbers given.

Example: 60 = 2.2.3.5
24 = 2.2.2.3

gcd (60,24) = 2.2.3 = 12 .

This procedure is more complex and ambiguity arises since the prime factorization is not
defined. So to make it as an efficient algorithm, incorporate the algorithm to find the prime
factors.

1.2 Algorithm Analysis


Algorithm analysis is the process of determining the time and space complexity of an algorithm, which
are measures of the algorithm's efficiency. Time complexity refers to the amount of time it takes for an
algorithm to run as a function of the size of the input, and is typically expressed using big O notation. Space
complexity refers to the amount of memory required by an algorithm as a function of the size of the input, and
is also typically expressed using big O notation. To analyze the time complexity of an algorithm, we need to
consider the number of operations performed by the algorithm, and how the number of operations changes as
the size of the input increases. This can be done by counting the number of basic operations performed in the
algorithm, such as comparisons, assignments, and function calls. The number of basic operations is then used
to determine the algorithm's time complexity using big O notation. To analyze the space complexity of an
algorithm, we need to consider the amount of memory used by the algorithm, and how the amount of memory
used changes as the size of the input increases. This can be done by counting the number of variables used by
the algorithm, and how the number of variables used changes as the size of the input increases. The amount of
memory used is then used to determine the algorithm's space complexity using big O notation. It's important to
note that analyzing the time and space complexity of an algorithm is a way to evaluate the efficiency of an
algorithm and trade-off between time and space, but it is not a definitive measure of the actual performance of
the algorithm, as it depends on the specific implementation of the algorithm, the computer and the input. Time
and Space Complexity.

Performance Analysis:
Performance analysis or analysis of algorithms refers to the task of determining the efficiency of an algorithm i.,e
how much computing time and storage an algorithm requires to run (or execute). This analysis of algorithm helps
in judging the value of one algorithm over another.
To judge an algorithm, particularly two things are taken into consideration
1. Space complexity
2. Time complexity.

1.2.1 Space Complexity: The space complexity of an algorithm (program) is the amount of memory it needs to
run to completion.
The space needed by an algorithm has the following components.
1. Instruction Space.
2. Data Space.
3. Environment Stack Space.

1. Instruction Space: Instruction space is the space needed to store the compiled version of the program
instructions. The amount of instruction space that is needed depends on factors such as
i). The compiler used to compile the program into machine code.
ii). The compiler options in effect at the time of compilation.
iii). The target computer, i.,e computer on which the algorithm run. Note that, one compiler may produce less
code as compared to another compiler, when the same program is compiled by these two.

2.Data Space: Data space is the space needed to store all constant and variable values.

Data space has two components.


i). Space needed by constants, for example 0, 1, 2.134.
ii). Space needed by dynamically allocated objects such as arrays, structures, classes. Environmental Stack Space:
Environmental stack space is used during execution of functions. Each time function is involved the following data
are saved as the environmental stack.
i). The return address.
ii). Value of local variables.
iii). Value of formal parameters in the function being invoked. Environmental stack space is mainly used in
recursive functions.

Thus, the space requirement of any program p may therefore be written as Space complexity

S(P) = C + Sp (Instance characteristics).

This equation shows that the total space needed by a program is divided into two parts.
 Fixed space requirements(C) is independent of instance characteristics of the inputs and outputs. - Instruction
space - Space for simple variables, fixed-size structure variables, constants.
 A variable space requirements (SP(1)) dependent on instance characteristics..
This part includes dynamically allocated space and the recursion stack space.

Example of instance character is:

Algorithm NEC (float x, float y, float z)


{
Return (X + Y +Y * Z + (X + Y +Z)) /(X+ Y) + 4.0;

In the above algorithm, there are no instance characteristics and the space needed by X, Y, Z is independent of
instance characteristics, therefore we can write, S(XYZ) =3+0=3 One space each for X, Y and Z Space
complexity is O(1).

Examples: 2
Algorithm ADD ( float [], int n)
{
sum = 0.0;
for i=1 to n do
sum=sum+X[i];
return sum;
}

Here, atleast n words since X must be large enough to hold the n elements to be summed. Here the problem
instances is characterized by n, the number of elements to be summed. So, we can write,

S(ADD) =3+n
3-one each for n,
I and sum Where n- is for array X[],
Space complexity is O(n).

1.2.2 Time Complexity :


The time complexity of an algorithm is the amount of compile time it needs to run to completion. We
can measure time complexity of an algorithm in two approaches 1. Priori analysis or compile time 2.
Posteriori analysis or run (execution) time. In priori analysis before the algorithm is executed we will
analyze the behavior of the algorithm. A priori analysis concentrates on determining the order if
execution of statements. In Posteriori analysis while the algorithm is executed we measure the
execution time. Posteriori analysis gives accurate values but it is very costly. As we know that the
compile time does not depend on the size of the input. Hence, we will confine ourselves to consider
only the run-time which depends on the size of the input and this run-time is denoted by TP(n).

Hence Time complexity T(P) = C + TP(n).

The time (T(P)) taken by a program P is the sum of the compile time and execution time. The compile
time does not depend on the instance characteristics, so we concentrate on the runtime of a program.
This runtime is denoted by tp (instance characteristics).

The following equation determines the number of addition, subtraction, multiplication, division
compares, loads stores and so on, that would be made by the code for p.

tp(n) = CaADD(n)+ CsSUB(n)+ CmMUL(n)+ CdDIV(n)+……………..

where n denotes instance characteristics, and Ca, Cs, Cm, Cd and so on….. As denote the time needed
for an addition, subtraction, multiplication, division and so on, and ADD, SUB, MUL, DIV and so on,
are functions whose values are the number of additions, subtractions, multiplications, divisions and so
on. But this method is an impossible task to find out time complexity.

Method 1: introduce a global variable “count”, which is initialized to zero. So each time a statement in the signal
program is executed, count is incremented by the step count of that statement.

Example: Algorithm Sum(a, n)


{
s:=0;
for i:=1 to n do
{
s:=s+a[i];
}
return s;
}

Algorithm sum with count statement added


count:=0;
Algorithm Sum(a,n)
{
s:=0;
count:=count+1;
for i:=1 to n do
{
count:=count +1;
s:=s+a[i]; count:=count+1;
}
count:=count+1; //for last time of for loop
count:=count+1; //for return statement
return s;

} Thus the total number of steps are 2n+3

Method 2:

Statement S/e Frequency Total steps

1. Algorithm Sum(a, n) 0 - 0

2. { 0 - 0

3. s:=0; 1 1 1

4. for i:=1 to n do 1 n+1 n+1

5. s:=s+a[i]; 1 N N

6. return s; 1 1 1

7. } 0 - 0

Total 2n+3 steps

The second method to determine the step count of an algorithm is to build a table in which we list
the total number of steps contributed by each statement.

The S/e (steps per execution) of a statement is the amount by which the count changes as a
result of the execution of that statement. The frequency determines the total number of times each
statement is executed.
Complexity of Algorithms:
1. Best Case: Inputs are provided in such a way that the minimum time is required to
process them.
2. Average Case: The amount of time the algorithm takes on an average set of inputs.
3. Worst Case: The amount of time the algorithm takes on the worst possible set of inputs.
Example: Linear Search

3 4 5 6 7 9 10 12 15
A 1 2 3 4 5 6 7 8 9
Best Case: If we want to search an element 3, whether it is present in the array or not. First, A(1) is
compared with 3, match occurs. So the number of comparisons is only one. It is observed that search
takes minimum number of comparisons, so it comes under best case.
Time complexity is O(1).
verage Case: If we want to search an element 7, whether it is present in the array or not.
First, A(1) is compared with 7 i,.e, (3=7), no match occurs. Next, compare A(2) and 7, no match
occurs. Compare A(3) and A(4) with 7, no match occurs. Up to now 4 comparisons takes place. Now
compare A(5) and 7 (i.,e, 7=7), so match occurs. The number of comparisons is 5. It is observed that
search takes average number of comparisons. So it comes under average case.
Note: If there are n elements, then we require n/2 comparisons.
.
. . Time complexity is O n = O(n) (we neglect constant)
2
Worst Case: If we want to search an element 15, whether it is present in the array or not.
First, A(1) is compared with 15 (i.,e, 3=15), no match occurs. Continue this process until either
element is found or the list is exhausted. The element is found at 9 th comparison. So number of
comparisons are 9.
Time complexity is O(n).
Note: If the element is not found in array, then we have to search entire array, so it comes under
worst case.

Time and Space Complexity Notation


Time complexity is a measure of how long an algorithm takes to run as a function of the
size of the input. It is typically expressed using big O notation, which describes the upper
bound on the growth of the time required by the algorithm. For example, an algorithm with
a time complexity of O(n) takes longer to run as the input size (n) increases.
There are different types of time complexities:
 O(1) or constant time: the algorithm takes the same amount of time to run regardless of the
size of the input.
 O(log n) or logarithmic time: the algorithm's running time increases logarithmically with
the size of the input.
 O(n) or linear time: the algorithm's running time increases linearly with the size of the input.
 O(n log n) or linear logarithmic time: the algorithm's running time increases linearly with
the size of the input and logarithmically with the size of the input.
 O(n^2) or quadratic time: the algorithm's running time increases quadratically with the size
of the input.

 O(2^n) or exponential time: the algorithm's running time increases exponentially with the
size of the input.

Space complexity, on the other hand, is a measure of how much memory an algorithm uses
as a function of the size of the input. Like time complexity, it is typically expressed using
big O notation. For example, an algorithm with a space complexity of O(n) uses more
memory as the input size (n) increases. Space complexities are generally categorized as:
 O(1) or constant space: the algorithm uses the same amount of memory regardless of the
size of the input.
 O(n) or linear space: the algorithm's memory usage increases linearly with the size of the
input.
 O(n^2) or quadratic space: the algorithm's memory usage increases quadratically with the
size of the input.
 O(2^n) or exponential space: the algorithm's memory usage increases exponentially with the

1.3 Asymptotic Notation

Asymptotic Notation is used to describe the running time of an algorithm - how much time an algorithm takes with a
given input, n. There are three different notations: big O, big Theta (Θ), and big Omega (Ω).

 Big O notation (O(f(n))) provides an upper bound on the growth of a function. It describes
the worst-case scenario for the time or space complexity of an algorithm. For example, an
algorithm with a time complexity of O(n^2) means that the running time of the algorithm is
at most n^2, where n is the size of the input.
 Big Ω notation (Ω(f(n))) provides a lower bound on the growth of a function. It describes
the best-case scenario for the time or space complexity of an algorithm. For example, an
algorithm with a space complexity of Ω(n) means that the memory usage of the algorithm
is at least n, where n is the size of the input.
 Big Θ notation (Θ(f(n))) provides a tight bound on the growth of a function. It describes the
average-case scenario for the time or space complexity of an algorithm. For example, an
algorithm with a time complexity of Θ(n log n) means that the running time of the
algorithm is both O(n log n) and Ω(n log n), where n is the size of the input.

It's important to note that the asymptotic notation only describes the behavior of the
function for large values of n, and does not provide information about the exact behavior of
the function for small values of n. Also, for some cases, the best, worst and average cases
can be the same, in that case the notation will be simplified to O(f(n)) = Ω(f(n)) = Θ(f(n))

Additionally, these notations can be used to compare the efficiency of different algorithms,
where a lower order of the function is considered more efficient. For example, an algorithm
with a time complexity of O(n) is more efficient than an algorithm with a time complexity
of O(n^2).
It's also worth mentioning that asymptotic notation is not only limited to time and space
complexity but can be used to express the behavior of any function, not just algorithms.
There are three asymptotic notations that are used to represent the time complexity of an
algorithm. They are:

 Input: Here our input is an integer array of size "n" and we have one integer "k" that
we need to search for in that array.
 Output: If the element "k" is found in the array, then we have return 1, otherwise we have

// for-loop to iterate with each element in the array


for (int i = 0; i < n; ++i)
{
// check if ith element is equal to "k" or not
if (arr[i] == k)
return 1; // return 1, if you find "k"

}
return 0; // return 0, if you didn't find "k"
}

 f the input array is [1, 2, 3, 4, 5] and you want to find if "1" is present in the array or not,
then the if-condition of the code will be executed 1 time and it will find that the element 1
is there in the array. So, the if-condition will take 1 second here.
 If the input array is [1, 2, 3, 4, 5] and you want to find if "3" is present in the array or not,
then the if-condition of the code will be executed 3 times and it will find that the element 3
is there in the array. So, the if-condition will take 3 seconds here.
 If the input array is [1, 2, 3, 4, 5] and you want to find if "6" is present in the array or not,
then the if-condition of the code will be executed 5 times and it will find that the element 6
is not there in the array and the algorithm will return 0 in this case. So, the if-condition will
take 5 seconds here.

As we can see that for the same input array, we have different time for different values of
"k". So, this can be divided into three cases:
 Best case: This is the lower bound on running time of an algorithm. We must know the
case that causes the minimum number of operations to be executed. In the above example,
our array was [1, 2, 3, 4, 5] and we are finding if "1" is present in the array or not. So here,
after only one comparison, we will get that ddelement is present in the array. So, this is the
best case of our algorithm.

 Average case: We calculate the running time for all possible inputs, sum all the calculated
values and divide the sum by the total number of inputs. We must know (or predict)
distribution of cases.
 Worst case: This is the upper bound on running time of an algorithm. We must know the
case that causes the maximum number of operations to be executed. In our example, the
worst case can be if the given array is [1, 2, 3, 4, 5] and we try to find if element "6" is
present in the array or not. Here, the if-condition of our loop will be executed 5 times and
then the algorithm will give "0" as output.
So, we learned about the best, average, and worst case of an algorithm. Now, let's get back to the
asymptotic notation where we saw that we use three asymptotic notation to represent the complexity
of an algorithm i.e. Θ Notation (theta), Ω Notation, Big O Notation

NOTE: In the asymptotic analysis, we generally deal with large input size.

Θ Notation (theta)
The Θ Notation is used to find the average bound of an algorithm i.e. it defines an upper
bound and a lower bound, and your algorithm will lie in between these levels. So, if a
function is g(n), then the theta representation is shown as Θ(g(n)) and the relation is shown
as:
Θ(g(n)) = { f(n): there exist positive constants c1, c2 and n0

Ω Notation
The Ω notation denotes the lower bound of an algorithm i.e. the time taken by the
algorithm can't be lower than this. In other words, this is the fastest time in which the
algorithm will return a result.

Its the time taken by the algorithm when provided with its best-case input. So, if a function
is g(n), then the omega representation is shown as Ω(g(n)) and the relation is shown as:
Ω(g(n)) = { f(n): there exist positive constants
c and n0 such that 0 ≤ cg(n) ≤ f(n) for all
n ≥ n0 }
The above expression can be read as omega of g(n) is defined as set of all the functions
f(n) for which there exist some constants c and n0 such that c*g(n) is less than or equal to
f(n), for all n greater than or equal to n0.
if f(n) = 2n² + 3n
+ 1 and g(n) = n²
then for c = 2 and n0 = 1, we can say that f(n) = Ω(n²)

Big O Notation
The Big O notation defines the upper bound of any algorithm i.e. you algorithm can't take
more time than this time. In other words, we can say that the big O notation denotes the
maximum time taken by an algorithm or the worst-case time complexity of an algorithm.
So, big O notation is the most used notation for the time complexity of an algorithm. So, if
a function is g(n), then the big O representation of g(n) is shown as O(g(n)) and the relation
is shown as:
O(g(n)) = { f(n): there exist positive constants
c and n0 such that 0 ≤ f(n) ≤ cg(n) for all
n ≥ n0 }
The above expression can be read as Big O of g(n) is defined as a set of functions f(n) for
which there exist some constants c and n0 such that f(n) is greater than or equal to 0 and
f(n) is smaller than or equal to c*g(n) for all n greater than or equal to n0.
if f(n) = 2n² + 3n
+ 1 and g(n) = n²
then for c = 6 and n0 = 1, we can say that f(n) = O(n²)
Big O notation example of Algorithms

Big O notation is the most used notation to express the time complexity of an algorithm. In
this section of the blog, we will find the big O notation of various algorithms.
Example 1: Finding the sum of the first n numbers.

In this example, we have to find the sum of first n numbers. For example, if n = 4, then our
output should be 1 + 2 + 3 + 4 = 10. If n = 5, then the ouput should be 1 + 2 + 3 + 4 + 5 =
15. Let's try various solutions to this code and try to compare all those codes.
O(1) solution
// function taking input "n"
int findSum(int n)
{
return n * (n+1) / 2; // this will take some constant time c1
}

In the above code, there is only one statement and we know that a statement takes constant
time for its execution. The basic idea is that if the statement is taking constant time, then it
will take the same amount of time for all the input size and we denote this as O(1) .
O(n) solution

In this solution, we will run a loop from 1 to n and we will add these values to a variable
named "sum".
// function taking input "n"
int findSum(int n)
{
int sum = 0; //----------------> it takes some constant time "c1"
for(int i = 1; i <= n; ++i) // --> here the comparision and increment will take place n times(c2*n)
and the creation of i takes place with some constant time
sum = sum + i; //----------> this statement will be executed n times i.e. c3*n

return sum; //-----------------> it takes some constant time "c4"


}
/*

* Total time taken = time taken by all the statments to execute

* here in our example we have 3 constant time taking statements i.e. "sum = 0", "i = 0", and "return
sum", so we can add all the constatnts and replacce with some new constant "c"

* apart from this, we have two statements running n-times i.e. "i < n(in real n+1)" and "sum = sum
+ i" i.e. c2*n + c3*n = c0*n

* Total time taken = c0*n + c

*/

The big O notation of the above code is O(c0*n) + O(c), where c and c0 are constants. So,
the overall time complexity can be written as O(n) .
O(n²) solution

In this solution, we will increment the value of sum variable "i" times i.e. for i = 1, the sum
variable will be incremented once i.e. sum = 1. For i = 2, the sum variable will be
incremented twice. So, let's see the solution.
// function taking input "n"
int findSum(int n)
{
int sum = 0; //--------------------> constant time
for(int i = 1; i <= n; ++i)
for(int j = 1; j <= i; ++j)
sum++; //------------------> it will run [n * (n + 1) / 2]
return sum; //---------------------> constant time
}
/*

* Total time taken = time taken by all the statments to execute

* the statement that is being executed most of the time is "sum++" i.e. n * (n + 1) / 2

* So, total complexity will be: c1*n² + c2*n + c3 [c1 is for the constant terms of n², c2 is for the
constant terms of n, and c3 is for rest of the constant time]
*/

The big O notation of the above algorithm is O(c1*n²) +O( c2*n) + O(c3). Since we take
the higher order of growth in big O. So, our expression will be reduced to O(n²) .

So, until now, we saw 3 solutions for the same problem. Now, which algorithm will you
prefer to use when you are finding the sum of first "n" numbers? If your answer is O(1)
solution, then we have one bonus section for you at the end of this blog. We would prefer
the O(1) solution because the time taken by the algorithm will be constant irrespective of
the input size.
1.4 Recurrence Relation
A recurrence relation is a mathematical equation that describes the relation between the
input size and the running time of a recursive algorithm. It expresses the running time of a
problem in terms of the running time of smaller instances of the same problem.
A recurrence relation typically has the form T(n) = aT(n/b) + f(n) where:
 T(n) is the running time of the algorithm on an input of size n
 a is the number of recursive calls made by the algorithm
 b is the size of the input passed to each recursive call
 f(n) is the time required to perform any non-recursive operations

The recurrence relation can be used to determine the time complexity of the algorithm
using techniques such as the Master Theorem or Substitution Method.
For example, let's consider the problem of computing the nth Fibonacci number. A simple
recursive algorithm for solving this problem is as follows:

Fibonacci(n)
if n <= 1
return n else
return Fibonacci(n-1) + Fibonacci(n-2)
The recurrence relation for this algorithm is T(n) = T(n-1) + T(n-2) + O(1), which describes the
running time of the algorithm in terms of the running time of the two smaller instances of the
problem with input sizes n-1 and n-2. Using the Master Theorem, it can be shown that the time
complexity of this algorithm is O(2^n) which is very inefficient for large input sizes.

What is a Recurrence Relation?


Whenever any function makes a recursive call to itself, its time can be computed by a
Recurrence Relation. Recurrence Relation is simply a mathematical relation/equation that
can give the value of any term in terms of some previous smaller terms. For example,
T(n) = T(n-1) + N
It is a recurrence relation because the value of the nth term is given in its previous term
i.e (n-1)the term.
Types of Recurrence Relation:
There are different types of recurrence relation that can be possible in the mathematical world.
Some of them are-
1. Linear Recurrence Relation: In case of Linear Recurrence Relation every term is
dependent linearly on its previous term. Example of Linear Recurrence Relation can be
T(n) = T(n-1) + T(n-2) + T(n-3)
2. Divide and Conquer Recurrence Relation: It the type of Recurrence Relation which is
obtained from Divide and Conquer Algorithm. Example of such recurrence relation can be
T(n) = 3T(n/2) + 9n
3. First Order Recurrence Relation: It is the type of recurrence relation in which every term
is dependent on just previous term. Example of this type of recurrence relation can be-
T(n) = T(n-1)2
(4) Higher Order Recurrence Relation- It is the type of recurrence relation where one term
is not only dependent on just one previous term but on multiple previous terms. If it will be
dependent on two previous term then it will be called to be second order. Similarly, for three
previous term its will be called to be of third order and so on. Let us see example of an third
order Recurrence relation
T(n) = 2T(n-1)2 + KT(n-2) + T(n-3)
Till now we have seen different recurrence relations but how to find time taken by any
recursive algorithm. So to calculate time we need to solve the recurrence relation. Now for
solving recurrence we have three famous methods-
 Substitution Method
 Recursive Tree Method
 Master Theorem
Now in this article we are going to focus on Substitution Method.
1.4.1 Substitution Method:
Substitution Method is very famous method for solving any recurrences. There are two types
of substitution methods-
1. Forward Substitution
2. Backward Substitution
1. Forward Substitution:
It is called Forward Substitution because here we substitute recurrence of any term into next
terms. It uses following steps to find Time using recurrences-
 Pick Recurrence Relation and the given initial Condition
 Put the value from previous recurrence into the next recurrence
 Observe and Guess the pattern and the time
 Prove that the guessed result is correct using mathematical Induction.
Now we will use these steps to solve a problem. The problem is-
T(n) = T(n-1) + n, n>1
T(n) = 1, n=1
Now we will go step by step-
1. Pick Recurrence and the given initial Condition:
T(n)=T(n-1)+n, n>1T(n)=1, n=1
2. Put the value from previous recurrence into the next recurrence:
T(1) = 1T(2) = T(1) + 2 = 1 + 2 = 3T(3) = T(2) + 3 = 1 + 2 + 3 = 6T(4)= T(3) + 4 = 1 + 2 +
3 + 4 = 10
3. Observe and Guess the pattern and the time:
So guessed pattern will be-T(n) = 1 + 2 + 3 .... + n = (n * (n+1))/2Time Complexity will be
O(n2)
4. Prove that the guessed result is correct using mathematical Induction:
 Prove T(1) is true:
T(1) = 1 * (1+1)/2 = 2/2 = 1 and from definition of recurrence we know T(1) = 1.
Hence proved T(1) is true
 Assume T(N-1) to be true:
Assume T(N-1) = ((N - 1) * (N-1+1))/2 = (N * (N-1))/2 to be true
 Then prove T(N) will be true:T(N) = T(N-1) + N from recurrence definition
Now, T(N-1) = N * (N-1)/2So, T(N) = T(N-1) + N = (N * (N-1))/2 + N = (N * (N-1)
+ 2N)/2 =N * (N+1)/2And from our guess also T(N)=N(N+1)/2Hence T(N) is
true.Therefore our guess was correct and time will be O(N2)
2. Backward Substitution:
It is called Backward Substitution because here we substitute recurrence of any term into
previous terms. It uses following steps to find Time using recurrences-
 Take the main recurrence and try to write recurrences of previous terms
 Take just previous recurrence and substitute into main recurrence
 Again take one more previous recurrence and substitute into main recurrence
 Do this process until you reach to the initial condition
 After this substitute the the value from initial condition and get the solution
Now we will use these steps to solve a problem. The problem is-
T(n) = T(n-1) + n, n>1T(n) = 1, n = 1
Now we will go step by step-
1. Take the main recurrence and try to write recurrences of previous terms:
T(n) = T(n-1) + nT(n-1) = T(n-2) + n - 1T(n-2) = T(n-3) + n - 2
2. Take just previous recurrence and substitute into main recurrence
put T(n-1) into T(n)So, T(n)=T(n-2)+ n-1 + n
3. Again take one more previous recurrence and substitute into main recurrence
put T(n-2) into T(n)So, T(n)=T(n-3)+ n-2 + n-1 + n
4. Do this process until you reach to the initial condition
So similarly we can find T(n-3), T(n-4)......and so on and can insert into T(n). Eventually we
will get following: T(n)=T(1) + 2 + 3 + 4 +.........+ n-1 + n
5. After this substitute the the value from initial condition and get the solution
Put T(1)=1, T(n) = 1 +2 +3 + 4 +..............+ n-1 + n = n(n+1)/2. So Time will be O(N2)
Limitations of Substitution method:
The Substitution method is a useful technique to solve recurrence relations, but it also has
some limitations. Some of the limitations are:
 It is not guaranteed that we will find the solution as substitution method is based on
guesses.
 It doesn't provide guidance on how to make an accurate guess, often relying on
intuition or trial and error.
 It may only yield a specific or approximate solution rather than the most general or
precise one.
 The substitution method isn't universally applicable to all recurrence relations,
especially those with complex or variable forms that do not get simplified using
substitution
1.5 Lower and Upper Bound

The Lower and Upper Bound Theory provides a way to find the lowest complexity algorithm to solve a
problem. Before understanding the theory, first, let’s have a brief look at what Lower and Upper bounds
are.
 Lower Bound –
Let L(n) be the running time of an algorithm A(say), then g(n) is the Lower Bound of A if there
exist two constants C and N such that L(n) >= C*g(n) for n > N. Lower bound of an algorithm is
shown by the asymptotic notation called Big Omega (or just Omega).

 Upper Bound –
Let U(n) be the running time of an algorithm A(say), then g(n) is the Upper Bound of A if there
exist two constants C and N such that U(n) <= C*g(n) for n > N. Upper bound of an algorithm is
shown by the asymptotic notation called Big Oh(O) (or just Oh).
1. Lower Bound Theory:
According to the lower bound theory, for a lower bound L(n) of an algorithm, it is not possible to have
any other algorithm (for a common problem) whose time complexity is less than L(n) for random input.
Also, every algorithm must take at least L(n) time in the worst case. Note that L(n) here is the minimum
of all the possible algorithms, of maximum complexity.
The Lower Bound is very important for any algorithm. Once we calculated it, then we can compare it
with the actual complexity of the algorithm and if their order is the same then we can declare our
algorithm as optimal. So in this section, we will be discussing techniques for finding the lower bound of
an algorithm.
Note that our main motive is to get an optimal algorithm, which is the one having its Upper Bound the
Same as its Lower Bound (U(n)=L(n)). Merge Sort is a common example of an optimal algorithm.
Trivial Lower Bound –
It is the easiest method to find the lower bound. The Lower bounds which can be easily observed based
on the number of input taken and the number of output produced are called Trivial Lower Bound

1.6 HASH FUNCTION

A hash function is a mathematical function that takes an input (or "message") and produces a fixed-size
string of characters, often in the form of a hash value or hash code. This output is typically a sequence of
numbers and letters, and the same input always produces the same output. Hash functions are widely used
in computer science, cryptography, and data processing.

Characteristics of a Good Hash Function


1. Deterministic: The same input always yields the same output.
2. Fast Computation: Hashing the input should be computationally efficient.
3. Uniform Distribution: Outputs (hash values) should be evenly distributed to minimize collisions.
4. Pre-image Resistance: Given a hash output, it should be computationally infeasible to determine the
original input.
5. Collision Resistance: It should be hard to find two different inputs that produce the same hash value.
6. Avalanche Effect: A small change in input should drastically change the hash value.

Applications of Hash Functions

1. Data Integrity: Ensures data has not been altered during transmission or storage by comparing hash values.
2. Cryptography: Used in digital signatures, password hashing, and secure communication protocols.
3. Hash Tables: Enables fast data retrieval by mapping keys to unique indices.
4. Blockchain: Verifies transactions and maintains chain integrity by linking blocks with hash values.
5. Checksums: Verifies file integrity during downloads or transfers.

1.7 Searching
Searching is the process of fetching a specific element in a collection of elements. The
collection can be an array or a linked list. If you find the element in the list, the process is
considered successful, and it returns the location of that element.
Two prominent search strategies are extensively used to find a specific item on a list.
However, the algorithm chosen is determined by the list's organization.

1. Linear Search
2. Binary Search
3. Interpolation search

1.7.1 Linear Search


Linear search, often known as sequential search, is the most basic search technique. In this
type of search, we go through the entire list and try to fetch a match for a single
element. If we find a match, then the address of the matching target element is returned.
On the other hand, if the element is not found, then it returns a NULL
value. Following is a step-by-step approach employed to perform Linear
Search Algorithm.

The procedures for implementing linear search are as follows:


Step 1: First, read the search element (Target element) in the array.
Step 2: In the second step compare the search element with the first element in the array.
Step 3: If both are matched, display "Target element is found" and terminate the
Linear Search function.
Step 4: If both are not matched, compare the search element with the next element in the
array. Step 5: In this step, repeat steps 3 and 4 until the search (Target) element is
compared with the last element of the array.
Step 6 - If the last element in the list does not match, the Linear Search Function will be
terminated, and the message "Element is not found" will be displayed.

Algorithm and Pseudocode of Linear Search Algorithm


Algorithm of the Linear Search Algorithm

Linear Search ( Array Arr, Value a ) // Arr is the name of the array, and a is the searched
element. Step 1: Set i to 0 // i is the index of an array which starts from 0
Step 2: ifi > n then go to step 7 // n is the number of elements in
array Step 3: if Arr[i] = a then go to step 6
Step 4: Set i to i + 1
Step 5: Goto step 2
Step 6: Print element a found at index i and go to
step 8 Step 7: Print element not found
Step 8: Exit

Pseudocode of Linear Search Algorithm

Start
linear_search ( Array , value)

For each element in the array


If (searched element == value)
Return's the searched element location
end if
end
for
end
Example of Linear Search Algorithm
Consider an array of size 7 with elements 13, 9, 21, 15, 39, 19, and 27 that starts with 0
and ends with size minus one, 6.
Search element = 39
Step 1: The searched element 39 is compared to the first element of an array, which is 13.

The match is not found, you now move on to the next element and try to implement a
comparison. Step 2: Now, search element 39 is compared to the second element of an
array, 9.

Step 3: Now, search element 39 is compared with the third element, which is 21.

Again, both the elements are not matching, you move onto the next following
element. Step 4; Next, search element 39 is compared with the fourth element,
which is 15.

Step 5: Next, search element 39 is compared with the fifth element 39.

A perfect match is found, display the element found at location 4.

The Complexity of Linear Search Algorithm


Three different complexities faced while performing Linear Search Algorithm, they are
mentioned as follows.
1. Best Case
2. Worst Case
3. Average Case
Best Case Complexity
 The element being searched could be found in the first position.
 In this case, the search ends with a single successful comparison.
 Thus, in the best-case scenario, the linear search algorithm performs O(1) operations.
Worst Case Complexity
 The element being searched may be at the last position in the array or not at all.
 In the first case, the search succeeds in ‘n’ comparisons.
 In the next case, the search fails after ‘n’ comparisons.
 Thus, in the worst-case scenario, the linear search algorithm performs O(n) operations.
Average Case Complexity
When the element to be searched is in the middle of the array, the average case of the
Linear Search Algorithm is O(n).
Space Complexity of Linear Search Algorithm
The linear search algorithm takes up no extra space; its space complexity is O(n) for an
array of n elements.
Application of Linear Search Algorithm
The linear search algorithm has the following applications:
 Linear search can be applied to both single-dimensional and multi-dimensional arrays.
 Linear search is easy to implement and effective when the array contains only a few
elements.
 Linear Search is also efficient when the search is performed to fetch a single search in
an unordered-List.

#include<stdio.h.>
#include<stdlib.h>
#include<conio.h>
int main()
{
int array[50],i,target,num;

Code Implementation of Linear Search Algorithm

printf("How many elements do you want in the array");


scanf("%d",&num);
printf("Enter array elements:");
for(i=0;i<num;++i)
scanf("%d",&array[i]);
printf("Enter element to
search:"); scanf("%d",&target);
for(i=0;i<num;++i)
if(array[i]==target)
break;
if(i<num)
printf("Target element found at location
%d",i); else
printf("Target element not found in an array");
return 0;
}
1.7.2 Binary Search

Binary search is the search technique that works efficiently on sorted lists. Hence, to search
an element into some list using the binary search technique, we must ensure that the list is
sorted.
Binary search follows the divide and conquer approach in which the list is divided into two
halves, and the item is compared with the middle element of the list. If the match is found
then, the location of the middle element is returned. Otherwise, we search into either of
the halves depending upon the result produced through the match
NOTE: Binary search can be implemented on sorted array elements. If the list elements are
not arranged in a sorted manner, we have first to sort them.

Algorithm
1. Binary_Search(a, lower_bound, upper_bound, val) // 'a' is the given array, 'lower_bound' is
t he index of the first array element, 'upper_bound' is the index of the last array element,
'val' is the value to search
2. Step 1: set beg = lower_bound, end = upper_bound, pos = - 1
3. Step 2: repeat steps 3 and 4 while beg <=end
4. Step 3: set mid = (beg + end)/2
5. Step 4: if a[mid] = val
6. set pos = mid
7. print pos
8. go to step 6
9. else if a[mid] > val
10. set end = mid - 1
11. else
12. set beg = mid + 1
13. [end of if]
14. [end of loop]
15. Step 5: if pos = -1
16. print "value is not present in the array"
17. [end of if]
18. Step 6: exit
Procedure binary_search
A ← sorted
array n ←
size of array
x ← value to be
searched Set
lowerBound = 1
Set upperBound
= n while x not
found
if upperBound <
lowerBound EXIT: x does
not exists.
set midPoint = lowerBound + ( upperBound -
lowerBound ) / 2 if A[midPoint] < x
set lowerBound =
midPoint + 1 if
A[midPoint] > x
set upperBound =
midPoint - 1 if A[midPoint]
=x
EXIT: x found at location
midPoint end while
end procedure

Working of Binary search


To understand the working of the Binary search algorithm, let's take a sorted array. It will
be easy to understand the working of Binary search with an example.
There are two methods to implement the binary search algorithm -
o Iterative method
o Recursive method
The recursive method of binary search follows the divide and conquer
approach. Let the elements of array are -

Let the element to search is, K = 56


We have to use the below formula to calculate the mid of the array -
1. mid = (beg +
end)/2 So, in the given
array -

beg = 0
end = 8
mid = (0 + 8)/2 = 4. So, 4 is the mid of the array.

Now, the element to search is found. So algorithm will return the index of the element
matched. Binary Search complexity
Now, let's see the time complexity of Binary search in the best case, average case, and
worst case. We will also see the space complexity of Binary search.
1. Time Complexity

Case Time Complexity

Best Case O(1)


Average Case O(logn)

Worst Case O(logn)

o Best Case Complexity - In Binary search, best case occurs when the element to search is
found in first comparison, i.e., when the first middle element itself is the element to be
searched. The best-case time complexity of Binary search is O(1).
o Average Case Complexity - The average case time complexity of Binary search is O(logn).
o Worst Case Complexity - In Binary search, the worst case occurs, when we have to keep
reducing the search space till it has only one element. The worst-case time complexity of
Binary search is O(logn).
2. Space Complexity
Space Complexity O(1)

o The space complexity of binary search is O(1).

Implementation of Binary Search


Program: Write a program to implement Binary search in C language.
1. #include <stdio.h>
2. int binarySearch(int a[], int beg, int end, int val)
3. {
4. int mid;
5. if(end >= beg)
6. { mid = (beg + end)/2;
7. /* if the item to be searched is present at middle */
8. if(a[mid] == val)
9. {
10. return mid+1;
11. }
12. /* if the item to be searched is smaller than middle, then it can only be in left
subarra y */
13. else if(a[mid] < val)
14. {
15. return binarySearch(a, mid+1, end, val);
16. }
17. /* if the item to be searched is greater than middle, then it can only be in right subarr
ay */
18. else
19. {
20. return binarySearch(a, beg, mid-1, val);
21. }
22. }
23. return -1;
24. }
25. int main() {
26. int a[] = {11, 14, 25, 30, 40, 41, 52, 57, 70}; // given array
27. int val = 40; // value to be searched
28. int n = sizeof(a) / sizeof(a[0]); // size of array
29. int res = binarySearch(a, 0, n-1, val); // Store result
30. printf("The elements of the array are - ");
31. for (int i = 0; i < n; i++)
32. printf("%d ", a[i]);
33. printf("\nElement to be searched is - %d", val);
34. if (res == -1)
35. printf("\nElement is not present in the array");
36. else
37. printf("\nElement is present at %d position of array", res);
38. return 0;
39. }
Output

1.7.3 Interpolation Search


Interpolation search is an improved variant of binary search. This search algorithm works
on the probing position of the required value. For this algorithm to work properly, the data
collection should be in a sorted form and equally distributed.
Binary search has a huge advantage of time complexity over linear search. Linear search
has worst- case complexity of Ο(n) whereas binary search has Ο(log n).
There are cases where the location of target data may be known in advance. For example,
in case of a telephone directory, if we want to search the telephone number of Morphius.
Here, linear search and even binary search will seem slow as we can directly jump to
memory space where the names start from 'M' are stored.
Position Probing in Interpolation Search
Interpolation search finds a particular item by computing the probe position. Initially, the
probe position is the position of the middle most item of the collection.

If a match occurs, then the index of the item is returned. To split the list into two parts, we
use the following method −
mid = Lo + ((Hi - Lo) / (A[Hi] - A[Lo])) * (X - A[Lo])

where
− A
= list
Lo = Lowest index of
the list Hi = Highest
index of the list
A[n] = Value stored at index n in the list

If the middle item is greater than the item, then the probe position is again calculated in the
sub- array to the right of the middle item. Otherwise, the item is searched in the subarray to
the left of the middle item. This process continues on the sub-array as well until the size of
subarray reduces to zero.
Runtime complexity of interpolation search algorithm is Ο(log (log n)) as compared to
Ο(log n) of BST in favorable situations.
Algorithm
As it is an improvisation of the existing BST algorithm, we are mentioning the steps to
search the 'target' data value index, using position probing −
Step 1 − Start searching data from middle of the list.
Step 2 − If it is a match, return the index of the item,
and exit. Step 3 − If it is not a match, probe position.
Step 4 − Divide the list using probing formula and find the new
midle. Step 5 − If data is greater than middle, search in higher
sub-list.
Step 6 − If data is smaller than middle, search in lower
sub-list. Step 7 − Repeat until match.

Pseudoco
de A →
Array list
N → Size
of A
X → Target Value

Procedure

Interpolation_Search() Set

Lo → 0

Set Mid
→ -1 Set
Hi → N-1

While X does not match

if Lo equals to Hi OR A[Lo] equals to A[Hi]


EXIT: Failure, Target not
found end if

Set Mid = Lo + ((Hi - Lo) / (A[Hi] - A[Lo])) * (X - A[Lo])

if A[Mid] = X
EXIT: Success, Target found at
Mid else
if A[Mid] < X
Set Lo to
Mid+1 else if
A[Mid] > X
Set Hi to
Mid-1
end
if end
if
End

While End

Procedure

Implementation of interpolation in C
#include<stdio.h>
#define MAX 10
// array of items on which linear search will be
conducted. int list[MAX] = { 10, 14, 19, 26, 27, 31,
33, 35, 42, 44 };
int find(int
data) { int lo
= 0;
int hi = MAX
- 1; int mid =
-1;
int comparisons
= 1; int index = -
1; while(lo <=
hi) {
printf("\nComparison %d \n" , comparisons ) ;
printf("lo : %d, list[%d] = %d\n", lo, lo, list[lo]);
printf("hi : %d, list[%d] = %d\n", hi, hi, list[hi]);

comparisons++;
// probe the mid point
mid = lo + (((double)(hi - lo) / (list[hi] - list[lo])) * (data
- list[lo])); printf("mid = %d\n",mid);
// data found
if(list[mid] ==
data) {
index = mid;
break;
} else {
if(list[mid] < data) {
// if data is larger, data is in upper
half lo = mid + 1;
} else {
// if data is smaller, data is in lower
half hi = mid - 1;
}
}
}

printf("\nTotal comparisons made: %d", --


comparisons); return index;
}
int main() {
//find location of
33 int location =
find(33);

// if element was found


if(location != -1)
printf("\nElement found at location: %d" ,
(location+1)); else
printf("Element not
found."); return 0;
}
If we compile and run the above program, it will produce the following
result − Output
Comparison 1
lo : 0, list[0] = 10
hi : 9, list[9] = 44
mid = 6

Total comparisons
made: 1 Element found
at location: 7
Time Complexity
 Bestcase-O(1)
The best-case occurs when the target is found exactly as the first expected position
computed using the formula. As we only perform one comparison, the time
complexity is O(1).

 Worst-case-O(n)
The worst case occurs when the given data set is exponentially distributed.

 Averagecase-O(log(log(n)))
If the data set is sorted and uniformly distributed, then it takes O(log(log(n))) time
as on an average (log(log(n))) comparisons are made.

Space Complexity
O(1) as no extra space is required.

1.8 Pattern Search


Pattern Searching algorithms are used to find a pattern or substring from another bigger
string. There are different algorithms. The main goal to design these type of algorithms to
reduce the time complexity. The traditional approach may take lots of time to complete the
pattern searching task for a longer text.
Here we will see different algorithms to get a better performance of pattern
matching. In this Section We are going to cover.
 Aho-Corasick Algorithm
 Anagram Pattern Search
 Bad Character Heuristic
 Boyer Moore Algorithm
 Efficient Construction of Finite Automata
 kasai’s Algorithm
 Knuth-Morris-Pratt Algorithm
 Manacher’s Algorithm
 Naive Pattern Searching
 Rabin-Karp Algorithm
 Suffix Array
 Trie of all Suffixes
 Z Algorithm
1.8.1 Naïve pattern searching

Naïve pattern searching is the simplest method among other pattern searching algorithms.
It checks for all character of the main string to the pattern. This algorithm is helpful for
smaller texts. It does not need any pre-processing phases. We can find substring by
checking once for the string. It also does not occupy extra space to perform the operation.
The time complexity of Naïve Pattern Search method is O(m*n). The m is the size of
pattern and n is the size of the main string.

Input and Output


Input:
Main String: “ABAAABCDBBABCDDEBCABC”,
pattern: “ABC” Output:
Pattern found at
position: 4 Pattern found
at position: 10 Pattern
found at position: 18

Algorithm
naive_algorithm(pattern, text)
Input − The text and the pattern
Output − locations, where the pattern is present in the
St a rt
text p a _len := pattern Size
str_len := string size
for i := 0 to (str_len -
pat_len), do for j := 0 to
pat_len, do
if text[i+j] ≠ pattern[j],
then break
if j == patLen, then
display the position i, as there pattern
found End

Implementation in C
#include
<stdio.h>
#include
<string.h> int
main (){
char txt[] =
"tutorialsPointisthebestplatformforprogrammers"; char
pat[] = "a";
int M = strlen
(pat); int N =
strlen (txt);
for (int i = 0; i <= N - M;
i++){ int j;
for (j = 0; j < M;
j++) if (txt[i +
j] != pat[j])
break;
if (j == M)
printf("Patternmatchesatindex%d", i);
}
return 0;
}
Output
Pattern matches at
6 Pattern matches
at 25 Pattern
matches at 39

1.8.2 Rabin-Karp matching pattern


Rabin-Karp is another pattern searching algorithm. It is the string matching algorithm that
was proposed by Rabin and Karp to find the pattern in a more efficient way. Like the Naive
Algorithm, it also checks the pattern by moving the window one by one, but without
checking all characters for all cases, it finds the hash value. When the hash value is
matched, then only it proceeds to check each character. In this way, there is only one
comparison per text subsequence making it a more efficient algorithm for pattern
searching.
Preprocessing time- O(m)
The time complexity of the Rabin-Karp Algorithm is O(m+n), but for the worst case, it is O(mn).
Algorithm
rabinkarp_algo(text, pattern, prime)
Input − The main text and the pattern. Another prime number of find hash location

Output − locations, where the pattern is


found Start
pat_len := pattern
Length str_len :=
string Length
patHash := 0 and strHash := 0, h := 1
maxChar := total number of characters in
character set for index i of all character in the
pattern, do
h := (h*maxChar) mod prime
for all character index i of pattern, do
patHash := (maxChar*patHash + pattern[i]) mod
prime strHash := (maxChar*strHash + text[i]) mod
prime
for i := 0 to (str_len -
pat_len), do if patHash =
strHash, then
for charIndex := 0 to pat_len -1, do
if text[i+charIndex] ≠
pattern[charIndex], then break
if charIndex = pat_len, then
print the location i as pattern found at i
position. if i < (str_len - pat_len), then
strHash := (maxChar*(strHash – text[i]*h)+text[i+patLen]) mod
prime, then if strHash < 0, then
strHash := strHash +
prime End
Implementation In C

#include<stdio.h
>
#include<string.h
> int main (){
char txt[80],
pat[80]; int q;
printf("Enterthecontainerstri
ng "); scanf ("%s", &txt);
printf("Enterthepatterntobesearche
d "); scanf ("%s", &pat);
int d = 256;
printf("Enteraprimenumber ");
scanf ("%d", &q);
int M = strlen
(pat); int N =
strlen (txt); int
i, j;
int p
= 0;
int t
= 0;
int h
= 1;
for (i = 0; i < M - 1; i++)
h = (h * d) % q;
for (i = 0; i < M; i++){
p = (d * p + pat[i]) % q;
t = (d * t + txt[i]) % q;
}
for (i = 0; i <= N -
M; i++){ if (p == t)
{
for (j = 0; j < M;
j++){ if (txt[i +
j] != pat[j])
break;
}
if (j == M)
printf("Patternfoundatindex%d
", i);
}
if (i < N - M){
t = (d * (t - txt[i] * h) + txt[i +
M]) % q; if (t < 0)
t = (t + q);
}
}
return 0;
}
Output
Enter the container string tutorials point
is the best programming website Enter
the pattern to be searched
p
Enter a prime
number 3
Pattern found at index 8
Pattern found at index
21

n this problem, we are given two strings a text and a pattern. Our task is to create a
program for KMP algorithm for pattern search, it will find all the occurrences of pattern in
text string.
Here, we have to find all the occurrences of patterns in the text.
Let’s take an example to understand the problem,
Input
text = “xyztrwqxyzfg” pattern =
“xyz” Output
Found at index
0 Found at
index 7
Here, we will discuss the solution to the problem using KMP (Knuth Morris Pratt) pattern
searching algorithm, it will use a preprocessing string of the pattern which will be used for
matching in the text. And help’s in processing or finding pattern matches in the case where
matching characters are followed by the character of the string that does not match the
pattern.
We will preprocess the pattern wand to create an array that contains the proper prefix and
suffix from the pattern that will help in finding the mismatch patterns.
Program for KMP Algorithm for Pattern Searching
// C Program for KMP Algorithm for Pattern
Searching Example
#include<iostream>
#include<string.h
> using
namespace std;
void prefixSuffixArray(char* pat, int M, int*
pps) { int length = 0;
pps[0] =
0; int i =
1; while
(i < M) {
if (pat[i] ==
pat[length]) {
length++;
pps[i] =
length; i++;
} else {
if (length != 0)
length = pps[length
- 1]; else {
pps[i] =
0; i++;
}
}
}
}
void KMPAlgorithm(char* text, char*
pattern) { int M = strlen(pattern);
int N =
strlen(text); int
pps[M];
prefixSuffixArray(pattern, M,
pps); int i = 0;
int j = 0;
while (i <
N) {
if (pattern[j] ==
text[i]) { j++;
i++;
}
if (j == M)
{
printf("Foundpatternatindex%d",
i - j); j = pps[j - 1];
}
else if (i < N && pattern[j] !=
text[i]) { if (j != 0)
j = pps[j -
1]; else
i = i + 1;
}
}
}
int main() {
char text[] = "xyztrwqxyzfg";

char pattern[] = "xyz"; printf("The pattern is found in


the text at the following index : ");
KMPAlgorithm(text, pattern);
return 0;
}
Output
The pattern is found in the text at the following
index − Found pattern at index 0
Found pattern at index 7

1.9 Sorting :
Sorting is a fundamental operation in data structures and algorithms. It involves arranging elements of a
list or array in a specific order, usually ascending or descending. Sorting is essential for optimizing search
operations, organizing data, and improving the efficiency of algorithms.

1.9.1 Insertion sort


Insertion sort works similar to the sorting of playing cards in hands. It is assumed that the
first card is already sorted in the card game, and then we select an unsorted card. If the
selected unsorted card is greater than the first card, it will be placed at the right side;
otherwise, it will be placed at the left side. Similarly, all unsorted cards are taken and put in
their exact place.

The same approach is applied in insertion sort. The idea behind the insertion sort is that
first take one element, iterate it through the sorted array. Although it is simple to use, it is
not appropriate for large data sets as the time complexity of insertion sort in the average
case and worst case is O(n2), where n is the number of items. Insertion sort is less efficient
than the other sorting algorithms like heap sort, quick sort, merge sort, etc.

Algorithm
The simple steps of achieving the insertion sort are listed as follows -
Step 1 - If the element is the first element, assume that it is already sorted. Return 1.
Step2 - Pick the next element, and store it separately in a
key. Step3 - Now, compare the key with all elements in the
sorted array.
Step 4 - If the element in the sorted array is smaller than the current element, then move to the
next element. Else, shift greater elements in the array towards the right.
Step 5 - Insert the value.
Step 6 - Repeat until the array is
sorted. Working of Insertion sort
Algorithm
Now, let's see the working of the insertion sort Algorithm.
To understand the working of the insertion sort algorithm, let's take an unsorted array. It
will be easier to understand the insertion sort via an example.
Let the elements of array are -

Initially, the first two elements are compared in insertion sort.

Here, 31 is greater than 12. That means both elements are already in ascending order. So,
for now, 12 is stored in a sorted sub-array.

Now, move to the next two elements and compare them.

Here, 25 is smaller than 31. So, 31 is not at correct position. Now, swap 31 with 25.
Along with swapping, insertion sort will also check it with all elements in the sorted array.
For now, the sorted array has only one element, i.e. 12. So, 25 is greater than 12. Hence,
the sorted array remains sorted after swapping.

Now, two elements in the sorted array are 12 and 25. Move forward to the next elements
that are 31 and 8.
Both 31 and 8 are not sorted. So, swap them.

After swapping, elements 25 and 8 are unsorted.

So, swap them.

Now, elements 12 and 8 are unsorted.

So, swap them too.

Now, the sorted array has three items that are 8, 12 and 25. Move to the next items that are
31 and 32.

Hence, they are already sorted. Now, the sorted array includes 8, 12, 25 and 31.

Move to the next elements that are 32 and 17.

17 is smaller than 32. So, swap them.

Swapping makes 31 and 17 unsorted. So, swap them too.

Now, swapping makes 25 and 17 unsorted. So, perform swapping again.

Now, the array is completely sorted.

Insertion sort complexity


1. Time Complexity
Case Time Complexity
Best Case O(n)
Average Case O(n2)
Worst Case O(n2)
o Best Case Complexity - It occurs when there is no sorting required, i.e. the array is already
sorted. The best-case time complexity of insertion sort is O(n).
o Average Case Complexity - It occurs when the array elements are in jumbled order that is
not properly ascending and not properly descending. The average case time complexity of
insertion sort is O(n2).
o Worst Case Complexity - It occurs when the array elements are required to be sorted in
reverse order. That means suppose you have to sort the array elements in ascending order,
but its elements are in descending order. The worst-case time complexity of insertion
sort is O(n2).
2. Space Complexity
Space Complexity O(1)
Stable YES
o The space complexity of insertion sort is O(1). It is because, in insertion sort, an
extra variable is required for swapping.
Implementation of insertion sort

Program: Write a program to implement insertion sort in C language.


1. #include <stdio.h>
2.
3. void insert(int a[], int n) /* function to sort an aay with insertion sort */
4. {
5. int i, j, temp;
6. for (i = 1; i < n; i++) {
7. temp = a[i];
8. j = i - 1;
9.
10. while(j>=0 && temp <= a[j]) /* Move the elements greater than temp to one position
a head from their current position*/
11. {
12. a[j+1] =
a[j];
13. j = j-1;
14. }
15. a[j+1] = temp;
16. }
17. }
18.
19. void printArr(int a[], int n) /* function to print the array */
20. {
21. int i;
22. for (i = 0; i < n; i++)
23. printf("%d ", a[i]);
24. }
25.
26. int main()
27. {
28. int a[] = { 12, 31, 25, 8, 32, 17 };
29. int n = sizeof(a) / sizeof(a[0]);
30. printf("Before sorting array elements are - \n");
31. printArr(a, n);
32. insert(a, n);
33. printf("\nAfter sorting array elements are - \n");
34. printArr(a, n);
35.
36. return 0;
37. }
Output:

1.9.2 Heap Sort

Heap Sort Algorithm

Heap sort processes the elements by creating the min-heap or max-heap using the elements
of the given array. Min-heap or max-heap represents the ordering of array in which the root
element represents the minimum or maximum element of the array.

Heap sort basically recursively performs two main operations -


o Build a heap H, using the elements of array.

o Repeatedly delete the root element of the heap formed in 1st phase.

A heap is a complete binary tree, and the binary tree is a tree in which the node can have
the utmost two children. A complete binary tree is a binary tree in which all the levels
except the last level, i.e., leaf node, should be completely filled, and all the nodes should be
left-justified.
Heapsort is a popular and efficient sorting algorithm. The concept of heap sort is to
eliminate the elements one by one from the heap part of the list, and then insert them into
the sorted part of the list.
Algorithm
1. HeapSort(arr)

2. BuildMaxHeap(arr)

3. for i = length(arr) to 2

4. swap arr[1] with arr[i]

5. heap_size[arr] = heap_size[arr] ? 1

6. MaxHeapify(arr,1)
7. End

BuildMaxHeap(arr)

1. BuildMaxHeap(arr)

2. heap_size(arr) = length(arr)

3. for i = length(arr)/2 to 1

4. MaxHeapify(arr,i)

5. End

MaxHeapify(arr,i)

1. MaxHeapify(arr,i)

2. L = left(i)

3. R = right(i)

4. if L ? heap_size[arr] and arr[L] > arr[i]

5. largest = L

6. else

7. largest = i

8. if R ? heap_size[arr] and arr[R] > arr[largest]

9. largest = R

10. if largest != i

11. swap arr[i] with arr[largest]

12. MaxHeapify(arr,largest)

13. End

Working of Heap sort Algorithm

In heap sort, basically, there are two phases involved in the sorting of elements. By using
the heap sort algorithm, they are as follows -
o The first step includes the creation of a heap by adjusting the elements of the array.

o After the creation of heap, now remove the root element of the heap repeatedly by shifting
it to the end of the array, and then store the heap structure with the remaining elements.

First, we have to construct a heap from the given array and convert it into max heap.

After converting the given heap into max heap, the array elements are -
Next, we have to delete the root element (89) from the max heap. To delete this node, we
have to swap it with the last node, i.e. (11). After deleting the root element, we again have
to heapify it to convert it into max heap.

After swapping the array element 89 with 11, and converting the heap into max-heap, the
elements of array are -
In the next step, again, we have to delete the root element (81) from the max heap. To delete
this node, we have to swap it with the last node, i.e. (54). After deleting the root element, we
again have to heapify it to convert it into max heap.

After swapping the array element 81 with 54 and converting the heap into max-heap, the
elements of array are -

In the next step, we have to delete the root element (76) from the max heap again. To delete
this node, we have to swap it with the last node, i.e. (9). After deleting the root element, we
again have to heapify it to convert it into max heap.

After swapping the array element 76 with 9 and converting the heap into max-heap, the
elements of array are -

In the next step, again we have to delete the root element (54) from the max heap. To delete
this node, we have to swap it with the last node, i.e. (14). After deleting the root element, we
again have to heapify it to convert it into max heap.

After swapping the array element 54 with 14 and converting the heap into max-heap, the
elements of array are -

In the next step, again we have to delete the root element (22) from the max heap. To delete
this node, we have to swap it with the last node, i.e. (11). After deleting the root element, we
again have to heapify it to convert it into max heap.
After swapping the array element 22 with 11 and converting the heap into max-heap, the
elements of array are -

In the next step, again we have to delete the root element (14) from the max heap. To delete
this node, we have to swap it with the last node, i.e. (9). After deleting the root element, we
again have to heapify it to convert it into max heap.

After swapping the array element 14 with 9 and converting the heap into max-heap, the
elements of array are -

In the next step, again we have to delete the root element (11) from the max heap. To delete
this node, we have to swap it with the last node, i.e. (9). After deleting the root element, we
again have to heapify it to convert it into max heap.

After swapping the array element 11 with 9, the elements of array are -

Now, heap has only one element left. After deleting it, heap will be empty.

After completion of sorting, the array elements are -


Time complexity of Heap sort in the best case, average case, and worst case
1. Time Complexity

Case Time Complexity

Best Case O(n logn)

Average Case O(n log n)

Worst Case O(n log n)

o Best Case Complexity - It occurs when there is no sorting required, i.e. the array is already
sorted. The best-case time complexity of heap sort is O(n logn).
o Average Case Complexity - It occurs when the array elements are in jumbled order that is
not properly ascending and not properly descending. The average case time complexity of
heap sort is O(n log n).
o Worst Case Complexity - It occurs when the array elements are required to be sorted in
reverse order. That means suppose you have to sort the array elements in ascending order, but
its elements are in descending order. The worst-case time complexity of heap sort is O(n log
n).

The time complexity of heap sort is O(n logn) in all three cases (best case, average case, and
worst case). The height of a complete binary tree having n elements is logn.
2. Space Complexity

Space Complexity O(1)

Stable N0

o The space complexity of Heap sort is


O(1).
Implementation of Heapsort

Program: Write a program to implement heap sort in C language.


1. #include <stdio.h>

2. /* function to heapify a subtree. Here 'i' is the

3. index of root node in array a[], and 'n' is the size of heap. */

4. void heapify(int a[], int n, int i)

5. {

6. int largest = i; // Initialize largest as root

7. int left = 2 * i + 1; // left child


8. int right = 2 * i + 2; // right child

9. // If left child is larger than root

10. if (left < n && a[left] > a[largest])

11. largest = left;

12. // If right child is larger than root

13. if (right < n && a[right] > a[largest])

14. largest = right;

15. // If root is not largest

16. if (largest != i) {

17. // swap a[i] with a[largest]

18. int temp = a[i];

19. a[i] = a[largest];

20. a[largest] = temp;

21. heapify(a, n, largest);

22. }
23. }
24. /*Function to implement the heap sort*/

25. void heapSort(int a[], int n)

26. {

27. for (int i = n / 2 - 1; i >= 0; i--)

28. heapify(a, n, i);

29. // One by one extract an element from heap

30. for (int i = n - 1; i >= 0; i--) {

31. /* Move current root element to end*/

32. // swap a[0] with a[i]

33. int temp = a[0];

34. a[0] = a[i];

35. a[i] =
temp; 36.
37. heapify(a, i, 0);
38. }
39. }
40. /* function to print the array elements */

41. void printArr(int arr[], int n)


42. {
43. for (int i = 0; i < n; ++i)

44. {

45. printf("%d", arr[i]);

46. printf(" ");

47. }
48.
49. }
50. int main()

51. {

52. int a[] = {48, 10, 23, 43, 28, 26, 1};

53. int n = sizeof(a) / sizeof(a[0]);

54. printf("Before sorting array elements are - \n");

55. printArr(a, n);

56. heapSort(a, n);

57. printf("\nAfter sorting array elements are - \n");

58. printArr(a, n);

59. return 0;

60. }
Output
UNIT 2
2.GRAPH ALGORITHMS
Graph algorithms are methods for solving problems related to graph data structures, which
consist of nodes (vertices) connected by edges. Graphs are widely used in various fields,
including computer networks, social networks, transportation, and computational biology.
Definition

A graph G(V, E) is a non-linear data structure that consists of


node and edge pairs of objects connected by links.

There are 2 types of graphs:

 Directed
 Undirected

Directed graph

A graph with only directed edges is said to be a directed graph.

Example

The following directed graph has 5 vertices and 8 edges. This


graph G can be defined as G = (V, E), where V = {A,B,C,D,E}
and E = {(A,B), (A,C) (B, E), (B,D), (D, A), (D, E),(C,D),
(D,D)}.

Directed Graph

Undirected graph

A graph with only undirected edges is said to be an undirected

graph. Example

The following is an undirected graph.


Undirected Graph

Representation of Graphs
Graph data structure is represented using the following
representations.

1. Adjacency Matrix
2. Adjacency List

Adjacency Matrix

 In this representation, the graph can be represented


using a matrix of size n x n, where n is the number of
vertices.
 This matrix is filled with either 1’s or 0’s.
 Here, 1 represents that there is an edge from row vertex
to column vertex, and 0 represents that there is no edge
from row vertex to column vertex.

Directed graph representation

Adjacency list

 In this representation, every vertex of the graph contains


a list of its adjacent vertices.
 If the graph is not dense, i.e., the number of edges is less,
then it is efficient to represent the graph through the
adjacency list.

Adjacency List

Graph traversals

 Graph traversal is a technique used to search for a vertex in a


graph. It is also used to decide the order of vertices to be
visited in the search process.
 A graph traversal finds the edges to be used in the search
process without creating loops. This means that, with
graph traversal, we can visit all the vertices of the graph
without getting into a looping path. There are two graph
traversal techniques:

1. DFS (Depth First Search)


2. BFS (Breadth-First Search)

Applications of graphs
1. Social network graphs : To tweet or not to tweet. Graphs that
represent who knows whom, who communicates with whom,
who influences whom, or other relationships in social
structures. An example is the twitter graph of who follows
whom.
2. Graphs in epidemiology: Vertices represent individuals and
directed edges to view the transfer of an infectious disease
from one individual to another. Analyzing such graphs has
become an important component in understanding and
controlling the spread of diseases.
3. Protein-protein interactions graphs: Vertices represent
proteins and edges represent interactions between them that
carry out some biological function in the cell. These graphs
can be used to, for example, study molecular pathway—
chains of molecular interactions in a cellular process.
4. Network packet traffic graphs: Vertices are IP (Internet
protocol) addresses and edges are the packets that flow
between them. Such graphs are used for analyzing network
security, studying the spread of worms, and tracking
criminal or non- criminal activity.
5. Neural networks: Vertices represent neurons and edges are
the synapses between them. Neural networks are used to
understand how our brain works and how connections
change when we learn. The human brain has about 1011
neurons and close to 1015 synapses.
2.1 Graph traversal

Graph traversal is a technique used for searching a vertex in a graph. The graph traversal is also used to
decide the order of vertices is visited in the search process. A graph traversal finds the edges to be used
in the search process without creating loops. That means using graph traversal we visit all the vertices
of the graph without getting into looping path.
There are two graph traversal techniques and they are as follows...

1. DFS (Depth First Search)


2. BFS (Breadth First Search)
2.1.1 BFS (Breadth First Search)

BFS traversal of a graph produces a spanning tree as final result. Spanning Tree is a graph
without loops. We use Queue data structure with maximum size of total number of vertices in the
graph to implement BFS traversal.

BFS Algorithm
For the BFS implementation, we will categorize each vertex of the graph into two categories:

1. Visited
2. Not Visited

The reason for this division is to prevent the traversal of the same node again thus
avoiding cycles in the graph. Breadth First Traversal in Data Structure uses a queue to keep
track of the vertices to visit. By processing vertices in the order they were added to the queue,
BFS ensures that all the vertices at the current level are visited before moving on to the next
level. A boolean visited array is used to mark the visited vertices. This leads to a breadth-first
exploration of the graph or tree, hence the name "breadth-first traversal."
Here's a step-by-step explanation of how the BFS algorithm works:

1. Start by selecting a starting vertex or node.


2. Add the starting vertex to the end of the queue.
3. Mark the starting vertex as visited and add it to the visited array/list.
4. While the queue is not empty, do the following steps:
o Remove the front vertex from the queue.
o Visit the removed vertex and process it.
o Enqueue all the adjacent vertices of the removed vertex that have not been visited
yet.
o Mark each visited adjacent vertex as visited and add it to the visited array/list.
5. Repeat step 4 until the queue is empty.

The graph might have two different disconnected parts so to make sure that we cover every
vertex, we can also run the BFS algorithm on every node.
Example
Step Traversal Description

Initialize the queue.

We start from visiting S (starting


node), and mark it as visited.
3 We then see an unvisited adjacent
node from S. In this example, we have
three nodes but alphabetically we
choose A, mark it as visited and
enqueue it.

4
Next, the unvisited adjacent node
from S is B. We mark it as visited and
enqueue it.

5
Next, the unvisited adjacent node
from S is C. We mark it as visited and
enqueue it.

6
Now, S is left with no unvisited
adjacent nodes. So, we dequeue and
find A.

7
From A we have D as
unvisited adjacent node. We
mark it as visited and enqueue
it.

BFS pseudocode
create a queue Q
mark v as visited and put v
into Q while Q is non-
empty
remove the head u of Q
mark and enqueue all (unvisited) neighbours of u
BFS Algorithm Complexity
The time complexity of the BFS algorithm is represented in the form of O(V + E),
where V is the number of nodes and E is the number of edges.
The space complexity of the algorithm is O(V).

Applications of Breadth First Search Algorithm

1. Minimum spanning tree for unweighted graphs:In Breadth-First Search we can


reach from any given source vertex to another vertex, with the minimum number
of edges, and this principle can be used to find the minimum spanning
tree which is the path covering all vertices in the shortest paths.
2. Peer-to-peer networking: In Peer-to-peer networking, to find the neighboring
peer from any other peer, the Breadth-First Search is used.
3. Crawlers in search engines: Search engines need to crawl the internet. To do so,
they can start from any source page, follow the links contained in that page in the
Breadth-First Search manner, and therefore explore other pages.
4. GPS navigation systems: To find locations within a given radius from any source
person, we can find all neighboring locations using the Breadth-First Search, and
keep on exploring until those are within the K radius.
5. Broadcasting in networks: While broadcasting from any source, we find all its
neighboring peers and continue broadcasting to them, and so on.
6. Path Finding: To find if there is a path between 2 vertices, we can take any vertex
as a source, and keep on traversing until we reach the destination vertex. If we
explore all vertices reachable from the source and cannot find the destination
vertex, then that means there is no path between these 2 vertices.
7. Finding all reachable Nodes from a given Vertex: All vertices that are
reachable from a given vertex can be found using the BFS approach in
any disconnected graph. The vertices that are marked as visited in
the visited array after the BFS is complete contain all those reachable vertices.

2.1.2 DFS(Depth First Traversal )


The depth-first search algorithm works by starting from an arbitrary node of the graph and exploring as
far as possible before backtracking i.e. moving to an adjacent node until there is no unvisited adjacent
node. After backtracking, it repeats the same process for all the remaining vertices which have not been
visited till now. It is a recursive algorithm for searching all the vertices of a graph or tree data structure.

DFS Algorithm

For the DFS implementation, we will categorize each vertex of the graph into two
categories:

1. Visited
2. Not Visited
The reason for this division is to prevent the traversal of the same node again thus, avoiding cycles in
the graph. Depth First Traversal in Data Structure uses a stack to keep track of the vertices to visit. A
boolean visited array is used to mark the visited vertices.
Here's a step-by-step explanation of how the DFS algorithm works::

1. Start by selecting a starting vertex or node.


2. Add the starting vertex on top of a stack.
3. Take the top item of the stack and add it to the visited list/array.
4. Create a list of that vertex's adjacent nodes. Add the ones that aren't in the visited list to the top
of the stack.

Keep repeating steps 3 and 4 until the stack is empty.The graph might have two different disconnected
parts so to make sure that we cover every vertex, we can also run the BFS algorithm on every node.
Depth First Search (DFS) algorithm traverses a graph in a depthward motion and uses a
stack to remember to get the next vertex to start a search, when a dead end occurs in any
iteration.

As in the example given above, DFS algorithm traverses from S to A to D to G to E to


B first, then to F and lastly to C. It employs the following rules.
 Rule 1 − Visit the adjacent unvisited vertex. Mark it as visited. Display it. Push it in a stack.
 Rule 2 − If no adjacent vertex is found, pop up a vertex from the stack. (It will
pop up all the vertices from the stack, which do not have adjacent vertices.)
 Rule 3 − Repeat Rule 1 and Rule 2 until the stack is empty.

Step Traversal Description

Initialize the stack.


2 Mark S as visited and put it onto the
stack. Explore any unvisited adjacent
node from S. We have three nodes and
we can pick any of them. For this
example, we shall take the node in an
alphabetical order.

3 Mark A as visited and put it onto the


stack. Explore any unvisited adjacent
node from A. Both S and D are
adjacent to A but we are concerned
for unvisited nodes only.

4 Visit D and mark it as visited and put


onto the stack. Here, we
have B and C nodes, which are
adjacent to D and both are unvisited.
However, we shall again choose in an
alphabetical order.

5
We choose B, mark it as visited and
put onto the stack. Here B does not
have any unvisited adjacent node. So,
we pop B from the stack.

6
We check the stack top for return to
the previous node and check if it has
any unvisited nodes. Here, we
find D to be on the top of the stack.

7
Only unvisited adjacent node is
from D is C now. So we visit C, mark
it as visited and put it onto the stack.

Pseudocode of DFS
DFS(G, u)

for each v ∈
u.visited = true

G.Adj[u] if
v.visited ==
false
DFS(G,v)

init() {
For each u ∈
G u.visited
= false

∈G
For each u

DFS(G, u)
}

Applications of Depth First Search Algorithm


1. Finding any path between two vertices u and v in the graph.
2. Finding the number of connected components present in a given undirected graph.
3. For doing topological sorting of a given directed graph.
4. Finding strongly connected components in the directed graph.
5. Finding cycles in the given graph.
6. To check if the given graph is bipartite.

Complexity Analysis of BFS and DFS Algorithms

Time Complexity Space Complexity

BFS O(V + E) O(V)

DFS O(V + E) O(V)


Here, V is the number of nodes and E is the number of edges.
Difference between BFS and DFS

Parameters BFS DFS

Full form BFS is Breadth First Search. DFS is Depth First Search.

It explores the graph from the


It first explores the graph through root node and proceeds through
Traversal
all nodes on the same level before the nodes as far as possible until
Order
moving on to the next level. we reach the node with no
unvisited nearby nodes.

BFS uses a Queue to find the DFS uses a Stack to find the
Data Structure
shortest path. shortest path.
It works on the concept It works on the concept
Principle
of FIFO (First In First Out). of LIFO (Last In First Out).

It is O(V + E) when Adjacency It is also O(V + E) when


List is used and O(V^2) Adjacency List is used and
Time
when Adjacency Matrix is used, O(V^2) when Adjacency Matrix
Complexity
where V stands for vertices and E is used, where V stands for
stands for edges. vertices and E stands for edges

DFS is more suitable when there


BFS is more suitable for searching
Suitable for are solutions away from the
vertices closer to the given source.
source.

Speed BFS is slower than DFS. DFS is faster than BFS.

DFS requires less memory


Memory BFS requires more memory space.
space.

DFS is more suitable for


BFS considers all neighbors first
decision trees. As with one
Suitability for and is therefore not suitable for
decision, we need to traverse
decision tree decision-making trees used in
further to augment the decision.
games or puzzles.
If we conclude, we won.

DFS algorithm is a recursive


In BFS there is no concept of
Backtracking algorithm that uses the idea of
backtracking.
backtracking.

Tapping in In BFS, there is no problem of In DFS, we may be trapped in


loops trapping into infinite loops. infinite loops.

Visiting of
Here, siblings are visited before Here, children are visited before
Siblings/
the children. their siblings.
Children

DFS is used in various


BFS is used in various
applications such as acyclic
Applications applications such as bipartite
graphs and topological order
graphs, shortest paths, etc.
etc.
Conceptual DFS builds the tree sub-tree by
BFS builds the tree level by level.
Difference sub-tree.

BFS is optimal for finding the DFS is not optimal for finding
Optimality
shortest path. the shortest path.

2.3 Minimum Spanning Tree


A Spanning Tree is a tree which have V vertices and V-1 edges. All nodes in a spanning
tree are reachable from each other.
A Minimum Spanning Tree(MST) or minimum weight spanning tree for a weighted,
connected, undirected graph is a spanning tree having a weight less than or equal to the
weight of every other possible spanning tree. The weight of a spanning tree is the sum
of weights given to each edge of the spanning tree. In short out of all spanning trees of a
given graph, the spanning tree having minimum weight is MST.

PROPERTIES OF THE MINIMUM SPANNING TREE

 A minimum spanning tree of a graph is unique, if the weight of all the edges are distinct.
Otherwise, there may be multiple minimum spanning trees. (Specific algorithms typically output
one of the possible minimum spanning trees).
 Minimum spanning tree is also the tree with minimum product of weights of edges. (It can be
easily proved by replacing the weights of all edges with their logarithms)
 In a minimum spanning tree of a graph, the maximum weight of an edge is the minimum possible
from all possible spanning trees of that graph. (This follows from the validity of Kruskal's
algorithm).
 The maximum spanning tree (spanning tree with the sum of weights of edges being maximum) of a
graph can be obtained similarly to that of the minimum spanning tree, by changing the signs of the
weights of all the edges to their opposite and then applying any of the minimum spanning tree
algorithm.

Algorithms for finding Minimum Spanning Tree(MST):-


1. Prim’s Algorithm
2. Kruskal’s Algorithm
2.3.1 Prim’s Algorithm
Prim's algorithm is a minimum spanning tree algorithm that takes a graph as input
and finds the subset of the edges of that graph which
 form a tree that includes every vertex
 has the minimum sum of weights among all the trees that can be formed from the graph

How Prim's algorithm works


It falls under a class of algorithms called greedy algorithms that find the local optimum
in the hopes of finding a global optimum.
We start from one vertex and keep adding edges with the lowest weight until we
reach our goal. The steps for implementing Prim's algorithm are as follows:
1. Initialize the minimum spanning tree with a vertex chosen at random.
2. Find all the edges that connect the tree to new vertices, find the minimum and add it to
the tree
3. Keep repeating step 2 until we get a minimum spanning tree

Example of Prim's algorithm

Start with a weighted graph

Choose a vertex

Choose the shortest edge from this vertex and add it

Choose the nearest vertex not yet in the solution


Choose the nearest edge not yet in the solution, if there are multiple choices, choose one at random

Prim's Algorithm pseudocode


The pseudocode for prim's algorithm shows how we create two sets of vertices U and V-
U. U contains the list of vertices that have been visited and V-U the list of vertices that
haven't. One by one, we move vertices from set V-U to set U by connecting the least

T = ∅;
weight edge.

U = { 1 };

let (u, v) be the lowest cost edge such that u ∈ U and v ∈ V - U;


while (U ≠ V)

T = T ∪ {(u, v)}
U = U ∪ {v}

Complexity Analysis of Prim’s Algorithm:


Time Complexity: O(V2), If the input graph is represented using an adjacency list, then the time
complexity of Prim’s algorithm can be reduced to O(E * logV) with the help of a binary heap. In this
implementation, we are always considering the spanning tree to start from the root of the graph
Auxiliary Space: O(V)
Advantages:
1. Prim’s algorithm is guaranteed to find the MST in a connected, weighted graph.
2. It has a time complexity of O(E log V) using a binary heap or Fibonacci heap, where E is the
number of edges and V is the number of vertices.
3. It is a relatively simple algorithm to understand and implement compared to some other MST
algorithms.
Disadvantages:
1. Like Kruskal’s algorithm, Prim’s algorithm can be slow on dense graphs with many edges, as it
requires iterating over all edges at least once.
2. Prim’s algorithm relies on a priority queue, which can take up extra memory and slow down the
algorithm on very large graphs.
3. The choice of starting node can affect the MST output, which may not be desirable in some
applications.
Other Implementations of Prim’s Algorithm:
Given below are some other implementations of Prim’s Algorithm
 Prim’s Algorithm for Adjacency Matrix Representation – In this article we have discussed the
method of implementing Prim’s Algorithm if the graph is represented by an adjacency matrix.
 Prim’s Algorithm for Adjacency List Representation – In this article Prim’s Algorithm
implementation is described for graphs represented by an adjacency list.
 Prim’s Algorithm using Priority Queue: In this article, we have discussed a time-efficient approach
to implement Prim’s algorithm.

2.3.2 Kruskal Algorithm

Kruskal's algorithm is a minimum spanning tree algorithm that takes a graph as input
and finds the subset of the edges of that graph which
 form a tree that includes every vertex
 has the minimum sum of weights among all the trees that can be formed from the graph
How Kruskal's algorithm works
It falls under a class of algorithms called greedy algorithms that find the local optimum
in the hopes of finding a global optimum.
We start from the edges with the lowest weight and keep adding edges until we reach
our goal. The steps for implementing Kruskal's algorithm are as follows:
1. Sort all the edges from low weight to high
2. Take the edge with the lowest weight and add it to the spanning tree. If
adding the edge created a cycle, then reject this edge.

3. Keep adding edges until we reach all vertices.

Example of Kruskal's algorithm

Start with a weighted graph

Choose the edge with the least weight, if there are more than 1, choose anyone
Choose the next shortest edge and add it

Choose the next shortest edge that doesn't create a cycle and add it

Choose the next shortest edge that doesn't create a cycle and add it

Repeat until you have a spanning tree

Kruskal Algorithm Pseudocode

A=∅
KRUSKAL(G):

For each vertex v ∈ G.V:

For each edge (u, v) ∈ G.E ordered by increasing order by weight(u, v):
MAKE-SET(v)

A = A ∪ {(u,
if FIND-SET(u) ≠ FIND-SET(v):

v)}
UNION(u, v)
return A
Time Complexity: O(E * logE) or O(E * logV)
 Sorting of edges takes O(E * logE) time.
 After sorting, we iterate through all edges and apply the find-union algorithm. The find and union
operations can take at most O(logV) time.
 So overall complexity is O(E * logE + E * logV) time.
 The value of E can be at most O(V2), so O(logV) and O(logE) are the same. Therefore, the overall
time complexity is O(E * logE) or O(E*logV)
 Auxiliary Space: O(V + E), where V is the number of vertices and E is the number of edges in the
graph.

Applications of Kruskal's algorithm


Kruskal's algorithm has several applications, including: Designing efficient network connections
or cable layout planning. Constructing efficient road networks or transportation systems. Cluster
analysis or data grouping. Approximate solutions to the traveling salesman problem.

2.4 Shortest Path Algorithm

The shortest path problem is about finding a path between vertices in a graph such
that the total sum of the edges weights is minimum.

Algorithm for Shortest Path


1. Bellman Algorithm
2. Dijkstra Algorithm
3. Floyd Warshall Algorithm

2.4.1 Bellman Algorithm


Bellman Ford algorithm helps us find the shortest path from a vertex to all other
vertices of a weighted graph.
It is similar to Dijkstra's algorithm but it can work with graphs in which edges can
have negative weights.
How Bellman Ford's algorithm works
Bellman Ford algorithm works by overestimating the length of the path from the starting
vertex to all other vertices. Then it iteratively relaxes those estimates by finding new
paths that are shorter than the previously overestimated paths.
By doing this repeatedly for all vertices, we can guarantee that the result is optimized.
Bellman Ford Pseudocode
We need to maintain the path distance of every vertex. We can store that in an array
of size v, where v is the number of vertices.
We also want to be able to get the shortest path, not only know the length of the shortest
path. For this, we map each vertex to the vertex that last updated its path length.
Once the algorithm is over, we can backtrack from the destination vertex to the source
vertex to find the path.
function
bellmanFord(G, S)
for each vertex V in
G distance[V] <-
infinite
previous[V] <-
NULL distance[S]
<- 0
for each vertex V
in G for each
edge (U,V) in G
tempDistance <- distance[U] +
edge_weight(U, V) if tempDistance <
distance[V]
distance[V] <-
tempDistance
previous[V] <- U

for each edge (U,V) in G


If distance[U] + edge_weight(U, V) < distance[V}
Error: Negative Cycle

Exists return distance[],

previous[] Bellman

Ford's Complexity Time

Complexity
Best Case O(E
Complexity )
Average Case O(VE
Complexity )
Worst Case O(VE
Complexity )
2.4.2 Dijkstra Algorithm

Dijkstra's algorithm allows us to find the shortest path between any two vertices of a graph.
It differs from the minimum spanning tree because the shortest distance between
two vertices might not include all the vertices of the graph.

How Dijkstra's Algorithm works


Dijkstra's Algorithm works on the basis that any subpath B -> D of the shortest path A -
> D between vertices A and D is also the shortest path between vertices B and D.
Each subpath is the shortest path

Djikstra used this property in the opposite direction i.e we overestimate the distance of each
vertex from the starting vertex. Then we visit each node and its neighbors to find the shortest
subpath to those neighbors.
The algorithm uses a greedy approach in the sense that we find the next best solution hoping
that the end result is the best solution for the whole problem.

Example of Dijkstra's algorithm


It is easier to start with an example and then think about the algorithm.

Start with a weighted graph

Choose a starting vertex and assign infinity path values to all other devices

Go to each vertex and update its path length


If the path length of the adjacent vertex is lesser than new path length, don't update it

Avoid updating path lengths of already visited vertices

After each iteration, we pick the unvisited vertex with the least path length. So we choose 5 before 7
Notice how the rightmost vertex has its path length updated twice

Repeat until all the vertices have been visited

Djikstra's algorithm pseudocode


We need to maintain the path distance of every vertex. We can store that in an array of
size v, where v is the number of vertices.
We also want to be able to get the shortest path, not only know the length of the shortest
path. For this, we map each vertex to the vertex that last updated its path length.
Once the algorithm is over, we can backtrack from the destination vertex to the source vertex
to find the path.
A minimum priority queue can be used to efficiently receive the vertex with least path
distance. function dijkstra(G, S)
for each vertex V in G
distance[V] <- infinite
previous[V] <- NULL
If V != S, add V to Priority Queue
Q distance[S] <- 0

while Q IS NOT EMPTY


U <- Extract MIN from Q
for each unvisited neighbour V of U
tempDistance <- distance[U] +
edge_weight(U, V) if tempDistance <
distance[V]
distance[V] <- tempDistance
previous[V] <- U
return distance[], previous[]
Dijkstra's Algorithm Complexity
Time Complexity: O(E Log V)
where, E is the number of edges and V is the number of
vertices. Space Complexity: O(V)

2.4.3 Ford-Fulkerson algorithm


The Ford-Fulkerson algorithm is a widely used algorithm to solve the maximum flow problem in a flow
network. The maximum flow problem involves determining the maximum amount of flow that can be sent
from a source vertex to a sink vertex in a directed weighted graph, subject to capacity constraints on the
edges.
The algorithm works by iteratively finding an augmenting path, which is a path from the source to the sink in
the residual graph, i.e., the graph obtained by subtracting the current flow from the capacity of each edge.
The algorithm then increases the flow along this path by the maximum possible amount, which is the
minimum capacity of the edges along the path.
Problem:
Given a graph which represents a flow network where every edge has a capacity. Also, given two
vertices source ‘s’ and sink ‘t’ in the graph, find the maximum possible flow from s to t with the following
constraints:
 Flow on an edge doesn’t exceed the given capacity of the edge.
 Incoming flow is equal to outgoing flow for every vertex except s and t.
For example, consider the following graph from the CLRS book.

The maximum possible flow in the above graph is 23.

Prerequisite : Max Flow Problem Introduction


Ford-Fulkerson Algorithm
The following is simple idea of Ford-Fulkerson algorithm:
1. Start with initial flow as 0.
2. While there exists an augmenting path from the source to the sink:
 Find an augmenting path using any path-finding algorithm, such as breadth-first search or depth-first
search.
 Determine the amount of flow that can be sent along the augmenting path, which is the minimum
residual capacity along the edges of the path.
 Increase the flow along the augmenting path by the determined amount.
3. Return the maximum flow.

Time Complexity: Time complexity of the above algorithm is O(max_flow * E). We run a loop while there
is an augmenting path. In worst case, we may add 1 unit flow in every iteration. Therefore the time
complexity becomes O(max_flow * E).

Ford-Fulkerson Algorithm

Initially, the flow of value is 0. Find some augmenting Path p and increase flow f on each
edge of p by residual Capacity cf (p). When no augmenting path exists, flow f is a maximum
flow.
FORD-FULKERSON METHOD (G, s, t)
1. Initialize flow f to 0
2. while there exists an augmenting path p
3. do argument flow f along p
4. Return f

1. for each edge (u, v) ∈ E [G]


FORD-FULKERSON (G, s, t)

2. do f [u, v] ← 0
3. f [u, v] ← 0
4. while there exists a path p from s to t in the residual network Gf.
5. do cf (p)←min?{ Cf (u,v):(u,v)is on p}
6. for each edge (u, v) in p
7. do f [u, v] ← f [u, v] + cf (p)
8. f [u, v] ←-f[u,v]

Example: Each Directed Edge is labeled with capacity. Use the Ford-Fulkerson algorithm to
find the maximum flow.

Solution: The left side of each part shows the residual network Gf with a shaded augmenting
path p,and the right side of each part shows the net flow f.
2.5 MAXIMUM FLOW

It is defined as the maximum amount of flow that the network would allow to flow from source to sink.
Multiple algorithms exist in solving the maximum flow problem. Two major algorithms to solve these kind
of problems are Ford-Fulkerson algorithm and Dinic's Algorithm.

In optimization theory, maximum flow problems involve finding a feasible flow through a flow network that
obtains the maximum possible flow rate.

The maximum flow problem can be seen as a special case of more complex network flow problems, such as
the circulation problem. The maximum value of an s-t flow (i.e., flow from source s to sink t) is equal to the
minimum capacity of an s-t cut (i.e., cut severing s from t) in the network, as stated in the max-flow min-cut
theorem.

2.6 Network Flow


Flow Network is a directed graph that is used for modeling material Flow. There are two
different vertices; one is a source which produces material at some steady rate, and another
one is sink which consumes the content at the same constant speed. The flow of the material
at any mark in the system is the rate at which the element moves.
Some real-life problems like the flow of liquids through pipes, the current through wires and
delivery of goods can be modelled using flow networks.

1. For each edge (u, v) ∈ E, we associate a nonnegative weight capacity c (u, v) ≥ 0.If (u, v) ∉
Definition: A Flow Network is a directed graph G = (V, E) such that

E, we assume that c (u, v) = 0.


2. There are two distinguishing points, the source s, and the sink t;
3. For every vertex v ∈ V, there is a path from s to t containing v.
Let G = (V, E) be a flow network. Let s be the source of the network, and let t be the sink. A
flow in G is a real-valued function f: V x V→R such that the following properties hold:

Capacity Constraint: For all u, v ∈ V, we need f (u, v) ≤ c (u, v).


Play Video

Skew Symmetry: For all u, v ∈ V, we need f (u, v) = - f (u, v).


o

Flow Conservation: For all u ∈ V-{s, t}, we need


o
o

The quantity f (u, v), which can be positive or negative, is known as the net flow from
vertex u to vertex v. In the maximum-flow problem, we are given a flow network G with
source s and sink t, and
a flow of maximum value from s to t.

2.7 Maximum Bipartite Matching

 A bipartite graph is an undirected graph G=(V,E) in which V can be partitioned into two sets V1 and V2 such
that (u,v)  E implies either u  V1 and v  V12 or vice versa. • That is, all edges go between the two sets V1
and V2 and not within V1 and V2.

 The bipartite matching is a set of edges in a graph is chosen in such a way, that no two
edges in that set will share an endpoint. The maximum matching is matching the
maximum number of edges.

When the maximum match is found, we cannot add another edge. If one edge is added to the
maximum matched graph, it is no longer a matching. For a bipartite graph, there can be more
than one maximum matching is possible.

Algorithm

bipartiteMatch(u, visited, assign)


Input: Starting node, visited list to keep track, assign the list to assign node with another node.
Output − Returns true when a matching for vertex u is possible.
Begin
for all vertex v, which are adjacent with u,
do if v is not visited, then
mark v as visited
if v is not assigned, or bipartiteMatch(assign[v], visited, assign) is true, then
assign[v] := u
return true
done
return false
End
maxMatch(graph)
Input − The given
graph.
Output − The maximum number of the match.
Begin
initially no vertex is
assigned count := 0
for all applicant u in M, do
make all node as unvisited
if bipartiteMatch(u, visited, assign),
then increase count by 1
done
End

Applications of Bipartite Graph


Various applications of bipartite graph are:
Matching Problems
In bipartite graphs, vertices are divided into two disjoint sets, and edges only connect vertices from different
sets. This property makes bipartite graphs useful for modeling matching problems, such as assigning tasks to
workers or matching students to schools.
Recommendation Systems
Bipartite graphs can represent relationships between users and items in recommendation systems. Users are
in one set, items (like products or movies) are in the other, and edges indicate user-item interactions.
Analyzing these graphs can help recommend items to users based on their preferences or similarities with
other users.
Social Networks
In social networks, bipartite graphs can represent connections between two different kinds of things, like
users and events, or users and interests. For instance, in a user-event graph, lines link users to events they go
to, helping with things like suggesting events or finding groups of users who like similar things.
Resource Allocation
Bipartite graphs can represent allocation problems, such as assigning resources to tasks or employees to
projects. By modeling resources and tasks as two disjoint sets of vertices and edges indicating compatibility
or assignment, bipartite graphs can help optimize resource allocation and scheduling.
Information Retrieval
In information retrieval systems, bipartite graphs can model relationships between documents and terms. Documents
and terms are represented as two disjoint sets of vertices, and edges indicate which terms appear in which documents.
Analyzing these graphs helps improve search algorithms and document clustering techniques.
Transportation Networks
Bipartite graphs can represent transportation networks, where one set of vertices represents locations (e.g., cities or
nodes), and the other set represents transportation routes (e.g., roads or edges). These graphs are used for optimizing
transportation systems, route planning, and logistics management.

You might also like