0% found this document useful (0 votes)
17 views147 pages

Handnotes DAA

Uploaded by

ULtraSwoRDFiSH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views147 pages

Handnotes DAA

Uploaded by

ULtraSwoRDFiSH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 147

Recursive relations are fundamental in defining sequences where

each term relies on previous terms. A prime example is the factorial


function defined recursively as follows:

For Example

// Function to calculate factorial recursively

int factorial(int n) n = 5
{
if (n == 0 || n == 1)
{
return 1;
}
return n * factorial(n - 1); 5! = 5 X 4 X 3 X 2 X 1 = 5
}
1
Functins will call itselt called recursive
T(n ) =5
12345
5
5 * fact(4)
20 * fact(3)
60* fact(2)
120 * fact(1)
1* 120 = 120
5 = 120
return n * factorial(n - 1);
1 + 2 + 3 + 4 + …. N = n(n-1)/2

2
Recurrence Relation

T(n) = n × T(n−1)

Base Case

For factorial, the base case is defined as:

T(0) = 1

Recursive Case

The recursive definition is unpacked as:

T(n) = n × T(n−1)

3
To understand the recursive relation T(n)=n×T(n−1) and its
derivation for factorial, let’s substitute the recursive relation
multiple times to observe the pattern. We will proceed step by step
and eventually reach the closed form.

Given:

T(n)=n×T(n−1)

Step-by-step Substitution n = n-1

Substitute T(n−1) using the recursive relation:

T(n−1)=(n−1)×T(n−2)… t(n-3)….T(n-k)

4
So,

T(n)=n×(n−1)×T(n−2)

Now, substitute T(n−2) using the same recursive relation:

T(n−2)=(n−2)×T(n−3)

Therefore,

T(n)=n×(n−1)×(n−2)×T(n−3)

Continue this process:

T(n)=n×(n−1)×(n−2)×⋯×T(1)

Finally, at T(1), the base case is defined as:

5
T(1)=1 or T(n) = 1

6
General Form

By substituting all terms down to the base case T(1)=1, we get the
following expression for T(n):

T(n)=n×(n−1)×(n−2)×⋯×1

This is precisely the definition of the factorial function:

T(n)=n!

Summary of the Derivation

7
We start with the recursive relation T(n)=n×T(n−1).

By repeatedly substituting the recursive relation, we unfold it into a


product of all integers from n down to 1. This gives the closed-form
solution for the factorial function: T(n)=n!, where n!=n×(n−1)×⋯×1.
Thus, the recursive relation T(n)=n×T(n−1) is simply a recursive
definition of the factorial function, and the substitution leads
directly to the standard factorial formula.

8
Derivation of T(n)

T(n)=n×T(n−1).
For n = 1:

T(1) = 1 × T(0) = 1 × 1 = 1

For n = 2:

T(2) = 2 × T(1) = 2 × 1 = 2

For n = 3:

T(3) = 3 × T(2) = 3 × 2 = 6

For n = 4:

9
T(4) = 4 × T(3) = 4 × 6 = 24

Continuing this pattern, we can represent:

T(n) = n × (n−1) × T(n−2)… T(n-3) x T(n-4)…..T(n-k)

10
The factorial of n is defined as the product of all positive integers up
to n:

Base Case

0! = 1

Recursive Case

Using recursion, the factorial can be expressed as:

n! = n × (n−1)!

Recursive Relation

This leads to the recursive relation:

11
T(n) = n × T(n−1)

with the base case:

T(0) = 1

General Expression

This simplifies to:

T(n) = n × (n−1) × (n−2) × ... × 1 = n!

Closed-Form Solution

Thus, the closed-form solution for the recurrence relation is:

12
T(n) = n!

=================

The time complexity of the recursive factorial function is derived


from the recursive relation:

T(n)=n×T(n−1)

Step-by-Step Analysis:

Recursive Formula:

The recursive definition of factorial is:

T(n)=n×T(n−1)

13
This relation calls the factorial function recursively for n−1,
reducing the input size by 1 with each call.

Base Case:

The base case is when n=0 or n=1, where the function returns 1,
without making any further recursive calls:

T(0)=T(1)=1 or T(n) = 1

14
Number of Recursive Calls:

The recursion depth is proportional to n, since each recursive call


reduces the problem size by 1 until reaching the base case. For
example, if n=5, the recursion unfolds as:

T(5)=5×T(4)→

5×(4×T(3))→

5×(4×(3×T(2)))→

5×(4×(3×(2×T(1))))

5 X fact(4)

15
4X fact(3)

3X fact(2)

2X fact(1)

1X fact(1)

Therefore, there are n recursive calls in total, starting from T(n)


down to T(1).
16
Work Done per Recursive Call:

Each recursive call performs a constant amount of work (i.e.,


multiplying two numbers). Therefore, the work done in each call
is O(1).

Worst case O(1) or O(n) omega (1) T(n) = O(1)

Best case = O(1)

Woset case = O(n)

Average case = O(n)

17
Total Time Complexity:

Since there are n recursive calls and each call takes constant time
O(1), the total time complexity is:

T(n)=O(n)

Conclusion:

The time complexity of the factorial function using the recursive


relation T(n)=n×T(n−1) is O(n), because the function makes n
recursive calls, each taking constant time.

18
============

Recursive relations Types

Recursive relations can be categorized into several types based


on their properties and structures:

1. Linear Recursive Relations

These relations define a sequence where each term is a linear


combination of previous terms. For example:

T(n) = a*T(n-1) + b*T(n-2)

19
2. Non-Linear Recursive Relations

These relations involve non-linear combinations of previous


terms. An exampe is:

32 45 77 88

T(n) = T(n-1) + T(n-2) + T(n-3)

3. Homogeneous Recursive Relations

Relations with no constant term are called homogeneous. For


instance:

T(n) = a*T(n-1) + b*T(n-2)

20
4. Non-Homogeneous Recursive Relations

These relations include constant or non-recursive terms, such as:

T(n) = a*T(n-1) + b*T(n-2) + c

5. First-Order Recursive Relations

Relations that depend solely on the preceding term are first-order,


exemplified by:

T(n) = T(n-1) + k

21
6. Higher-Order Recursive Relations

Relations that involve multiple previous terms are higher-order,


such as:

T(n) = T(n-1) + T(n-2)

7. Mutual Recursion

Occurs when two or more functions call each other recursively.


For example:

function1() calls function2(); function2() calls function1();

22
Conclusion

Understanding the types of recursive relations helps in choosing


appropriate methods for solving problems and analyzing their
complexity.

===================================

The substitution method is a technique used to solve systems of


equations, typically in algebra. It involves solving one of the
equations for one variable in terms of the other variable(s), then
substituting that expression into the other equation(s). This allows

23
you to reduce the system to a single equation with one variable,
making it easier to solve.

Solve one equation for one variable:


Choose one of the equations and solve it for one variable in terms
of the other. For example, if you have two equations:

You can see that the first equation is already solved for y in terms
of x.

24
Verifying points:
To verify that any point lies on the line represented by
y = 2x+1y
= 2x + 1y
=2x+1,
you can plug in the values of x and check if y matches.
For example, if x=3x = 3x=3:
y=2(3)+1=6+1=7
y =2(4) + 1
So, the point (3,7) (4,9) (5, 11) (6,13) is on the line.

25
Graph Interpretation:
If you graph the equation y= 2x+1y = 2x + 1y= 2x+1, you'll see
that it forms a straight line.
The steepness and the intercept can help you understand how
changes in x affect y.
Would you like help graphing this equation, or any further
exploration of how slopes or intercepts work?

26
For Example
The Merge Sort algorithm follows a divide-and-conquer
strategy, where a problem of size n is recursively divided into two
smaller subproblems of size n/2, until the base case of size 1 is
reached (where the list is trivially sorted).
After sorting the subarrays, the two sorted halves are merged back
together.

27
The time complexity of Merge Sort can be described using the
following recurrence relation:

28
Conclusion:

=================
A recurrence tree is a method used to visually analyze the time
complexity of recursive algorithms. It helps to break down how a
problem is split into smaller subproblems and how much work is
done at each level of recursion. The tree structure makes it easier

29
to see how the work adds up across the different levels of
recursion, allowing us to determine the overall time complexity.
How a Recurrence Tree Works:
➢ Nodes: Each node represents a subproblem.
➢ Edges: The edges show the recursive division of a problem
into smaller subproblems.
➢ Levels: Each level of the tree corresponds to a recursive step
(breaking the problem into smaller parts).
➢ Costs: At each level, you calculate the total cost of the
recursive calls and any additional work (e.g., combining
results).

30
General Example of Recurrence:
Let's consider the recurrence relation for a typical divide-and-
conquer algorithm:

31
Recurrence tree is a method used to visually analyze the time
complexity of recursive algorithms. It helps to break down how a
problem is split into smaller subproblems and how much work is
done at each level of recursion.
The tree structure makes it easier to see how the work adds up
across the different levels of recursion, allowing us to determine
the overall time complexity.

32
33
General Use of Recurrence Trees:
• Understanding Recursive Algorithms: Recurrence trees
help to visualize how recursive algorithms work and how
work is distributed across different levels of recursion.
• Analyzing Time Complexity: They provide a way to easily
calculate the total time complexity by summing the work done
at each level.
Conclusion:
• Recurrence tree is a powerful tool to break down recursive
algorithms and calculate their time complexity. By visualizing
how the problem divides into subproblems, you can see how
the total work grows and how many levels of recursion exist.

34
• For divide-and-conquer algorithms like Merge Sort, the
recurrence tree shows that the total time complexity is
• O(n log n).
===============
The recurrence relation for Binary Search describes the time
complexity of the algorithm as it recursively divides the problem
into smaller subproblems.
Binary Search Recurrence Relation
Binary Search works by repeatedly dividing the search interval in
half. If the array is sorted, you compare the middle element to the
target value:
• If the target is equal to the middle element, you return the
index of that element.
35
• If the target is smaller than the middle element, you
recursively search in the left half.
• If the target is larger, you search in the right half.
For an array of size n, at each step, you reduce the problem size
by half. This leads to the following recurrence relation:

36
Explanation of the Recurrence:
• Subproblem size: Each recursive call reduces the problem
size from n to n/2.
• Work outside recursion: The comparison between the
middle element and the target element takes constant time,
O(1), regardless of the input size.

37
38
39
Conclusion
The time complexity of Binary Search is O( log n).
This result makes sense because at each step, you reduce the
search space by half, leading to a logarithmic number of
comparisons.

40
Recurrence tree for Binary Search is a visual representation of
how the problem is broken down at each recursive step. It shows
how the problem size is halved at each level of recursion, helping
us understand how the time complexity is derived.
Binary Search Recurrence Relation
For Binary Search, the recurrence relation is:

41
Recursive Tree Breakdown
Let’s visualize the recursive tree for this recurrence relation. At
each level, the problem size reduces by half, and the comparison
step takes constant time at each level.
Root Level (Level 0):
• At the top (root) level, we start with the full array of size n.
• The work done at this level is O(1) to compare the middle
element with the target.
Recursive Tree Visualization for Binary Search
Here’s a simplified diagram of the recursive tree for Binary
Search:
Recursive Tree Visualization for Binary Search

42
Here’s a simplified diagram of the recursive tree for Binary
Search:

Recursive Tree Visualization for Binary Search

43
Here’s a simplified diagram of the recursive tree for Binary
Search:

44
Conclusion
• Recursive Tree: The recursive tree for Binary Search shows
how the problem size is halved at each level, with constant
work O(1) performed at each level.
• Number of Levels: There are (log2 n) levels in the tree.
• Total Work: The total time complexity is O(log n), as the
work done at each level adds up to O(log n).
This tree clearly illustrates why Binary Search has logarithmic
time complexity, O(log n), since each recursive step reduces the
problem size by half.

============================

45
The Master Method is often applied in the analysis of recursive
algorithms, especially those that follow a divide-and-conquer
strategy.
It is particularly useful in understanding the time complexity of
algorithms in data structures that involve recursion, such as
binary search, tree traversal, and algorithms like Merge Sort,
Quicksort, and operations on heaps.

The Master Theorem provides a way to analyze the time


complexity of divide-and-conquer algorithms that follow a
recurrence of the form:

46
47
48
49
Complexity classes P and NP
Definition of Class P (Polynomial Time):
➢ Definition: P is the class of decision problems (problems with
a yes/no answer) that can be solved in polynomial time by a
deterministic Turing machine.
➢ Polynomial Time: A problem is in P if there exists an algorithm
that can solve the problem in a number of steps that grows as a
polynomial function of the input size.
➢ In other words, for an input of size n, the algorithm's running
time is bounded by O(nk), where k is a constant.
➢ Key Feature: P represents problems that are considered to be
efficiently solvable. This means that for large input sizes, the

50
time required to solve the problem grows at a reasonable rate
(as a polynomial function of the input size).
Examples of Problems in P:
➢ Searching Algorithms: Binary Search.

➢ Sorting Algorithms: Merge Sort, Quick Sort (average case).

➢ Graph Algorithms: Finding the shortest path in a graph (e.g.,


Dijkstra’s algorithm), checking if a graph is connected.

In summary, P consists of problems for which an efficient


solution exists, where "efficient" means that the solution can be
computed in polynomial time relative to the size of the input.

51
An example of a problem in Class P (Polynomial Time) is
Binary Search.
Binary Search Problem:
• Problem: Given a sorted array of n elements, find the index of
a target element (if it exists) or determine that the element is not
in the array.
• Algorithm: Binary search works by repeatedly dividing the
search interval in half. If the target value is less than the middle
element, the search continues on the left half, otherwise on the
right half. This process is repeated until the target value is found
or the interval is empty.

52
Time Complexity:
➢ In each step, Binary Search reduces the size of the problem by
half. So, after the first comparison, the problem size becomes
n/2, then n/4, and so on.

➢ The time complexity is O(log n) is logarithmic and much


smaller than a linear or quadratic time complexity for large n.

➢ Since O(log n) is a polynomial function of the input size n,


Binary Search belongs to the class P.

53
Definition of the classes NP
Class NP (Nondeterministic Polynomial Time):
• Definition: NP is the class of decision problems (problems with
a yes/no answer) for which, if the answer is "yes," a solution (or
certificate) can be verified in polynomial time by a
deterministic Turing machine.
• Nondeterministic Polynomial Time:
• The term "nondeterministic" refers to the hypothetical idea of a
nondeterministic machine that can explore multiple
possibilities simultaneously. NP problems are ones that such a
machine could solve in polynomial time. In practice, this means
that while we may not know how to solve the problem
efficiently, we can verify a solution efficiently.

54
Example Problems in NP:
Boolean Satisfiability Problem (SAT): Given a Boolean
expression, is there an assignment of truth values to variables that
makes the entire expression true?
If someone gives a solution (an assignment of variables), you can
check in polynomial time whether it satisfies the Boolean
expression.
Hamiltonian Path Problem: Does a path exist in a graph that
visits every vertex exactly once? If given a path, you can verify
that it visits each vertex once in polynomial time.
Subset Sum Problem: Given a set of numbers, is there a subset
whose sum equals a particular number? If given a subset, you can
easily check if the sum equals the target number in polynomial
time.
55
NP vs. P:
• P (Polynomial Time): Problems in P are both solvable and
verifiable in polynomial time.
• NP: Problems in NP are verifiable in polynomial time, but it
is not guaranteed that they are solvable in polynomial time
(unless P = NP).
The P vs. NP Question:
• One of the biggest open questions in computer science is
whether P = NP. This means: Can every problem for which a
solution can be verified in polynomial time also be solved in
polynomial time?

56
• If P = NP: This would mean that all problems that are
verifiable in polynomial time can also be solved in
polynomial time.
• If P ≠ NP: This means that some problems can be verified in
polynomial time but cannot be solved in polynomial time.
In summary, NP consists of problems for which, if given a
solution, you can verify it efficiently in polynomial time, but
finding that solution may be more difficult.

Travling sales problem ----- 2 power n verified make as P

NP makes as P ---------------
NP-hard – we could not make P… tthat will be problem in future
57
NP-hard (Nondeterministic Polynomial-time Hard) refers to a
class of problems in computational complexity theory that are at
least as difficult as the hardest problems in NP (Nondeterministic
Polynomial time). The term "hard" indicates that solving an NP-
hard problem is as challenging as solving the most difficult
problems in NP, but it does not necessarily mean the problem
itself is in NP.
Example of an NP-hard Problem:
Travelling Salesman Problem (TSP) – Optimization Version:
• Problem: Given a set of cities and distances between them,
find the shortest possible route that visits each city exactly
once and returns to the starting point.

58
NP-complete
NP-complete is a class of problems in computational complexity
theory that are both in NP (Nondeterministic Polynomial time)
and NP-hard. These problems are of great interest in computer
science because they represent the most challenging problems
within NP, and solving one NP-complete problem efficiently
would imply that all NP problems can be solved efficiently.
Key Points:
• If you can solve any NP-complete problem in polynomial
time, then P = NP, meaning every problem in NP can also be
solved in polynomial time.

59
• NP-complete problems are the hardest problems in NP, and
they are often used to understand the boundary between what
can be solved efficiently and what likely cannot.
• Most researchers believe P ≠ NP, meaning that there are no
polynomial-time solutions for NP-complete problems, but
this has not been definitively proven.
Travelling Salesman Problem (TSP) – Decision Version:
• Description: Given a set of cities and distances between them,
and a number DDD, is there a tour that visits each city exactly
once and returns to the starting city with a total distance less
than or equal to DDD?
• Why NP-complete: The decision version of TSP is NP-
complete. It is in NP because if you are given a tour, you can

60
easily verify whether the total distance is less than DDD. It is
NP-hard because it can be used to solve any NP problem.
• Knapsack Problem (Decision Version):
➢ Description: Given a set of items, each with a weight and a
value, and a knapsack with a weight capacity, is there a subset
of items that fits in the knapsack and achieves at least a given
total value?
➢ Why NP-complete: The decision version of the knapsack
problem is NP-complete. You can verify a solution in
polynomial time (by checking the total weight and value of
the selected items), and it is NP-hard because it can be used
to solve any NP problem.

61
Huffman coding
Huffman coding is a method for compressing data using variable-
length codes assigned to characters based on their frequencies.
Steps to Create Huffman Codes
Calculate Character Frequencies:
• Count how often each character appears in the input data.
Build a Priority Queue:
• Create a priority queue (min-heap) where each node
represents a character and its frequency.

62
Construct the Huffman Tree:
• While there is more than one node in the queue:
o Remove the two nodes with the lowest frequencies.
o Create a new internal node with these two nodes as
children and with a frequency equal to the sum of their
frequencies.
o Insert the new node back into the queue.
• The remaining node is the root of the Huffman tree.
Assign Codes:
• Traverse the Huffman tree from the root to each leaf node:
o Assign '0' for a left edge and '1' for a right edge.
o The path to each character forms its unique binary code.

63
Job sequencing
Job sequencing is a problem in operations research and computer
science that deals with scheduling a set of jobs to optimize a
particular objective, such as minimizing total time, maximizing
profit, or meeting deadlines.
The most common version of the problem is the Job Sequencing
with Deadlines problem, which aims to maximize profit by
selecting and sequencing jobs to be completed within their
deadlines.
Problem Statement:
• You are given a set of n jobs, each with:
o A deadline by which the job must be completed.

64
o A profit associated with completing the job before or on
its deadline.
• The goal is to find the best sequence of jobs such that the total
profit is maximized, and no job is scheduled after its deadline.
Approach:
The problem can be solved using a greedy algorithm. Here's a
step-by-step approach:
Sort Jobs by Profit:
o Sort all the jobs in decreasing order of their profit.
Schedule Jobs:
Initialize an array to keep track of free time slots.
Iterate over the sorted jobs and place each job in the latest
available time slot before its deadline (if possible).
65
Calculate Profit:
o Keep track of the total profit as jobs are scheduled.
Example:
Suppose you have the following jobs:

Applications:
• Task scheduling in operating systems.
• Job scheduling in manufacturing.
• Allocating resources in cloud computing environments.

66
Complexity:
• Sorting the jobs takes O(n log n). 5 log 5 = 3 with profit
• Scheduling each job takes O(n).
• Total time complexity: O(n2)) in the worst case.
This basic greedy method is suitable for most simple job
sequencing problems but may need modifications for more
complex constraints or objectives.

67
Fractional Knapsack Problem in Greedy Methods
The Fractional Knapsack Problem is a variation of the classic
knapsack problem, and it is solved using a greedy algorithm
approach. Unlike the 0/1 Knapsack Problem, where each item
must be taken in whole or not at all, the Fractional Knapsack
Problem allows you to take fractions of an item, making it easier
to achieve the optimal solution.
Problem Statement
Given a set of items, each with a weight and a value, the goal is
to maximize the total value in a knapsack that can hold up to a
certain weight capacity. You can take any fraction of an item.
• Input:
o A list of items, each with a value v and a weight w.

68
o A maximum weight capacity W of the knapsack.
• Output:
o The maximum total value that can be achieved by filling
the knapsack, possibly with fractional items.
Greedy Algorithm Approach
The greedy algorithm approach works as follows:
Calculate the Value-to-Weight Ratio:
o For each item, calculate the value-to-weight ratio v/w.
This ratio indicates the value you get for each unit of
weight.
Sort Items by Value-to-Weight Ratio:

69
o Sort the items in descending order based on their value-to-
weight ratio. This ensures that the item with the highest
value per unit weight is considered first.
Fill the Knapsack:
o Start adding items to the knapsack in the sorted order:
▪ If the entire item can fit into the knapsack, take it
completely.
▪ If only part of the item can fit, take as much as possible
(a fraction of the item).
o Continue until the knapsack is full or all items have been
considered.
Calculate the Total Value:

70
o Sum the total value of the items (or fractions thereof)
included in the knapsack.
Algorithm Steps
Calculate the value-to-weight ratio for each item.
Sort all items based on the ratio in descending order.
Initialize the total value as 0 and the current weight of the
knapsack as 0.
For each item in the sorted list:
o If adding the entire item won't exceed the capacity, add it
and update the total value.
o If adding the entire item would exceed the capacity, add
the fraction that fits, update the total value, and break the
loop.
71
Return the total value.
Time Complexity
Time Complexity of the Fractional Knapsack Problem Using
Greedy Methods
The time complexity of the Fractional Knapsack problem using a
greedy algorithm primarily depends on the sorting step and the
process of filling the knapsack. Here’s a detailed breakdown
along with example derivations.
Steps of the Greedy Algorithm
Calculate Value-to-Weight Ratio:
o For each item, calculate the ratio value/weight.
Sort the Items:

72
o Sort all items based on their value-to-weight ratio in
descending order.
Select Items for the Knapsack:
o Iterate through the sorted items, adding them fully or
partially to the knapsack until the capacity is reached.
Time Complexity Analysis
Calculating the Value-to-Weight Ratio:
o This step involves calculating the ratio for each item,
which takes O(n) time, where n is the number of items.
Sorting the Items:
o Sorting the items based on their value-to-weight ratio
takes O(n log n) time. This is the most computationally
expensive step in the algorithm.
73
Selecting Items for the Knapsack:
o Once the items are sorted, the process of iterating through
them and adding them to the knapsack takes O(n) time.
Overall Time Complexity:
• The dominant term in the time complexity is from the sorting
step, which is O(n log n). Therefore, the overall time
complexity of the Fractional Knapsack problem using a
greedy approach is: O(n log n)
Conclusion
The Fractional Knapsack problem is well-suited to a greedy
algorithm because it allows for selecting parts of items, enabling
straightforward maximization of value. This makes it

74
computationally efficient and straightforward to implement
compared to its 0/1 counterpart.
The time complexity of a dynamic programming algorithm

75
Dynamic programming and greedy algorithms are both used to
solve optimization problems, but they differ fundamentally in
their approach and when they are applicable.
Greedy Method
• Approach: It builds a solution piece by piece, always
choosing the next piece that offers the most immediate
benefit. The choice is made based on some locally optimal
criterion.
• Optimal Substructure: A greedy choice at every step should
lead to an optimal solution.
• Use Case: It is typically used for problems where a local
choice can lead to a global optimum, such as in activity
selection, minimum spanning trees (Kruskal's and Prim's

76
algorithms), or coin change problems with denominations that
follow certain rules.
The time complexity of greedy methods varies depending on the
specific problem and implementation. In general, greedy
algorithms have time complexities that are better or similar to
those of dynamic programming solutions because they do not
involve solving overlapping subproblems multiple times. The key
factor is often the sorting step or selection criteria used to make
the "greedy choice."
General Factors Affecting Time Complexity
Sorting: Many greedy algorithms involve sorting the input,
which has a time complexity of O(n log n).
Selection of Greedy Choice: This is typically O(1) or O(n),
depending on how the choice is made.
77
Number of Iterations: This is usually linear (O(n)) as the
algorithm processes each element exactly once.

Dynamic Programming (DP)


• Approach: It breaks a problem into overlapping subproblems
and solves each subproblem just once, storing its solution
(usually in a table). The key idea is to use these solutions to
build up the solution to the overall problem.
• Optimal Substructure: The problem can be solved optimally
by breaking it into subproblems that can also be solved
optimally.
• Overlapping Subproblems: The same subproblems are
solved multiple times.

78
• Use Case: It is typically used when the problem has
overlapping subproblems and optimal substructure, such as
the Fibonacci sequence, shortest paths in graphs (Bellman-
Ford algorithm), or knapsack problem.

79
Dynamic Programming Solution:
• Create a table where rows represent items and columns
represent weights.
• Fill in the table with the maximum value that can be obtained
for each weight capacity.
• The solution for the entire problem is found at the last cell of
the table.

80
The time complexity of dynamic programming (DP) algorithms
depends on the number of subproblems and the time required to
compute each subproblem. It can vary widely based on the
specific problem being solved. Here’s a structured way to think
about it:
General Formula for Time Complexity
Number of Subproblems (P): This is the total number of distinct
subproblems that need to be solved.
Time per Subproblem (T): This is the time required to solve
each subproblem, which often involves looking up values from a
table and combining them.
Total Time Complexity = Number of Subproblems (P) × Time
per Subproblem (T)

81
Analyzing Different Problems
Fibonacci Sequence
Recursive Approach: Exponential time complexity, O(2^n).
Dynamic Programming Approach:
Number of Subproblems: n (one for each Fibonacci number).
Time per Subproblem: O(1) (just a table lookup and addition).
Total Time Complexity: O(n) × O(1) = O(n).
Matrix Chain Multiplication
o Problem: Determine the most efficient way to multiply a
given sequence of matrices.
o Dynamic Programming Approach:

82
▪ Number of Subproblems: O(n^2) (one for each pair
of matrices).
▪ Time per Subproblem: O(n) (evaluating the split
point for multiplication).
▪ Total Time Complexity: O(n^3).
o Space Complexity: O(n^2) for the DP table.

83
Dynamic programming (DP) is a method used in computer
science and mathematics to solve complex problems by breaking
them down into simpler overlapping subproblems. It is
particularly useful for optimization problems, where the goal is
to find the best solution among many possible options.
Fibonacci sequence, the sequence of numbers 1, 1, 2, 3, 5, 8, 13,
21, …, each of which, after the second, is the sum of the two
previous numbers; that is, the nth Fibonacci number Fn = Fn − 1 +
Fn − 2.

84
The Matrix Chain Multiplication problem is a classic example
of dynamic programming. The goal is to determine the most
efficient way to multiply a given sequence of matrices, which
minimizes the number of scalar multiplications.
Time Complexity:
The time complexity of solving the Matrix Chain Multiplication
problem using dynamic programming is O(n³), where n is the
number of matrices.

85
86
(Note: takes any example from other resources)

87
Multistage graph
A multistage graph is a directed graph where vertices are
grouped into stages, and edges only connect vertices from one
stage to the next. The goal in many problems involving
multistage graphs is to find the shortest path from the start
vertex in the first stage to the end vertex in the last stage.
Time Complexity for Solving Multistage Graph Problems
A common algorithm used to solve multistage graph problems is
Dynamic Programming. Here's an outline of the process and its
time complexity:
Dynamic Programming Approach:
o You start at the last stage and work backward to the first
stage.
88
o For each vertex in the current stage, you calculate the
minimum cost to reach the destination using the costs
already calculated for the next stage.
o Store the result for each vertex in the current stage to
avoid recalculating the same values.
Let:
V be the number of vertices in the graph.
E be the number of edges in the graph.
k be the number of stages in the graph.
In the worst case:
• You will process each vertex once, checking each outgoing
edge.
89
• For each edge, you calculate the minimum cost to reach the
destination, which takes constant time.
The total time complexity can be analyzed as:
• O(V + E):
o Processing each vertex and evaluating each edge takes
O(V) for vertices and O(E) for edges, hence O(V + E)
overall. All the vertex data moving on all vertex

In a fully connected multistage graph (where each vertex in one


stage is connected to all vertices in the next stage), the time
complexity can also be written as O(k * n^2), where n is the
number of vertices in each stage, and k is the number of stages.

90
Summary:
• The time complexity of solving a multistage graph problem
using dynamic programming is typically O(V + E).

T(n) = O(V + E) since all node the data moving on the all the nodes
T(n) = O(n) if it fully connected T(n) = n2

91
The 0/1 Knapsack problem is a classic problem in combinatorial
optimization, and the dynamic programming approach is often
used to solve it efficiently.
Problem Recap:
In the 0/1 Knapsack problem, you are given:
• A set of n items, each with a weight w[i] and a value v[i].
• A knapsack with a maximum capacity W.
The goal is to maximize the total value of items placed in the
knapsack without exceeding its capacity. You can either include
an item or exclude it (hence the name "0/1 Knapsack").

92
Dynamic Programming Approach:
The dynamic programming solution involves building a table
dp[i][j] where:
• i represents the first i items.
• j represents the knapsack capacity from 0 to W.
• dp[i][j] holds the maximum value that can be achieved using
the first i items with a knapsack of capacity j.
Transition:
For each item i:
• If the weight of item i is less than or equal to the current
capacity j, you have two choices:

93
Exclude the item: The value remains the same as dp[i-1][j].
Include the item: The value is the value of item i plus the value
of the remaining capacity (dp[i-1][j-w[i]]).
Thus, the recurrence relation is:

94
Initialization:
• dp[0][j] = 0 for all j (with 0 items, the value is 0).
• dp[i][0] = 0 for all i (with a knapsack of 0 capacity, the value
is 0).
Time Complexity:
Table Size:
o The table dp has dimensions (n+1) x (W+1), where n is the
number of items, and W is the capacity of the knapsack.
Filling the Table:
o For each item i (from 1 to n), you iterate through each
capacity j (from 0 to W), and for each cell dp[i][j], you

95
perform a constant-time operation (comparing and
possibly updating values).
Thus, the time complexity for filling the table is O(n * W), where:
• n is the number of items.
• W is the capacity of the knapsack.
Space Complexity:
• Standard DP Table: The space complexity is also O(n * W)
since the table requires storing values for n+1 items and W+1
capacities.
• Space Optimization: It’s possible to reduce the space
complexity to O(W) by using a 1D array. Instead of
maintaining the entire table, you only need to keep track of

96
the current and previous rows. This reduces space usage
significantly but does not change the time complexity.
Summary:
• Time Complexity: O(n * W).
• Space Complexity: O(n * W) (or O(W) with optimization).

(Example take from my example in the class or other


resources)

97
The Branch and Bound method is a popular algorithm used to
solve combinatorial optimization problems, and it is particularly
effective for problems like the 0/1 Knapsack problem. The 0/1
Knapsack problem is a classic problem where you are given a set
of items, each with a weight and a value, and you need to
determine which items to include in a knapsack of limited
capacity to maximize the total value without exceeding the
capacity.
Problem Definition: 0/1 Knapsack Problem
Given:
• A set of n items, each with:
o Weight wi_
o Value vi

98
• A knapsack with maximum capacity W.
The goal is to select a subset of items such that:
• The total weight of the selected items is less than or equal to
W.
• The total value of the selected items is maximized.
In the 0/1 Knapsack problem, each item can either be included
(1) or excluded (0) from the knapsack, hence the name "0/1".
Branch and Bound Approach to the 0/1 Knapsack Problem
The Branch and Bound method for solving the 0/1 Knapsack
problem explores subsets of the solution space in a tree structure
and uses bounds to prune suboptimal solutions early.

99
Key Concepts
State Space Tree:
o Each node in the tree represents a partial solution (a subset
of items selected or excluded).
o The root node represents no items selected, and each child
node represents a decision about including or excluding
an item.
o The depth of the tree corresponds to the number of items
considered.
Bounding:
o At each node, we compute an upper bound on the
maximum value that can be obtained from that point

100
onward. This helps in pruning branches that cannot lead
to an optimal solution.
o If a node’s bound is lower than the current best solution,
we can safely discard (prune) that node and its children.
Branching:
o From each node, two branches are created:
▪ One branch represents the decision to include the
current item.
▪ The other branch represents the decision to exclude the
current item.
Pruning:

101
o If the weight of the selected items in the current node
exceeds the capacity W, the node is pruned because it
cannot lead to a feasible solution.
o If the upper bound of a node is lower than the value of the
best solution found so far, that node is also pruned.
Steps for Solving the 0/1 Knapsack Problem Using Branch and
Bound
Initialization:
o Sort items based on their value-to-weight ratio
vi,wi\frac{vi}{wi}wivi in descending order. This sorting
helps in estimating an upper bound effectively.
Starting Point:

102
o Begin with the root node, which corresponds to
considering no items (an empty knapsack).
Upper Bound Calculation:
o For each node (partial solution), calculate an upper bound.
This is done using a greedy approach by filling the
knapsack as much as possible with remaining items.
o If there is a fractional item (i.e., the next item in line
cannot fully fit in the remaining capacity), its fractional
value is added to the bound, but no fractional items are
actually selected (because this is a 0/1 problem, not a
fractional knapsack problem).
Recursive Branching:
o For each item, branch into two nodes:

103
▪ One node represents including the item in the
knapsack.
▪ The other node represents excluding the item.
o Move to the next item and repeat the process.
Bounding and Pruning:
o At each node, calculate the total weight and value of the
items selected so far.
o If the total weight exceeds the knapsack's capacity WWW,
prune that node.
o If the upper bound of the node is less than or equal to the
current best solution, prune the node.
o Otherwise, explore the node further by branching.
Backtracking:
104
o Backtrack to explore other branches when one branch
completes or is pruned.
Optimal Solution:
o Keep track of the best solution found during the process.
Once all nodes have been explored or pruned, the best
solution is the optimal one.
Example of Branch and Bound for 0/1 Knapsack
Conclusion
The Branch and Bound method provides an efficient way to solve
the 0/1 Knapsack problem by systematically exploring potential
solutions while pruning those that cannot lead to an optimal
result. It leverages upper bounds to reduce the solution space and
thus avoid unnecessary computations.

105
Time complexity
The time complexity of the Branch and Bound algorithm for the
0/1 Knapsack problem is hard to pin down exactly because it
depends heavily on the structure of the solution space and the
efficiency of the bounding and pruning steps. However, we can
provide some insights into its worst-case, best-case, and average-
case complexities.
Worst-Case Time Complexity
In the worst case, the Branch and Bound algorithm explores all
possible solutions before arriving at the optimal solution. The
number of possible solutions to the 0/1 Knapsack problem
corresponds to all subsets of items, which is 2n, where n is the
number of items. Thus, in the worst case:
106
• Worst-case time complexity: O(2n).
This occurs when the algorithm is unable to prune many
branches, and it ends up evaluating almost all possible
combinations of items. This time complexity is exponential
because it must explore every combination of including or
excluding each item.
Best-Case Time Complexity
In the best case, the algorithm can prune a significant portion of
the search tree early on, dramatically reducing the number of
nodes it needs to explore. If the bounding function is very tight
and quickly leads to the optimal solution:
• Best-case time complexity: O(n).

107
This is because if the first solution found is optimal and most
branches can be pruned, the algorithm will only need to explore
a few nodes before terminating. In this case, the algorithm
essentially behaves like a greedy or heuristic approach that finds
the solution quickly.
Average-Case Time Complexity
The average-case time complexity of the Branch and Bound
algorithm is somewhere between the best and worst cases. In
practice, the bounding and pruning techniques often allow the
algorithm to prune a large portion of the search tree, so it doesn't
need to explore all 2n possibilities. The efficiency of the bounding
function, as well as the nature of the items (values and weights),
significantly influence the performance.

108
• Average-case time complexity: O(2n) in theory, but typically
much faster due to effective pruning.
In most real-world applications, Branch and Bound is more
efficient than the brute-force approach, but it can still take
exponential time in cases where the pruning is not effective.
Comparison with Other Algorithms
• Dynamic Programming: The 0/1 Knapsack problem can also
be solved using dynamic programming, which has a time
complexity of O(nW), where W is the capacity of the
knapsack. This is often more efficient than the worst-case
Branch and Bound complexity, especially when n is large and
W is small.
• Greedy Algorithm: The greedy algorithm is not applicable for
the 0/1 Knapsack problem as it does not guarantee an optimal
109
solution. However, it is efficient with O(n log n) time
complexity when sorting items by value-to-weight ratio for
fractional knapsack.

110
The Branch and Bound algorithm is used to solve
optimization problems such as the 15-puzzle problem, where
the goal is to move tiles on a 4x4 grid into a particular order
with the fewest number of moves. Here's a general
explanation of how the Branch and Bound approach is applied
to the 15-puzzle problem, along with an example of how the
game tree might look.
Problem Setup
The 15-puzzle consists of a 4x4 grid with 15 numbered tiles
and one empty space. The goal is to move the tiles in such a
way that the puzzle reaches its goal configuration (e.g., tiles
are arranged in ascending order with the empty space at the
bottom right corner).
Key Components
111
• State: The current arrangement of the tiles on the grid.
• Goal State: The desired configuration of the puzzle.
• Cost Function (g(n)): The number of moves made so far to
reach the current state.
• Heuristic Function (h(n)): An estimate of the number of
moves required to reach the goal state from the current state.
Common heuristics for this problem include:
o Manhattan Distance: The sum of the distances of each
tile from its goal position.
• Evaluation Function (f(n)): The total estimated cost for
reaching the goal from the initial state. It’s computed as: f(n)

112
Branch and Bound Method
Branch and Bound systematically explores the possible
moves (branches) and evaluates the cost function. It keeps
track of the best solution found so far (bound) and prunes
branches that are guaranteed to lead to worse solutions than
the current bound. The algorithm uses a priority queue to
explore the most promising states first (those with the lowest
evaluation function value).
Steps:
Start with the initial state (root node) and add it to a priority
queue.
Explore the current node:

113
o Move the blank tile in the possible directions (up, down,
left, right) to generate new states.
o For each new state, compute the evaluation function
f(n)=g(n)+h(n).
o Add each new state to the priority queue if it hasn't been
visited before or if it's a better path than previously
recorded.
Branching: Continue exploring the next state with the
lowest f(n).
Bounding: If a state has a higher f(n)f(n)f(n) than the
current best solution (bound), discard it (prune the
branch).

114
Goal: The algorithm terminates when the goal state is
reached, and the path is reconstructed.
Example of a Game Tree
Here is a simple representation of a 15-puzzle game tree using
Branch and Bound.
(Handled in the class room)
Advantages of Branch and Bound
• It explores the solution space more efficiently than brute-force
methods by avoiding redundant or unnecessary paths.
• The heuristic helps guide the search toward the goal faster.
Conclusion
Branch and Bound is a powerful technique for solving the 15-
puzzle problem. By using the heuristic function to prioritize
115
promising states and pruning unpromising ones, it minimizes
the number of moves required to reach the solution. The game
tree represents the exploration of possible states, and the
algorithm continues until the goal state is achieved.
Time complexity of Branch and Bound 15 Puzzle
The time complexity of the Branch and Bound algorithm for
solving the 15-puzzle problem depends heavily on the
efficiency of the heuristic function used, the branching factor
of the puzzle, and how many nodes are pruned during the
search. However, in general, the time complexity of solving
the 15-puzzle is quite high because it involves exploring a
large search space.

116
Here’s a breakdown of factors that influence the time
complexity of the 15-puzzle problem using the Branch and
Bound approach:
. State Space Size
• The 15-puzzle has a search space of 16!/2≈1013 possible
configurations. This is because the puzzle has 16 positions (15
numbered tiles and one empty space), and only half of these
configurations are solvable, leading to the division by 2.
2. Branching Factor
• The branching factor of the puzzle depends on the position
of the empty space. On average, each move has 2–4 possible
directions (up, down, left, right) where the blank space can
move.

117
• On average, the branching factor is around 3 since the empty
space usually has three possible moves in most configurations
(except near edges or corners).
3. Depth of the Optimal Solution
• The depth of the optimal solution varies but is typically
around 80–100 moves for most random configurations of the
15-puzzle.
4. Heuristic Impact
• Good heuristics like the Manhattan Distance significantly
reduce the search space, but the problem remains difficult due
to the large state space.
• The quality of the heuristic directly impacts the number of
nodes that need to be explored. A more accurate heuristic will

118
guide the search faster toward the solution and reduce the
number of nodes expanded.
5. Worst-Case Time Complexity
In the worst case, the algorithm would need to explore a large
number of nodes before finding the optimal solution. Without
effective pruning, the time complexity would be related to the
number of nodes explored, which can be expressed as:

• b is the branching factor (approximately 3 for the 15-


puzzle),
• d is the depth of the solution (around 80–100 moves for
typical puzzles).
119
This exponential complexity in the worst case can be very
large:
6. Average Case
In practice, the Branch and Bound algorithm with a good
heuristic (like Manhattan Distance) usually explores
thousands to millions of nodes rather than the full search
space, depending on the specific puzzle configuration and
how close the heuristic is to the actual cost.
For instance, on average:
• The search space might be reduced to millions of states or
fewer due to the pruning, making the average time complexity
closer to O(n log n) in well-behaved cases, though this is still
quite high for large puzzles.

120
Conclusion
• Worst-case time complexity: O(3d) where d is the depth of
the optimal solution .
• Average-case time complexity: Significantly reduced with
good heuristics, but still exponential in nature, often requiring
the exploration of thousands to millions of nodes.
Thus, while the Branch and Bound algorithm can find an
optimal solution to the 15-puzzle problem, its time
complexity can be very large without an effective heuristic to
prune the search tree.

121
The time complexity formula O(bd) in search algorithms like
Branch and Bound is derived based on the structure of the
search tree and the depth at which the solution is found. Let's
break down the derivation step-by-step:
1. Understanding the Search Tree
In a search problem like the 15-puzzle:
• Root node: The initial configuration of the puzzle.
• Children of a node: Each possible move from the current
configuration generates a new node in the search tree.
• Depth (d): The number of moves required to reach the
solution from the root node.

122
• Branching factor (b): The average number of child nodes
generated from any given node, which corresponds to the
number of possible moves from a given configuration. For the
15-puzzle, the branching factor b is approximately 3 because
the blank tile can move in 2–4 directions on average.
2. Nodes at Each Level of the Tree
At each level of the search tree, the number of nodes increases
exponentially, depending on the branching factor b.
• Level 0 (Root Level): There is only 1 node (the initial state
of the puzzle).
• Level 1: Each move generates b new nodes from the root.
Therefore, at level 1, there are b nodes.

123
• Level 2: Each of the b nodes at level 1 can generate b new
nodes. Therefore, at level 2, there are b2 nodes.
• Level 3: Each of the b2 nodes at level 2 can generate b new
nodes. Therefore, at level 3, there are b3 nodes.
Continuing this process, at level d (where d is the depth at
which the solution is found), the total number of nodes is bd.
3. Total Number of Nodes Explored
To calculate the total number of nodes explored by the
algorithm, we sum up the number of nodes at all levels from
0 to ddd (where ddd is the depth of the solution).
• Nodes at level 0: 111 (root node).
• Nodes at level 1: b.
• Nodes at level 2: b2.
124
• Nodes at level 3: b3.
• ...
• Nodes at level d: bd.
The total number of nodes explored, N, can be expressed as
the sum of the nodes at each level:

125
N-Queens problem
The N-Queens problem is a classic combinatorial problem in
computer science and mathematics, where the goal is to place
N queens on an N×N chessboard in such a way that no two
queens threaten each other.
This means that no two queens can share the same row,
column, or diagonal.
Problem Statement:
Place N queens on an N×N chessboard so that:
No two queens are in the same row.
No two queens are in the same column.

126
No two queens are on the same diagonal (both main diagonal
and anti-diagonal).
Backtracking Approach:
The most common way to solve the N-Queens problem is by
using backtracking. Backtracking systematically searches
for solutions by placing queens one by one in each column
and checking if the placement is valid.
Algorithm Steps:
Start with the first column and try to place a queen in any
row of that column.

127
Check for conflicts:
If the queen placement is safe (i.e., no other queens threaten it),
move to the next column and try placing another queen.
If placing a queen in the current row and column leads to a
conflict, try the next row in the same column.
Backtrack:
o If no valid placement can be found in the current column
(i.e., no row works), backtrack to the previous column and
try a different row for the previously placed queen.
Continue this process until either:
o All queens are placed safely (a solution is found).
o All possibilities are exhausted (if no solution exists for
that NNN).
128
Time Complexity of the N-Queens Problem Using Backtracking
To analyze the time complexity of the N-Queens problem using backtracking,
we need to break down how the algorithm systematically explores all possible
configurations of placing N queens on an N×N chessboard.

Basic Idea of the N-Queens Problem:


• You need to place N queens on an N×NN \times NN×N
chessboard in such a way that no two queens can attack each
other (i.e., no two queens can be placed in the same row,
column, or diagonal).
• The backtracking algorithm attempts to place queens column
by column, row by row, and checks if each placement is valid.
If a placement is valid, it moves on to the next column, and if
not, it backtracks to try a different row.
129
Basic Idea of the N-Queens Problem:
• You need to place N queens on an N×N chessboard in such a
way that no two queens can attack each other (i.e., no two
queens can be placed in the same row, column, or diagonal).
• The backtracking algorithm attempts to place queens column
by column, row by row, and checks if each placement is valid.
If a placement is valid, it moves on to the next column, and if
not, it backtracks to try a different row.
• The worst-case time complexity of solving the N-Queens
problem using backtracking is O(N!), where:
• O(N!)=N×(N−1)×(N−2)×⋯×1

130
Graph Coloring Problem Using Backtracking
The Graph Coloring Problem is a classic combinatorial
optimization problem.
The goal is to assign colors to the vertices of a graph such that:
1. No two adjacent vertices share the same color.
2. The number of colors used is minimized.
This problem can be solved using backtracking, where we try
assigning colors to the vertices one by one and check if the current
assignment is valid.
Problem Definition
Given a graph and m colors, determine if it is possible to color
the vertices using at most mmm colors such that no two adjacent
vertices share the same color.

131
Steps of the Backtracking Algorithm:
Start with the first vertex and assign a color.
Check if the assignment is valid, i.e., the assigned color is not
the same as the color of any adjacent vertex.
1. If the assignment is valid, recur to color the next vertex.
2. If the current color assignment does not lead to a solution,
backtrack by removing the color assignment and try another
color.
3. Repeat the process until either all vertices are colored, or
it is determined that no valid coloring exists.

Time Complexity
The time complexity of the backtracking solution depends on the
number of vertices V and the number of colors mmm. In the worst
case, for each vertex, the algorithm tries all mmm colors. If there

132
are V vertices, the total number of colorings that the algorithm
needs to try is:

133
Pattern in the context of string matching algorithms refers to the
substring or sequence of characters that you are searching for
within a larger string, known as the text.
The goal of string matching algorithms is to determine whether
the pattern exists in the text and, if so, at which positions it occurs.
Example:
Let’s consider the following:
• Text T="this is a simple example "
• Pattern P="simple"
In this case:
• The pattern is "simple", which is a sequence of characters
you are looking for in the text.
• The text is "this is a simple example", which is the larger
sequence where you search for the pattern.
134
String Matching:
Using a string matching algorithm (such as the Naive String
Matching Algorithm or Knuth-Morris-Pratt), you would try to
find the occurrences of the pattern "simple" in the text.
Visualization:
• Text (T): "this is a simple example"
• Pattern (P): "simple"
Here, the pattern "simple" occurs at position 10 in the text.
Applications of Patterns:
• Text searching: Finding words or substrings in documents.
• DNA sequence analysis: Searching for genetic sequences
within a longer DNA string.
• Log file analysis: Finding specific error codes or messages in
large logs.
• Spam filtering: Identifying patterns that indicate spam
content in emails.
135
The pattern is essentially the "target" you are trying to locate
within a larger body of data (the text).
The Naive String Matching Algorithm is a straightforward
approach to the string-matching problem, where the goal is to find
all occurrences of a pattern P within a larger text T.
It is simple but can be inefficient in cases of large texts or patterns
due to its brute-force nature. Here's a step-by-step explanation of
how it works:
Steps of the Naive String Matching Algorithm:
1. Initialization:
o Let n be the length of the text T and m be the length of the
pattern P.
o Loop through all possible starting positions in T where the
pattern P could match.

2. Pattern Comparison:
136
o For each starting position i in T, check if the substring of
T starting at position i and extending for m characters
matches the pattern P. This means you compare T[i] to
P[0], T[i+1] to P[1], and so on until either all characters
match or a mismatch is found.
3. Match:
o If the entire substring T[i:i+m] matches the pattern P,
record the starting index i as a match.
4. Shift and Repeat:
o Move to the next starting position in T (increment i) and
repeat the comparison process until you have checked all
possible positions (i.e., until i + m > n).

137
Time Complexity:
• The worst-case time complexity is O((n - m + 1) * m), where
n is the length of the text, and m is the length of the pattern.
This occurs when the algorithm has to check every character
in the text for every possible starting position.

138
Knuth-Morris-Pratt (KMP) String Matching Algorithm
The Knuth-Morris-Pratt (KMP) String Matching Algorithm
is an efficient algorithm used for finding occurrences of a pattern
within a text. Unlike the Naive String Matching Algorithm, which
may have to backtrack and start comparisons again after
mismatches, the KMP algorithm avoids unnecessary
comparisons by preprocessing the pattern.
The key idea behind KMP is to preprocess the pattern to create a
Longest Prefix Suffix (LPS) array (also called a failure

139
function) that helps to determine how much of the pattern can be
reused after a mismatch.
Time Complexity:
• Preprocessing the LPS array: O(m), where m is the length
of the pattern.
• Text scanning: O(n), where n is the length of the text.
Thus, the overall time complexity of KMP is O(n + m), which is
much more efficient than the O(n×m) time complexity of the
Naive String Matching Algorithm.

140
Advantages of KMP:
• Avoids rechecking parts of the text that have already been
processed.
• Efficient for large texts and patterns with repetitive structures.

141
Rabin-Karp Algorithm
The Rabin-Karp algorithm is an efficient string searching
algorithm used to find a pattern (substring) within a larger text.
Its core strength lies in using hashing to compare strings, which
allows it to search for multiple patterns in the text simultaneously.
Key Concepts:
1. Hashing:
o The Rabin-Karp algorithm converts a string (both the
pattern and the substrings of the text) into a hash value.
Instead of comparing strings character by character, it
142
compares hash values. This reduces the comparison
overhead.
2. Sliding Window:
o The algorithm uses a sliding window approach to shift
across the text and calculate the hash for each substring of
the same length as the pattern. If the hash of the substring
matches the hash of the pattern, the actual characters are
compared to confirm the match.
Steps of the Algorithm:
1. Calculate the Hash of the Pattern:
o Compute the hash for the pattern you are searching for.
143
2. Calculate the Hash of the First Substring:
o Compute the hash for the first substring of the text (of the
same length as the pattern).
3. Compare the Hashes:
o If the hash of the current substring matches the hash of the
pattern, perform a direct comparison of the characters to
confirm the match.
o If the hash doesn't match, slide the window one character
to the right, and recalculate the hash of the new substring
using a rolling hash function.
4. Repeat:
144
o Continue this process across the entire text.
Efficiency:
• Best Case: O(n+m) where n is the length of the text and m is
the length of the pattern. This occurs when hash values allow
quick rejection of non-matching substrings.
• Worst Case: O(n x m) when all hash values match but the
actual strings do not, forcing the algorithm to compare
characters directly.
Advantages:
• Efficient for multiple pattern matching in a single pass
through the text.
145
• Works well for large texts with many patterns, as the hashing
reduces unnecessary comparisons.
Limitations:
• Hash collisions can lead to extra comparisons, which may
affect the worst-case performance.
Example:
Suppose you want to search for the pattern "abc" in the text
"abcdeabc". Rabin-Karp would compute the hash of "abc" and
then compare it with the hash of substrings "abc", "bcd", "cde",
and "abc" again using the sliding window method.

146
The algorithm shines when searching for multiple patterns
simultaneously because it uses a hash-based approach that can
accommodate this more efficiently than methods like brute force
or the Knuth-Morris-Pratt algorithm.

147

You might also like