Unit 5 Daa
Unit 5 Daa
Unit: 5
• Selected Topics
• String Matching
• Theory of NP-completeness
• Approximation algorithms
• Randomized algorithms.
• Qualification:
• First, we will start with the internet which is very much important for our daily life, and we cannot
even imagine our life without the internet, and it is the outcome of clever and creative algorithms.
Numerous sites on the internet can operate and falsify this huge number of data only with the help
of these algorithms.
• The everyday electronic commerce activities are massively subject to our data, for example, credit
or debit card numbers, passwords, OTPs, and many more. The center technologies used
incorporate public-key cryptocurrency and digital signatures which depend on mathematical
algorithms.
• Even an application that doesn't need algorithm content at the application level depends vigorously
on the algorithm as the application relies upon hardware, GUI, networking, or object direction and
all of these create a substantial use of algorithms.
• There are some other vital use cases where the algorithm has been used such as if we watch any
video on YouTube then next time, we will get related-type advice as recommended videos for us.
CO2 To apply different problem-solving approaches for advanced data structures Knowledge, analysis
And apply
CO3 To apply divide and conquer method for solving merge sort, quick sort, matrix Knowledge, analysis and Apply
multiplication and Greedy Algorithm for solving different Graph Problem.
CO4 To analyze and apply different optimization techniques like dynamic programming, Knowledge, Analysis And Apply
backtracking and Branch & Bound to solve the complex problems
CO5 To understand the advanced concepts like NP Completeness and Fast Fourier Knowledge, Analysis and Apply
Transform, to analyze and apply String Matching, Approximation and Randomized
Algorithms to solve the complex problems
The engineer and Apply reasoning informed by the contextual knowledge to assess
society
societal, health, safety, legal and cultural issues and the
consequent responsibilities relevant to the professional engineering
practice.
Environment and Understand the impact of the professional engineering solutions in
sustainability
societal and environmental contexts, and demonstrate the
knowledge of, and need for sustainable development.
Ethics Apply ethical principles and commit to professional ethics and
responsibilities and norms of the engineering practice.
CO.K PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
ACSE0501.1 3 3 3 3 2 - - - 2 2 - 3
ACSE0501.2 3 3 3 3 2 2 - 1 1 1 - 3
ACSE0501.3 3 3 2 3 3 2 - 2 1 1 2 3
ACSE0501.4 3 3 3 3 2 2 - 2 2 1 3 3
ACSE0501.5 2 2 2 2 2 2 - 2 1 1 1 2
Average 2.8 2.8 2.6 2.8 2.2 1.6 - 1.8 1.4 1.2 1.2 2.8
SECTION B
2. Attempt any three of the following: 3 x 10 = 30
SECTION C
3. Attempt any one part of the following: 1 x 10 = 10
Q. No. Question Marks CO
1 10
2 10
• Prerequisite
• Basic concept of c programming language.
• Concept of stack, queue and link list.
• Recap
• Flow Chart
• Algorithm
• Algebraic Structures
• Representation
Prerequisite
• Algorithms
• Finite automata
Recap
• Algebraic Computation
• String matching algorithms have greatly influenced computer science and play an essential role
in various real-world problems. It helps in performing time-efficient tasks in multiple domains.
These algorithms are useful in the case of searching a string within another string. String
matching is also used in the Database schema, Network systems.
• Let us look at a few string-matching algorithms before proceeding to their applications in real
world. String Matching Algorithms can broadly be classified into two types of algorithms –
• Given a text array, T [1.....n], of an integer character and a pattern array, P [1......m], of m
characters. The problems are to find r s, called valid shift where 0 ≤ s < n-m and
T [s+1......s+m] = P [1......m].
• In other words, to find even if P in T, i.e., where P is a substring of T.
• The item of P and T are character drawn from some finite alphabet such as {0, 1} or {A,
B .....Z, a, b..... z}.
• Given a string T [1......n], the substrings are represented as T [i......j] for some 0≤i ≤ j≤n-1,
the string formed by the characters in T from index i to index j, inclusive. This process that
a string is a substring of itself (take i = 0 and j =m).
• The proper substring of string T [1......n] is T [1......j] for some 0<i ≤ j≤n-1. That is, we
must have either i>0 or j < m-1.
• Using these descriptions, we can say given any string T [1......n], the substrings are
T [i.....j] = T [i] T [i +1] T [i+2]......T [j] for some 0≤i ≤ j≤n-1.
And proper substrings are
T [i.....j] = T [i] T [i +1] T [i+2]......T [j] for some 0≤i ≤ j≤n-1.
• Note: If i>j, then T [i.....j] is equal to the empty string or null, which has length zero.
• The Rabin-Karp-Algorithm
• Finite Automata
The naïve approach tests all the possible placement of Pattern P [1.......m] relative to text T
[1......n]. We try shift s = 0, 1.......n-m, successively and for each shift s. Compare T
[s+1.......s+m] to P [1......m].
The naïve algorithm finds all valid shifts using a loop that checks the condition P [1.......m]
= T [s+1.......s+m] for each of the n - m +1 possible value of s.
NAIVE-STRING-MATCHER (T, P)
• 1. n ← length [T]
• 2. m ← length [P]
• 3. for s ← 0 to n -m
• 4. do if P [1.....m] = T [s + 1....s + m]
• 5. then print "Pattern occurs with shift" s
Analysis: This for loop from 3 to 5 executes for n-m + 1(we need at least m characters at the
end) times and in iteration we are doing m comparisons. So the total complexity is O (n-m+1)
• The Rabin-Karp string matching algorithm calculates a hash value for the pattern, as well as for
each M-character subsequences of text to be compared.
• If the hash values are unequal, the algorithm will determine the hash value for next M-
character sequence.
• If the hash values are equal, the algorithm will analyze the pattern and the M-character sequence.
• In this way, there is only one comparison per text subsequence, and character matching is only
required when the hash values match.
• For string matching, working module q = 11, how many spurious hits does the Rabin-Karp
matcher encounters in Text T = 31415926535.......
T = 31415926535.......
P = 26
Here T.Length =11 so Q = 11
And P mod Q = 26 mod 11 = 4
Now find the exact match of P mod Q...
Rabin- Karp-
Algorithm
•Solution:
Rabin- Karp-
Algorithm
Complexity:
• The running time of RABIN-KARP-MATCHER in the worst-case scenario O ((n-m+1) m but it has a good
average case running time. If the expected number of strong shifts is small O (1) and prime q is chosen to be
quite large, then the Rabin-Karp algorithm can be expected to run in time O (n+m) plus the time to require to
process spurious hits.
Anshul Varshney
12/19/2024 DAA Unit V 36
String Matching (CO5)
• The finite automaton starts in state q0 and reads the characters of its input string one at a time.
• If the automaton is in state q and reads input character a, it moves from state q to state δ (q, a).
• Whenever its current state q is a member of A, the machine M has accepted the string read so
far. An input that is not allowed is rejected.
Anshul Varshney DAA
12/19/2024 37
Unit V
String Matching (CO5)
• We have discussed the Naive pattern-searching algorithm in the previous. The worst-case
complexity of the Naive algorithm is O(m(n-m+1)). The time complexity of the KMP
algorithm is O(n+m) in the worst case.
• Knuth-Morris and Pratt introduce a linear time algorithm for the string-matching problem.
• A matching time of O (n) is achieved by avoiding comparison with an element of 'S' that
have previously been involved in comparison with some element of the pattern 'p' to be
matched. i.e., backtracking on the string 'S' never occurs.
1.The Prefix Function (Π): The Prefix Function, Π for a pattern encapsulates knowledge
about how the pattern matches against the shift of itself. This information can be used to
avoid a useless shift of the pattern 'p.' In other words, this enables avoiding backtracking of
the string 'S.’
2.The KMP Matcher: With string 'S,' pattern 'p' and prefix function 'Π' as inputs, find the
occurrence of 'p' in 'S' and returns the number of shifts of 'p' after which occurrences are
found.
In the above pseudo code for calculating the prefix function, the for loop from step 4 to step 10 runs 'm'
times. Step1 to Step3 take
12/19/2024 constant time. Hence the running time ofUnit
Anshul Varshney
computing
V
DAA
prefix function is O (m).
40
String Matching (CO5)
Solution:
Initially: m = length [p] = 7 Π [1] = 0 k = 0
• The B-M algorithm takes a 'backward' approach: the pattern string (P) is aligned with the
start of the text string (T), and then compares the characters of a pattern from right to left,
beginning with rightmost character.
• If a character is compared that is not within the pattern, no match can be found by analysing
any further aspects at this position so the pattern can be changed entirely past the
mismatching character.
• The two strategies are called heuristics of B - M as they are used to reduce the search. They
are
• Bad Character Heuristics
• Good Suffix Heuristics
• Suppose there is a character in a text in which does not occur in a pattern at all. When a
mismatch happens at this character (called as bad character), the whole pattern can be
changed, begin matching form substring next to this 'bad character.’
• On the other hand, it might be that a bad character is present in the pattern, in this case,
align the nature of the pattern with a bad character in the text.
Prerequisite
• Different Problems like graph colouring
• Travelling Salesman Problem
Recap
• String Matching Algorithm
• Algorithms for solving real world problems
• A problem is in the class NPC if it is in NP and is as hard as any problem in NP. A problem
is NP-hard if all problems in NP are polynomial time reducible to it, even though it may not
be in NP itself.
• If a polynomial time algorithm exists for any of these problems, all problems in NP would
be polynomial time solvable. These problems are called NP-complete. The phenomenon
of NP-completeness is important for both theoretical and practical reasons.
12/19/2024 Anshul Varshney DAA 49
Unit V
NP Completeness(CO5)
If a language satisfies the second property, but not necessarily the first one, the language B is
known as NP-Hard. Informally, a search problem B is NP-Hard if there exists some NP-
Complete problem A that Turing reduces to B.
The problem in NP-Hard cannot be solved in polynomial time, until P = NP. If a problem is
proved to be NPC, there is no need to waste time on trying to find an efficient algorithm for it.
Instead, we can focus on design approximation algorithm.
• NP-Complete Problems
Following are some NP-Complete problems, for which no polynomial time algorithm is
known.
• Determining whether a graph has a Hamiltonian cycle
• Determining whether a Boolean formula is satisfiable, etc.
• NP-Hard Problems
The following problems are NP-Hard
• The circuit-satisfiability problem
• Set Cover
• Vertex Cover
• Travelling Salesman Problem
12/19/2024 Anshul Varshney DAA 51
Unit V
NP Completeness(CO5)
Circuit Satisfiability
According to given decision-based NP problem, we can design the CIRCUIT and verify a
given mentioned output also within the P time. The CIRCUIT is provided below:-
Although we can design a circuit and verified the mentioned output within Polynomial time
but remember we can never predict the number of gates which produces the high output
against the set of inputs/high inputs within a polynomial time. So we verified the production
and conversion had been done within polynomial time. So it is NPC.
• There is no polynomial time solution available for this problem as the problem is a known
NP-Hard problem
• Example:
• U = {1,2,3,4,5}, S = {S1,S2,S3}
•
• S1 = {4,1,3}, Cost(S1) = 5
• S2 = {2,5}, Cost(S2) = 10
• S3 = {1,4,3,2}, Cost(S3) = 3
•
• Output: Minimum cost of set cover is 13 and set cover is {S2, S3}
• There are two possible set covers {S1, S2} with cost 15 and {S2, S3} with cost 13
The minimum vertex cover problem is the optimization problem of finding a smallest vertex
cover in a given graph. The vertex cover problem is an NP-complete problem.
A vertex-cover of an undirected graph G = (V, E) is a subset of vertices V' ⊆ V such that if
edge (u, v) is an edge of G, then either u in V or v in V' or both.
Example:
The set of edges of the given graph is −
{(1,6),(1,2),(1,4),(2,3),(2,4),(6,7),(4,7),(7,8),(3,8),(3,5),(8,5)}
• The salesman has to visit each one of the cities starting from a certain one and
returning to the same city.
• The challenge of the problem is that the traveling salesman wants to minimize the total
length of the trip
• Suppose the cities are x1 x2..... xn where cost cij denotes the cost of travelling from
city xi to xj. The travelling salesperson problem is to find a route starting and ending at
x1 that will take in all cities with the minimum cost.
12/19/2024 Anshul Varshney DAA 59
Unit V
NP Completeness(CO5)
Proof
1. To prove TSP is NP-Complete, first we have to prove that TSP belongs to NP.
• In TSP, we find a tour and check that the tour contains each vertex once.
• Now, suppose that a Hamiltonian cycle h exists in G. It is clear that the cost of each
edge in h is 0 in G' as each edge belongs to E. Therefore, h has a cost of 0 in G'. Thus,
if graph G has a Hamiltonian cycle, then graph G' has a tour of 0 cost.
• Conversely, we assume that G' has a tour h' of cost at most 0. The cost of edges
in E' are 0 and 1 by definition. Hence, each edge must have a cost of 0 as the cost
of h' is 0. We therefore conclude that h' contains only edges in E.
• We have thus proven that G has a Hamiltonian cycle, if and only if G' has a tour of cost
at most 0. TSP is NP-complete.
A simple example of an approximation algorithm is one for the Minimum Vertex Cover
problem, where the goal is to choose the smallest set of vertices such that every edge in
the input graph contains at least one chosen vertex.
An algorithm that uses random numbers to decide what to do next anywhere in its logic is
called Randomized Algorithm.. For example, in Randomized Quick Sort, we use
random number to pick the next pivot (or we randomly shuffle the array). And in
Karger's algorithm, we randomly pick an edge..
For example:
Select a random number from stream, with O(1) space.
Birthday Paradox
Linearity of Expectation
Random Number generator in arbitrary probability distribution fashion
4. Explain Rabin Karp algorithm. For the text 2359023141526739921 and for the pattern
31415 and working modulo q=13 how many valid match and spurious hits does the Rabin
matcher encounter. [CO5]
5. Which of the following is incorrect for the given phrase :’solvable by non-
deterministic algorithms in polynomial time’
a) NP Problems
b) During control flow, non-deterministic algorithm may have more than one choice
c) If the choices that nondeterministic algorithm makes are correct, the amount of
time it takes is bounded by polynomial time.
d) None of the mentioned
7. Which of the following does not belong to the closure properties of NP class?
a) Union
b) Concatenation
c) Reversal
d) Complement
Q.2 Problems that can be solved in polynomial time are known as____________.
a) intractable b) tractable
c) decision d) complete
Q.3 _________ is the class of decision problems that can be solved by non-deterministic
polynomial algorithms.
a) NP b) P
c) Hard d) Complete
Q.4 Problems that cannot be solved by any algorithm are called _________.
a) tractable problems b) intractable problems
c) undecidable problems d) decidable problems
Q.8 The choice of polynomial class has led to the development of an extensive theory called
________
a) computational complexity b) time complexity
c) problem complexity d) decision complexity
Q.9 A randomized algorithm uses random bits as input in order to achieve a _____________
good performance over all possible choice of random bits.
a) worst case b) best case
c) average case d) none of the mentioned
This unit gives the insights of different classes of problems , P Class, NP class, NPC. The
different string-matching algorithms have been discussed. The Approximation algorithm
and Randomized algorithm have also been explained.
Thank You
Anshul Varshney
12/19/2024 DAA Unit V 94