0% found this document useful (0 votes)
5 views60 pages

Unit 7

Data structure and Algorithm part 7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views60 pages

Unit 7

Data structure and Algorithm part 7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

Unit 7

String Matching & NP Completeness


Dr. Meghana Harsh Ghogare
String matching & NP Completeness

• Introduction to String Matching


• Naive String Matching
• Rabin-Karp Algorithm
• Kruth-Morris-Pratt Algorithm
• String Matching using Finite Automata
• Introduction to NP Completeness
• P Class Problems
• NP Class Problems
• Hamiltonial Cycle
Introduction to String Matching

•String Matching:
Finding a smaller string (pattern) inside a bigger one (text).
•Example:
Looking for "cat" in the sentence "I have a cat."
•Naive Method:
Check every letter one by one to see if the pattern
matches.
•Faster Methods:
•KMP: Skips unnecessary checks.
•Rabin-Karp: Uses numbers (hashing) to speed up matching.
•Exact Match:
Pattern must match exactly with the text.
•Approximate Match:
Allows small differences or mistakes.
•Uses:
•Search engines.
•Finding genes in DNA.
•Checking passwords.
•Challenge:
Can be slow with large texts, so we use smarter methods.
Naive String Matching

• Is a brute forced approach


• It checks every possible position
• compares the characters one by one.
• How it works:
1.Start from the first character of the text.
2.Check if the next characters match the pattern.
3.Move to the next position in the text and repeat the process until the end
of the text.
4.If a match is found, return the position of the match.
5.If no match is found, report that the pattern doesn’t exist in the text.
Naive String Matching Algorithm (Steps)
•Input: A text T of length n, and a pattern P of length m.

•Loop through the text: For each starting position i in the text from 0 to n - m:

•Compare the characters from position i in the text with the pattern.

•If all characters of the pattern match the corresponding characters in the text,
• report a match.

•Return the result: Report the positions where matches are found.

•Time Complexity: Worst-case: 𝑂(𝑛×𝑚)


• n is the length of the text and m is the length of the pattern.
def naive_string_matching(text,
pattern): # If complete match is found, print
n = len(text) #9 the index
m = len(pattern) #3 if j == m:
print(f"Pattern found at index
# Loop through the text {i}")
for i in range(n - m + 1):#9-6=3
# Check if the substring from text # Example usage
matches the pattern text = "ABCABCABC"
j=0 pattern = "ABC"
while j < m and text[i + j] == naive_string_matching(text, pattern)
pattern[j]:
j += 1
Rabin-Karp Algorithm
• Instead of comparing characters one by one, Rabin-Karp computes a
hash value for the pattern
• If the hash values match, then it performs a character-by-character
comparison to confirm the match
• If the hash values don't match, it skips the detailed comparison and
slides the pattern to the next position.
Step-by-Step Process
Hash Calculation:
Let's assume a simple hash function where we assign a numerical value
to each letter (e.g., a = 1, b = 2, c = 3, etc.), and then sum these values
for the pattern:
Hash of "abc" = 1 + 2 + 3 = 6.
Sliding through the text: Char by Char and try to match
Continue sliding:

Spurious hits (fake hits) Problem


Kruth-Morris-Pratt Algorithm
KMP's Smart Way:

• The KMP Algorithm is a faster way of searching because it avoids


repeating the checks by remembering where things matched.

• Example:
• Text: "abcxabcdabxabcdabcdabcy
• "Pattern: "abcdabcy“

• Pattern: a b c d a b c y
• LPS: 0 0 0 0 1 2 3 0
Core idea of KMP
• When you compare the pattern with the text and find a mismatch,
instead of going back to the next character in the text, KMP uses
information from the pattern to "jump" to a better position, saving
time. It does this using a partial match table (also called the LPS
array).
Steps of KMP:
1.Build the LPS (Longest Prefix Suffix) array: This table helps us skip
unnecessary comparisons.
2.Use the LPS array to search: When a mismatch occurs, the LPS tells
us how much to shift the pattern without missing any matches.
• Pattern: A B A B C A B
• LPS: 0 0 1 2 0 1 2
A B A B A B C A B A B A A B A B
• Text: A B A B MS

A B A B C A B

A B A m

• Where ever there is a mismatch LOOK INTO LPS BEFORE C we have B and
value of B =2, so start from 2 in text(Skip A,B
)
• ABCABCDBABABCDABC
• ABCDABD
• ABABAABA
• ABAB
• ABCDABCABC
• ABCABC
Kruth-Morris-Pratt Algorithm
String Matching using Finite Automata
Introduction to NP Completeness
Nondeterministic Polynomial time.
• Nondeterministic Polynomial time.
• helps classify problems based on how hard they are to solve.
• A problem is NP-complete if it meets two conditions:
1.It is in NP: This means that a proposed solution to the problem can be
verified as correct or incorrect in polynomial time (quickly, relative to the
input size). Eg: Merge sort, Maximum, Finding GCD etc
• It is NP-hard: This means that every problem in NP can be transformed (or
reduced) into this problem in polynomial time. If we could find a polynomial-
time solution to an NP-complete problem, we could solve all NP problems
quickly.
For example:
• Set: {2, 3, 7, 8, 10}
• Target: 11
• P Class Problems
1. Definition: Problems that can be solved quickly (in polynomial time) using a deterministic algorithm.
2. Easy to Solve: Solutions can be found efficiently as the problem size grows.
3. Predictable Time: The time it takes to solve these problems can be expressed as a polynomial function of the
input size.
• Examples
1. Sorting:
1. Example: Merge Sort
2. Description: Organizing a list of numbers in ascending or descending order.
2. Finding the Greatest Common Divisor (GCD):
1. Example: Using the Euclidean algorithm.
2. Description: Determining the largest number that divides two numbers without leaving a remainder.
3. Graph Traversal:
1. Example: Breadth-First Search (BFS).
2. Description: Exploring all nodes and edges of a graph to find a path or analyze its structure.
4. Searching:
1. Example: Linear Search or Binary Search.
2. Description: Finding a specific value in a list of numbers.
5. Minimum Spanning Tree:
1. Example: Kruskal’s or Prim’s algorithm.
2. Description: Connecting all points in a graph with the least total edge weight.
NP Class Problems
• NP Class Problems
1. Definition: Problems for which a solution can be verified quickly (in polynomial time) by a deterministic algorithm, but finding that
solution may take a long time.
2. Hard to Solve: Finding the actual solution can be very challenging and may require a lot of time or guessing.
3. Easy to Verify: If someone gives you a solution, you can check if it’s correct quickly.
• Examples
1. Sudoku:
1. Example: Solving a Sudoku puzzle.
2. Description: Finding a valid arrangement of numbers in a 9x9 grid that follows Sudoku rules is difficult, but checking a completed grid is easy.
2. Boolean Satisfiability Problem (SAT):
1. Example: Determining if there is a way to assign true/false values to variables in a logical expression so that the expression is true.
2. Description: Checking if a proposed assignment of values satisfies the expression is quick, but finding that assignment can be hard.
3. Hamiltonian Path:
1. Example: Finding a path in a graph that visits each vertex exactly once.
2. Description: Verifying if a given path is Hamiltonian (visits every vertex once) is easy, but finding such a path is difficult.
4. Graph Coloring:
1. Example: Assigning colors to the vertices of a graph such that no two adjacent vertices share the same color.
2. Description: Checking if a coloring is valid is easy, but finding a valid coloring can be complex.
5. Knapsack Problem:
1. Example: Selecting items with given weights and values to maximize value without exceeding a weight limit.
2. Description: Given a selection of items, checking if the total weight is within the limit and the total value is maximized is easy, but finding the
best selection is challenging.
• What is an NP problem?
• Answer: An NP (Nondeterministic Polynomial time) problem is a class
of decision problems for which a proposed solution can be verified as
correct or incorrect in polynomial time. This means that if we are
given a potential solution, we can check if it’s correct in a reasonable
amount of time. However, finding the solution from scratch may not
necessarily be done in polynomial time.
• 2. Define P vs NP problem.
• Answer: The P vs NP problem is a fundamental unsolved question in
computer science. It asks whether every problem for which a solution
can be verified in polynomial time (NP) can also be solved in
polynomial time (P). Formally, it questions if P=NPP = NPP=NP. If P
equals NP, it would mean that any problem that we can verify quickly
can also be solved quickly, which would have profound implications in
fields like cryptography, optimization, and more.
• What is NP-Complete?
• Answer: An NP-Complete problem is a problem in NP that is as hard
as any other problem in NP, meaning any NP problem can be reduced
to it in polynomial time. If we find a polynomial-time solution for any
NP-Complete problem, it would imply that P=NPP = NPP=NP because
all problems in NP could be solved in polynomial time by
transformation.
• 4. What is a reduction in NP problems?
• Answer: Reduction is a method used to show that one problem is at
least as hard as another. In NP problems, if we can transform one
problem to another in polynomial time, we say that the first problem
is reducible to the second. If we can reduce an NP problem to another
problem, and that problem is solved in polynomial time, then the
original NP problem can also be solved in polynomial time.
• Give an example of an NP-Complete problem.
• Answer: One example of an NP-Complete problem is the Traveling
Salesman Problem (TSP). In TSP, given a list of cities and the distances
between each pair, the problem is to determine the shortest possible route
that visits each city exactly once and returns to the origin city. This problem
is NP-Complete because checking a given solution (a specific route and its
total distance) can be done in polynomial time, but finding the optimal
route is very challenging.
• 6. Explain the difference between NP, NP-Complete, and NP-Hard.
• Answer:
• NP (Nondeterministic Polynomial time): The set of decision problems for which a
solution can be verified in polynomial time.
• NP-Complete: A subset of NP that represents the hardest problems in NP, meaning if
we can solve an NP-Complete problem in polynomial time, then all NP problems can
be solved in polynomial time.
• NP-Hard: Problems that are at least as hard as NP-Complete problems but are not
necessarily in NP (i.e., they might not be decision problems or may not have
solutions verifiable in polynomial time).
• Why are NP problems important in computer science?
• Answer: NP problems are important because they include many real-
world problems for which finding solutions is computationally
challenging. These problems are essential in fields like cryptography,
logistics, artificial intelligence, and more. Understanding NP problems
and the P vs NP question helps guide research in efficient algorithms
and complexity theory.
• 8. What is the significance of the Cook-Levin theorem in NP
problems?
• Answer: The Cook-Levin theorem states that the Boolean Satisfiability
Problem (SAT) is NP-Complete. This theorem was the first to identify
an NP-Complete problem and demonstrated that if any NP problem
could be solved in polynomial time, then SAT could also be solved in
polynomial time. This discovery led to the establishment of the class
of NP-Complete problems and the foundation for complexity theory.
• What is a decision problem in the context of NP?
• Answer: A decision problem is a problem that can be phrased as a yes-or-
no question for which an answer can be verified as correct or incorrect. In
the context of NP, these are problems for which a solution can be verified
in polynomial time. Examples include determining whether there exists a
path in a graph with a certain property or if a particular assignment
satisfies a Boolean formula.
• 10. How does backtracking relate to solving NP problems?
• Answer: Backtracking is an algorithmic technique that explores all possible
solutions by building up a solution incrementally and abandoning (or
"backtracking") when it determines that a solution cannot work. Although
backtracking is often inefficient, it can be applied to NP problems as a way
to explore the solution space and can sometimes find solutions faster than
brute force, particularly with optimizations like pruning. However, it still
doesn’t guarantee a polynomial time solution for NP problems.
• What does it mean if P ≠ NP?
• Answer: If P ≠ NP, it means that there are some problems in NP that cannot
be solved in polynomial time, only verified in polynomial time. In this case,
no efficient solution exists for NP-Complete problems (unless a non-
polynomial time solution is acceptable), and many challenging problems
would remain practically unsolvable for large inputs.
• 12. Why can’t we find polynomial time solutions for NP-Complete
problems?
• Answer: There is currently no known method to solve NP-Complete
problems in polynomial time because these problems inherently require
exploring an enormous solution space. Unless there’s a breakthrough
showing that P = NP, it’s assumed that a polynomial-time algorithm for NP-
Complete problems is unlikely due to their computational complexity and
the need to test numerous potential solutions.
Hamiltonial Cycle
• Definition: A Hamiltonian cycle is a path in a graph that visits each vertex exactly once and returns to the starting vertex.
• Key Points
1. Graph: Can be directed or undirected. The cycle must cover all vertices in the graph.
2. Existence: Not all graphs contain a Hamiltonian cycle. Determining if such a cycle exists is a challenging problem.
3. Complexity: The problem of finding a Hamiltonian cycle is NP-complete, meaning there is no known efficient algorithm to solve all
instances of this problem quickly.
• Examples
1. Simple Graph:
1. For a triangle graph with vertices A, B, and C, a Hamiltonian cycle could be A → B → C → A.
2. Complex Graph:
1. In a square graph with vertices A, B, C, and D, a Hamiltonian cycle could be A → B → C → D → A.
• Applications
• Traveling Salesman Problem (TSP): Finding the shortest Hamiltonian cycle that visits a set of cities is a variation of this problem.
• Routing: Applications in networking and logistics where visiting each point exactly once is required.
• What is a Hamiltonian Cycle?
• Answer: A Hamiltonian Cycle in a graph is a cycle that visits each
vertex exactly once and returns to the starting vertex. In other words,
it's a closed loop that includes each vertex of the graph exactly once,
except for the starting and ending vertex, which are the same.
• How can you check if a Hamiltonian Cycle exists in a given graph?
• Answer: There’s no polynomial-time algorithm for checking if a
Hamiltonian Cycle exists in general graphs, as the problem is NP-
complete. For small graphs, a brute-force approach by generating all
permutations of vertices and checking for cycles works, but for larger
graphs, backtracking and heuristic algorithms, like the use of dynamic
programming or branch-and-bound, are more practical.
• Why is finding a Hamiltonian Cycle considered an NP-complete
problem?
• Answer: The Hamiltonian Cycle problem is NP-complete because:
• It is in NP (we can verify a Hamiltonian Cycle in polynomial time if one is
provided).
• There’s no known polynomial-time algorithm to solve it for all graphs.
• It’s as hard as any other problem in NP, meaning any problem in NP can be
reduced to the Hamiltonian Cycle problem in polynomial time.
• Can you give an example of a real-world application of Hamiltonian
Cycles?
• Answer: One of the most common real-world applications is in routing and
logistics, where each location needs to be visited exactly once before
returning to the starting point (e.g., delivery trucks or drone paths).
Another example is in DNA sequencing, where finding Hamiltonian Paths
and Cycles can help in reconstructing DNA sequences.
• What is the time complexity of the brute-force algorithm to find a
Hamiltonian Cycle?
• Answer: The time complexity of the brute-force algorithm is
O(n!)O(n!)O(n!), where nnn is the number of vertices. This is because it
checks all permutations of vertices, making it infeasible for large graphs.
• Is there a relationship between the Hamiltonian Cycle problem and
the Travelling Salesman Problem (TSP)?
• Answer: Yes, the Hamiltonian Cycle problem can be seen as a special
case of the Travelling Salesman Problem (TSP) where all edge weights
are the same. TSP requires finding the shortest path that visits all
vertices and returns to the starting point, while Hamiltonian Cycle
simply requires the existence of such a path without considering edge
weights.

You might also like