0% found this document useful (0 votes)
19 views2 pages

Short Notes On Knuth

Uploaded by

Janhavi Bhati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views2 pages

Short Notes On Knuth

Uploaded by

Janhavi Bhati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Short Notes on Knuth-Morris-Pratt (KMP) Algorithm

1. Purpose:
The KMP algorithm is a pattern-matching algorithm used to find the occurrence of a pattern PPP of length
mmm in a text TTT of length nnn. It avoids redundant comparisons, achieving a time complexity of
O(n+m)O(n + m)O(n+m).
2. Key Idea:
Instead of starting over after a mismatch, the algorithm uses information from the pattern itself to skip
unnecessary comparisons. This is done using a failure function.
3. Failure Function:
o The failure function f(j)f(j)f(j) for a pattern PPP represents the length of the longest prefix of PPP that
is also a suffix of P[1..j]P[1..j]P[1..j].
o Example for P="abacab"P = "abacab"P="abacab": j:012345P[j]:abacabf(j):001012j: 0 \quad 1 \quad
2 \quad 3 \quad 4 \quad 5 P[j]: a \quad b \quad a \quad c \quad a \quad b f(j): 0 \quad 0 \quad 1 \
quad 0 \quad 1 \quad 2 j:012345P[j]:abacabf(j):001012
4. Algorithm Steps:
o Preprocessing: Compute the failure function fff for PPP in O(m)O(m)O(m).
o Matching: Use fff to determine how far to shift the pattern after a mismatch, reducing unnecessary
comparisons.
5. Performance:
o Worst-case time complexity: O(n+m)O(n + m)O(n+m).
o This is optimal since every character in both TTT and PPP is processed at most once.
6. Advantages:
o Efficient for large TTT and PPP.
o Reduces the need to recheck previously matched characters.
7. C++ Implementation:
The algorithm can be implemented in C++ using two functions: one for matching (KMPMatch) and another
for computing the failure function (computeFailFunction).

Descriptive Questions and Answers


1. Q1: Explain the key concept of the KMP algorithm. Why is it more efficient than the brute-force approach?
Answer:
The KMP algorithm avoids redundant comparisons by using the failure function. When a mismatch occurs,
the failure function provides the next index to continue the search, skipping unnecessary characters. In
contrast, the brute-force approach restarts the comparison from the next character in the text, leading to
redundant checks. The KMP algorithm thus achieves O(n+m)O(n + m)O(n+m) time complexity, whereas brute
force can take O(n⋅m)O(n \cdot m)O(n⋅m) in the worst case.

Q2: Define the failure function f(j)f(j)f(j) and explain its significance in the KMP algorithm.
Answer:
The failure function f(j)f(j)f(j) is defined as the length of the longest prefix of PPP that is also a suffix of
P[1..j]P[1..j]P[1..j]. It helps the KMP algorithm efficiently shift the pattern PPP in the text TTT after a mismatch,
ensuring no redundant comparisons are made. It encodes information about repeated substrings within the pattern.

Q5: Analyze the time complexity of the KMP algorithm.


Answer:
 The failure function is computed in O(m)O(m)O(m).
 The matching phase processes nnn characters of TTT and uses fff to skip unnecessary comparisons. Each
iteration either increments iii or reduces jjj, ensuring at most 2n2n2n iterations.
 Total time complexity: O(m+n)O(m + n)O(m+n).

Let's compute the failure function f(j)f(j) for the pattern P="ababaca"P = "ababaca" step by step. The failure function
f(j)f(j) represents the length of the longest prefix of PP that is also a suffix of P[1..j]P[1..j].
Pattern PP:
P="a b a b a c a"P = "a \ b \ a \ b \ a \ c \ a"
Steps for f(j)f(j):
1. Initialization:
o f(0)=0f(0) = 0 (by definition).
o Start with i=1i = 1 (current position) and j=0j = 0 (length of longest prefix).

2. Step-by-Step Calculation:
o i=1i = 1:
P[1]=b≠P[0]=aP[1] = b \neq P[0] = a, so f(1)=0f(1) = 0.
No prefix matches the suffix for P[1..1]="ab"P[1..1] = "ab".
o i=2i = 2:
P[2]=a=P[0]P[2] = a = P[0], so f(2)=1f(2) = 1.
Prefix "a""a" matches suffix for P[1..2]="aba"P[1..2] = "aba".
o i=3i = 3:
P[3]=b=P[1]P[3] = b = P[1], so f(3)=2f(3) = 2.
Prefix "ab""ab" matches suffix for P[1..3]="abab"P[1..3] = "abab".
o i=4i = 4:
P[4]=a=P[2]P[4] = a = P[2], so f(4)=3f(4) = 3.
Prefix "aba""aba" matches suffix for P[1..4]="ababa"P[1..4] = "ababa".
o i=5i = 5:
P[5]=c≠P[3]=bP[5] = c \neq P[3] = b, so we use f(3)=2f(3) = 2.
P[5]=c≠P[2]=aP[5] = c \neq P[2] = a, so f(5)=0f(5) = 0.
No prefix matches the suffix for P[1..5]="ababac"P[1..5] = "ababac".
o i=6i = 6:
P[6]=a=P[0]P[6] = a = P[0], so f(6)=1f(6) = 1.
Prefix "a""a" matches suffix for P[1..6]="ababaca"P[1..6] = "ababaca".

Final Failure Function:


j:0123456P[j]:ababacaf(j):0012301j: \quad 0 \quad 1 \quad 2 \quad 3 \quad 4 \quad 5 \quad 6 P[j]: \quad a \quad
b \quad a \quad b \quad a \quad c \quad a f(j): \quad 0 \quad 0 \quad 1 \quad 2 \quad 3 \quad 0 \quad 1

Explanation:
 f(j)f(j) gives us the information to skip unnecessary comparisons in the Knuth-Morris-Pratt algorithm when
there’s a mismatch during pattern matching.

You might also like