0% found this document useful (0 votes)
7 views2 pages

Pattern Matching

Pattern matching is the process of finding occurrences of a smaller string (pattern) within a larger string (text). The Knuth-Morris-Pratt (KMP) algorithm efficiently searches for patterns in linear time by preprocessing the pattern to create a Longest Prefix-Suffix (LPS) array, which helps skip unnecessary comparisons during the search. In the given example, the KMP algorithm finds two occurrences of the pattern 'ABCDABD' in the text 'ABC ABCDAB ABCDABCDABDE' at positions 15 and 18.

Uploaded by

poorvikabv117
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views2 pages

Pattern Matching

Pattern matching is the process of finding occurrences of a smaller string (pattern) within a larger string (text). The Knuth-Morris-Pratt (KMP) algorithm efficiently searches for patterns in linear time by preprocessing the pattern to create a Longest Prefix-Suffix (LPS) array, which helps skip unnecessary comparisons during the search. In the given example, the KMP algorithm finds two occurrences of the pattern 'ABCDABD' in the text 'ABC ABCDAB ABCDABCDABDE' at positions 15 and 18.

Uploaded by

poorvikabv117
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

What do you mean by pattern matching?

Outline the Knuth Morris Pratt (KMP) algorithm


and illustrate it to find the occurrences of the following pattern.
P: ABCDABD
S: ABC ABCDAB ABCDABCDABDE

What is Pattern Matching?


Pattern matching refers to the process of finding occurrences of a "pattern" (a smaller string) within a
"text" (a larger string). In this context, we are interested in identifying all the positions in the text
where the pattern appears as a substring.
For example, given a pattern P = {"ABCDABD"} and S = {"ABC ABCDAB ABCDABCDABDE"}
the goal is to find all occurrences of the pattern P within the S.

Knuth-Morris-Pratt (KMP) Algorithm


The Knuth-Morris-Pratt (KMP) algorithm is an efficient string searching algorithm designed to find
all occurrences of a pattern within a text in linear time. It improves on the naive approach of pattern
matching, which might need to re-check characters in the text even if we have already compared them
and found they don't match.

Steps of the KMP Algorithm:

1. Preprocessing Step : Create the Longest Prefix-Suffix Array, LPS Array:


The LPS array is a preprocessing step where we create an array that helps skip re-evaluating
characters when a mismatch occurs. The value of lps[i] tells us the length of the longest proper
prefix of the substring P[0..i] which is also a suffix.

2. Search Step:
Using the LPS array, we can avoid unnecessary comparisons while searching for the pattern in the
text. When a mismatch occurs, the LPS array tells us the next position to compare, based on
previously matched characters.
Pre-processing Phase: Compute LPS (Longest Prefix-Suffix) Array
Given the pattern P = {"ABCDABD"} , we create the LPS array for this pattern. Here's how we
compute the LPS array:

- Let’s initialize lps[0] = 0 because there’s no proper prefix for a single character.
- Starting from i = 1 , for each character P[i], check if it matches the character in the pattern at the
position indicated by lps[i-1] . If it does, increment the `lps` value. If not, adjust based on previous
prefix-suffix information.

For the pattern P = {"ABCDABD"} , we compute the LPS array as follows:

1.P[0] = {"A"} → No proper prefix-suffix, so (lps[0] = 0.


2. P[1] ={"B"} → P[1] ≠ P[0], so lps[1] = 0.
3. P[2] = {"C"} → P[2] ≠P[0], so lps[2] = 0.
4. P[3] = {"D"}→ P[3] ≠P[0], so lps[3] = 0.
5. P[4] = {"A"} → P[4] = P[0], so lps[4] = 1
6. P[5] = {"B"}→ P[5] = P[1], so lps[5] = 2.
7. P[6] = {"D"}→P[6] ≠P[2] \), so adjust tolps[6] = 3.

Final LPS Array:


{LPS} = [0, 0, 0, 0, 1, 2, 3]
Searching Phase
With the LPS array, we now search for the pattern in the text. Let’s use the patternP =
{"ABCDABD"}and the text S = {"ABC ABCDAB ABCDABCDABDE"}.

- Start comparing the characters of the pattern P with the text S.


- If a character matches, move forward.
- If there’s a mismatch, use the LPS array to determine the next position in the pattern to start
comparing, rather than starting from the beginning.

Illustration: Pattern Matching inS = {"ABC ABCDAB ABCDABCDABDE"} with P =


{"ABCDABD"}

1. Initialize:
- Pattern: P = {"ABCDABD"}
- Text: S = {"ABC ABCDAB ABCDABCDABDE"}
- LPS Array: [0, 0, 0, 0, 1, 2, 3]
- Text pointer j = 0 (points to the start of the text)
- Pattern pointer i = 0 (points to the start of the pattern)

2. Step-by-step matching:
- Compare S[0] = A with P[0] = A→ match.
- Compare S[1] = B with P[1] = B → match.
- Compare S[2] = C with P[2] = C → match.
- Compare S[3] = A with P[3] = D → mismatch.
Mismatch at P[3] and S[3]:
- The LPS value at P[3] is 0, so we move the pattern pointer i to 0 and text pointer j moves to the
next character.

3. Next comparisons:
- Compare S[1] = B with P[0] = A→ mismatch.
Mismatch at S[1]:
- The LPS value at P[0] is 0, so both the pattern pointer and text pointer move ahead.

4. Step-by-step matching continues, moving through mismatches and using the LPS array to skip over
already matched portions. Eventually, you get:
- First occurrence of pattern at position 15 in the string "ABCDAB".
- Second occurrence of pattern at position 18 in the string "ABCDABCD".

Summary of KMP Algorithm

1. Preprocess the pattern to build the LPS array.


2. Use the LPS array to efficiently search the text, skipping unnecessary comparisons.
For this example, the KMP algorithm finds two occurrences of the pattern "ABCDABD" in the
string "ABC ABCDAB ABCDABCDABDE" at positions 15 and 18.

You might also like