0% found this document useful (0 votes)
6 views

KPM Algorithm

Uploaded by

pavithra.r
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

KPM Algorithm

Uploaded by

pavithra.r
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

The Knuth–Morris–Pratt (KMP) algorithm is an efficient string-matching algorithm used

to search for occurrences of a "pattern" string within a "text" string. It was developed by
Donald Knuth, Vaughan Pratt, and James H. Morris in 1977. The key idea behind the KMP
algorithm is to avoid unnecessary comparisons by preprocessing the pattern string.

Key Concepts

1. Partial Match Table (LPS Array):


o The Longest Prefix Suffix (LPS) array is the preprocessing step in KMP.
o It stores the length of the longest proper prefix of the pattern that is also a
suffix for each position in the pattern.
o The LPS array allows the algorithm to skip unnecessary comparisons.
2. String Matching:
o While comparing the pattern with the text, if a mismatch occurs, the algorithm
uses the LPS array to determine the next position in the pattern to resume
matching.

Steps of the Algorithm

1. Preprocessing:
o Construct the LPS array for the pattern string.
2. Search:
o Compare the pattern with the text.
o If there is a mismatch, use the LPS array to decide how far to shift the pattern.
o Repeat until the entire text is scanned or the pattern is found.

Example

Given:

 Text: "ABABDABACDABABCABAB"
 Pattern: "ABABCABAB"

Step 1: Build the LPS Array

The LPS array is constructed as follows:

Index (i) Pattern[i] LPS[i]


0 A 0
1 B 0
2 A 1
3 B 2
4 C 0
Index (i) Pattern[i] LPS[i]
5 A 1
6 B 2
7 A 3
8 B 4

Step 2: Pattern Matching

Now compare the pattern "ABABCABAB" with the text "ABABDABACDABABCABAB":

 Match starts at index 0. Compare each character:


o Match ABAB → mismatch at index 4 (text: D, pattern: C).
o Use LPS to shift the pattern to align the prefix AB with the suffix.
 Match resumes at index 5:
o Match ABABC → mismatch at index 9.
 Match resumes at index 10:
o Match ABABCABAB at index 10.

Advantages

 Efficient: Time complexity is O(m+n)O(m + n)O(m+n), where mmm is the length of


the pattern and nnn is the length of the text.
 Preprocessing the pattern ensures minimal comparisons.

Applications

 Pattern matching in text processing.


 Finding substrings.
 Plagiarism detection.
 DNA sequence analysis.

You might also like