We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12
Text Search in a Document
Using Knuth-Morris-Pratt (KMP) Algorithm Efficient Pattern Matching in C++ PRESENTED BY MUSKAN KESHRI 221FA04669 Introduction
•Objective: Efficiently search for patterns in text
documents. •Algorithm Used: Knuth-Morris-Pratt (KMP) •Applications: Useful for large text documents, DNA sequence analysis, and search engines. Problem Description
•Search for the pattern “fox”
•Count occurrences of each pattern in the document. •Replace “fox” with “cat” •Compare KMP with Naive Search, Rabin-Karp, and Boyer-Moore Understanding the KMP Algorithm •KMP Algorithm: Uses a partial match table to skip unnecessary comparisons. •Steps: 1.Preprocess Pattern: Build the longest prefix-suffix (LPS) array. 2.Search Phase: Match pattern in text using the LPS array. •Time Complexity: O(m+n) where m is the length of the text and nnn is the pattern length. Implementing KMP in C++ (Code - Preprocess Function) Implementing KMP in C++ (Code - Search Function) Extending for Multiple Patterns std::unordered_map<std::string, std::vector<int>> findPatterns(const std::string &text, const std::vector<std::string> &patterns) { std::unordered_map<std::string, std::vector<int>> results; for (const auto &pattern : patterns) { results[pattern] = KMPSearch(text, pattern); } return results; } Counting Occurrences of Patterns
int countOccurrences(const std::vector<int>
&indices) { return indices.size(); } Replacing Patterns in Text
std::string &pattern, const std::string &replacement) { std::string result = text; auto indices = KMPSearch(text, pattern); int shift = 0; for (int index : indices) { result.replace(index + shift, pattern.size(), replacement); shift += replacement.size() - pattern.size(); } return result; } Code Execution Complexity Comparison with Other Algorithms Summary and Conclusion:
•KMP Algorithm: Optimized for repeated text
searches with preprocessed patterns. •Advantages: Faster than naive search, suitable for large documents. •Applications: Text editors, search engines, DNA matching, etc.
Python Data Science Cookbook: Practical solutions across fast data cleaning, processing, and machine learning workflows with pandas, NumPy, and scikit-learn