0% found this document useful (0 votes)
14 views16 pages

Unit2 Rabinkarp

The document discusses two string-searching algorithms: Rabin-Karp and Knuth-Morris-Pratt (KMP). Rabin-Karp uses hashing for efficient pattern matching, while KMP optimizes searches by using a prefix table to avoid redundant comparisons. Both algorithms have various applications in fields such as plagiarism detection, DNA analysis, and spam filtering, with their respective complexities analyzed for best, average, and worst cases.

Uploaded by

sgithub9572
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views16 pages

Unit2 Rabinkarp

The document discusses two string-searching algorithms: Rabin-Karp and Knuth-Morris-Pratt (KMP). Rabin-Karp uses hashing for efficient pattern matching, while KMP optimizes searches by using a prefix table to avoid redundant comparisons. Both algorithms have various applications in fields such as plagiarism detection, DNA analysis, and spam filtering, with their respective complexities analyzed for best, average, and worst cases.

Uploaded by

sgithub9572
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

CSE408

DESIGN AND ANALYSIS OF


ALGORITHM

Rabin-Karp Algorithm, Knuth-Morris-Pratt Algorithm

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Introduction

• String Searching: Find a substring (pattern) in a large text.


• Challenge: Search efficiently in large datasets.
• Rabin-Karp Solution:
• Uses hashing for efficient matching.
• Compares hash values instead of individual characters.

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Rabin-Karp Algorithm

• A string-searching algorithm that uses hashing to efficiently


find a pattern in a text.
• Compares the hash value of the pattern with the hash values of
substrings in the text.
• Confirms matches by verifying actual characters when hash
values are the same.
 Key Advantage:
• Efficient for multiple pattern searches in large datasets.

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Steps of Rabin-Karp Algorithm

1. Compute hash of the pattern.


2. Compute hash of the first substring in the text.
3. Compare pattern hash with substring hash.
4. If hashes match, verify characters (to avoid collisions).
5. Slide the window by one character.
6. Use rolling hash to compute the next hash.
7. Repeat until the end of the text.

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Real-Life Applications

 Plagiarism Detection
 Search Engines
 Intrusion Detection
 DNA Sequence
 Data Deduplication.
 Digital Forensics

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Complexity of Rabin-Karp Algorithm

 Best Case: 𝑂(𝑛+𝑚)O(n+m)


Hashes of pattern and substrings match without collisions.

 Average Case: 𝑂(𝑛+𝑚)O(n+m)


Few or no hash collisions occur during matching.

 Worst Case: 𝑂(𝑛×𝑚)O(n×m)


Hash collisions require character-by-character comparison for
each window.

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Knuth-Morris-Pratt (KMP) Algorithm

 Finds occurrences of a pattern in a given text.


 Avoids redundant comparisons by using a prefix table.
 Preprocesses the pattern to optimize the search.
 Shifts the pattern intelligently after mismatches to improve
efficiency.
 Efficient pattern matching algorithm.

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Steps

 Preprocessing : Construct prefix table (LPS).


 Pattern Matching : Compare pattern with text.
 Mismatch Handling : Shift pattern using LPS.
 Efficient Search : Avoid redundant comparisons.
 Continue Search : Repeat until pattern is found.
 Final Match : Return match index if found.

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Example

 Text : ABABDABACDABABCABAB
 Pattern : ABABCABAB

 Steps:

1. Preprocessing Phase (LPS Table)

 Compute the Longest Prefix Suffix (LPS) array for the pattern:
Pattern: ABABCABAB
LPS Table: [0, 0, 1, 2, 0, 1, 2, 3, 4]

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
2. Pattern Matching Phase

 Start matching the pattern with the text from left to right:
 Compare A (text) with A (pattern) → Match.
 Compare B (text) with B (pattern) → Match.
 Compare A (text) with A (pattern) → Match.
 Compare B (text) with B (pattern) → Match.
 Compare D (text) with C (pattern) → Mismatch.

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
3. Mismatch Handling (Shifting the Pattern)

 Use the LPS table to shift the pattern:


 LPS[4] = 0, so we shift the pattern by 3 characters, not 1.
 Continue matching from the shifted position.

4 .Final Match
1. Continue matching, and you find that the pattern occurs at index 10 in
the text.

 Output: Pattern found at index: 10

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Real life Applications

 String Searching: Quickly searches for patterns in long texts.


 Compilers: Used for searching tokens or keywords in source
code.
 DNA Analysis: Locates genetic sequences efficiently.
 Spam Filtering: Detects specific spam phrases in messages

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Complexity Analysis

 Best Case: 𝑂(𝑛)O(n)


No mismatches; pattern is found quickly.
 Average Case: 𝑂(𝑛)O(n)

Efficient due to reduced comparisons using the prefix table.


 Worst Case: 𝑂(𝑛)O(n)

Even in the worst case, redundant checks are avoided.

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
Advantages

 Efficient
 Fast
 Linear
 Optimal
 No Backtracking
 Reliable

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1
! ! !
a nk You
Th

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 1

You might also like