0% found this document useful (0 votes)
11 views19 pages

Unit 2 - Letter ManipilationPattern Searching

Uploaded by

snoopy mouse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views19 pages

Unit 2 - Letter ManipilationPattern Searching

Uploaded by

snoopy mouse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Letter Manipulation

What is Pattern Searching


• Pattern searching in Data Structures and Algorithms
(DSA) is a fundamental concept that involves
searching for a specific pattern or sequence of
elements within a given data structure.
• This technique is commonly used in string matching
algorithms to find occurrences of a particular pattern
within a text or a larger string.
• Pattern searching plays a crucial role in tasks such as
text processing, data retrieval, and computational
biology.
• Text Processing: Searching for keywords in a document, finding and replacing
text, spell checking, and plagiarism detection.
• Information Retrieval: Finding relevant documents in a database, web search,
and data mining.
• Bioinformatics: Searching for DNA sequences in a genome, protein analysis,
and gene expression analysis.
• Network Security: Detecting malicious patterns in network traffic, intrusion
detection, and malware analysis.
Applications
Data Mining: Identifying patterns in large datasets, customer
segmentation, and fraud detection.
Important Pattern Searching
Algorithms:
 Naive String Matching
 Rabin-Karp Algorithm
 Knuth-Morris-Pratt (KMP) Algorithm
 Aho-Corasick algorithm
Examples
• Input: T[] = “THIS IS A TEST TEXT”, P[] = “TEST”

Output: Pattern found at index 10

• Input: T[] = “AABAACAADAABAABA”, P[] = “AABA”


Output: Pattern found at index 0
Pattern found at index 9
Pattern found at index 12
Brute Force-Complexity
• Given a pattern M characters in length, and a text N characters in
length...
• Worst case: compares pattern to each substring of text of length M.
For example, M=5.
• This kind of case can occur for image data.

Total number of comparisons: M (N-M+1)


8
Worst case time complexity: O(MN)
Brute Force-
Complexity(cont.)
• Given a pattern M characters in length, and a text N characters in
length...
• Best case if pattern found: Finds pattern in first M positions of text.
For example, M=5.

Total number of comparisons: M


Best case time complexity: O(M) 9
Brute Force-
Complexity(cont.)
• Given a pattern M characters in length, and a text N characters in length...
• If pattern not found: Always mismatch on first character. For example, M=5.

Total number of comparisons: N


10
time complexity: O(N)
Rabin-Karp Algorithm
• The Rabin Karp Algorithm is string searching
algorithm that uses hashing to find the patterns
in strings
• It make use of hash functions and rolling hash
technique
Rabin-Karp
• The Rabin-Karp string searching algorithm calculates a hash
value for the pattern, and for each M-character subsequence of
text to be compared.
• If the hash values are unequal, the algorithm will calculate the
hash value for next M-character sequence.
• If the hash values are equal, the algorithm will do a Brute Force
comparison between the pattern and the M-character
sequence.
• In this way, there is only one comparison per text subsequence,
and Brute Force is only needed when hash values match.
12
Rabin-Karp Example
• Hash value of “AAAAA” is 37
• Hash value of “AAAAH” is 100

13
Rabin-Karp Algorithm
pattern is M characters long
hash_p=hash value of pattern
hash_t=hash value of first M letters in body of text
do
if (hash_p == hash_t)
brute force comparison of pattern and selected section of
text
hash_t= hash value of next section of text, one character over
while (end of text)

14
What is the hash
function used to
calculate values for
character sequences?
Hash Function
• Let b be the number of letters in the alphabet. The text subsequence t[i .. i+M-1] is
mapped to the number

• Furthermore, given x(i) we can compute x(i+1) for the next


subsequence t[i+1 .. i+M] in constant time, as follows:

• In this way, we never explicitly compute a new value. We


simply adjust the existing value as we move over one
16
character.
Rabin-Karp Mods
• If M is large, then the resulting value (~bM) will be enormous. For this reason, we hash the value
by taking it mod a prime number q.
• The mod function is particularly useful in this case due to several of its inherent properties:
[(x mod q) + (y mod q)] mod q = (x+y) mod q
(x mod q) mod q = x mod q
• For these reasons:
h(i)=((t[i] bM-1 mod q) +(t[i+1] bM-2 mod q) + … +(t[i+M-1] mod q))mod q
h(i+1) =( h(i) b mod q
Shift left one digit
-t[i] bM mod q
Subtract leftmost digit
+t[i+M] mod q )
Add new rightmost digit
mod q
17
Rabin-Karp Algorithm
• Given a text T[0. . .n-1] and a pattern P[0. . .m-1]

Write a function search(char P[], char T[]) that prints all occurrences of
P[] present in T[] using Rabin Karp algorithm.

Note:Assume that n > m.


Limitations

You might also like