Patternmatchingalgorithms
Patternmatchingalgorithms
net/publication/326209389
CITATIONS READS
0 1,525
1 author:
Kamran Mahmoudi
Imam Khomeini International University
37 PUBLICATIONS 2 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Design, development and primitive evaluation of ADHD gamified assessment tool View project
All content following this page was uploaded by Kamran Mahmoudi on 05 July 2018.
1/56
Formal Definition
2/56
Classification
using preprocessing as main criteria
3/56
Basic classification
Single Pattern Algorithms
✓ Naïve String Search
✓ Knuth-Morris-Pratt Algorithm
✓ Boyer-Moore Algorithm
✓ Rabin-Karp String Search Algorithm
✓ Finite State Automaton Based Search
Bitap algorithm (shift-or, shift-and, Baeza–Yates–Gonnet)
Two-way string-matching algorithm
BNDM (Backward Non-Deterministic Dawg Matching)
BOM (Backward Oracle Matching) 4/56
Basic classification
5/56
Basic classification
6/56
Naïve string search
Input: a pattern p= p1…pm and a text t=t1….tn
I := φ
For j:=0 to n-m do
i:=1
while pi=tj+1 and i<=m do
i:=i+1
if i=m+1 then {p1…pm=tj+1… tj+m}
I := I U {j+1}
Output: The set I of positions,
where an occurrence of p as a substring in t starts
7/56
Knuth–Morris–Pratt algorithm
KMP-prefix(P)
Begin
m |P|
T[1] 0
i 0
for j=2 upto m step 1 do
while i>0 and P[i+1] != P[j] then
i T[i]
if P[i+1] = P[j] then
i i+1
T[j] i
return T
end
8/56
Knuth–Morris–Pratt algorithm
KMP-Matcher(T,P)
Begin
n |T|
m |P|
Table KMP-Prefix(P)
i 0
for j=0 upto n step 1 do
while i>0 and P[i+1] != T[j] do
i Table[i]
Wend
if P[i+1] = T[j] then
i i+1
end if
if i = m then
output(j-m)
iTable[i] 9/56
end if
end
The Boyer-Moore algorithm
10/56
The Bad Character Rule
11/56
Ex.1: the bad character rule
[4] 12/56
Preprocessing for the bad character rule
13/56
Good suffix rule
14/56
Bad match rule & good suffix rule
15/56
( https://www.youtube.com/watch?v=4Xyhb72LCX4 )
Rabin-Karp – the idea
16/56
Example
Pattern = AAT
Text = TAACGGCATACAATCG
Character values :
A=1
Calculate hash from oldHash code method
: T=2
1. X=oldHash – val(old char) C=3
2. X=x/prime G=4
3. newHash=X+primem-1 * val(new char) Prime number=7
17/56
Example, Rabin-Karp algorithm
Pattern = AAT
H(AAT)= 1 + 1*7 + 2*49 = 106
▪ Text = TAGACAATCG H(TAG)=2+1*7+4*49 = 205 !=106
▪ Text = TAGACAATCG H(AGT)=(205-2)/7+1*49 = 78 != 106
▪ Text = TAGACAATCG H(GAC)=(78-2)/7+3*49 = 157 != 106
▪ Text = TAGACAATCG H(ACA)=(157-2)/7+1*49 = 71 != 106
▪ Text = TAGACAATCG H(CAA)=(71-2)/7+1*49 = 58 != 106
✓ Text = TAGACAATCG H(AAT)=(58-2)/7+2*49 = 106 ==106
18/56
Finite state automaton
19/56
Informal definition of automata
20/56
Formal definition
Time Complexity :
Preprocessing : O(m3 |Σ|)
Matching: 𝜃 (n)
22/56
String matching with FSM
23/56
( https://www.youtube.com/watch?v=nNb9lu5Hvio )
FSM Matching algorithm
FINITE-AUTOMATON-MATCHER(T,d,m)
1. n length[T]
2. q 0
3. for i 1 to n
4. do q Ϭ(q, T[i])
5. if q=m then
6. print `Pattern occurs with shift' i-m
24/56
Transition-function construction
algorithm
1. m length[P]
2. for q 0 to m (for each state)
3. do for each character a ∈ Σ (|Σ|)
4. do k min(m+1, q+2)
5. repeat k k-1 (1 ≤ k ≤ m+1)
6. until Pk ⊐ Pqa (Σ k )
7. Ϭ(q,a) k
8. return Ϭ
25/56
Better solution: suffix trees
26/56
[8]
27/56
28/56
29/56
30/56
31/56
32/56
33/56
34/56
35/56
36/56
37/56
38/56
39/56
40/56
41/56
42/56
43/56
44/56
45/56
46/56
47/56
48/56
49/56
50/56
51/56
Weiner’s Algorithm I
Definitions
i: suffix tree for Si=S[i..n]$
WHead(i): longest prefix of Si that is also prefix of Sj j>i
Proceeding
Build n+1 = edge (root, n+1) labelled $
For i from n to 1 do
Find WHead(j) in Wj+1
w = node labelled WHead(j) (eventually new created)
Create new leaf j and edge (w,j) labelled
S[j..n]-WHead(j)
52/56
[7]
53/56
54/56
[9]
Ukkonen’s suffix tree
(https://www.youtube.com/watch?v=WbLKFzqvacg )
55/56
Suffix array
P.S. 1
Suffix array, example
P.S. 2
Suffix array, example (continue)
P.S. 3
Suffix array – pattern matching
def search(P):
l = 0; r = n
while l < r:
mid = (l+r) / 2
if P > suffixAt(A[mid]):
l = mid + 1
else:
r = mid
s = l; r = n
while l < r:
mid = (l+r) / 2
if P < suffixAt(A[mid]):
r = mid
else:
l = mid + 1
return (s, r)
P.S. 4
References
[1]: Hans-Joachim Bockenhauer, Dirk Bongartz, “Algorithmic Aspects of Bioinformatics ”,
2007 Natural computing series, Springer, ISSN 1619-7127
[2]: https://en.wikipedia.org/wiki/String_searching_algorithm
[3]: https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm
[4]: http://www.cs.jhu.edu/~langmea/resources/lecture_notes/boyer_moore.pdf
[5]: http://u.cs.biu.ac.il/~rosenfa5/Alg2/fingerpainting.ppt
[6]: http://web.cs.mun.ca/~wang/courses/cs6783-13f/n2-string-1.pdf
[7]: http://www.zbh.uni-hamburg.de/pubs/pdf/GieKur1997.pdf
[8]:
http://bix.ucsd.edu/bioalgorithms/presentations/Ch09_CombinatorialPatternMatching.pdf
[9]: http://wwwmayr.in.tum.de/konferenzen/Jass03/presentations/pentenrieder.pdf
56/56