CSU22012 2024 Lecture 7
CSU22012 2024 Lecture 7
Algorithms II
Lecture 7: Substrings
Dr Anthony Ventresque
Outline of Substring search algorithms
• Brute force
• KMP (Knuth-Morris-Pratt)
• Boyer-Moore
• Rabin-Karp
• Many many many others
• Suffix arrays
• LCP ( longest common prefix) arrays
• Implement a needle-in-a-haystack
• public int Search(String haystack, String needle)
• Implement strstr()
• Find the first instance of a string in another string
• Longest common substring between 2 files
• Longest substring that’s a palindrome
• Longest repeated substring
• Etc etc
• Deterministic Finite Automata are always complete: they define a transition for
each state and each input symbol.
• DFA simulation
• assume DFA given
• DFA construction
• manual
• DFA construction
• Algorithm/code
B C B A A B A C A A B A B A C A A
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 1 5 1
dfa[][j] B 0 2 0 4 0 4
C 0 0 0 0 0 6
A A
B,C
A B
0 A 1 B 2 A 3 B 4 A 5 C 6
C
B,C C B,C
ADD A FOOTER
Trinity College Dublin, The University of Dublin
DFA simulation
B C B A A B A C A A B A B A C A A
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 1 5 1
dfa[][j] B 0 2 0 4 0 4
C 0 0 0 0 0 6
A A
B,C
A B
0 A 1 B 2 A 3 B 4 A 5 C 6
C
B,C C B,C
ADD A FOOTER
Trinity College Dublin, The University of Dublin
DFA simulation
B C B A A B A C A A B A B A C A A
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 1 5 1
dfa[][j] B 0 2 0 4 0 4
C 0 0 0 0 0 6
A A
B,C
A B
0 A 1 B 2 A 3 B 4 A 5 C 6
C
B,C C B,C
ADD A FOOTER
Trinity College Dublin, The University of Dublin
DFA simulation
B C B A A B A C A A B A B A C A A
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 1 5 1
dfa[][j] B 0 2 0 4 0 4
C 0 0 0 0 0 6
A A
B,C
A B
0 A 1 B 2 A 3 B 4 A 5 C 6
C
B,C C B,C
ADD A FOOTER
Trinity College Dublin, The University of Dublin
DFA simulation
B C B A A B A C A A B A B A C A A
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 1 5 1
dfa[][j] B 0 2 0 4 0 4
C 0 0 0 0 0 6
A A
B,C
A B
0 A 1 B 2 A 3 B 4 A 5 C 6
C
B,C C B,C
ADD A FOOTER
Trinity College Dublin, The University of Dublin
DFA simulation
B C B A A B A C A A B A B A C A A
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 1 5 1
dfa[][j] B 0 2 0 4 0 4
C 0 0 0 0 0 6
A A
B,C
A B
0 A 1 B 2 A 3 B 4 A 5 C 6
C
B,C C B,C
ADD A FOOTER
Trinity College Dublin, The University of Dublin
DFA simulation
B C B A A B A C A A B A B A C A A
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 1 5 1
dfa[][j] B 0 2 0 4 0 4
C 0 0 0 0 0 6
A A
B,C
A B
0 A 1 B 2 A 3 B 4 A 5 C 6
C
B,C C B,C
ADD A FOOTER
Trinity College Dublin, The University of Dublin
DFA simulation
B C B A A B A C A A B A B A C A A
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 1 5 1
dfa[][j] B 0 2 0 4 0 4
C 0 0 0 0 0 6
A A
B,C
A B
0 A 1 B 2 A 3 B 4 A 5 C 6
C
B,C C B,C
ADD A FOOTER
Trinity College Dublin, The University of Dublin
DFA simulation
B C B A A B A C A A B A B A C A A
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 1 5 1
dfa[][j] B 0 2 0 4 0 4
C 0 0 0 0 0 6
A A
B,C
A B
0 A 1 B 2 A 3 B 4 A 5 C 6
C
B,C C B,C
10
ADD A FOOTER
Trinity College Dublin, The University of Dublin
DFA simulation
B C B A A B A C A A B A B A C A A
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 1 5 1
dfa[][j] B 0 2 0 4 0 4
C 0 0 0 0 0 6
A A
B,C
A B
0 A 1 B 2 A 3 B 4 A 5 C 6
C
B,C C B,C
ADD A FOOTER
Trinity College Dublin, The University of Dublin
DFA simulation
B C B A A B A C A A B A B A C A A
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 1 5 1
dfa[][j] B 0 2 0 4 0 4
C 0 0 0 0 0 6
A A
B,C
A B
0 A 1 B 2 A 3 B 4 A 5 C 6
C
B,C C B,C
ADD A FOOTER
Trinity College Dublin, The University of Dublin
DFA simulation
B C B A A B A C A A B A B A C A A
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 1 5 1
dfa[][j] B 0 2 0 4 0 4
C 0 0 0 0 0 6
A A
B,C
A B
0 A 1 B 2 A 3 B 4 A 5 C 6
C
B,C C B,C
33
ADD A FOOTER
Trinity College Dublin, The University of Dublin
DFA simulation
B C B A A B A C A A B A B A C A A
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 1 5 1
dfa[][j] B 0 2 0 4 0 4
C 0 0 0 0 0 6
A A
B,C
A B
0 A 1 B 2 A 3 B 4 A 5 C 6
C
B,C C B,C
ADD A FOOTER
Trinity College Dublin, The University of Dublin
DFA simulation
B C B A A B A C A A B A B A C A A
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 1 5 1
dfa[][j] B 0 2 0 4 0 4
C 0 0 0 0 0 6
A A
B,C
A B
0 A 1 B 2 A 3 B 4 A 5 C 6
C
B,C C B,C
ADD A FOOTER
Trinity College Dublin, The University of Dublin
DFA simulation
B C B A A B A C A A B A B A C A A
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 1 5 1
dfa[][j] B 0 2 0 4 0 4
C 0 0 0 0 0 6
A A
B,C
A B
0 A 1 B 2 A 3 B 4 A 5 C 6
C
B,C C B,C
ADD A FOOTER
Trinity College Dublin, The University of Dublin
DFA simulation
B C B A A B A C A A B A B A C A A
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 1 5 1
dfa[][j] B 0 2 0 4 0 4
C 0 0 0 0 0 6
A A
B,C
A B
0 A 1 B 2 A 3 B 4 A 5 C 6
C
B,C C B,C
ADD A FOOTER
Trinity College Dublin, The University of Dublin
DFA States – number of characters matched
j 0 1 2 3 4
char? I V A N A
A 0 0 3 0 5
I 1 1 1 1 1
N 0 0 0 4 0
V 0 2 0 0 0
j 0 1 2 3 4
char? I V A N A
A 0 3 0 5
I 1 1 1 1 1
N 0 0 0 4 0
V 0 2 0 0 0
j 0 1 2 3 4
char? I V A N A
A 0 0 3 0 0
I 1 1 1 1 1
N 0 0 0 4 0
V 0 2 0 0 5
Text ANVAIVAAIVANAAN
Search string IVANA
Trinity College Dublin, The University of Dublin
KMP Java Implementation
Include one state for each character in pattern (plus accept state).
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A
dfa[][j] B
C
0 1 2 3 4 5 6
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 3 5
dfa[][j] B 2 4
C 6
0 A 1 B 2 A 3 4 A 5 C 6
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 3 5
dfa[][j] B 0 2 4
C 0 6
B,C
j
0 A 1 B 2 A 3 4 A 5 C 6
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 5
dfa[][j] B 0 2 4
C 0 0 6
B,C A j
0 A 1 B 2 A 3 4 A 5 C 6
C
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 5
dfa[][j] B 0 2 0 4
C 0 0 0 6
B,C A j
0 A 1 B 2 A 3 4 A 5 C 6
C
B,C
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 1 5
dfa[][j] B 0 2 0 4
C 0 0 0 0 6
B,C A j
A
0 A 1 B 2 A 3 4 A 5 C 6
C
B,C C
X
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 1 5
dfa[][j] B 0 2 0 4 0
C 0 0 0 0 0 6
B,C A j
X A
0 A 1 B 2 A 3 4 A 5 C 6
C
B,C C B,C
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 1 5 1
dfa[][j] B 0 2 0 4 0 4
C 0 0 0 0 0 6
A X A j
B,C
A B
0 A 1 B 2 A 3 B 4 A 5 C 6
C
B,C C B,C
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 1 5 1
dfa[][j] B 0 2 0 4 0 4
C 0 0 0 0 0 6
A A
B,C
A B
0 A 1 B 2 A 3 B 4 A 5 C 6
C
B,C C B,C
• Construct DFA table and graphical representation for a search word “banana”
• Make up a 15-letter string in which you’re going to search for the word, assuming the
alphabet contains only letters.
• You can decide whether you want the string to contain the search word or not, but if
it does, do not have it too early into the string
• Write out the trace of DFA states while searching for the word in the madeup string
Include one state for each character in pattern (plus accept state).
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A
dfa[][j] B
C
0 1 2 3 4 5 6
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 3 5
dfa[][j] B 2 4
C 6
0 A 1 B 2 A 3 4 A 5 C 6
Include one state for each character in pattern (plus accept state).
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A
dfa[][j] B
C
0 1 2 3 4 5 6
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 3 5
dfa[][j] B 2 4
C 6
0 A 1 B 2 A 3 4 A 5 C 6
Mismatch transition.
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 3 5
dfa[][j] B 0 2 4
C 0 6
B,C
j
0 A 1 B 2 A 3 4 A 5 C 6
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 5
dfa[][j] B 0 2 4
C 0 0 6
B,C A j
0 A 1 B 2 A 3 4 A 5 C 6
C
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 5
dfa[][j] B 0 2 0 4
C 0 0 0 6
B,C A j
0 A 1 B 2 A 3 4 A 5 C 6
C
B,C
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 1 5
dfa[][j] B 0 2 0 4
C 0 0 0 0 6
B,C A j
A
0 A 1 B 2 A 3 4 A 5 C 6
C
B,C C
X
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 1 5
dfa[][j] B 0 2 0 4 0
C 0 0 0 0 0 6
A j
B,C X A
0 A 1 B 2 A 3 4 A 5 C 6
C
B,C C B,C
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 1 5 1
dfa[][j] B 0 2 0 4 0 4
C 0 0 0 0 0 6
A X A j
B,C
A B
0 A 1 B 2 A 3 B 4 A 5 C 6
C
B,C C B,C
j 0 1 2 3 4 5
pat.charAt(j) A B A B A C
A 1 1 3 1 5 1
dfa[][j] B 0 2 0 4 0 4
C 0 0 0 0 0 6
A A
B,C
A B
0 A 1 B 2 A 3 B 4 A 5 C 6
C
B,C C B,C
20
Trinity College Dublin, The University of Dublin
DFA Construction – Java code