String Algorithms for CS Students

Uploaded by

Clash Of Clanes

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views48 pages

String Algorithms for CS Students

Uploaded by

Clash Of Clanes

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Module 06: String Algorithms

(Contd.)
-
Knuth Morris Pratt Algorithm
• The KMP matching algorithm uses degenerating property (pattern
having same sub-patterns appearing more than once in the pattern)
of the pattern and improves the worst case complexity to O(n).
• The basic idea behind KMP’s algorithm is: whenever we detect a
mismatch (after some matches), we already know some of the
characters in the text of the next window. We take advantage of this
information to avoid matching the characters that we know will
anyway match.
Matching Overview txt = "AAAAABAAABA"
pat = "AAAA"

We compare first window of txt with pat

txt = "AAAAABAAABA"
pat = "AAAA" [Initial position]
txt = "AAAAABAAABA"
pat = "AAAA" [Pattern shifted one position]
txt = "AAAAABAAABA"
pat = "AAAA" [Pattern shifted one position]
• KMP algorithm preprocesses pat[] and constructs an auxiliary lps[] of size m (same as
size of pattern) which is used to skip characters while matching.
• name lps indicates longest proper prefix which is also suffix..
• A proper prefix is prefix with whole string not allowed.
• For example, prefixes of “ABC” are “”, “A”, “AB” and “ABC”.
• Proper prefixes are “”, “A” and “AB”.
• Suffixes of the string are “”, “C”, “BC” and “ABC”.
• We search for lps in sub-patterns.
• For each sub-pattern pat[0..i] where i = 0 to m-1,
• lps[i] stores length of the maximum matching proper prefix which is also a suffix of the
sub-pattern pat[0..i]. lps[i] = the longest proper prefix of pat[0..i] which is also a suffix of
pat[0..i].
Algorithm: Use a value from lps[] to decide the next characters to be
matched. The idea is to not match a character that we know will
anyway match.
Hashing based approach (Rabin Karp) for pattern matching
Rabin-Karp
• The Rabin-Karp string searching algorithm calculates a hash value for the pattern,
and for each M-character subsequence of text to be compared.
• If the hash values are unequal, the algorithm will calculate the hash value for next
M-character sequence.
• If the hash values are equal, the algorithm will do a Brute Force comparison
between the pattern and the M-character sequence.
• In this way, there is only one comparison per text subsequence, and Brute Force is
only needed when hash values match.

12
Rabin-Karp Example
• Hash value of “AAAAA” is 37
• Hash value of “AAAAH” is 100

13
Rabin-Karp Algorithm
pattern is M characters long
hash_p=hash value of pattern
hash_t=hash value of first M letters in body of text
do
if (hash_p == hash_t)
brute force comparison of pattern and selected section of text
hash_t= hash value of next section of text, one character over
while (end of text)

14
Hash Function
• Let b be the number of letters in the alphabet. The text subsequence t[i .. i+M-1] is
mapped to the number

• Furthermore, given x(i) we can compute x(i+1) for the next

subsequence t[i+1 .. i+M] in constant time, as follows:

• In this way, we never explicitly compute a new value. We

simply adjust the existing value as we move over one
16
character.
Rabin-Karp Math Example
• Let’s say that our alphabet consists of 10 letters.
• our alphabet = a, b, c, d, e, f, g, h, i, j
• Let’s say that “a” corresponds to 1, “b” corresponds to 2 and so
on.
The hash value for string “cah” would be ...

3100 + 110 + 8*1 = 318

17
Rabin-Karp Mods
• If M is large, then the resulting value (~bM) will be enormous. For this reason, we
hash the value by taking it mod a prime number q.
• The mod function is particularly useful in this case due to several of its inherent
properties:
[(x mod q) + (y mod q)] mod q = (x+y) mod q
(x mod q) mod q = x mod q
• For these reasons:
h(i)=((t[i] bM-1 mod q) +(t[i+1] bM-2 mod q) + … +(t[i+M-1] mod q))mod q
h(i+1) =( h(i) b mod q
Shift left one digit
-t[i] bM mod q
Subtract leftmost digit
+t[i+M] mod q )
Add new rightmost digit
mod q
18
Rabin-Karp Complexity
• If a sufficiently large prime number is used for the hash function,
the hashed values of two different patterns will usually be distinct.
• If this is the case, searching takes O(N) time, where N is the
number of characters in the larger body of text.
• It is always possible to construct a scenario with a worst case
complexity of O(MN). This, however, is likely to happen only if the
prime number used for hashing is small.

19
Data structures (Tries and compressed tries) for strings
Tries
• Tries is an efficient information reTrieval data structure.
• Tries can reduce search complexities to optimal limit (key length).
• Binary search tree can reduce retrieval time to M * log N, where M is
maximum string length and N is number of keys in tree.
• Using Tries, we can search the key in O(M) time, but space complexity
for tries can be its limitation.
• All the descendant node in tries have the same prefix, hence tries
also know as prefix trees.
Trie Node
// Trie node
struct TrieNode
{
struct TrieNode *children[ALPHABET_SIZE];
// isEndOfWord is true if the node
// represents end of a word
bool isEndOfWord;
};

• Every node of Trie consists of multiple branches.

• Each branch represents a possible character of keys.
• Last node of every key is marked as end of word node.
Example:
Insertion in Tries
• Every character of input key is inserted as an individual Trie node.
• children is an array of pointers/references to next level trie nodes.
• The key character acts as an index into the array children
• If the input key is new or an extension of existing key, construct non-
existing nodes of the key, and mark end of word for last node.
• If the input key is prefix of existing key in Trie, we simply mark the last
node of key as end of word.
• The key length determines Trie depth.
Example:

void insert(String s) {
for(every char in string s) { if(child node
belonging to current char is null) { child
node=new Node(); }
current_node=child_node; }
Mark isEndofWord }
Standard trie- {bear, bell, bid, bull, buy, sell,
stock, stop}
Compress Trie - obtained from standard trie by joining chains
of single nodes.
Searching in Tries
• Searching for a key is similar to insert operation
• compare the characters and move down.
• search can terminate due to end of string or lack of key in trie.
• In the former case, if the isEndofWord field of last node is true, then
the key exists in trie.
• In the second case, the search terminates without examining all the
characters of key, since the key is not present in trie.
Lecture 36-Suffix tree and Suffix array data structures and related
operation for string handling
Suffix Tree
• A Suffix Tree for a given text is a compressed trie for all suffixes of
the given text.
Example:
• Given words: {bear, bell, bid, bull, buy, sell, stock, stop}
Build a Suffix Tree for a given text
• 1) Generate all suffixes of given text.
2) Consider all suffixes as individual words and build a compressed
trie.
• example text “banana\0”

• Following are all suffixes of “banana\0”

• banana\0
• anana\0
• nana\0
• ana\0
• na\0
• a\0
• \0
Search in Suffix Tree
Example:
Example:
• Panamabananas$
Possible suffixes:
index suffix
0 Panamabananas$
1 anamabananas$
2 namabananas$
3 amabananas$
4 mabananas$
5 abananas$
6 bananas$
index suffix
7 ananas$
8 nanas$
9 anas$
10 as$
11 s$
12 $
Pattern searching in suffix tree
• Step.1 : check if the given pattern really exists in string, for this,
traverse the suffix tree against the pattern.
• Step.2 : If you find pattern in suffix tree, then traverse the subtree
below that point and find all suffix indices on leaf nodes. All those
suffix indices will be pattern indices in string.
Suffix Arrays
A suffix array is a sorted array of all suffixes of a given string.
A suffix array can be constructed from Suffix tree by doing a DFS
traversal of the suffix tree.
Suffix array and suffix tree both can be constructed from each other in
linear time.
Example
Let the given string be "banana".

<suffixes> <sorted suffixes>

0 banana 5a
1 anana Sort the Suffixes 3 ana
2 nana ---------------------> 1 anana
3 ana alphabetically 0 banana
4 na 4 na
5a 2 nana

• So the suffix array for "banana" is {5, 3, 1, 0, 4, 2}

Search a pattern using Suffix Array
• Preprocess the text and build a suffix array of the text
• Binary search can be applied to search the given pattern
Applications of Suffix Array

Following are some famous problems where Suffix array can be used.
1) Pattern Searching
2) Finding the longest repeated substring
3) Finding the longest common substring
4) Finding the longest palindrome in a string
References
1. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest,
and Clifford Stein. Introduction to Algorithms, Third Edition. MIT Press
and McGraw-Hill, 2009. ISBN 0-262-03293-7. Chapter 32: String
Matching, pp. 985–1013.
2. Aho, Alfred V.; Hopcroft, John E.; Ullman, Jeffrey D. (1974), The
Design and Analysis of Computer Algorithms, Reading/MA: Addison-
Wesley, ISBN 0-201-00029-6.
3. Weiner, P. (1973), "Linear pattern matching algorithms" (PDF), 14th
Annual IEEE Symposium on Switching and Automata Theory, pp. 1–
11, doi:10.1109/SWAT.1973.13.

Understanding Suffix Trees and Arrays
No ratings yet
Understanding Suffix Trees and Arrays
22 pages
Lecture4 - Indexing and Searching I
No ratings yet
Lecture4 - Indexing and Searching I
56 pages
String Matching Algorithms
No ratings yet
String Matching Algorithms
25 pages
Lecture 04 Inaryseachtree
No ratings yet
Lecture 04 Inaryseachtree
20 pages
Suffix Arrays: Justin Zhang 24 May 2017
No ratings yet
Suffix Arrays: Justin Zhang 24 May 2017
5 pages
Suffix Trees and Suffix Arrays
No ratings yet
Suffix Trees and Suffix Arrays
33 pages
Strings and Pattern Matching
No ratings yet
Strings and Pattern Matching
17 pages
String Data Types and Matching Algorithms
No ratings yet
String Data Types and Matching Algorithms
20 pages
Trie and Suffix Tree Guide
No ratings yet
Trie and Suffix Tree Guide
6 pages
Co 4 (Lo 2)
No ratings yet
Co 4 (Lo 2)
12 pages
Suffix Array Construction Techniques
No ratings yet
Suffix Array Construction Techniques
17 pages
10 String Algorithms
No ratings yet
10 String Algorithms
36 pages
String Searching Algorithms Explained
No ratings yet
String Searching Algorithms Explained
23 pages
Pattern Matching + Hashing
No ratings yet
Pattern Matching + Hashing
29 pages
Advanced String Lecture
No ratings yet
Advanced String Lecture
50 pages
Unit 3
No ratings yet
Unit 3
34 pages
DSA - Strings - Notes
No ratings yet
DSA - Strings - Notes
8 pages
Algorithm Design for CS Students
No ratings yet
Algorithm Design for CS Students
16 pages
Notes 06 Text Indexing PDF
No ratings yet
Notes 06 Text Indexing PDF
162 pages
09 SuffixTrees
No ratings yet
09 SuffixTrees
21 pages
Module 06. String Algorithms Lecture 1 - 2
No ratings yet
Module 06. String Algorithms Lecture 1 - 2
19 pages
Ir Mod 4
No ratings yet
Ir Mod 4
15 pages
Suffix Arrays and Their Construction
No ratings yet
Suffix Arrays and Their Construction
29 pages
String Matching: CPSC 212: Algorithms and Data Structures Brian C. Dean
No ratings yet
String Matching: CPSC 212: Algorithms and Data Structures Brian C. Dean
23 pages
Unit5 Trie
No ratings yet
Unit5 Trie
23 pages
11 Data Structures and Algorithms - Narasimha Karumanchi
100% (1)
11 Data Structures and Algorithms - Narasimha Karumanchi
12 pages
6 Suffix-Tree
No ratings yet
6 Suffix-Tree
20 pages
Trings and Attern Atching: - Brute Force, Rabin-Karp, Knuth-Morris-Pratt
No ratings yet
Trings and Attern Atching: - Brute Force, Rabin-Karp, Knuth-Morris-Pratt
49 pages
Trings and Attern Atching: - Brute Force, Rabin-Karp, Knuth-Morris-Pratt - Regular Expressions
No ratings yet
Trings and Attern Atching: - Brute Force, Rabin-Karp, Knuth-Morris-Pratt - Regular Expressions
21 pages
Aho-Corasick and String Matching Techniques
No ratings yet
Aho-Corasick and String Matching Techniques
89 pages
Pattern Matching
No ratings yet
Pattern Matching
33 pages
String Algorithm Challenges Assignment
No ratings yet
String Algorithm Challenges Assignment
17 pages
04 03-PatternMatchingAndTries
No ratings yet
04 03-PatternMatchingAndTries
28 pages
Week 4
No ratings yet
Week 4
18 pages
Notesa
No ratings yet
Notesa
15 pages
Foundations of Sequence Analysis
No ratings yet
Foundations of Sequence Analysis
161 pages
String Matching and Hashing
No ratings yet
String Matching and Hashing
10 pages
UNIT-V String Matching
No ratings yet
UNIT-V String Matching
24 pages
Patternmatching
No ratings yet
Patternmatching
29 pages
String Matching Kmprabin Karp and Naive
No ratings yet
String Matching Kmprabin Karp and Naive
41 pages
Suffix Trees and Arrays for Pattern Analysis
No ratings yet
Suffix Trees and Arrays for Pattern Analysis
78 pages
Suf Tree
No ratings yet
Suf Tree
6 pages
Suffix Arrays for String Search
No ratings yet
Suffix Arrays for String Search
71 pages
Rabin-Karp and KMP Algorithms Explained
No ratings yet
Rabin-Karp and KMP Algorithms Explained
9 pages
Suffixtrees
No ratings yet
Suffixtrees
50 pages
Overview of String Matching Algorithms
No ratings yet
Overview of String Matching Algorithms
5 pages
Understanding Trie Data Structures
No ratings yet
Understanding Trie Data Structures
38 pages
Obs Ds Unit5
No ratings yet
Obs Ds Unit5
10 pages
String Matching 2019
No ratings yet
String Matching 2019
50 pages
Daa Exp 09
No ratings yet
Daa Exp 09
7 pages
Toc
No ratings yet
Toc
6 pages
Chapter 09 Advanced Data Structures
No ratings yet
Chapter 09 Advanced Data Structures
9 pages
Lecture 01
No ratings yet
Lecture 01
28 pages
Module2-Lecture Divide and Conquer
No ratings yet
Module2-Lecture Divide and Conquer
49 pages
Module1 Lecture2
No ratings yet
Module1 Lecture2
31 pages
Module 3 Greedy Algorithms Lect-1 and 2
No ratings yet
Module 3 Greedy Algorithms Lect-1 and 2
28 pages
Module 7 Uninformed and Informed Searches 2
No ratings yet
Module 7 Uninformed and Informed Searches 2
20 pages
Cumulative Discount Factor Table
No ratings yet
Cumulative Discount Factor Table
3 pages
Mat PP1 Cycle 2 QNS
No ratings yet
Mat PP1 Cycle 2 QNS
11 pages
Real Time Object Detection Using YOLO
No ratings yet
Real Time Object Detection Using YOLO
6 pages
Image Denoising with Spatial Filters
No ratings yet
Image Denoising with Spatial Filters
21 pages
Minimum Spanning Trees: Prim's Algorithm Kruskal's Algorithm
No ratings yet
Minimum Spanning Trees: Prim's Algorithm Kruskal's Algorithm
33 pages
Assignment Model CM I
No ratings yet
Assignment Model CM I
16 pages
Fin Irjmets1741417092
No ratings yet
Fin Irjmets1741417092
8 pages
Week 9
No ratings yet
Week 9
35 pages
Koushik Katakam Resume
No ratings yet
Koushik Katakam Resume
2 pages
CTCM
No ratings yet
CTCM
21 pages
Incremental Learning
No ratings yet
Incremental Learning
14 pages
Probability Density Function (PDF) - Definition, Formula, Graph, Example
No ratings yet
Probability Density Function (PDF) - Definition, Formula, Graph, Example
12 pages
Classification Methods Based On Formal Concept Analysis
No ratings yet
Classification Methods Based On Formal Concept Analysis
9 pages
Bode Plot
No ratings yet
Bode Plot
4 pages
Error Calculations in Numerical Analysis: Absolute and Relative Errors
100% (1)
Error Calculations in Numerical Analysis: Absolute and Relative Errors
4 pages
Python Challenges 1+ +6
No ratings yet
Python Challenges 1+ +6
13 pages
Charged Fluids Theory
No ratings yet
Charged Fluids Theory
35 pages
Graham Scan Algorithm for Convex Hulls
No ratings yet
Graham Scan Algorithm for Convex Hulls
41 pages
Adaptive Dynamic Programming With Applications in Optimal Control 1st Edition Derong Liu - Quickly Download The Ebook To Read Anytime, Anywhere
100% (2)
Adaptive Dynamic Programming With Applications in Optimal Control 1st Edition Derong Liu - Quickly Download The Ebook To Read Anytime, Anywhere
59 pages
Numerical Analysis Notes
No ratings yet
Numerical Analysis Notes
450 pages
Mathematics Grade 10 Paper 1 November Exam 2022 Memo
No ratings yet
Mathematics Grade 10 Paper 1 November Exam 2022 Memo
9 pages
Image Fusion by Wavelet Method: D.Ravikrishna Reddy
No ratings yet
Image Fusion by Wavelet Method: D.Ravikrishna Reddy
25 pages
Grade of Services Blocking Probability
100% (1)
Grade of Services Blocking Probability
4 pages
Electric Power Systems Research: Sciencedirect
No ratings yet
Electric Power Systems Research: Sciencedirect
10 pages
MIT Project Risk Management
0% (1)
MIT Project Risk Management
70 pages
Neural Networks for Robot Control
No ratings yet
Neural Networks for Robot Control
9 pages
34852-Article Text-38919-1-2-20250410
No ratings yet
34852-Article Text-38919-1-2-20250410
9 pages
Deepcoder: A Deep Neural Network Based Video Compression
No ratings yet
Deepcoder: A Deep Neural Network Based Video Compression
4 pages
Muon ZH Dual
No ratings yet
Muon ZH Dual
19 pages
The GR4J Hydrological Model
No ratings yet
The GR4J Hydrological Model
13 pages

String Algorithms for CS Students

Uploaded by

String Algorithms for CS Students

Uploaded by

Module 06: String Algorithms

We compare first window of txt with pat

• Furthermore, given x(i) we can compute x(i+1) for the next

• In this way, we never explicitly compute a new value. We

3*100 + 1*10 + 8*1 = 318

• Every node of Trie consists of multiple branches.

• Following are all suffixes of “banana\0”

<suffixes> <sorted suffixes>

• So the suffix array for "banana" is {5, 3, 1, 0, 4, 2}

You might also like

3100 + 110 + 8*1 = 318