
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Pattern Searching Using Suffix Tree
Trie A trie is a tree-based data structure used to store and retrieve a dynamic set of strings.
Compressed Trie A compressed trie is a variation of the trie data structure used for storing and searching dynamic sets of strings. Memory usage is minimised by sharing common prefixes.
In a compressed trie, nodes with only one child are merged with their parent nodes compressing the common prefixes into a single edge.
Suffix Tree A suffix tree is a data structure used in string processing to store and search for all suffixes of a given string. It represents all possible suffixes of a string in the form of a tree-like data structure where each edge represents a substring and each node represents a position in the string. The root node represents an empty string and leaf nodes represent all unique suffixes of the string.
Creating a Suffix Tree for a Given String
Generate all the suffixes for the given string.
Taking an example of the word "world"
Suffixes of "world\0" are: world\0 orld\0 rld\0 ld\0 d\0 \0
Taking each suffix are separate words, create a compressed trie.
Problem Statement
Given an input string ?str' and a pattern string ?ptr'. Using a suffix tree tell if the pattern string ?ptr' is present in the input string ?str'or not along with th indices they are present at.
Sample Example 1
Input: str = "aabcdaaabcdbabc" ptr = "abc" Output: 1, 7, 12
Explanation
The pattern "abc" is present at indices 1, 7, and 12 in the input string.
Sample Example 2
Input: str = "minimization" ptr = "ma" Output: False
Explanation
The pattern "ma" is not found in the input string.
Solution Approach
For searching ptr in the suffix tree, we first look at the first character of the pattern and match it with the children of the root node. If there is a match then we search recursively on the child node. But if at any point, the pattern does not match the child node, then the pattern is not present in the string.
Pseudocode
class Node children[256]: Node array ind: List of integers constructor() ind <- create new empty list of integers for i from 0 to 255 children[i] <- NULL function insertSuffix(suffix: string, index: integer) ind.push_back(index) if suffix.length() > 0 cIndex <- suffix.at(0) if children[cIndex] is NULL children[cIndex] <- create new Node children[cIndex].insertSuffix(suffix.substr(1), index + 1) function search(pat: string): List of integers if pat.length() is 0 return ind if children[pat.at(0)] is not NULL return children[pat.at(0)].search(pat.substr(1)) else return NULL class SuffixTree root: Node constructor(txt: string) root <- create new Node for i from 0 to txt.length() - 1 root.insertSuffix(txt.substr(i), i) function search(ptr: string) ans <- root.search(ptr) if ans is NULL print "Pattern not found" else for each i in ans print "Pattern found at position " + (i - ptr.length())
Example: C++ Implementation
The following code searches for a pattern in a string using a suffix tree.
#include <bits/stdc++.h> using namespace std; // Defining node of the Suffix tree class Node{ private: Node *children[256]; list<int> *ind; public: Node(){ ind = new list<int>; for (int i = 0; i < 256; i++) { children[i] = NULL; } } // Inserting new suffix to the tree void insertSuffix(string suffix, int index){ ind->push_back(index); if (suffix.length() > 0){ char cIndex = suffix.at(0); if (children[cIndex] == NULL) children[cIndex] = new Node(); children[cIndex]->insertSuffix(suffix.substr(1), index + 1); } } // Pattern Searching in subtree list<int> *search(string pat){ if (pat.length() == 0) return ind; if (children[pat.at(0)] != NULL) return (children[pat.at(0)])->search(pat.substr(1)); else return NULL; } }; // Defination of Suffix Tree class SuffixTree { private: Node root; public: SuffixTree(string txt){ for (int i = 0; i < txt.length(); i++) root.insertSuffix(txt.substr(i), i); } // Function for searching a pattern in the tree void search(string ptr){ list<int> *ans = root.search(ptr); if (ans == NULL) cout << "Pattern not found" << endl; else { list<int>::iterator i; int ptrLength = ptr.length(); for (i = ans->begin(); i != ans->end(); i++){ cout << "Pattern found at position " << *i - ptrLength << endl; } } } }; int main(){ string str = "aabcdaaabcdbabc"; string ptr = "abcx"; SuffixTree Tree(str); cout << "Searching for " << ptr << endl; Tree.search(ptr); return 0; }
Output
Searching for abcx Pattern not found
Time Complexity
Suffix Tree Construction O(N^2) where N is the length of input string and this is the worst case time complexity.
Pattern Searching O(M) where M is the length of the pattern.
Space Complexity
Suffix Tree Construction O(N^2) where N is the length of the input string and this is the worst-case space complexity.
Pattern Searching O(1)
Conclusion
In conclusion, the Suffix Tree is a powerful data structure for efficiently storing and manipulating strings. It allows for various string-related operations, including substring searches, pattern matching, and prefix/suffix queries. Pattern searching in a string using the suffix tree is an efficient approach. The provided solution solves the problem with the time complexity of O(M) where M is the size of the patterns string, and space complexity of O(1).