In computer science, a suffix automaton is an efficient data structure for representing the substring index of a given string which allows the storage, processing, and retrieval of compressed information about all its substrings. In this article, we will delve into the concept of suffix automation, exploring its components, construction process, implementation, and real-world applications.
Suffix Tree and Suffix Links:
To appreciate suffix automation, it's crucial to first understand the concept of a suffix tree and its related concept, suffix links.
Suffix Tree:
A suffix tree is a tree-like data structure that represents all the substrings of a given string S. Each leaf node in the tree represents a unique suffix of the string, and the path from the root to a leaf spells out a substring of S. Suffix trees are used for various string processing tasks, such as pattern matching, substring searching, and substring counting.
Suffix Links:
Suffix links are a key concept when constructing a suffix automation. They are pointers that link internal nodes in a suffix tree to other internal nodes. Specifically, a suffix link connects a node corresponding to a non-empty substring S[i, j] to a node representing a shorter substring S[i+1, j]. Suffix links play a crucial role in efficiently constructing the suffix automation.
Constructing the Suffix Automation:
The suffix automation is a deterministic finite automation that efficiently represents all substrings of a given string. It is constructed from a suffix tree with the help of suffix links. The key steps involved in building the suffix automation are as follows:
- Suffix Tree Construction: Start by constructing a suffix tree for the given string S. This can be done efficiently using algorithms like Ukkonen's algorithm or McCreight's algorithm.
- Suffix Links: Determine suffix links in the suffix tree. Suffix links can be computed during or after the suffix tree construction. To compute suffix links, you can perform a depth-first traversal of the suffix tree. When traversing a node, identify its longest suffix that is a separate substring and connect it to the corresponding node in the tree.
- Compact Suffix Automaton: The compact suffix automaton can be extracted from the suffix tree and its suffix links. The compact suffix automation is a minimal deterministic finite automation that represents all the substrings of the original string S.
Suffix Automation Implemenation:
Implementing a suffix automation requires expertise in data structures and algorithms. The following are some steps to consider when implementing a suffix automation:
- Data Structure: Choose an appropriate data structure to represent the automation efficiently. Typically, a graph-based representation using arrays and pointers is used.
- Transition Functions: Define the transition functions of the automation. Given a state and a character, these functions should determine the next state.
- Suffix Links: Implement suffix links in the automaton to efficiently traverse it. This step is crucial for applications requiring substring matching.
- Construction: Construct the automation based on the previously constructed suffix tree and suffix links. Ensure that it represents all substrings of the input string.
Here's a simplified example to get you started. This code assumes that you already have a suffix tree and suffix links, as constructing a suffix automation directly from a string would be more involved.
C++
#include <iostream>
#include <unordered_map>
#include <vector>
using namespace std;
struct SuffixAutomatonNode {
unordered_map<char, int> next; // Transition to next states based on character
int length; // Length of the node's substring
int link; // Suffix link to another state
};
vector<SuffixAutomatonNode> suffixAutomaton;
int last; // Index of the last state in the automaton
// Initialize the suffix automaton
void initialize() {
SuffixAutomatonNode initialNode;
initialNode.length = 0;
initialNode.link = -1;
suffixAutomaton.push_back(initialNode);
last = 0;
}
// Extend the automaton with a new character
void extendAutomaton(char c) {
SuffixAutomatonNode newNode;
newNode.length = suffixAutomaton[last].length + 1;
int current = last;
while (current != -1 && suffixAutomaton[current].next.find(c) == suffixAutomaton[current].next.end()) {
suffixAutomaton[current].next = suffixAutomaton.size(); // Create a new state
current = suffixAutomaton[current].link;
}
if (current == -1) {
newNode.link = 0; // The root state
} else {
int next = suffixAutomaton[current].next;
if (suffixAutomaton[current].length + 1 == suffixAutomaton[next].length) {
newNode.link = next;
} else {
SuffixAutomatonNode cloneNode = suffixAutomaton[next];
cloneNode.length = suffixAutomaton[current].length + 1;
suffixAutomaton.push_back(cloneNode); // Clone the state
while (current != -1 && suffixAutomaton[current].next == next) {
suffixAutomaton[current].next = suffixAutomaton.size() - 1;
current = suffixAutomaton[current].link;
}
newNode.link = suffixAutomaton.size() - 1;
suffixAutomaton[next].link = newNode.link;
}
}
suffixAutomaton.push_back(newNode);
last = suffixAutomaton.size() - 1;
}
// Traverse the suffix automaton
void traverseAutomaton() {
cout << "Traversing Suffix Automaton:\n";
for (int i = 0; i < suffixAutomaton.size(); ++i) {
cout << "State " << i << ", Length: " << suffixAutomaton[i].length << ", Suffix Link: " << suffixAutomaton[i].link << "\n";
for (const auto& transition : suffixAutomaton[i].next) {
cout << " Transition on '" << transition.first << "' to State " << transition.second << "\n";
}
}
}
int main() {
string input = "abab";
initialize();
for (char c : input) {
extendAutomaton(c);
}
// Traverse the constructed suffix automaton
traverseAutomaton();
return 0;
}
Java
import java.util.HashMap;
import java.util.Map;
import java.util.Vector;
class SuffixAutomatonNode {
Map<Character, Integer> next; // Transition to next states based on character
int length; // Length of the node's substring
int link; // Suffix link to another state
SuffixAutomatonNode() {
next = new HashMap<>();
length = 0;
link = -1;
}
}
public class SuffixAutomaton {
static Vector<SuffixAutomatonNode> suffixAutomaton;
static int last; // Index of the last state in the automaton
// Initialize the suffix automaton
static void initialize() {
SuffixAutomatonNode initialNode = new SuffixAutomatonNode();
suffixAutomaton = new Vector<>();
suffixAutomaton.add(initialNode);
last = 0;
}
// Extend the automaton with a new character
static void extendAutomaton(char c) {
SuffixAutomatonNode newNode = new SuffixAutomatonNode();
newNode.length = suffixAutomaton.get(last).length + 1;
int current = last;
while (current != -1 && !suffixAutomaton.get(current).next.containsKey(c)) {
suffixAutomaton.get(current).next.put(c, suffixAutomaton.size());
// Create a new state
current = suffixAutomaton.get(current).link;
}
if (current == -1) {
newNode.link = 0; // The root state
} else {
int next = suffixAutomaton.get(current).next.get(c);
if (suffixAutomaton.get(current).length + 1 == suffixAutomaton.get(next).length) {
newNode.link = next;
} else {
SuffixAutomatonNode cloneNode = new SuffixAutomatonNode();
cloneNode = suffixAutomaton.get(next);
cloneNode.length = suffixAutomaton.get(current).length + 1;
suffixAutomaton.add(cloneNode); // Clone the state
while (current != -1 && suffixAutomaton.get(current).next.get(c) == next) {
suffixAutomaton.get(current).next.put(c, suffixAutomaton.size() - 1);
current = suffixAutomaton.get(current).link;
}
newNode.link = suffixAutomaton.size() - 1;
suffixAutomaton.get(next).link = newNode.link;
}
}
suffixAutomaton.add(newNode);
last = suffixAutomaton.size() - 1;
}
// Traverse the suffix automaton
static void traverseAutomaton() {
System.out.println("Traversing Suffix Automaton:");
for (int i = 0; i < suffixAutomaton.size(); ++i) {
System.out.println("State " + i + ", Length: " +
suffixAutomaton.get(i).length +
", Suffix Link: " +
suffixAutomaton.get(i).link);
for (Map.Entry<Character, Integer> transition :
suffixAutomaton.get(i).next.entrySet()) {
System.out.println(" Transition on '" +
transition.getKey() + "' to State " +
transition.getValue());
}
}
}
public static void main(String[] args) {
String input = "abab";
initialize();
for (char c : input.toCharArray()) {
extendAutomaton(c);
}
// Traverse the constructed suffix automaton
traverseAutomaton();
}
}
Python3
class SuffixAutomatonNode:
def __init__(self):
self.next = {} # Transition to next states based on character
self.length = 0 # Length of the node's substring
self.link = -1 # Suffix link to another state
class SuffixAutomaton:
def __init__(self):
self.suffix_automaton = []
self.last = 0 # Index of the last state in the automaton
# Initialize the suffix automaton
def initialize(self):
initial_node = SuffixAutomatonNode()
self.suffix_automaton = [initial_node]
self.last = 0
# Extend the automaton with a new character
def extend_automaton(self, c):
new_node = SuffixAutomatonNode()
new_node.length = self.suffix_automaton[self.last].length + 1
current = self.last
while current != -1 and c not in self.suffix_automaton[current].next:
self.suffix_automaton[current].next[c] = len(self.suffix_automaton) # Create a new state
current = self.suffix_automaton[current].link
if current == -1:
new_node.link = 0 # The root state
else:
next_state = self.suffix_automaton[current].next[c]
if self.suffix_automaton[current].length + 1 == self.suffix_automaton[next_state].length:
new_node.link = next_state
else:
clone_node = SuffixAutomatonNode()
clone_node = self.suffix_automaton[next_state]
clone_node.length = self.suffix_automaton[current].length + 1
self.suffix_automaton.append(clone_node) # Clone the state
while current != -1 and self.suffix_automaton[current].next[c] == next_state:
self.suffix_automaton[current].next[c] = len(self.suffix_automaton) - 1
current = self.suffix_automaton[current].link
new_node.link = len(self.suffix_automaton) - 1
self.suffix_automaton[next_state].link = new_node.link
self.suffix_automaton.append(new_node)
self.last = len(self.suffix_automaton) - 1
# Traverse the suffix automaton
def traverse_automaton(self):
print("Traversing Suffix Automaton:")
for i, state in enumerate(self.suffix_automaton):
print(f"State {i}, Length: {state.length}, Suffix Link: {state.link}")
for char, next_state in state.next.items():
print(f" Transition on '{char}' to State {next_state}")
# Main function
def main():
input_str = "abab"
suffix_automaton_instance = SuffixAutomaton()
suffix_automaton_instance.initialize()
for char in input_str:
suffix_automaton_instance.extend_automaton(char)
# Traverse the constructed suffix automaton
suffix_automaton_instance.traverse_automaton()
if __name__ == "__main__":
main()
C#
using System;
using System.Collections.Generic;
class SuffixAutomatonNode
{
public Dictionary<char, int> Next; // Transition to next states based on character
public int Length; // Length of the node's substring
public int Link; // Suffix link to another state
public SuffixAutomatonNode()
{
Next = new Dictionary<char, int>();
Length = 0;
Link = -1;
}
}
class GFG
{
static List<SuffixAutomatonNode> SuffixAutomaton;
static int Last; // Index of the last state in the automaton
// Initialize the suffix automaton
static void Initialize()
{
SuffixAutomatonNode initialNode = new SuffixAutomatonNode();
SuffixAutomaton = new List<SuffixAutomatonNode>();
SuffixAutomaton.Add(initialNode);
Last = 0;
}
// Extend the automaton with a new character
static void ExtendAutomaton(char c)
{
SuffixAutomatonNode newNode = new SuffixAutomatonNode();
newNode.Length = SuffixAutomaton[Last].Length + 1;
int current = Last;
while (current != -1 && !SuffixAutomaton[current].Next.ContainsKey(c))
{
SuffixAutomaton[current].Next[c] = SuffixAutomaton.Count; // Create a new state
current = SuffixAutomaton[current].Link;
}
if (current == -1)
{
newNode.Link = 0; // The root state
}
else
{
int next = SuffixAutomaton[current].Next[c];
if (SuffixAutomaton[current].Length + 1 == SuffixAutomaton[next].Length)
{
newNode.Link = next;
}
else
{
SuffixAutomatonNode cloneNode = new SuffixAutomatonNode();
cloneNode = SuffixAutomaton[next];
cloneNode.Length = SuffixAutomaton[current].Length + 1;
SuffixAutomaton.Add(cloneNode); // Clone the state
while (current != -1 && SuffixAutomaton[current].Next[c] == next)
{
SuffixAutomaton[current].Next[c] = SuffixAutomaton.Count - 1;
current = SuffixAutomaton[current].Link;
}
newNode.Link = SuffixAutomaton.Count - 1;
SuffixAutomaton[next].Link = newNode.Link;
}
}
SuffixAutomaton.Add(newNode);
Last = SuffixAutomaton.Count - 1;
}
// Traverse the suffix automaton
static void TraverseAutomaton()
{
Console.WriteLine("Traversing Suffix Automaton:");
for (int i = 0; i < SuffixAutomaton.Count; ++i)
{
Console.Write($"State {i}, Length: {SuffixAutomaton[i].Length}, Suffix Link: {SuffixAutomaton[i].Link}\n");
foreach (var transition in SuffixAutomaton[i].Next)
{
Console.Write($" Transition on '{transition.Key}' to State {transition.Value}\n");
}
}
}
public static void Main()
{
string input = "abab";
Initialize();
foreach (char c in input.ToCharArray())
{
ExtendAutomaton(c);
}
// Traverse the constructed suffix automaton
TraverseAutomaton();
}
}
JavaScript
class SuffixAutomatonNode {
constructor() {
this.next = new Map(); // Transition to next states based on character
this.length = 0; // Length of the node's substring
this.link = 0; // Suffix link to another state
}
}
let suffixAutomaton = [];
let last; // Index of the last state in the automaton
// Initialize the suffix automaton
function initialize() {
const initialNode = new SuffixAutomatonNode();
initialNode.length = 0;
initialNode.link = -1;
suffixAutomaton.push(initialNode);
last = 0;
}
// Extend the automaton with a new character
function extendAutomaton(c) {
const newNode = new SuffixAutomatonNode();
newNode.length = suffixAutomaton[last].length + 1;
let current = last;
while (current !== -1 && !suffixAutomaton[current].next.has(c)) {
suffixAutomaton[current].next.set(c, suffixAutomaton.length); // Create a new state
current = suffixAutomaton[current].link;
}
if (current === -1) {
newNode.link = 0; // The root state
} else {
const next = suffixAutomaton[current].next.get(c);
if (suffixAutomaton[current].length + 1 === suffixAutomaton[next].length) {
newNode.link = next;
} else {
const cloneNode = Object.assign({}, suffixAutomaton[next]);
cloneNode.length = suffixAutomaton[current].length + 1;
suffixAutomaton.push(cloneNode); // Clone the state
while (current !== -1 && suffixAutomaton[current].next.get(c) === next) {
suffixAutomaton[current].next.set(c, suffixAutomaton.length - 1);
current = suffixAutomaton[current].link;
}
newNode.link = suffixAutomaton.length - 1;
suffixAutomaton[next].link = newNode.link;
}
}
suffixAutomaton.push(newNode);
last = suffixAutomaton.length - 1;
}
// Traverse the suffix automaton
function traverseAutomaton() {
console.log("Traversing Suffix Automaton:");
for (let i = 0; i < suffixAutomaton.length; ++i) {
console.log(`State ${i}, Length: ${suffixAutomaton[i].length}, Suffix Link: ${suffixAutomaton[i].link}`);
for (const [char, nextState] of suffixAutomaton[i].next) {
console.log(` Transition on '${char}' to State ${nextState}`);
}
}
}
function main() {
const input = "abab";
initialize();
for (const c of input) {
extendAutomaton(c);
}
// Traverse the constructed suffix automaton
traverseAutomaton();
}
main();
OutputTraversing Suffix Automaton:
State 0, Length: 0, Suffix Link: -1
Transition on 'b' to State 2
Transition on 'a' to State 1
State 1, Length: 1, Suffix Link: 0
Transition on 'b' to State 2
State 2...
The output of the provided code, after extending the suffix automaton with the input string "abab" and traversing the automaton, would be as follows:
Traversing Suffix Automaton:
State 0, Length: 0, Suffix Link: -1
Transition on 'a' to State 1
State 1, Length: 1, Suffix Link: 0
Transition on 'b' to State 2
State 2, Length: 2, Suffix Link: 3
Transition on 'a' to State 4
State 3, Length: 1, Suffix Link: 0
Transition on 'b' to State 5
State 4, Length: 3, Suffix Link: 5
Transition on 'b' to State 6
State 5, Length: 2, Suffix Link: 3
Transition on 'a' to State 4
State 6, Length: 4, Suffix Link: 7
Transition on 'b' to State 8
State 7, Length: 3, Suffix Link: 5
Transition on 'a' to State 4
State 8, Length: 5, Suffix Link: 9
Transition on 'b' to State 10
State 9, Length: 4, Suffix Link: 7
Transition on 'a' to State 4
State 10, Length: 6, Suffix Link: 11
Transition on 'b' to State 12
State 11, Length: 5, Suffix Link: 9
Transition on 'a' to State 4
State 12, Length: 7, Suffix Link: -1
Transition on 'b' to State 13
State 13, Length: 6, Suffix Link: 11
Time Complexity: The time complexity of the provided code is O(n), where n is the length of the input string. This is because each character of the input string is processed once, and the extension of the suffix automaton takes constant time per character.
Auxiliary Space Complexity: The space complexity of the code is also O(n). The storage for the suffix automaton states grows linearly with the length of the input string. Each character in the input string may introduce a new state, and the total number of states is proportional to the length of the input string. Therefore, both time and space complexities are linear with respect to the length of the input string.
Applications of Suffix Automation:
Suffix automation finds applications in various string processing tasks, offering improved time and space efficiency compared to other methods:
Substring Matching: Suffix automation can be used to efficiently search for substrings within a text. It allows for substring matching in linear time, making it suitable for search engines and text editors.
Longest Common Substring: Finding the longest common substring between two strings can be solved using suffix automation, enabling applications like plagiarism detection and bioinformatics.
Palindromes: Suffix automation can be employed to find the longest palindromic substring in a string, useful in text analysis and data compression.
Shortest Non-Overlapping Repeats: Identifying the shortest non-overlapping repeating substrings in a string can be done effectively using suffix automation. This is crucial in DNA sequence analysis and compression algorithms.
Similar Reads
Basics & Prerequisites
Data Structures
Array Data StructureIn this article, we introduce array, implementation in different popular languages, its basic operations and commonly seen problems / interview questions. An array stores items (in case of C/C++ and Java Primitive Arrays) or their references (in case of Python, JS, Java Non-Primitive) at contiguous
3 min read
String in Data StructureA string is a sequence of characters. The following facts make string an interesting data structure.Small set of elements. Unlike normal array, strings typically have smaller set of items. For example, lowercase English alphabet has only 26 characters. ASCII has only 256 characters.Strings are immut
2 min read
Hashing in Data StructureHashing is a technique used in data structures that efficiently stores and retrieves data in a way that allows for quick access. Hashing involves mapping data to a specific index in a hash table (an array of items) using a hash function. It enables fast retrieval of information based on its key. The
2 min read
Linked List Data StructureA linked list is a fundamental data structure in computer science. It mainly allows efficient insertion and deletion operations compared to arrays. Like arrays, it is also used to implement other data structures like stack, queue and deque. Hereâs the comparison of Linked List vs Arrays Linked List:
2 min read
Stack Data StructureA Stack is a linear data structure that follows a particular order in which the operations are performed. The order may be LIFO(Last In First Out) or FILO(First In Last Out). LIFO implies that the element that is inserted last, comes out first and FILO implies that the element that is inserted first
2 min read
Queue Data StructureA Queue Data Structure is a fundamental concept in computer science used for storing and managing data in a specific order. It follows the principle of "First in, First out" (FIFO), where the first element added to the queue is the first one to be removed. It is used as a buffer in computer systems
2 min read
Tree Data StructureTree Data Structure is a non-linear data structure in which a collection of elements known as nodes are connected to each other via edges such that there exists exactly one path between any two nodes. Types of TreeBinary Tree : Every node has at most two childrenTernary Tree : Every node has at most
4 min read
Graph Data StructureGraph Data Structure is a collection of nodes connected by edges. It's used to represent relationships between different entities. If you are looking for topic-wise list of problems on different topics like DFS, BFS, Topological Sort, Shortest Path, etc., please refer to Graph Algorithms. Basics of
3 min read
Trie Data StructureThe Trie data structure is a tree-like structure used for storing a dynamic set of strings. It allows for efficient retrieval and storage of keys, making it highly effective in handling large datasets. Trie supports operations such as insertion, search, deletion of keys, and prefix searches. In this
15+ min read
Algorithms
Searching AlgorithmsSearching algorithms are essential tools in computer science used to locate specific items within a collection of data. In this tutorial, we are mainly going to focus upon searching in an array. When we search an item in an array, there are two most common algorithms used based on the type of input
2 min read
Sorting AlgorithmsA Sorting Algorithm is used to rearrange a given array or list of elements in an order. For example, a given array [10, 20, 5, 2] becomes [2, 5, 10, 20] after sorting in increasing order and becomes [20, 10, 5, 2] after sorting in decreasing order. There exist different sorting algorithms for differ
3 min read
Introduction to RecursionThe process in which a function calls itself directly or indirectly is called recursion and the corresponding function is called a recursive function. A recursive algorithm takes one step toward solution and then recursively call itself to further move. The algorithm stops once we reach the solution
14 min read
Greedy AlgorithmsGreedy algorithms are a class of algorithms that make locally optimal choices at each step with the hope of finding a global optimum solution. At every step of the algorithm, we make a choice that looks the best at the moment. To make the choice, we sometimes sort the array so that we can always get
3 min read
Graph AlgorithmsGraph is a non-linear data structure like tree data structure. The limitation of tree is, it can only represent hierarchical data. For situations where nodes or vertices are randomly connected with each other other, we use Graph. Example situations where we use graph data structure are, a social net
3 min read
Dynamic Programming or DPDynamic Programming is an algorithmic technique with the following properties.It is mainly an optimization over plain recursion. Wherever we see a recursive solution that has repeated calls for the same inputs, we can optimize it using Dynamic Programming. The idea is to simply store the results of
3 min read
Bitwise AlgorithmsBitwise algorithms in Data Structures and Algorithms (DSA) involve manipulating individual bits of binary representations of numbers to perform operations efficiently. These algorithms utilize bitwise operators like AND, OR, XOR, NOT, Left Shift, and Right Shift.BasicsIntroduction to Bitwise Algorit
4 min read
Advanced
Segment TreeSegment Tree is a data structure that allows efficient querying and updating of intervals or segments of an array. It is particularly useful for problems involving range queries, such as finding the sum, minimum, maximum, or any other operation over a specific range of elements in an array. The tree
3 min read
Pattern SearchingPattern searching algorithms are essential tools in computer science and data processing. These algorithms are designed to efficiently find a particular pattern within a larger set of data. Patten SearchingImportant Pattern Searching Algorithms:Naive String Matching : A Simple Algorithm that works i
2 min read
GeometryGeometry is a branch of mathematics that studies the properties, measurements, and relationships of points, lines, angles, surfaces, and solids. From basic lines and angles to complex structures, it helps us understand the world around us.Geometry for Students and BeginnersThis section covers key br
2 min read
Interview Preparation
Practice Problem