C++ Program to Implement Suffix Array
Last Updated :
30 Sep, 2024
A suffix array is a data structure which can be used to store all possible suffixes of a given string in the sorted order. It stores the starting indices of the suffixes in lexicographical order. It is similar to trie data structure but is more space efficient then tries.
Example
Let the given string be "banana".
0 banana 5 a
1 anana Sort the Suffixes 3 ana
2 nana ----------------> 1 anana
3 ana alphabetically 0 banana
4 na 4 na
5 a 2 nana
So the suffix array for "banana" is {5, 3, 1, 0, 4, 2}
In this article, we will learn how to implement a suffix array for a given string in C++.
How to Create Suffix Array?
Following are the steps involved in creating the suffix array:
- Create the vector of string where we will store all the suffixes and also create the vector of integer where we will store staring position of all suffixes.
- Now generate all the suffixes simply using the loop and store all the suffixes in the vector of string.
- Sort all the suffixes alphabetically.
- Now according to the alphabetically sorted suffixes we have to create suffix array using the staring position of all the suffixes.
Code Implementation
Below is the program for creating the suffix array:
C++
// C++ Program to illustrate how to create the
// suffix array
#include <bits/stdc++.h>
using namespace std;
vector<int> buildSufArr(string &s) {
int n = s.length();
vector<int> sufArr(n);
vector<string> suf(n);
// Generating all the suffixes
for (int i = 0; i < n; i++)
suf[i] = s.substr(i);
// Sort all suffixes alphabetically
sort(suf.begin(), suf.end());
// Create the suffix array using the
// starting position of all the suffixes
// by subtracting it from the length of
// the original string
for (int i = 0; i < n; i++)
sufArr[i] = n - suf[i].length();
return sufArr;
}
int main() {
string s = "banana";
vector<int> sufArr = buildSufArr(s);
for (int i : sufArr)
cout << i << " ";
return 0;
}
Time Complexity: O(k * n log n), where n is the length of string and k is the maximum length of suffixes.
Auxiliary Space: O(n), where n is the length of the string.
Applications of Suffix Arrays
Suffix Arrays can be used in various problems some of which are given below:
- Pattern Matching: It Quickly finds a substring within a larger string.
- Data Compression: It helps in algorithms that reduce the size of data.
- Bioinformatics: This algorithm is used in analysing DNA sequences.
Advantages
- After it is built it allows for quick substring searches.
- It uses less space compared to other structures like suffix trees.
Similar Reads
Java Program to Implement Suffix Array A Suffix Array is a fundamental data structure used in string processing and computational biology. It represents an array of all suffixes of a given string, sorted lexicographically. Suffix arrays find applications in pattern matching, substring search, text compression, and bioinformatics, among o
3 min read
One Dimensional Arrays in C In C, an array is a collection of elements of the same type stored in contiguous memory locations. This organization allows efficient access to elements using their index. Arrays can also be of different types depending upon the direction/dimension they can store the elements. It can be 1D, 2D, 3D,
5 min read
Suffix Product Array Given an array nums[] of N integers the task is to generate a suffix product array from the given array. A Suffix Product Array is an array where each element at index i contains the product of all elements to the right of i (including the element at index i). Examples: Input: nums[] = {1, 2, 3, 4,
4 min read
Array of Strings in C++ In C++, a string is sequence of characters that is used to store textual information. Internally, it is implemented as a dynamic array of characters. Array of strings is the array in which each element is a string.We can easily create an array of string in C++ as shown in the below example:C++#inclu
4 min read
STD::array in C++ The array is a collection of homogeneous objects and this array container is defined for constant size arrays or (static size). This container wraps around fixed-size arrays and the information of its size are not lost when declared to a pointer. In order to utilize arrays, we need to include the ar
5 min read