-
Abelian Powers and Repetitions in Sturmian Words
Authors:
Gabriele Fici,
Alessio Langiu,
Thierry Lecroq,
Arnaud Lefebvre,
Filippo Mignosi,
Jarkko Peltomäki,
Élise Prieur-Gaston
Abstract:
Richomme, Saari and Zamboni (J. Lond. Math. Soc. 83: 79-95, 2011) proved that at every position of a Sturmian word starts an abelian power of exponent $k$ for every $k > 0$. We improve on this result by studying the maximum exponents of abelian powers and abelian repetitions (an abelian repetition is an analogue of a fractional power) in Sturmian words. We give a formula for computing the maximum…
▽ More
Richomme, Saari and Zamboni (J. Lond. Math. Soc. 83: 79-95, 2011) proved that at every position of a Sturmian word starts an abelian power of exponent $k$ for every $k > 0$. We improve on this result by studying the maximum exponents of abelian powers and abelian repetitions (an abelian repetition is an analogue of a fractional power) in Sturmian words. We give a formula for computing the maximum exponent of an abelian power of abelian period $m$ starting at a given position in any Sturmian word of rotation angle $α$. vAs an analogue of the critical exponent, we introduce the abelian critical exponent $A(s_α)$ of a Sturmian word $s_α$ of angle $α$ as the quantity $A(s_α) = limsup\ k_{m}/m=limsup\ k'_{m}/m$, where $k_{m}$ (resp. $k'_{m}$) denotes the maximum exponent of an abelian power (resp.~of an abelian repetition) of abelian period $m$ (the superior limits coincide for Sturmian words). We show that $A(s_α)$ equals the Lagrange constant of the number $α$. This yields a formula for computing $A(s_α)$ in terms of the partial quotients of the continued fraction expansion of $α$. Using this formula, we prove that $A(s_α) \geq \sqrt{5}$ and that the equality holds for the Fibonacci word. We further prove that $A(s_α)$ is finite if and only if $α$ has bounded partial quotients, that is, if and only if $s_α$ is $β$-power-free for some real number $β$. Concerning the infinite Fibonacci word, we prove that: i) The longest prefix that is an abelian repetition of period $F_j$, $j>1$, has length $F_j( F_{j+1}+F_{j-1} +1)-2$ if $j$ is even or $F_j( F_{j+1}+F_{j-1} )-2$ if $j$ is odd, where $F_{j}$ is the $j$th Fibonacci number; ii) The minimum abelian period of any factor is a Fibonacci number. Further, we derive a formula for the minimum abelian periods of the finite Fibonacci words
△ Less
Submitted 18 April, 2016; v1 submitted 9 June, 2015;
originally announced June 2015.
-
Algorithms for Longest Common Abelian Factors
Authors:
Ali Alatabbi,
Costas S. Iliopoulos,
Alessio Langiu,
M. Sohel Rahman
Abstract:
In this paper we consider the problem of computing the longest common abelian factor (LCAF) between two given strings. We present a simple $O(σ~ n^2)$ time algorithm, where $n$ is the length of the strings and $σ$ is the alphabet size, and a sub-quadratic running time solution for the binary string case, both having linear space requirement. Furthermore, we present a modified algorithm applying so…
▽ More
In this paper we consider the problem of computing the longest common abelian factor (LCAF) between two given strings. We present a simple $O(σ~ n^2)$ time algorithm, where $n$ is the length of the strings and $σ$ is the alphabet size, and a sub-quadratic running time solution for the binary string case, both having linear space requirement. Furthermore, we present a modified algorithm applying some interesting tricks and experimentally show that the resulting algorithm runs faster.
△ Less
Submitted 27 February, 2015;
originally announced March 2015.
-
A Note on the Longest Common Compatible Prefix Problem for Partial Words
Authors:
Maxime Crochemore,
Costas S. Iliopoulos,
Tomasz Kociumaka,
Marcin Kubica,
Alessio Langiu,
Jakub Radoszewski,
Wojciech Rytter,
Bartosz Szreder,
Tomasz Waleń
Abstract:
For a partial word $w$ the longest common compatible prefix of two positions $i,j$, denoted $lccp(i,j)$, is the largest $k$ such that $w[i,i+k-1]\uparrow w[j,j+k-1]$, where $\uparrow$ is the compatibility relation of partial words (it is not an equivalence relation). The LCCP problem is to preprocess a partial word in such a way that any query $lccp(i,j)$ about this word can be answered in $O(1)$…
▽ More
For a partial word $w$ the longest common compatible prefix of two positions $i,j$, denoted $lccp(i,j)$, is the largest $k$ such that $w[i,i+k-1]\uparrow w[j,j+k-1]$, where $\uparrow$ is the compatibility relation of partial words (it is not an equivalence relation). The LCCP problem is to preprocess a partial word in such a way that any query $lccp(i,j)$ about this word can be answered in $O(1)$ time. It is a natural generalization of the longest common prefix (LCP) problem for regular words, for which an $O(n)$ preprocessing time and $O(1)$ query time solution exists.
Recently an efficient algorithm for this problem has been given by F. Blanchet-Sadri and J. Lazarow (LATA 2013). The preprocessing time was $O(nh+n)$, where $h$ is the number of "holes" in $w$. The algorithm was designed for partial words over a constant alphabet and was quite involved.
We present a simple solution to this problem with slightly better runtime that works for any linearly-sortable alphabet. Our preprocessing is in time $O(nμ+n)$, where $μ$ is the number of blocks of holes in $w$. Our algorithm uses ideas from alignment algorithms and dynamic programming.
△ Less
Submitted 9 December, 2013;
originally announced December 2013.
-
Order-Preserving Suffix Trees and Their Algorithmic Applications
Authors:
Maxime Crochemore,
Costas S. Iliopoulos,
Tomasz Kociumaka,
Marcin Kubica,
Alessio Langiu,
Solon P. Pissis,
Jakub Radoszewski,
Wojciech Rytter,
Tomasz Walen
Abstract:
Recently Kubica et al. (Inf. Process. Let., 2013) and Kim et al. (submitted to Theor. Comp. Sci.) introduced order-preserving pattern matching. In this problem we are looking for consecutive substrings of the text that have the same "shape" as a given pattern. These results include a linear-time order-preserving pattern matching algorithm for polynomially-bounded alphabet and an extension of this…
▽ More
Recently Kubica et al. (Inf. Process. Let., 2013) and Kim et al. (submitted to Theor. Comp. Sci.) introduced order-preserving pattern matching. In this problem we are looking for consecutive substrings of the text that have the same "shape" as a given pattern. These results include a linear-time order-preserving pattern matching algorithm for polynomially-bounded alphabet and an extension of this result to pattern matching with multiple patterns. We make one step forward in the analysis and give an $O(\frac{n\log{n}}{\log\log{n}})$ time randomized algorithm constructing suffix trees in the order-preserving setting. We show a number of applications of order-preserving suffix trees to identify patterns and repetitions in time series.
△ Less
Submitted 27 March, 2013;
originally announced March 2013.
-
Note on the Greedy Parsing Optimality for Dictionary-Based Text Compression
Authors:
Maxime Crochemore,
Alessio Langiu,
Filippo Mignosi
Abstract:
Dynamic dictionary-based compression schemes are the most daily used data compression schemes since they appeared in the foundational papers of Ziv and Lempel in 1977, commonly referred to as LZ77. Their work is the base of Deflate, gZip, WinZip, 7Zip and many others compression software. All of those compression schemes use variants of the greedy approach to parse the text into dictionary phrases…
▽ More
Dynamic dictionary-based compression schemes are the most daily used data compression schemes since they appeared in the foundational papers of Ziv and Lempel in 1977, commonly referred to as LZ77. Their work is the base of Deflate, gZip, WinZip, 7Zip and many others compression software. All of those compression schemes use variants of the greedy approach to parse the text into dictionary phrases. Greedy parsing optimality was proved by Cohn et al. (1996) for fixed length code and unbounded dictionaries. The optimality of the greedy parsing was never proved for bounded size dictionary which actually all of those schemes require. We define the suffix-closed property for dynamic dictionaries and we show that any LZ77-based dictionary, including the bounded variants, satisfy this property. Under this condition we prove the optimality of the greedy parsing as a variant of the proof by Cohn et al.
△ Less
Submitted 22 November, 2012;
originally announced November 2012.
-
The Rightmost Equal-Cost Position Problem
Authors:
Maxime Crochemore,
Alessio Langiu,
Filippo Mignosi
Abstract:
LZ77-based compression schemes compress the input text by replacing factors in the text with an encoded reference to a previous occurrence formed by the couple (length, offset). For a given factor, the smallest is the offset, the smallest is the resulting compression ratio. This is optimally achieved by using the rightmost occurrence of a factor in the previous text. Given a cost function, for ins…
▽ More
LZ77-based compression schemes compress the input text by replacing factors in the text with an encoded reference to a previous occurrence formed by the couple (length, offset). For a given factor, the smallest is the offset, the smallest is the resulting compression ratio. This is optimally achieved by using the rightmost occurrence of a factor in the previous text. Given a cost function, for instance the minimum number of bits used to represent an integer, we define the Rightmost Equal-Cost Position (REP) problem as the problem of finding one of the occurrences of a factor which cost is equal to the cost of the rightmost one. We present the Multi-Layer Suffix Tree data structure that, for a text of length n, at any time i, it provides REP(LPF) in constant time, where LPF is the longest previous factor, i.e. the greedy phrase, a reference to the list of REP({set of prefixes of LPF}) in constant time and REP(p) in time O(|p| log log n) for any given pattern p.
△ Less
Submitted 21 November, 2012;
originally announced November 2012.
-
Abelian Repetitions in Sturmian Words
Authors:
Gabriele Fici,
Alessio Langiu,
Thierry Lecroq,
Arnaud Lefebvre,
Filippo Mignosi,
Élise Prieur-Gaston
Abstract:
We investigate abelian repetitions in Sturmian words. We exploit a bijection between factors of Sturmian words and subintervals of the unitary segment that allows us to study the periods of abelian repetitions by using classical results of elementary Number Theory. We prove that in any Sturmian word the superior limit of the ratio between the maximal exponent of an abelian repetition of period…
▽ More
We investigate abelian repetitions in Sturmian words. We exploit a bijection between factors of Sturmian words and subintervals of the unitary segment that allows us to study the periods of abelian repetitions by using classical results of elementary Number Theory. We prove that in any Sturmian word the superior limit of the ratio between the maximal exponent of an abelian repetition of period $m$ and $m$ is a number $\geq\sqrt{5}$, and the equality holds for the Fibonacci infinite word. We further prove that the longest prefix of the Fibonacci infinite word that is an abelian repetition of period $F_j$, $j>1$, has length $F_j(F_{j+1}+F_{j-1} +1)-2$ if $j$ is even or $F_j(F_{j+1}+F_{j-1})-2$ if $j$ is odd. This allows us to give an exact formula for the smallest abelian periods of the Fibonacci finite words. More precisely, we prove that for $j\geq 3$, the Fibonacci word $f_j$ has abelian period equal to $F_n$, where $n = \lfloor{j/2}\rfloor$ if $j = 0, 1, 2\mod{4}$, or $n = 1 + \lfloor{j/2}\rfloor$ if $ j = 3\mod{4}$.
△ Less
Submitted 9 May, 2013; v1 submitted 26 September, 2012;
originally announced September 2012.