Near-Optimal Quantum Algorithms For String Problems
Near-Optimal Quantum Algorithms For String Problems
https://doi.org/10.1007/s00453-022-01092-x
Received: 26 April 2022 / Accepted: 29 December 2022 / Published online: 27 January 2023
© The Author(s) 2023, corrected publication 2023
Abstract
We study quantum algorithms for several fundamental string problems, including
Longest Common Substring, Lexicographically Minimal String Rotation, and Longest
Square Substring. These problems have been widely studied in the stringology lit-
erature since the 1970s, and are known to be solvable by near-linear time classical
algorithms. In this work, we give quantum algorithms for these problems with near-
optimal query complexities and time complexities. Specifically, we show that: Longest
Common Substring can be solved by a quantum algorithm in Õ(n 2/3 ) time, improving
upon the recent Õ(n 5/6 )-time algorithm by Le Gall and Seddighin (in: Proceedings
of the 13th innovations in theoretical computer science conference (ITCS 2022), pp
97:1–97:23, 2022. https://doi.org/10.4230/LIPIcs.ITCS.2022.97). Our algorithm uses
the MNRS quantum walk framework, together with a careful combination of string
synchronizing sets (Kempa and Kociumaka, in: Proceedings of the 51st annual ACM
SIGACT symposium on theory of computing (STOC 2019), ACM, pp 756–767, 2019.
https://doi.org/10.1145/3313276.3316368) and generalized difference covers. Lexi-
cographically Minimal String Rotation can be solved by a quantum algorithm in
n 1/2+o(1) time, improving upon the recent Õ(n 3/4 )-time algorithm by Wang and Ying
(in: Quantum algorithm for lexicographically minimal string rotation. CoRR, 2020.
arXiv:2012.09376). We design our algorithm by first giving a new classical divide-
and-conquer algorithm in near-linear time based on exclusion rules, and then speeding
it up quadratically using nested Grover search and quantum minimum √ finding. Longest
Square Substring can be solved by a quantum algorithm in Õ( n) time. Our algo-
rithm is an adaptation of the algorithm by Le Gall and Seddighin (2022) for the
Longest Palindromic Substring problem, but uses additional techniques to overcome
the difficulty that binary search no longer applies. Our techniques naturally extend to
B Ce Jin
[email protected]
Shyan Akmal
[email protected]
123
Algorithmica (2023) 85:2260–2317 2261
other related string problems, such as Longest Repeated Substring, Longest Lyndon
Substring, and Minimal Suffix.
1 Introduction
˜
1 Throughout this paper, Õ(·) hides poly logn factors where n denotes the input length, and (·), ˜ are
(·)
defined analogously. In particular, Õ(1) means O(poly logn).
123
2262 Algorithmica (2023) 85:2260–2317
Fig. 1 Near-optimal quantum algorithms for string problems (see definitions in Sect. 2.2). Problems are
grouped based on similarity. All problems listed here have near-linear time classical algorithms
Despite these results, our knowledge about the quantum computational complexities
of basic string problems is far from complete. For the LCS problem and the Minimal
String Rotation problem mentioned above, there are n (1) gaps between current upper
bounds and lower bounds. Better upper bounds are only known in special cases: Le Gall
and Seddighin [12] gave an Õ(n 2/3 )-time algorithm for (1 − ε)-approximating LCS
in non-repetitive strings,
√ matching the query lower bound in this setting. Wang and
Ying [14] gave an Õ( n)-time algorithm for Minimum String Rotation in randomly
generated strings, and showed a matching average-case query lower bound. However,
these algorithms do not immediately extend to the general cases. Moreover, there
remain many other string problems which have near-linear time classical algorithms
with no known quantum speed-up.
In this work, we develop new quantum query algorithms for many fundamental string
problems. All our algorithms are near-optimal and time-efficient: they have time com-
plexities that match the corresponding query complexity lower bounds up to n o(1)
factors. In particular, we close the gaps for Longest Common Substring and Minimal
String Rotation left open in previous work [12, 14]. We summarize our contributions
(together with some earlier results) in Fig. 1. See Sect. 2.2 for formal definitions of
the studied problems.
We give high-level overviews of our quantum algorithms for Longest Common Sub-
string (LCS), Minimal String Rotation, and Longest Square Substring.
We consider the decision version of LCS with threshold length d: given two length-n
input strings s, t, decide whether they have a common substring of length d.
123
Algorithmica (2023) 85:2260–2317 2263
Le Gall and Seddighin [12, Section 3.1.1] observed a simple reduction from this
decision problem to the (bipartite version of) Element Distinctness problem, which
asks whether the two input lists A, B contain a pair of identical items Ai = B j .
Ambainis [13] gave a comparison-based algorithm for this problem in Õ(n 2/3 · T )
time, where T denotes the time complexity of comparing two items. In the LCS
problem of threshold length d, each item is a length-d substring of s or t (specified by
the starting position), and the√lexicographical order between two length-d substrings
can be compared in T = Õ( d) using binary search√and Grover search (see Lemma
2.5). Hence, this problem can be solved in Õ(n 2/3 · d) time.
The anchoring technique The inefficiency of the algorithm described above comes
from the fact that there are n − d + 1 = (n) positions to be considered in each input
string. This seems rather unnecessary for larger d, since intuitively there is a lot of
redundancy from the large overlap between these length-d substrings.
This is the idea behind the so-called anchoring technique, which has been widely
applied in designing classical algorithms for various versions of the LCS problem
[22–28]. In this technique, we carefully pick subsets C1 , C2 ⊆ [n] of anchors, such
that in a YES input instance there must exist an anchored common substring, i.e., a
common string with occurrences s[i . . i + d) = t[ j . . j + d) and a shift 0 ≤ h < d
such that i + h ∈ C1 and j + h ∈ C2 . Then, the task reduces to the Two String
Families LCP problem [23], where we want to find a pair of anchors i ∈ C1 , j ∈
C2 that can be extended in both directions to get a length-d common substring, or
equivalently, the longest common prefix of s[i . .], t[ j . .] and the longest common
suffix of s[. . i −1], t[. . j −1] have total length at least d. Intuitively, finding a smaller
set of anchors would make our algorithm have better running time.
anchor can be reported in T = Õ( d) quantum time. Our construction (Sect. 3.3)
is based on an approximate version of difference covers, combined with the string
synchronizing sets recently introduced by Kempa and Kociumaka [32] (adapted to the
sublinear setting using tools from pseudorandomness). Roughly speaking, allowing
errors in the difference cover makes the size much smaller, while also introducing
slight misalignments between the anchors, which are then to be fixed by the string
synchronizing sets.
123
2264 Algorithmica (2023) 85:2260–2317
Anchoring via quantum walks Now we explain how to use small and explicit√anchor sets
to obtain better quantum LCS algorithms with time complexity Õ(m 2/3 ·( √ d + T )) =
Õ(n 2/3 ), where m = Õ(n/d 3/4 ) is the number of anchors, and T = Õ( d) is the
time complexity of computing the i th anchor. Our algorithm uses the MNRS quantum
walk framework [33] (see Sect. 2.5) on Johnson graphs. Informally speaking, to apply
this framework, we need to solve the following dynamic problem: maintain a subset
of r anchors which undergoes insertions and deletions (called update steps), and in
each query (called a checking step) we need to solve the Two String Families LCP
problem on this subset, i.e., answer whether the current subset contains a pair of
anchors that can extend to a length-d common substring. If each update step takes
time U, and each checking step takes time√ C, then the MNRS quantum walk algorithm
√
has overall running time Õ(r U + mr ( r U + C)). We will achieve U = Õ( d + T )
√
and C = Õ( r d), and obtain the claimed time complexity by setting r = m 2/3 .
To solve this dynamic problem, we maintain the lexicographical ordering of the
length-d substrings specified by the current subset of anchors, as well as the corre-
sponding LCP array which contains the length of the longest common prefix between
every two lexicographically adjacent substrings. Note that the maintained information
uniquely defines the compact trie of these substrings. This information can be updated
easily after each insertion (or deletion) operation: we first compute the inserted anchor
in T time, and then use binary search with Grover√search to find its lexicographical
rank and the LCP values with its neighbors, in Õ( d) quantum time.
The maintained information will be useful for the checking step. In fact, if we
only care about query complexity, then we are already done, since the maintained
information already uniquely determines the answer of the Two String Families LCP
problem, and no additional queries to the input strings are needed. The main challenge
is to implement this checking step time-efficiently. Unfortunately, the classical near-
linear-time algorithm [23] for solving √ the Two String Families LCP problem is too
slow compared to our goal of C = Õ( r d), and it is not clear how to obtain a quantum
speedup over this classical algorithm. Hence, we should try to dynamically maintain
the solution using data structures, instead of solving it from scratch every time. In
fact, such a data structure with poly log(n) time per operation was already given
by Charalampopoulos, Gawrychowski, and Pokorski [27], and was used to obtain a
classical data structure for maintaining Longest Common Substring under character
substitutions. However, this data structure cannot be applied to the quantum walk
algorithm, since it violates two requirements that are crucial for the correctness of
quantum walk algorithms: (1) It should have worst-case time complexity (instead of
being amortized), and (2) it should be history-independent (see the discussion in Sect.
3.2.1 for more details). Instead, we will design a different data structure that satisfies
these two requirements, √ and can solve the Two String Families LCP problem on the
maintained subset in Õ( r d) quantum time. This time complexity is worse than the
poly log(n) time achieved by the classical data structure of [27], but suffices for our
application.
A technical hurdle: limitations of 2D range query data structures Our solution for
the Two String Families LCP problem is straightforward, but a key component in
the algorithm relies on dynamic 2-dimensional orthogonal range queries. This is a
123
Algorithmica (2023) 85:2260–2317 2265
well-studied problem in the data structure literature, and many poly log (n)-time data
structures are known (see [34–36] and the references therein). However, for our results,
the 2-dimensional (2D) range query data structure in question has to satisfy not only the
two requirements mentioned above, but also a third requirement of being comparison-
based. In particular, we are not allowed to treat the coordinates of the 2D points as
poly(n)-bounded integers, because the coordinates actually correspond to substrings
of the input string, and should be compared by lexicographical order. Unfortunately,
no data structures satisfying all three requirements are known.
To bypass this difficulty, our novel idea is to use a sampling procedure that lets
us estimate the rank of a coordinate of the inserted 2D point among all the possible
coordinates, which effectively allows us to convert the non-integer coordinates into
integer coordinates. By a version of the Balls-and-Bins hashing argument, the inac-
curacy incurred by the sampling can be controlled for most of the vertices on the
Johnson graph which the quantum walk operates on. This then lets us apply 2D range
query data structures over integer coordinates (see Sect. 3.2.3 for the details of this
argument), which can be implemented with worst-case time complexity and history-
independence as required. Combining this method with the tools and ideas mentioned
before lets us get a time-efficient implementation of the quantum walk algorithm for
computing the LCS.
We believe this sampling idea will find further applications in improving the time
efficiency of quantum walk algorithms (for example, it can simplify the implemen-
tation of Ambainis’ Õ(n 2/3 )-time Element Distinctness algorithm, as noted in Sect.
6).
In the Minimal String Rotation problem, we are given a string s of length n and are
tasked with finding the cyclic rotation of s which is lexicographically the smallest. We
sketch the main ideas of our improved quantum algorithm for Minimal String Rotation
by comparing it to the previous best solution for this problem.
The simplest version of Wang and Ying’s algorithm [14, Theorem 5.2] works by
identifying a small prefix of the minimal rotation using Grover search, and then apply-
ing pattern matching with this small prefix to find the starting position of the minimum
rotation. More concretely, let B be some size parameter. By quantum minimum finding
over all prefixes of length B among the rotations
√ of √ s, we can find the length-B prefix
P of the minimal rotation in asymptotically B · n time. Next, split the string s into
(n/B) blocks of size (B) each. Within each block, we find the leftmost occurrence
of P via quantum Exact String Matching [9] It turns out that one of these positions is
guaranteed to be a starting position of the minimal rotation (this property is called an
“exclusion rule” or “Ricochet Property” in the literature). By minimum finding over
these O(n/B) candidate starting positions (and comparisons of length-n√strings √ via
Grover search), we can find the true minimum rotation in asymptotically n/B · n
time. So overall the algorithm takes asymptotically
√ √
Bn + (n/ B)
123
2266 Algorithmica (2023) 85:2260–2317
√
time, which is minimized at B = n and yields a runtime of Õ(n 3/4 ).
This algorithm is inefficient in its first step, where it uses quantum minimum finding
to obtain the minimum length-B prefix P. The length-B prefixes we are searching over
all come from rotations of the same string s. Due to this common structure, we should
be able to find their minimum more efficiently than just using the generic algorithm for
minimum finding. At a high level, we improve this step by finding P using recursion
instead. Intuitively, this is possible because the Minimal Rotation problem is already
about finding the minimum “prefix” (just of length n) among rotations of s. This then
yields a recursive algorithm running in n 1/2+o(1) quantum time.
In the presentation of this algorithm in Sect. 4, we use a chain of reductions and
actually solve a more general problem to get this recursion to work. The argument also
relies on a new “exclusion rule,” adapted from previous work, to prove that we only
need to consider a constant number of candidate starting positions of the minimum
rotation within each small block of the input string.
A square string is a string of even length with the property that its first half is identical
to its second half. In other words, a string is square if it can be viewed as some string
repeated twice in a row.
We show how to find the longest square substring √ in an input string of length n
using a quantum algorithm which runs in Õ( n) time. Our algorithm mostly fol-
lows the ideas used in [12] to solve the Longest Palindromic Substring problem, but
makes some modifications due to the differing structures of square substrings and
palindromic substrings (for example, [12] exploits the fact that if a string contains a
large palindromic substring it has smaller palindromic substrings centered at the same
position; in contrast, it is possible for a string to have a large square substring but not
contain any smaller square substrings, so we cannot leverage this sort of property).
At a high level, our algorithm starts by guessing the size of the longest square
substring within a (1 + ε) factor for some small constant ε > 0. We then guess a
large substring P contained in the first half of an optimal solution, and then use the
quantum algorithm for Exact String Matching to find a copy of this P in the second
half of the corresponding solution. If we find a unique copy of P, we can use a Grover
search to extend outwards from our copies of P and recover a longest square substring.
Otherwise, if we find multiple copies, it implies our substring is periodic, so we can
use a Grover search to find a maximal periodic substring containing a large square
substring, and then employ some additional combinatorial arguments to recover the
solution.
Quantum algorithms on string problems Wang and Ying [14] improved the logarithmic
factors of the quantum Exact String Matching algorithm by Ramesh and Vinay [9]
(and filled in several gaps in their original proof), and showed that the same technique
can be used to find the smallest period of a periodic string [14, Appendix D].
123
Algorithmica (2023) 85:2260–2317 2267
Another important string problem is computing the edit distance between two
strings (the minimum number of deletions, insertions, and substitutions needed to turn
one string into the other). The best known classical algorithm has O(n 2 / log2 n) time
complexity [37], which is near-optimal under the Strong Exponential Time Hypoth-
esis [38]. It is open whether quantum algorithms can compute edit distance in truly
subquadratic time. For the approximate version of the edit distance problem, the break-
through work of Boroujeni et al. [39] gave a truly subquadratic time quantum algorithm
for computing a constant factor approximation. The quantum subroutines of this algo-
rithm were subsequently replaced with classical randomized algorithms in [40] to get a
truly subquadratic classical algorithm that approximates the edit distance to a constant
factor.
Le Gall and Seddighin [12] also considered the (1 + ε)-approximate Ulam distance
problem (i.e., edit distance
√ on non-repetitive strings), and showed a quantum algorithm
with near-optimal Õ( n) time complexity. Their algorithm was based on the classical
algorithm by Naumovitz, Saks, and Seshadhri [41].
Montarano [42] gave quantum algorithms for the d-dimensional pattern matching
problem with random inputs. Ambainis et al. [43] gave quantum algorithms for decid-
ing Dyck languages. There are also some results [44, 45] on string problems with
non-standard quantum queries to the input.
Quantum walks and time-efficient quantum algorithms Quantum walks [13, 33, 46]
are a useful method to obtain query-efficient quantum algorithms for many important
problems, such as Element Distinctness [13] and Triangle Finding [47–49]. Ambainis
showed that the query-efficient algorithm for element distinctness [13] can also be
implemented in a time-efficient manner with only a poly log(n) blowup, by applying
history-independent data structures in the quantum walk. Since then, this “quantum
walk plus data structure” strategy has been used in many quantum algorithms to obtain
improved time complexity. For example, Belovs, Childs, Jeffery, Kothari, and Magniez
[50] used nested quantum walk with Ambainis’ data structure to obtain time-efficient
algorithms for the 3-distinctness problem. Bernstein, Jeffery, Lange, and Meurer [51]
designed a simpler data structure called quantum radix tree [52], and applied it in their
quantum walk algorithms for the Subset Sum problem on random input. Aaronson,
Chia, Lin, Wang, and Zhang [53] gave a quantum walk algorithm for the Closest-Pair
problem in O(1)-dimensional space with near-optimal time complexity Õ(n 2/3 ). The
previous Õ(n 2/3 )-time algorithm for approximating LCS in non-repetitive strings [12]
also applied quantum walks.
On the other hand, query-efficient quantum algorithms do not always have
time-efficient implementations. This motivated the study of quantum fine-grained
complexity. Aaronson et al. [53] formulated the QSETH conjecture, which is a quan-
tum analogue of the classical Strong Exponential Time Hypothesis, and showed that
Orthogonal Vectors and Closest-Pair in poly log(n)-dimensional space require n 1−o(1)
quantum time under QSETH. In contrast, these two problems have simple quantum
walk algorithms with only O(n 2/3 ) query complexity. Buhrman, Patro, and Speelman
[54] formulated another version of QSETH, which implies a conditional (n 1.5 )-time
lower bound for quantum algorithms solving the edit distance problem. Recently,
Buhrman, Loff, Patro, and Speelman [55] proposed the quantum 3SUM hypothesis,
123
2268 Algorithmica (2023) 85:2260–2317
and used it to show that the quadratic quantum speedups obtained by Ambainis and
Larka [56] for many computational geometry problems are conditionally optimal.
Notably, in their fine-grained reductions, they employed a quantum walk with data
structures to bypass the linear-time preprocessing stage that a naive approach would
require.
Some of our results were improved after the conference version of this paper was pub-
lished. Jin and Nogler [81] gave a quantum algorithm that decides whether the Longest
Common Substring of two input strings has length at least d in Õ((n/d)2/3 ·d 1/2+o(1) )
quantum query and time complexity. Childs, Kothari, Kovacs-Deak, Sundaram, and
123
Algorithmica (2023) 85:2260–2317 2269
Wang [82] gave a quantum algorithm for a decision version of the Lexicographical
Minimal String Rotation problem in Õ(n 1/2 ) quantum query complexity.
In Sect. 2, we provide useful definitions and review some quantum primitives which
will be used in our algorithms. In Sect. 3, we present our algorithm for Longest
Common Substring. In Sect. 4, we present our algorithm for Minimal String Rotation
and several related problems. In Sect. 5, we present our algorithm for Longest Square
Substring. Finally, we mention several open problems in Sect. 6.
2 Preliminaries
123
2270 Algorithmica (2023) 85:2260–2317
not divide |s|, we say that s is primitive. We will need the following classical results
regarding periods of strings for some of our algorithms.
Lemma 2.2 (Weak Periodicity Lemma, [84]) If a string s has periods p and q such
that p + q ≤ |s|, then gcd( p, q) is also a period of s.
Lemma 2.3 (e.g., [79, 85]) Let s, t be two strings with |s|/2 ≤ |t| ≤ |s|, and let
s[i 1 . . i 1 + |t|) = s[i 2 . . i 2 + |t|) = · · · = s[i m . . i m + |t|) = t be all the occurrences
of t in s (where i k < i k+1 ). Then, i 1 , i 2 , . . . , i m form an arithmetic progression.
Moreover, if m ≥ 3, then per(t) = i 2 − i 1 .
We say string s is a (cyclic) rotation of string t, if |s| = |t| = n and there exists
an index 1 ≤ i ≤ n such that s = t[i . . n]t[1 . . i − 1]. If string s is primitive and
is lexicographically minimal among its cyclic rotations, we call s a Lyndon word.
Equivalently, s is a Lyndon word if and only if s t for all proper suffices t of s. For
a periodic string s with minimal period per(s) = p, the Lyndon root of s is defined
as the lexicographically minimal rotation of s[1 . . p], which can be computed by a
classical deterministic algorithm in linear time (e.g., [6–8]).
123
Algorithmica (2023) 85:2260–2317 2271
Maximal Suffix
Input: A string s
Task: Output the position i ∈ [1 . . |s|] such that s[i . . |s|] s[ j . . |s|] holds for
all j ∈ [|s|] \ {i}.
Minimal Suffix
Input: A string s
Task: Output the position i ∈ [1 . . |s|] such that s[i . . |s|] ≺ s[ j . . |s|] holds for
all j ∈ [|s|] \ {i}.
In the first four problems, we only require the algorithm to output the maximum
length. The locations of the witness substrings can be found by a binary search.
We assume the input strings can be accessed in a quantum query model [86, 87], which
is standard in the literature of quantum algorithms. More precisely, letting s be an input
string of length n, we have access to an oracle Os that, for any index i ∈ [n] and any
b ∈ , performs the unitary mapping Os : |i, b → |i, b ⊕ s[i], where ⊕ denotes
the XOR operation on the binary encodings of characters. The oracles can be queried
in superposition, and each query has unit cost. Besides the input queries, the algorithm
can also apply intermediate unitary operators that are independent of the input oracles.
Finally, the query algorithm should return the correct answer with success probability
at least 2/3 (which can be boosted to high probability2 by a majority vote over O(log n)
repetitions). The query complexity of an algorithm is the number of queries it makes
to the input oracles.
In this paper, we are also interested in the time complexity of the quantum algorithms,
which counts not only the queries to the input oracles, but also the elementary gates [88]
for implementing the unitary operators that are independent of the input. In order to
implement the query algorithms in a time-efficient manner, we also need the quantum
random access gate, defined as
to access at unit cost the i th element from the quantum working memory |z 1 , . . . , z m .
Assuming quantum random access, a classical time-T algorithm that uses random
access memory can be converted into a quantum subroutine in time O(T ), which can be
2 We say an algorithm succeeds with high probability (w.h.p), if the success probability can be made at
least 1 − 1/n c for any desired constant c > 1.
123
2272 Algorithmica (2023) 85:2260–2317
invoked by quantum search primitives such as Grover search. Quantum random access
has become a standard assumption in designing time-efficient quantum algorithms (for
example, all the time-efficient quantum walk algorithms mentioned in Sect. 1.3 relied
on this assumption).
Grover search (Amplitude amplification) [11, 89]. Let f : [n] → {0, 1} be a function,
where f (i) for each i ∈ [n] can be evaluated in time T . There is a quantum algorithm
−1 that f −1 (1) is empty, in
√with high probability, finds an x ∈ f (1) or report
that,
Õ( n·T ) time. Moreover, if it is guaranteed
√ that either | f (1)| ≥ M or | f −1 (1)| = 0
−1
Quantum minimum finding [15]. Let x1 , . . . , xn be n items with a total order, where
each pair of xi and x j can be compared in time T . There is a quantum algorithm that,
√
with high probability, finds the minimum item among x1 , . . . , xn in Õ( n · T ) time.
Remark 2.4 If the algorithm for evaluating f (i) (or for comparing xi , x j ) has some
small probability of outputting the wrong answer, we can first boost it to high success
probability, and then the Grover search (or Quantum minimum finding) still works,
since quantum computational errors only accumulate linearly. It is possible to improve
the log-factors in the query complexity of quantum search when the input has errors
[90], but in this paper we do not seek to optimize the log-factors.
Lemma 2.5 (Computing LCP) Given two strings s, t of lengths |s|, |t| ≤ n, there
√ is
a quantum algorithm that computes lcp(s, t) and decides whether s t, in Õ( n)
time.
Proof √
Note that we can use Grover search to decide whether two strings are identical
in Õ( n) time. Then we can compute lcp(s, t) by a simple binary search over the
length of the prefix. After that we can easily compare their lexicographical order by
comparing the next position.
Given a string s and positions g and h such that s[g] = s[h], we often use Lemma
2.5 to “extend” these common characters to larger identical strings to some bound d
while keeping them equivalent (i.e. find the largest positive integer j ≤ d such that
s[g . . g+ j) = s[h . . h + j)). We will often refer to this process (somewhat informally)
as “extending strings via Grover search.”
As a final useful subroutine, we appeal to the result of Ramesh and Vinay [9], who
combined Grover search with the deterministic sampling technique of Vishkin [10],
and obtained a quantum algorithm for Exact String Matching.
Theorem 2.6 (Quantum Exact String Matching [9]) We can solve the Exact String
Matching√problem with a quantum algorithm on input strings s, t of length at most n
using Õ( n) query complexity and time complexity.
123
Algorithmica (2023) 85:2260–2317 2273
We use the quantum walk framework developed by Magniez, Nayak, Roland, and
Santha [33], and apply it on Johnson graphs,
The Johnson graph J (m, r ) has mr vertices, each being an subset of [m] with size
r , where two vertices in the graph A, B ∈ [m] r are connected by an edge if and
only if |A ∩ B| = r − 1, or equivalently there exist a ∈ A, b ∈ [m] \ A such that
B = (A \{a})∪{b}. Depending on the application, we usually identify a special subset
of the vertices Vmarked ⊆ [m] r as being marked. The quantum walk is analogous to a
random walk on the Johnson graph attempting to find a marked vertex, but provides
quantum speed-up compared to the classical random walk. The vertices in the Johnson
graph are also called the states of the walk.
In the quantum walk algorithm, each vertex K ∈ [m] r is associated with a data
structure
D(K ). The setup cost S is the cost to set up the data structure D(K ) for any
K ∈ [m] r , where the cost could be measured in query complexity or time complexity.
The checking cost C is the cost to check whether K is a marked vertex, given the
data structure D(K ). The update cost U is the cost of updating the data structure
from D(K ) to D(K ), where K = (K \ {a}) ∪ {b} is an adjacent vertex specified
by a ∈ K , b ∈ [m] \ K . The MNRS quantum walk algorithm can be summarized as
follows.
Theorem 2.7 (MNRS Quantum Walk [33]) Suppose |Vmar ked |/ mr ≥ ε whenever
Vmar ked is non-empty. Then there is a quantum algorithm that with high probability √
determines if Vmar ked is empty or finds a marked vertex, with cost of order S+ √1ε ( r ·
U + C).
Readers unfamiliar with the quantum walk approach are referred to [91, Section
8.3.2] for a quick application of this theorem to solve the Element Distinctness problem
using O(n 2/3 ) quantum queries. This algorithm can be implemented in Õ(n 2/3 ) time
by carefully designing the data structures to support time-efficient insertion, deletion,
and searching [13, Section 6.2]. We elaborate on the issue of time efficiency when we
apply quantum walks in our algorithm in Sect. 3.2.
123
2274 Algorithmica (2023) 85:2260–2317
As mentioned in Sect. 1.2.1, our algorithm for LCS is based on the anchoring technique
which previously appeared in classical LCS algorithms. Here, we will implement this
technique using the MNRS quantum walk framework (Sect. 2.5).
Notations and input assumptions To simplify the presentation, we concatenate the two
input strings s, t into S := s$t, where $ is a delimiter symbol that does not appear in
the input strings, and let n = |S| = |s| + 1 + |t|. So s[i] = S[i] for all i ∈ [1 . . |s|],
and t[ j] = S[|s| + 1 + j] for all j ∈ [1 . . |t|].
We will only solve the decision version of LCS: given a length threshold d, deter-
mine whether s and t have a common substring of length d. The algorithm for
computing the length of the longest common substring then follows from a binary
search over the threshold d. We assume d ≥ 100 to avoid corner cases in later anal-
ysis; for smaller d, the problem can be solved in Õ(n 2/3 d 1/2 ) = Õ(n 2/3 ) time by
reducing to the (bipartite version of) element distinctness problem [12, Section 3.1.1]
and applying Ambainis’ algorithm [13] (see Sect. 1.2.1).
Anchoring We begin by introducing the notion of good anchor sets.
Definition 3.2 (Good anchor sets) For input strings s, t and threshold length d, we call
C ⊆ [1 . . n] a good anchor set if the following holds: if the longest common substring
of s and t has length at least d, then there exist positions i ∈ [1 . . |s| − d + 1], j ∈
[1 . . |t| − d + 1] and a shift h ∈ [0 . . d), such that s[i . . i + d) = t[ j . . j + d), and
i + h, |s| + 1 + j + h ∈ C.
In this definition, the anchor set C is allowed to depend on s and t. If C =
{C(1), C(2), . . . , C(m)} and there is a (quantum) algorithm that, given any index
1 ≤ j ≤ m, computes the element C( j) in T (n, d) time, then we say C is T (n, d)-
(quantum)-time constructible. The elements C(1), C(2), . . . , C(m) are allowed to
contain duplicates (i.e., C could be a multiset), and are not necessarily sorted in any
particular order.
The set [1 . . n] is trivially a good anchor set, but there are constructions of much
smaller size. As a concrete example, one can directly construct good anchor sets using
difference covers.
Definition 3.3 (Difference cover [29, 30]) A set D ⊆ N+ is called a d-cover, if for
every i, j ∈ N+ , there exists an integer h(i, j) ∈ [0 . . d) such that i + h(i, j), j +
h(i, j) ∈ D.
The following construction of d-cover has optimal size (up to a constant factor).
Lemma 3.4 (Construction of d-cover [29, 30]) √ For every positive integer d ≥ 1, there
is a d-cover D such that D ∩[n] contains O(n/ d) elements. Moreover, given integer
i ≥ 1, one can compute the i th smallest element of D ∩ [n] in Õ(1) time.
Here we omit the proof of Lemma 3.4, as a more general version (Lemma 3.19) will be
proved later in Sect. 3.3.1. Using difference covers, we immediately have the following
simple construction of good anchor sets.
123
Algorithmica (2023) 85:2260–2317 2275
are substrings (or reversed substrings) of S obtained by extending from the anchor C(k)
to the right or reversely to the left. The length of P(k) is at most3 d, and the length of
Q(k) is at most d − 1. We say the string pair (P(k), Q(k)) is red if C(k) ∈ [1 . . |s|],
or blue if C(k) ∈ [|s| + 1 . . n]. We also say k ∈ [m] is a red index or a blue index,
depending on the color of the string pair (P(k), Q(k)). Then, from the definition of
good anchor sets, we immediately have the following simple observation.
Proposition 3.7 (Witness Pair) The longest common substring of s and t has length at
least d, if and only if there exist a red string pair (P(k), Q(k)) and a blue string pair
(P(k ), Q(k )) where k, k ∈ [m], such that lcp(P(k), P(k ))+lcp(Q(k), Q(k )) ≥ d.
In such case, (k, k ) is called a witness pair.
Proof Suppose s and t have LCS of length at least d. Then the property of the good
anchor set C implies the existence of a shift h ∈ [0 . . d) and a length-d common
substring s[i . . i + d) = t[ j . . j + d) such that i + h = C(k), |s| + 1 + j + h =
C(k ) for some k, k ∈ [m]. Then, we must have lcp(P(k), P(k )) ≥ d − h and
lcp(Q(k), Q(k )) ≥ h, implying that (k, k ) is a witness pair.
Conversely, the existence of a witness pair immediately implies a common substring
of length at least d.
Remark 3.8 The algorithm we are going to describe can be easily adapted to the
Longest Repeated Substring problem: we only have one input string S[1 . . n], and
we drop the red-blue constraint in the definition of witness pairs in Proposition 3.7.
3 Recall that we use the convention S[x . . y) := S[max{1, x} . . min{y + d, n + 1}) for a length-n string S.
123
2276 Algorithmica (2023) 85:2260–2317
Now we shall describe our quantum walk algorithm that solves the decision version
of LCS by searching for a witness pair.
Definition of the Johnson graph Recall that C = {C(1), . . . , C(m)} is a good anchor
set size |C| = m. We perform a quantum walk on the Johnson graph with vertex set
[m]of
r , where r is a parameter to be determined later. A vertex K = {k1 , k2 , . . . , kr } ⊆
[m] in the Johnson graph is called a marked vertex, if and only if {k1 , k2 , . . . , kr }
contains a witness
pair m(Proposition
3.7). If s and t have a common substring of length
d, then at least m−2
r −2 / r = (r 2 /m 2 ) fraction of the vertices are marked. Otherwise,
123
Algorithmica (2023) 85:2260–2317 2277
observation, due to the fact that our associated information already uniquely deter-
mines the LCP value of every pair.
Proposition 3.9 (Query complexity of checking step is zero) Using the associated
data defined above, we can determine whether {k1 , . . . , kr } ⊆ [m] is a marked state,
without making any additional queries to the input.
Now, we consider the cost of maintaining the associated data when the subset
{k1 , . . . , kr } undergoes insertion and deletion during the quantum walk algorithm.
Proposition 3.10 (Update cost) Assume the anchor set C is T -time constructible.
√ each update step of the quantum walk algorithm has query complexity U =
Then,
Õ( d + T ).
Proof Let us consider how to update the associated data when a new index k is being
inserted into the subset {k1 , . . . , kr }. The deletion process is simply the reverse oper-
ation of insertion.
The insertion procedure can be summarized by the pseudocode in Algorithm 1. First,
we compute and store C(k) in time T . Then we use a binary search to find the correct
Q Q
place to insert k into the lexicographical orderings (k1P , . . . , krP ) (and (k1 , . . . , kr )).
Since the involved substrings have length d, each √ lexicographical comparison required
by this binary search can be implemented in Õ( d) time by Lemma 2.5. After inserting
k into the list, we update the LCP array by computing its LCP values h pre , h suc with
two neighboring substrings, and removing (by “uncomputing”) √ the LCP value h old
between their neighbors which were adjacent at first, in Õ( d) time (Lemma 2.5).
Proposition√3.11 (Setup cost) The setup step of the quantum walk has query complexity
S = Õ(r · ( d + T )).
Proof We can set up the initial state for the quantum walk by simply performing r
insertions successively using Proposition 3.10.
123
2278 Algorithmica (2023) 85:2260–2317
Finally, by Theorem 2.7, the query complexity of our quantum walk algorithm
(omitting poly log(n) factors) is
m2 √
S+ 2
· (C + r · U) (1)
r
√ m √ √
= r · ( d + T) + · 0 + r · ( d + T)
√ r
= m 2/3 · ( d + T ),
In this section, we will show how to implement the Õ(n 2/3 )-query quantum walk
algorithm from Sect. 3.1 in time complexity Õ(n 2/3 ).
3.2.1 Overview
Recall that our algorithm described in Sect. 3.1 for input strings
s, t and threshold
length d performs a quantum walk on the Johnson graph [m] r . In this section, we have
to measure the quantum walk costs S, C, U in terms of the time complexity instead of
query complexity. Inspecting Equation 1, we observe that the quantum walk algorithm
can achieve Õ(n 2/3 ) time complexity, as long as we can√implement the setup, √checking
and update
√ steps with time complexities S = Õ(r ( d + T )), C = Õ( r d), and
U = Õ( d + T ).
As mentioned in Sect. 3.1, there are two parts in the described quantum walk
algorithm that are time-assuming:
• Maintaining the arrays of associated data under insertions and deletions ((Remark
3.12).
• Solving the Two String Families LCP problem in the checking step.
Now we give an overview of how we address these two problems.
Dynamic arrays under insertions and deletions A natural solution to speed up the
insertions and deletions is to maintain the arrays of using appropriate data structures,
123
Algorithmica (2023) 85:2260–2317 2279
which support the required operations in Õ(1) time. This “quantum walk plus data
structures” framework was first used in Ambainis’ element distinctness algorithm
[13], and have been applied to many time-efficient quantum walk algorithms (see the
discussion in Sect. 1.3). However, as noticed by Ambainis [13, Section 6.2], such
data structures have to satisfy the following requirements in order to be applicable in
quantum walk algorithms.
1. The data structure needs to be history-independent, that is, the representation of
the data structure in memory should only depend on the set of elements stored (and
the random coins used) by the data structure, not on the sequence of operations
leading to this set of elements.
2. The data structure should guarantee worst-case time complexity (with high prob-
ability over the random coins) per operation.
The first requirement guarantees that each vertex of the Johnson graph corresponds
to a unique quantum state, which is necessary since having multiple possible states
would destroy the interference during the quantum walk algorithm. This requirement
rules out many types of self-balancing binary search trees5 such as AVL Tree and
Red-Black Tree.
The second requirement rules out data structures with amortized or expected run-
ning time, which may take very long time in some of the operations. The reason is that,
during the quantum algorithm, each operation is actually applied to a superposition
of many instances of the data structure, so we would like the time complexity of an
operation to have a fixed upper bound that is independent of the particular instance
being operated on.
Ambainis [13] designed a data structure satisfying both requirements based on hash
tables and skip lists, which maintains a sorted list of items, and supports insertions,
deletions, and searching in Õ(1) time with high probability. Buhrman, Loff, Patro, and
Speelman [55] modified this data structure to also support indexing queries, which ask
for the k th item in the current list (see Lemma 3.14 below). Using this data structure
to maintain the arrays in our quantum walk algorithm, we can implement the update
steps and the setup steps time-efficiently.
Dynamic Two String Families LCP The checking step of our quantum walk algorithm
(Proposition 3.9) requires solving an Two String Families LCP instance with r string
pairs of lengths bounded by d. We will not try to solve this problem from scratch for
each instance, since it is not clear how to solve it significantly faster than the Õ(r )-
time classical algorithm [23, Lemma 3] even using quantum algorithms. Instead, we
dynamically maintain the solution using some data structure, which efficiently handles
each update step during the quantum walk where we insert one string pair (P, Q) into
(and remove one from) the current Two String Families LCP instance. As mentioned
in Sect. 1.2.1, the classical data structure for this task given by Charalampopoulos,
Gawrychowski, and Pokorski [27] is not applicable here, since it violates both require-
ments mentioned above: it maintains a heavy-light decomposition of the compact tries
of the input strings, and rebuilds them from time to time to ensure amortized poly log(n)
123
2280 Algorithmica (2023) 85:2260–2317
123
Algorithmica (2023) 85:2260–2317 2281
Proof (Sketch) The construction is similar to [13, Section 6.2]. The hash table has
r buckets, each with the capacity for storing O(log m) many key-value pairs. A pair
(key, value) is stored in the (h(key))th bucket, and the pairs inside each bucket are
sorted in increasing order of keys. If some buckets overflow, we can collect all the
leftover pairs into a separate buffer of size r and store them in sorted order. This
ensures that any set of r key-value pairs has a unique representation in the memory.
And, each basic operation can be implemented in poly log(m) time, unless there is an
overflow. Using an O(log m)-wise independent hash function h : [m] → [r ], for any
possible r -subset of keys, with high probability none of the buckets overflow.6
Dynamic arrays We will need a dynamic array that supports indexing, insertion, dele-
tion, and some other operations.
The skip list [93] is a probabilistic data structure which is usually used as an alter-
native to balanced trees, and satisfies the history-independence property. Ambainis’
quantum Element Distinctness algorithm [13] used the skip list to maintain a sorted
array, supporting insertions, deletions, and binary search. In order to apply the skip
list in the quantum walk, a crucial adaptation in Ambainis’ construction is to show
that the random choices made by the skip list can be simulated using O(log n)-wise
independent functions [13, Section 6.2], which only take poly log(n) random coins to
sample. In the recent quantum fine-grained reduction result by Buhrman, Loff, Patro,
and Speelman [55, Section 3.2], they used a more powerful version of skip lists that
supports efficient indexing. We will use this version of skip lists with some slight
extension.
Proof (Sketch) We will use (a slightly modified version of) the data structure described
in [55, Section 3.2], which extends the construction of [13, Section 6.2] to support
insertion, deletion, and indexing. Their construction is a (bidirectional) skip list of
the items, where a pointer (a “skip”) from an item (key, value) to another item
6 We remark that Ambainis only used a fixed hash function h(i) = r · i/m, which ensures the buckets
do not overflow with high probability over a random r -subset K ⊆ [m] of keys. Ambainis showed that this
property is already sufficient for the correctness of the quantum walk algorithm. Here we choose to state a
different version that achieves high success probability for every fixed r -subset of keys, merely for keeping
consistency with later presentation.
123
2282 Algorithmica (2023) 85:2260–2317
(key , value ) is stored in a hash table as a key-value pair (key, key ). To support
efficient indexing, for each pointer they also store the distance of this skip, which is
used during an indexing query to keep track of the current position after following the
pointers (similar ideas were also used in, e.g., [94, Section 3.4]). After every insertion
or deletion, the affected distance values are updated recursively, by decomposing a
level-i skip into O(log n) many level-(i − 1) skips.
A difference between their setting and ours is that they always keep the array sorted
in increasing order of value’s, and the position of an inserted item is decided by its
relative order among the values in the array, instead of by a given position 1 ≤ i ≤ r +1.
Nevertheless, it is straightforward to adapt their construction to our setting, by using
the distance values of the skips to keep track of the current position, instead of by
comparing the values of items.
Note that using the distance values we can also efficiently implement the Location
operation in a reversed way compared to Indexing, by following the pointers backwards
and moving up levels.
To implement the range-minimum query operations, we maintain the range-
minimum value of each skip in the skip list, in a similar way to maintaining the
distance values of the skips. They can also be updated recursively after each update.
Then, to answer a query, we can travel from the a th item to the bth by following the
pointers (this is slightly trickier if a = 1, where we may first move up levels and then
move down).
We also need a 2D range sum data structure for points with integer coordinates.
Lemma 3.15 (2D range sum) Let integer N ≤ n O(1) . There is a history-
independent data structure of size Õ(r ) that maintains a multi-set of at most r points
{(x1 , y1 ), . . . , (xr , yr )} with integer coordinates xi ∈ [N ], yi ∈ [N ], and supports the
following operations with worst-case Õ(1) time complexity and high success proba-
bility:
• Insertion Add a new point (x, y) into the multiset (duplicates are allowed).
• Deletion Delete the point (x, y) from the multiset (if it appears more than once,
only delete one copy of them).
• Range sum Given 1 ≤ x1 ≤ x2 ≤ N , 1 ≤ y1 ≤ y2 ≤ N , return the number of
points (x, y) in the multiset that are in the rectangle [x1 . . x2 ] × [y1 . . y2 ].
C1 = {[1 . . N ]},
C2 = {[1 . . N /2], [N /2 + 1 . . N ]},
C3 = {[1 . . N /4], [N /4 + 1 . . 2N /4], [2N /4 + 1 . . 3N /4], [3N /4 + 1 . . N ]},
...
Clog N = {[1 . . 1], [2 . . 2], . . . , [N . . N ]}.
123
Algorithmica (2023) 85:2260–2317 2283
Now we will use the data structures described in Sect. 3.2.2 to implement our quantum
walk algorithm from Sect. 3.1 time-efficiently.
Recall that C is the T -quantum-time-constructible good anchor set of size |C| =
m (Definition 3.2). The states of our quantum walk algorithms are r -subsets K =
{k1 , k2 , . . . , kr } ⊆ [m], where each index k ∈ K is associated with an anchor C(k) ∈
[n], which specifies the color (red or blue) of k and the pair (P(k), Q(k)) of strings
of lengths at most d. We need to maintain the lexicographical orderings (k1P , . . . , krP )
and LCP arrays (h 1P , . . . , h rP−1 ), so that P(k1P ) P(k2P ) · · · P(krP ) and h iP =
lcp(P(kiP ), P(ki+1 P )), and similarly maintain (k Q , . . . , k Q ), (h Q , . . . , h Q ) for the
1 r 1 r −1
strings {Q(k)}k∈K .
For k ∈ K , we use pos P (k) to denote the position i such that kiP = k, i.e., the lexi-
cographical rank of P(k) among all P(k1 ), . . . , P(kr ). Similarly, let pos Q (k) denote
Q
the position i such that ki = k.
We can immediately see that all the steps in the update step (Algorithm 1) of
our quantum walk can be implemented time-efficiently. In particular, we use a hash
table (Lemma 3.13) to store the anchor C(k) corresponding to each k ∈ K , and
use Lemma 3.14 to maintain the lexicographical orderings and LCP arrays under
insertions and deletions. Each update operation on these data structures takes Õ(1)
time. Additionally, these data structures allow us to efficiently compute some useful
information, as summarized below.
123
2284 Algorithmica (2023) 85:2260–2317
Proof For 1, rather than use T time to compute C(k) (Definition 3.2), we instead look
up the value of C(k) from the hash table. Then, C(k) ∈ [n] determines the color of k
and the string lengths.
For 2, we use the location operation of the dynamic array data structure (Lemma
3.14).
For 3, we first compute i = pos P (k), i = pos P (k ), and assume i < i with-
out loss of generality. Then, by Lemma 2.1, we can compute lcp(P(k), P(k )) =
lcp(P(kiP ), P(kiP )) = min{h iP , h i+1
P , . . . , h P } using a range-minimum query
i −1
(Lemma 3.14).
Algorithm 2: Solving the Two String Families LCP problem in the checking step
1 Grover-Search over red indices k red ∈ K , and integers d ∈ [0 . . d]
2 Find P , r P such that lcp(P(kiP ), P(k red )) ≥ d if and only if P ≤ i ≤ r P .
Find Q , r Q such that lcp(Q(ki ), Q(k red )) ≥ d − d if and only if Q ≤ i ≤ r Q .
Q
3
4 if exists a blue index k blue ∈ K such that pos P (k blue ) ∈ [ P . . r P ], pos Q (k blue ) ∈ [ Q . . r Q ]
then return True
5 return False
In the Algorithm 2, we use Grover search to find a red index k red ∈ K and
an integer d ∈ [0 . . d], such that there exists a blue index k blue ∈ K with
lcp(P(k red ), P(k blue ))ó d and lcp(Q(k
√
red ), Q(k blue )) ≥ d − d . The number of
Grover iterations is Õ( |K | · d) = Õ( r d), and we will implement each iteration in
poly log(n) time. By Lemma 2.1, all the strings P(k) that satisfy lcp(P(k), P(k red )) ≥
d form a contiguous segment in the lexicographical ordering P(k1P ) · · · P(krP ).
In Line 2, we find the left and right boundaries P , r P of this segment, using a binary
search with Proposition 3.16 (3). Line 3 is similar to Line 2. Then, Line 4 checks the
existence of such a blue string pair. It is clear that this procedure correctly solves the
Two String Families LCP problem. The only remaining problem is how to implement
Line 4 efficiently.
Note that Line 4 can be viewed as a 2D orthogonal range query, where each 2D
point is a blue string pair (P(k), Q(k)), with coordinates being strings to be compared
in lexicographical order. We cannot simply replace the coordinates by their ranks
123
Algorithmica (2023) 85:2260–2317 2285
pos P (k) and pos Q (k) among the r substrings in the current state, since their ranks
will change over time. It is also unrealistic to replace the coordinates by their ranks
among all the possible substrings {P(k)}k∈[m] , since m could be much larger than the
desired overall time complexity n 2/3 . These issues seem to require our 2D range query
data structure to be comparison-based, which is also difficult to achieve as mentioned
before.
Instead, we will use a sampling technique, which effectively converts the non-
integer coordinates into integer coordinates. At the very beginning of the algo-
rithm (before running the quantum walk), we uniformly sample r distinct indices
x1 , x2 . . . , xr ∈ [m], and sort√them so that P(x1 ) P(x2 ) · · · P(xr ) (breaking
ties by the indices), in Õ(r ( d + T )) total time
√ (this complexity is absorbed by the
time complexity of the setup step S = O(r ( d + T ))). Then, during the quantum
walk algorithm, when we insert an index k ∈ [m] into K , we assign it an integer
label ρ P (k) defined as the unique √ i ∈ [0 . . r ] satisfying P(xi ) s ≺ P(xi+1 ),
which can be computed in Õ( d) time by a binary search on the sorted sequence
P(x1 ) ··· P(xr ). We also sample y1 , . . . , yr ∈ [m] and sort them so that
Q(y1 ) Q(y2 ) · · · Q(yr ), and similarly define the integer labels ρ Q (k). Intu-
itively, the (scaled) label ρ P (k)·(m/r ) estimates the rank of P(k) among all the strings
{P(k )}k ∈[m] .
The following lemma formalizes this intuition. It states that in a typical r -subset
K = {k1 , k2 , . . . , kr } ⊆ [m], not too many indices can receive the same label.
Lemma 3.17 For any c > 1, there is a c > 1, such that the following statement holds:
For positive integers r ≤ m, let A, B ⊆ [m] be two independently uniformly random
r -subsets. Let A = {a1 , a2 , . . . , ar } where a1 < a2 < · · · < ar , and denote
Then,
1
Pr |Ai ∩ B| ≥ c log m for some 0 ≤ i ≤ r ≤ c .
A,B m
Proof Let k = c log m for some c > 1 to be determined later, and we can assume
k ≤ r . Observe that, |Ai ∩ B| ≥ k holds for some i only if there exist b, b ∈ [m],
such that |[b . . b ] ∩ B| ≥ k and [b + 1 . . b ] ∩ A = ∅.
Let b, b ∈ [m], b ≤ b . For b − b ≥ (c + 2)(m ln m)/r , we have
m−(b −b)
mr
Pr [[b + 1 . . b ] ∩ A = ∅] =
A
r
b − b r
≤ 1−
m
≤ 1/m c+2 .
123
2286 Algorithmica (2023) 85:2260–2317
Algorithm 3: Extra steps in the insertion procedure (in addition to the steps in
Algorithm 1)
1 Given an index k ∈ [m]
2 Compute the integer labels ρ P (k) and ρ Q (k) using binary search, and store them in hash table
3 if k is blue then
4 Insert the 2D point (ρ P (k), ρ Q (k)) into the 2D range sum data structure
123
Algorithmica (2023) 85:2260–2317 2287
this early stopping would also introduce (one-sided) error if there are too many bound-
ary points which we have no time to check. However, a straightforward application
of Lemma 3.17 implies that, with high success probability over the initial samples
P(x1 ) P(x2 ) · · · P(xr ) and Q(y1 ) Q(y2 ) · · · Q(yr ), only 1/poly(m)
fraction of the r -subsets K = {k1 , . . . , kr } ∈ [m] in the Johnson graph can have
more than c log m strings receiving the same label. On these problematic states
K = {k1 , . . . , kr } ∈ [m] , the checking procedure may erroneously recognize K
as unmarked, while other states are handled correctly by Algorithm 4 since there is no
early aborting. This decreases the fraction of marked states in the Johnson graph by
only a 1/poly(m) fraction, which does not affect the overall time complexity of our
quantum walk algorithm.
In this section, we will prove Lemma 3.6 by constructing a good anchor set with
smaller size. Our construction of good anchor sets is based on a careful combination
of a generalized version of difference covers [29, 30] and the string synchronizing sets
[32].
The notion of d-cover (Definition 3.3) used in previous algorithms corresponds to the
τ = 1 case of our new definition. Our generalization to larger τ can be viewed as an
approximate version of difference covers with additive error ≤ τ − 1. As we shall see,
allowing additive error makes the size of the (d, τ )-cover much smaller compared to
Definition 3.3.
We present a construction of approximate difference covers, by adapting previous
constructions from τ = 1 to general values of τ .
123
2288 Algorithmica (2023) 85:2260–2317
I := {z M · τ | z ∈ N+ },
and
J := {(x M 2 − y) · τ | x ∈ N+ , y ∈ [M]}.
h 2 (i, j) ≤ ( j + M 2 − 1) · τ − j ≤ M 2 τ − 1 ≤ d − 1,
h 2 (i, j) ≥ j · τ − j ≥ 0,
i + h 1 (i, j) = (z − j + i ) · τ ≡ 0 (mod Mτ )
In Corollary 3.5 we obtained a good anchor set using a (d, 1)-cover. If we simply
replace it by a (d, τ )-cover with larger τ , the size of the obtained anchor set would
become smaller, but it would no longer be a good anchor set, due to the misalignment
introduced by approximate difference covers. To deal with the misalignment, we will
use the string synchronizing sets recently introduced by Kempa and Kociumaka [32].
Informally, a synchronizing set of string S is a small set of synchronizing positions,
such that every two sufficiently long matching substrings of S with no short periods
should contain a pair of consistent synchronizing positions.
123
Algorithmica (2023) 85:2260–2317 2289
Definition 3.20 (String synchronizing sets [32, Definition 3.1]) For a string S[1 . . n]
and a positive integer 1 ≤ τ ≤ n/2, we say A ⊆ [1 . . n − 2τ + 1] is a τ -synchronizing
set of S if it satisfies the following properties:
• Consistency If S[i . . i + 2τ ) = S[ j . . j + 2τ ), then i ∈ A if and only if j ∈ A.
• Density For i ∈ [1 . . n − 3τ + 2], A ∩ [i . . i + τ ) = ∅ if and only if per(S[i . . i +
3τ − 2]) ≤ τ/3.
Kempa and Kociumaka gave a linear-time classical randomized algorithm (as well
as a derandomized version, which we will not use here) to construct a τ -synchronizing
set A of optimal size7 |A| = O(n/τ ). However, this classical algorithm for construct-
ing A has to query each of the n input characters, and is not directly applicable to our
sublinear quantum algorithm.
To apply Kempa and Kociumaka’s construction algorithm to the quantum setting,
we observe that this algorithm is local, in the sense that whether an index i should be
included in A is completely decided by its short context S[i . . i + 2τ ) and the random
coins. Moreover, by suitable adaptation of their construction, one can compute all the
synchronizing positions in an O(τ )-length interval in Õ(τ ) time. We summarize all
the desired properties of the synchronizing set in the following lemma.
Lemma 3.21 (Adaptation of [32]) For a string S[1 . . n] and a positive integer 1 ≤
τ ≤ n/2, given a sequence r of O(log n) many random coins, there exists a set A with
the following properties:
• Correctness With high probability over r , A is a τ -synchronizing set of S.
• Locality For every i ∈ [1 . . n − 2τ + 1], whether i ∈ A or not is completely
determined by the random coins r and the substring s[i . . i + 2τ ).
Moreover, given s[i . . i + 4τ ) and r , one can compute all the elements in A ∩
[i . . i + 2τ ) by a classical algorithm in Õ(τ) time.
• Sparsity For every i ∈ [1 . . n − 2τ + 1], Er |A ∩ [i . . i + 2τ )| ≤ 80
In the following, we first inspect the (slightly adapted) randomized construction of
string synchronized sets by Kempa and Kociumaka [32], and then show that it satisfies
the properties in Lemma 3.21.
Construction of string synchronizing sets Fix an input string S[1 . . n] and a positive
integer τ ≤ n/2. Define sets
Q = {i ∈ [1 . . n − τ + 1] : per(S[i . . i + τ )) ≤ τ/3},
B = i ∈ [1 . . n − τ + 1] \ Q : per(S[i . . i + τ − 1)) ≤ τ/3
∨ per(S[i + 1 . . i + τ )) ≤ τ/3 ,
7 In the case where S has no highly periodic substrings, every τ -length interval should contain at least one
index from A.
123
2290 Algorithmica (2023) 85:2260–2317
function id : [1 . . n − τ + 1] → N+ by
π S[i . . i + τ ) i ∈ B,
id(i) :=
π S[i . . i + τ ) + N i ∈
/ B.
Lemma 3.22 ([32, Lemma 8.2]) The set A is always a τ -synchronizing set of string
S.
1
Pr π(x) < min{π(x ) : x ∈ X } ∈ · (1 ± 0.1).
π |X | + 1
Lemma 3.24 ( [32, Fact 8.9], adapted) The expected size of A satisfies Eπ [|A|] ≤
20n/τ .
Remark 3.25 We remark that in the original construction of [32], π was chosen to be
a uniformly random bijection P → [|P|], and this is the only part that differs from
our modified version. The main issue with this ideal choice is that, in our quantum
algorithm, we do not have enough time to sample and store π , which could have size
(n). Observe that in their proof of Lemma 3.24, the only required property of π is
that π satisfies (perfect) min-wise independence. Hence, here we can relax it to have
approximate min-wise independence, and their proof of Lemma 3.24 still applies (with
a slighly worse constant factor).
123
Algorithmica (2023) 85:2260–2317 2291
Definition 3.26 [Rolling hash] Let p > || be a prime, and pick y ∈ F p uniformly
at random. Then, the rolling hash function ρ p,y : τ → F p on length-τ strings is
defined as
τ
ρ p,x s[1 . . τ ] := s[i] · y i (mod p).
i=1
We have two the following two folklore facts about rolling hash.
• Rolling hashes of substrings can be easily computed in batch: on any given string
s of length |s| ≥ τ , one can compute the hash values ρ p,y s[i . . i + τ ) for all
i ∈ [1 . . |s| − τ + 1], in O(|s| · poly log p) total time.
• By choosing p = poly(n), we can ensure that with high probability over the choice
of y, the rolling hash function ρ p,y takes distinct values over all the strings in P
(by Schwartz-Zippel lemma).
After hashing the strings in P to a small integer set [poly(n)], we can apply known
constructions of approximate min-wise independent hash families.
Lemma 3.27 (Approximate min-wise independent hash family, e.g., [95]) Given
parameter n ≥ 1, one can choose N ≤ n O(1) , so that there is a hash family
H = {h : [N ] → [N ]} that satisfies the following properties:
• Injectivity For any subset X ⊆ [N ] of size |X | ≤ n, with high probability over the
choice of h ∈ H, h maps X to distinct elements.
• Approximate min-wise independence For any x ∈ [N ] and subset X ⊆ [N ] \ {x},
1
Pr h(x) < min{h(x ) : x ∈ X } ∈ · (1 ± 0.1).
h∈H |X | + 1
• Explicitness Each hash function h ∈ H can be specified using O(log n) bits, and
can be evaluated at any point in poly log(n) time.
Finally, we choose parameters p = poly(n), N = poly(n), p ≤ N , and define the
pseudorandom mapping π : P → [N ] by π(s) := h ρ p,y (s) , where ρ p,y : P → F p
is the rolling hash function (identifying F p with [ p] ⊆ [N ]), and h : [N ] → [N ] is
the approximate min-wise independent hash function.
Now we are ready to prove that the string synchronizing set A determined by the
random mapping π satisfies the properties stated in Lemma 3.21.
Proof (of Lemma 3.21) First note that the random coins r are used to sample y ∈ F p
and h ∈ H, which only take O(log n) bits of seed.
Correctness By Lemma 3.22, A is correct as long as π is an injection, which holds
with high probability by the injectivity properties of ρ p,y and h.
Locality The first part of the statement is already
verified in Proposition
3.23. To show
the moreover part, first note that the values of π S[ j . . j +τ ) over all j ∈ [i . . i +3τ )
can be computed in Õ(τ ) time, by the property of rolling hash and the explicitness of
h. By [32, Lemma 8.8], the sets Q ∩ [i . . i + 3τ ) and B ∩ [i . . i + 3τ ) can be computed
123
2292 Algorithmica (2023) 85:2260–2317
in O(τ ) time. Hence, we can compute id( j) for all j ∈ [i . . i + 3τ ), and then construct
A ∩ [i . . i + 2τ ) by computing the sliding-window minima, in Õ(τ ) overall time.
Sparsity Let S = S[i . . i + 4τ ), and construct a τ -synchronizing set A of S using
the same random coins r . Then, from the locality have |A | ≥
property we clearly
|A ∩ [i . . i + 2τ )|. Hence, by Lemma 3.24, Er |A ∩ [i . . i + 2τ )| ≤ Er |A |] ≤
20 · 4τ/τ = 80
Now we will construct the good anchor set for input strings s, t and threshold length
d. Recall that S = s$t and |S| = n, and we have assumed d ≥ 100 in order to
avoid corner cases. Our plan is to use string synchronizing sets to fix the misalignment
introduced by the approximate different covers. However, in highly-periodic parts
where synchronizing fails, we have to rely on periodicity arguments and Grover search.
Construction 3.28 (Anchor set C) Let D be a (d/2,
τ )-cover
for some parameter
τ ≤ d/100 to be determined later, and let D S := D ∩ [|s|] ∪ |s| + 1 + (D ∩ [|t|]) .
Let A be the τ -synchronizing set of S determined by random coins r .
For every i ∈ D S ∩ [n − 3τ + 2], let L i ⊆ [1 . . n] be defined by the following
procedure.
• Step 1 If A ∩ [i . . i + 2τ ) has at most 1000 elements, then add all the elements
from A ∩ [i . . i + 2τ ) into L i . Otherwise, add the smallest 1000 elements from
A ∩ [i . . i + 2τ ) into L i .
• Step 2 If p := per(S[i + τ . . i + 3τ − 2]) ≤ τ/3, then we do the following:
– Define two boundary indices
r := max r : r ≤ min{n, i + d} ∧ per(S[i + τ . . r ]) = p ,
:= min : ≥ min{1, i − d} ∧ per(S[ . . i + 3τ − 2]) = p .
Proof For any given i ∈ D S√∩ [n − 3τ + 2], L i contains at most 1003 elements. Hence,
|C| ≤ 1003 · |D S | ≤ O(n/ dτ ) by Lemma 3.19.
In Construction 3.28, Step 1 takes Õ(τ ) classical time by the Locality property in
Lemma 3.21. In Step 2, we can find the period p and the Lyndon root P in Õ(τ )
123
Algorithmica (2023) 85:2260–2317 2293
classical time (see Sect. 2.1). Then, finding the right boundary r is equivalent to
searching for the largest r ∈ i + 3τ − 2 . . min{n, i + d} such that p is a period
of S[i + τ . . r ] (this is because we already know per(S[i + τ . . i + 3τ − 2]) = p,
and the period is monotonically non-decreasing in r ). We do this with√a binary search
over r , where each check can be performed by a Grover search in Õ( d) time, since
the length of S[i + τ . . r ] is at most d. The left boundary can be found similarly.
Finally, the positions i (b) and i (e) can be found in Õ( p) time classically,
√ since we must
have i (b) − , r − i (e) ≤ 2 p. Hence, our anchor set C is Õ(τ + d)-quantum-time
constructible.
Now we show that, with constant probability, C is a good anchor set (Definition
3.2).
Lemma 3.30 For input strings s, t and threshold length d, with at least 0.8 probability
over the random coins r , the set C is a good anchor set.
The proof of this lemma has a similar structure to the proof of [28, Lemma 17], but
is additionally complicated by the fact that (1) we have to deal with the misalignment
introduced by approximate difference covers, and (2) we only considered a subset of
the synchronizing set when defining the anchors.
Here, we first give an informal overview of the proof. By the property of approximate
difference covers, the length-d common substring of s and t should have a pair of
slightly misaligned anchors within a shift of at most τ − 1 from each other. If the
context around these misaligned anchors are not highly-periodic (Case 1 in the proof
below), then their O(τ )-neighborhood must contain a pair of synchronizing positions
(by the density property of A), which are included in Step 1 of Construction 3.28,
and form a pair of perfectly aligned anchors (by the consistency property of A). If the
context around the misaligned anchors are highly-periodic (Case 2), we can extend the
period to the left or to the right, and look at the first position where the period stops. If
this stopping position is inside the common substring, then we have a pair of anchors
(Cases 2(i), 2(ii)). Otherwise, the whole common substring is highly-periodic, and we
can also obtain anchors by looking at the starting positions of its Lyndon roots (Case
2(iii)). These anchors for Case 2 are included in Step 2 of Construction 3.28.
Proof (of Lemma 3.30) Let s[i . . i + d) = t[ j . . j + d) be a length-d common
substring of s and t. Our goal is to show the existence of positions i ∈ [|s|−d +1], j ∈
[|t| − d + 1] and a shift h ∈ [0 . . d), such that s[i . . i + d) = t[ j . . j + d), and
i + h, |s| + 1 + j + h ∈ C.
Recall that we assumed d ≥ 100τ . By the definition of D S , there exist h 1 , h 2 ∈
[0 . . d/2) such that i + h 1 , |s| + 1 + j + h 2 ∈ D S , and |h 1 − h 2 | ≤ τ − 1. These
h 1 , h 2 form a pair of anchors that are slightly misaligned by a shift of at most τ − 1.
Then the plan is to obtain perfectly aligned anchors from h 1 , h 2 , either by finding
synchronizing positions in their O(τ ) neighborhood, or by exploiting periodicity.
Without loss of generality, we assume h 1 ≤ h 2 , and the case of h 1 > h 2 can be proved
analogously by switching the roles of s and t. Now we consider two cases:
• Case 1 per(s[i + h 1 + τ . . i + h 1 + 4τ − 2]) > τ/3.
In this aperiodic case, by the density condition of the τ -synchronizing set A, we
know that A∩[i +h 1 +τ . . i +h 1 +2τ ) is an non-empty set, and let a be an arbitrary
123
2294 Algorithmica (2023) 85:2260–2317
b = j + h 2 + (a − i − h 1 ) − (h 2 − h 1 )
∈ [ j + h 2 + τ − (h 2 − h 1 ) . . j + h 2 + 2τ − (h 2 − h 1 ))
⊆ [ j + h 2 + 1 . . j + h 2 + 2τ ),
123
Algorithmica (2023) 85:2260–2317 2295
and |s|+1+rt must be the same as the right boundary r in Step 2 of Construction
3.28 for constructing L |s|+1+ j +h 2 .
Let P denote the Lyndon root of s[i + h 1 + τ . . rs ], and let P = s[i (e) . . i (e) +
p) = t[ j (e) . . j (e) + p) be the last occurrences of P in s[i + h 1 + τ . . rs ] and
t[ j + h 2 + τ . . rt ]. We must have rs − i (e) = rt − j (e) . Note that i (e) ∈ L i +h 1
and |s| + 1 + j (e) ∈ L |s|+1+ j +h 2 . So setting i = i , j = j , h = i (e) − i
satisfies the requirement.
– Case 2(ii) per(s[i + h 1 + τ . . i + d)) = p, but per(s[i . . i + d)) = p.
In this case, the period p fully extends to the right but not to the left. Using a
similar argument as in Case 2(i), we can show that the left boundaries
Then s , |s| + 1 + t are the left boundaries in Step 2 of Construction 3.28 for
constructing the sets L i +h 1 , L |s|+1+ j +h 2 respectively. We must have s < i
and t < j .
Let P be the Lyndon root of s[i . . i +d), and assume the first occurrence of P
in s[i . . i +d) is s[i . . i + p). We also let s[i (b) . . i (b) + p) = t[ j (b) . . j (b) + p)
be the first occurrences of P in s[s . . i + d) and t[t . . j + d). Then the
second occurrence of P in s[s . . i + d) is s[i (b) + p . . i (b) + 2 p). Observe
that, if we find the first occurrence of the common substring s[i . . i + d)
inside the entire periodic region s[s . . i + d), this occurrence should align
s[i . . i + p) with either the first occurrence of P or the second occurrence
of P in this region. Formally, let i = i (b) − (i − i ). Then, we have either
s[i . . i + d) = s[i . . i + d) or s[i . . i + d) = s[i + p . . i + p + d).
Similarly, letting j = j (b) − ( j − j ), we have t[ j . . j + d) = t[ j . . j + d)
or t[ j . . j + d) = t[ j + p . . j + p + d).
Note that i (b) , i (b) + p ∈ L i +h 1 , and |s| + 1 + j (b) , |s| + 1 + j (b) + p ∈
L |s|+1+ j +h 2 . Hence, setting i = i (or i = i + p), j = j (or j = j + p),
h = i − i satisfies the requirement.
Hence, the desired i, j and h always exist.
123
2296 Algorithmica (2023) 85:2260–2317
Rather than work with the Minimal String Rotation problem directly, we present an
algorithm for the following problem, which is more amenable to work with using our
divide-and-conquer approach.
ϕ(c) = || − c + 1
t = ϕ(s[1]) · · · ϕ(s[n])
123
Algorithmica (2023) 85:2260–2317 2297
if and only if
Thus the solution to the Maximal String Rotation problem on t recovers the solution
to the Minimal String Rotation problem on s, which proves the desired result.
Proposition 4.3 The Maximal String Rotation problem reduces to the Maximal Suffix
problem.
Proof Take an instance of the Maximal String Rotation problem, consisting of a string
s of length n.
Let t = ss be the string of length 2n formed by concatenating s with itself. Suppose
i is the starting index of the maximal rotation of s. Then we claim that i is the starting
index of the maximal suffix of t as well.
Indeed, take any position j ∈ [1 . . 2n] in string t with j = i.
If j > n, then we can write j = n + for some positive integer ≤ n. In this
case we have
because the string on the left hand side is a proper prefix of the string on the right hand
side. Thus j cannot be the starting position of a maximal suffix for t.
Otherwise, j ≤ n. Note that we can write
Since i is a solution to the Maximal String Rotation problem, we know that either
by considering the length n prefixes of the two strings. In the second case, since
s[ j . . n]s[1 . . j − 1] = s[i . . n]s[1 . . i − 1] and i < j the decompositions from Eq.
123
2298 Algorithmica (2023) 85:2260–2317
because the string on the left hand side is a proper prefix of the string on the right hand
side. Combining these results, we see that the solution to the Maximal Suffix problem
on t is the index i which solves the Maximal String Rotation problem on s.
Proposition 4.4 The Maximal Suffix problem reduces to the Minimal Suffix problem.
Proof Take an instance of the Maximal Suffix problem, consisting of a string s of length
n over an alphabet = [1 . . ||]. Let σ = ||+1 denote a character lexicographically
after all the characters in . As in the proof of Proposition 4.2, consider the map
ϕ : → defined by taking
ϕ(c) = || − c + 1
t = ϕ(s[1])ϕ(s[2]) · · · ϕ(s[n])σ
because the string on the left hand side agrees with the string on the right hand side for
the first j positions, but then at the (n− j +2)th position, the string on the right hand side
has the character σ , which is larger than the corresponding character ϕ(s[n − ( j − i)])
from the string on the left hand side.
Otherwise, Eq. (3) holds because there exists some nonnegative integer such that
s[ j + ] < s[i + ] and s[ j + d] = s[i + d] for all nonnegative d < . By definition,
ϕ(c) ≺ ϕ(c ) if and only if characters c ≺ c for all c, c ∈ . Thus in this case too
we have
123
Algorithmica (2023) 85:2260–2317 2299
because the strings agree for the first characters, but then at the ( + 1)st position,
the string on the right hand side has the character ϕ(s[ j + ]), which is larger than
the corresponding character ϕ(s[i + ]) from the string on the left hand side by our
observation on ϕ. Finally, note that the suffix t[n + 1] = σ is larger than every other
suffix of t by construction, and is thus not a minimal suffix of t. Thus the minimal
suffix of t corresponds to the maximal suffix of s, and the reduction is correct.
Proposition 4.5 The Minimal Suffix problem reduces to the Minimal Length- Sub-
strings problem.
Proof Take an instance of the Minimal Suffix problem, consisting of a string s of
length n. Consider the string of length 2n − 1 of the form
t = s 00 · · · 0
n−1 times
s[i . . n] ≺ s[ j . . n].
Because the string on the left hand side occurs strictly before the string on the right
hand side in lexicographic order, appending any number 0s to the ends of the strings
above cannot change their relative order. Thus
· · · 0 ≺ s[ j . . n] 00
t[i . . i + n) = s[i . . n] 00 · · · 0 = t[ j . . j + n)
i−1 times j−1 times
as well. Because this holds for all j = i we get that i is the unique position output by
solving the Minimal Length-n Substrings problem on t. This proves the reduction is
correct.
By chaining the above reductions together, we obtain the following corollary of
Theorem 4.1.
Theorem 4.6 Minimal String Rotation, Maximal Suffix, and Minimal Suffix can be
solved by a quantum algorithm with n 1/2+o(1) query complexity and time complexity.
Proof By combining the results of Propositions 4.2, 4.3, 4.4, and 4.5, we see that all
the problems mentioned in the theorem statement reduce to the Minimal Length-
Substrings problem. Each of the reductions only involves simple substitutions and
insertions to the input strings.
In particular, by inspecting the proofs of the propositions, we can verify that for an
input string s and its image t under any of these reductions, any query to a character
123
2300 Algorithmica (2023) 85:2260–2317
Fig. 2 Proof of Lemma 4.8. Here, the answers in the Minimal Length-k Substrings problem on input string
s[a . . a + 2k) is J = { j1 , j2 , j3 }, and p = j2 − j1 . This period p extends to the right up to position r
of t can be simulated with O(1) queries to the characters of s. Thus, we can get
a n 1/2+o(1) query and time quantum algorithm for each of the listed problems by
using the algorithm of Theorem 4.1 and simulating the aforementioned reductions
appropriately in the query model.
√
Remark 4.7 We remark that, from the ( n) quantum query lower bound for Minimal
String Rotation [14], this chain
√ of reductions also implies that Maximal Suffix and
Minimal Suffix require ( n) quantum query complexity.
It remains to prove Theorem 4.1. To solve the Minimal Length- Substrings problem,
it suffices to find any individual solution
and then use the quantum Exact String Matching √ algorithm to find all the elements
(represented as an arithmetic progression) in Õ( n) time. Our approach will invoke
the following “exclusion rule,” which simplifies the previous approach used in [14].
We remark that similar kinds of exclusion rules have been applied previously in parallel
algorithms for Exact String Matching [10] and Minimal String Rotation [17] (under
the name of “Ricochet Property” or “duel”), as well as the quantum algorithm by Wang
and Ying [14, Lemma 5.1]. The advantage of our exclusion rule is that it naturally
yields a recursive approach for solving the Minimal Length- Substrings problem.
Lemma 4.8 (Exclusion Rule) In the Minimal Length- Substrings problem with input
s[1 . . n] with n/2 ≤ ≤ n, let
I := argmin s[i . . i + )
1≤i≤n−+1
a + 2k − 1 ≤ n − + k ≤ 2(n − ) ≤ n,
123
Algorithmica (2023) 85:2260–2317 2301
s[ j1 . . j1 + k) = s[ j2 . . j2 + k) = · · · = s[ jm . . jm + k)
To motivate our quantum algorithm, we first describe a classical algorithm for the
Minimal n/2-length Substring problem which runs in O(n log n) time (note that other
classical algorithms can solve the problem faster in O(n) time). Our quantum algorithm
will use the same setup, but obtain a speed-up via Grover search. For the purpose of
this overview, we assume n is a power of 2. The classical algorithm works as follows:
Suppose we are given an input string s of length n and target substring size = n/2.
Set m = /2 = n/4. Then the half of the solution (i.e. the first m characters of a
minimum length -substring) are contained entirely in either the block s1 = s[1 . . n/2]
or the block s2 = s[n/4 . . 3n/4).
With that in mind, we recursively solve the problem on the strings s1 and s2 with
target size m in both cases. Let u 1 and v1 be the smallest and largest starting positions
returned by the recursive call to s1 respectively. Define u 2 and v2 as the analogous
positions returned by the recursive call to s2 . Then by Lemma 4.8, the true starting
position of the minimal -length substring of s is in {u 1 , u 2 , v1 , v2 }.
We identify the -length substrings starting at each of these positions, and find their
lexicographic minimum in O(n) time via linear-time string comparison. This lets us
find at least one occurrence of the minimum substring of length . Then, to find all
occurrences of this minimum substring, we use a linear time string matching algorithm
(such as the classic Knuth-Morris-Pratt algorithm [1]) to find the first two occurrences
of the minimum length substring in s. The difference between the starting positions
123
2302 Algorithmica (2023) 85:2260–2317
then lets us determine the common difference of the arithmetic sequence of positions
encoding all starting positions of the minimum substring.
If we let T (n) denote the runtime of this algorithm, the recursion above yields a
recurrence
Proof (of Theorem 4.1) Let b be some parameter to be set later. For convenience
assume that b divides both and n (this assumption does not affect the validity of our
arguments, and is only used to let us avoid working with floor and ceiling functions).
Set m = /b.
For each nonnegative integer k ≤ n/m − 2 we define the substring
sk = s(km . . (k + 2)m].
123
Algorithmica (2023) 85:2260–2317 2303
It remains to check the runtime of the algorithm. Let T (n) denote the runtime of
the algorithm with error probability at most 1/n. Recall that our algorithm solves
Minimum Finding over (b) blocks, where each comparison involves a recursive call
on strings of size 2m = (n/b) and a constant number of string comparisons of length
n (via Lemma 2.5), and finally solves Exact String Matching for strings of size (n).
Hence we have the recurrence (assuming all logarithms are base 2)
√ √ √ √ √
T (n) ≤ Õ( b) · T (n/b) + Õ( n) + Õ( n) = c(log b)e b T (n/b) + n
for some constants c, e > 0, where the polylogarithmic factors are inherited from the
subroutines we use and the possibility of repeating our steps O(log n) times to drive
down the error probability. Now set
2/3
b = 2d(log n)
for some constant d. We claim that for sufficiently large d, we recover a runtime of
2/3
T (n) = n 1/2 · 2d(log n) .
We prove this by induction. The result holds when n is a small constant by taking
d large enough. Now, suppose we want to prove the result for some arbitrary n, and
that the claimed runtime bound holds on inputs of size less than n. Then using the
recurrence above and the inductive hypothesis we have
√ √
T (n) ≤ c(log b)e b T (n/b) + n
√ 2/3 √
≤ c(log b)e n 2d(log(n/b)) + b
√ 2/3
≤ 2c(log b)e n · 2d(log(n/b)) ,
√
where the last inequality
√ follows from d(log(n/b))2/3 ≥ d(log( n))2/3 >
1 2/3 = log( b) for large enough n. Equivalently, this means that
2 d(log n)
T (n)
(log n)2/3 −(log n−log b))2/3
2/3
≤ 2c · 2e(log log b)−d . (4)
n 1/2 2d(log n)
Using the mean value theorem, we can bound
where the last inequality follows from log log b = log d + (2/3) log log n. Thus, by
taking d to be a large enough constant in terms of c and e, we can force the right hand
side of Equation (4) to be less than 1, which proves that
2/3
T (n) ≤ n 1/2 2d(log n) .
123
2304 Algorithmica (2023) 85:2260–2317
This completes the induction, and proves that we can solve the Minimum Length-
Substrings problem in the desired runtime as claimed.
The technique we use to solve the Minimal String Rotation problem can also be adapted
to get a quantum speed-up for solving the Longest Lyndon Substring problem.
Theorem 4.9 The Longest Lyndon Substring problem can be solved by a quantum
algorithm with n 1/2+o(1) query complexity and time complexity.
Theorem 4.10 For any constant √ 0 < ε < 1, suppose there is a T (d)-time quantum
algorithm (where T (d) ≥ ( d)) for solving the Longest Lyndon Substring problem
on string s of length |s| = (1+2ε)d with the promise that the longest Lyndon substring
of s has length in the interval [d, (1+ε)·d). And suppose there is an T (d)-time quantum
algorithm for checking whether an O(d)-length string is a Lyndon word.
Then, there is an algorithm in time Õ(T (n)) for solving the Longest Lyndon Sub-
string problem on length-n strings in general case.
Proof Let s be the input string of length n. For each nonnegative integer i ≤
(log n)/(log(1+ε))−1,
we look for
a longest Lyndon substring of s whose length is
in the interval (1 + ε)i , (1 + ε)i+1 , and return the largest length (after certifying that
it is indeed a Lyndon substring) found. This only blows up the total time complexity
by an O(log n) factor.
For each i, we define the positions jk := 1 + k · εd/2 for all 0 ≤ k < 2n/(εd), and
consider the substrings
s[ j0 . . j0 + (1 + 2ε)d), s[ j1 . . j1 + (1 + 2ε)d), . . .
Note that, if the longest Lyndon substring of s has length in the interval [d, (1 + ε)d),
then it must be entirely covered by some of these substrings. For each of these
substrings, its longest Lyndon substring can be computed in T (d)-time by the
assumption. Then, we use the quantum maximum finding algorithm √ (see Sect. 2.4)
to √
find the longest
√ among these 2n/(εd) answers , in Õ( 2n/(εd) · T (d)) =
Õ( n · T (d)/
√ d) ≤ Õ(T (n)) overall time, where we used the assumption of
T (d) ≥ ( d).
Now, we are going to describe an d 1/2+o(1) -time quantum algorithm for solving the
Longest Lyndon Substring problem on string s of length |s| = (1 + 2ε)d, with the
promise that the longest Lyndon substring of s has length in the interval [d, (1+ε)·d).
Combined with the reduction above, this proves Theorem 4.9, since a string is Lyndon
123
Algorithmica (2023) 85:2260–2317 2305
if and only if its minimal suffix is itself (see Sect. 2.1) and can be checked by our
Minimal Suffix algorithm. We will set ε = 0.1.
We will make use of the following celebrated fact related to Lyndon substrings.
Definition 4.11 (Lyndon Factorization [8, 96]) Any string s can be written as a con-
catenation
s = s1 s 2 · · · sk
123
2306 Algorithmica (2023) 85:2260–2317
Proposition 4.15 (e.g., [71, Lemma 10]) For any pre-Lyndon string w, there exists a
unique Lyndon word x such that w = x k x where k ≥ 1, and x = x[1 . . i] for some
i ∈ [0 . . |x| − 1]. Here x k denotes concatenating x for k times.
Lemma 4.16 Given any string w of length d, we can check whether it is pre-Lyndon in
d 1/2+o(1) quantum time. Moreover, if w is pre-Lyndon, we can find its decomposition
described in Proposition 4.15 also in d 1/2+o(1) quantum time.
Remark
√ 4.17 We can show that the Longest Lyndon Substring problem requires
( n) quantum queries, by a simple reduction from the unstructured search prob-
lem. Suppose we are given a string s ∈ {0, 1}n and want to decide whether there exists
i ∈ [n] such that s[i] = 1. We create another string s := s0n 1 by appending n zeros
and a one after s. Then, if s = 0n , then the longest Lyndon substring of s will be s
itself. If there is at least a one in s, then s cannot be a Lyndon word, since its suffix 0n 1
must be smaller than s . Hence, by [21], √ this implies that Longest Lyndon Substring
problem requires query complexity ( n).
123
Algorithmica (2023) 85:2260–2317 2307
Recall that in the Longest Square Substring problem, we are given a string s of length
n and tasked with finding the largest positive integer such that there exists some
index 1 ≤ i ≤ n − 2 + 1 with s[i . . i + ) = s[i + . . i + 2). In other words, we
want to find the maximum size = 2 such that s contains a -periodic substring of
length . We call the shift and the length of the longest square substring. We refer
to the substrings s[i . . i + ) and s[i + . . i + 2) as the first half and second half
of the solution respectively.
In this section, we
√ present a quantum algorithm which solves this problem on strings
of length n in Õ( n) time. We follow this up with a brief argument indicating why
this algorithm is optimal up to polylogarithmic factors in the query complexity.
Theorem 5.1 The √ Longest Square Substring problem can be solved by a quantum
algorithm with Õ( n) query complexity and time complexity.
P = s(g . . g + 2d/5]
of length 2d/5 starting immediately after g. Since g is good, the end of P is at most
the
123
2308 Algorithmica (2023) 85:2260–2317
Fig. 3 An example of how we extract a square substring when there is only one copy of P. The black
vertical line segments in the top three images bound the first half and second half of the true optimal square
substring A. When we extend the patterns backwards we cover the entire prefix, we are guaranteed to reach
the boundaries of A, but may end up extending further. In the final step we extend the patterns forward and
find a square substring bounded by the ends of the resulting extensions, marked by vertical line segments
in the bottom image. This square substring may be different from A, but is always at least as long as A
123
Algorithmica (2023) 85:2260–2317 2309
Case 2: Multiple Copies It remains to consider the case where S contains multiple
copies of P. In this case, we use the quantum algorithm for Exact √ String Matching
to find the rightmost and second rightmost copies of P in S in Õ( d) time. Suppose
that these copies start after positions h and h respectively, so that
with h < h. Then since 2|P| = 4d/5 > (3/5 + ε)d = |S|, we know by Lemma 2.3
that P has minimum period p = h − h and every appearance of P in S starts some
multiple of p away from h. Moreover, all the copies of P in S overlap each other and
together form one larger p-periodic substring in s. Our next step will be to extend
these periodic parts to maximal periodic substrings, which well help us locate a large
square substring.
By√ Exact String Matching, we can find the leftmost copy s(l . . l + 2d/5] of P in S
in Õ( d), where the integer l + 1 is the starting position of this copy. By our earlier
discussion, we know that the string s(l . . h + 2d/5] is p-periodic.
We now extend the original pattern P as well as the leftmost copy of P in S
backwards while maintaining the property of being p-periodic.
Formally, we find the largest nonnegative integer j1 ≤ 2d/5 such that s(g− j1 . . g+
2d/5] is p-periodic and the largest nonnegative integer j2 ≤ l + 1 − g − 2d/5 such
that s(l − j2 . . h + 2d/5] is p-periodic. Because
√ we upper bound j1 , j2 ≤ O(d),
extending the strings in this way takes Õ( d) time via Grover search. We now split
into two further subcases, depending on how far back the strings are extended.
Case 2a: Single Periodic Substring
Suppose we get j2 = l + 1 − g − 2d/5. This means that we were able to extend
the leftmost copy of P in S so far back that it overlapped with our original pattern
P contained in the first half of A. It follows that the substring s(g . . h + 2d/5] is p-
periodic. In particular, we deduce that the substring starting from the original pattern
P in the first half of A to its -shifted copy in the second half of A is contained in this
p-periodic part. Since A is a square substring, it follows that its prefix which ends at
position g + 2d/5 of s is also p-periodic. Thus, position g − j1 of s occurs before the
first character of A. This reasoning is depicted in the second image of Fig. 4.
We now extend this entire p-periodic substring forward. Find the largest non-
negative integer k ≤ (1 + ε)d √ such that s(g − j1 . . g + k] is p-periodic. Since
j1 , k ≤ O(log d) this takes Õ( d) time via Grover search. As pictured in the bottom
image of Fig. 4, since A is a square string and the end of its first half is p-periodic,
the end of its second half is p-periodic as well. Thus position g + k in s occurs after
the final character of A.
We now have a p-periodic string s(g − j1 . . g + k] which contains A, and thus
has length at least . This means that A has period p as well. We claim the shift
associated with A is an integer multiple of p.
Indeed, by definition, A is -periodic. Then because A has length 2 ≥ + p,
by Lemma 2.2 we know that A is gcd( p, )-periodic as well. Since P is a substring
of A, P must have period gcd( p, ) too. But p is the minimum period of P. Hence
p = gcd( p, ), so = pm for some positive integer m as claimed.
123
2310 Algorithmica (2023) 85:2260–2317
Fig. 4 An example of how we extract a square substring when there are multiple copies of P and the light
blue substring spanning the copies of P across both halves of A is p-periodic. In this case, because A
is square, when we extend the initial pattern P backwards to a p-periodic string we cover a prefix of A.
Similarly, when we extend copies of P forward to a p-periodic string we cover a suffix of A. Curly braces
indicate parts of the strings guaranteed to be identical because A is square (Color figure online)
Fig. 5 If extending P backwards (pictured by the light blue substrings) to a p-periodic substring does not
allow us to cover the prefix of A, then these p-periodic parts are identical because A is square. Thus the
distance between the beginnings of the light blue substrings equals the true shift (Color figure online)
123
Algorithmica (2023) 85:2260–2317 2311
Fig. 6 If in Case 2b, extending P backwards (pictured by the light blue substring on the left) to a p-periodic
substring covers a prefix of A, then when we extend P forward in the same way (pictured by the left green
substring) the result cannot cross into the second half of A (if it did, we would have a single connected
periodic substring and fall into Case 2a). Then the p-periodic parts at the ends of each half must be identical
because A is square. Thus the distance between the ends of the green substrings equals the true shift
(Color figure online)
the largest square substring among these two. It remains to prove that this procedure
is correct. There are two cases to consider, based off how large j1 is relative to the
position of A.
First, suppose that position g − j1 + 1 in s is a character in the first half of A.
Then, as depicted in Fig. 5, since A is square, l − j2 + 1 must also be in the second
half of A, and in fact be exactly characters to the right of g − j1 + 1 (because if
this position was earlier, it would mean we could have picked j1 larger and still had
a p-periodic string). Thus = (l − j2 + 1) − (g − j1 + 1) = is forced. Then
when we construct the string B by searching backwards and forwards from positions
g − j1 + 1 and l − j2 + 1 we will in fact find a square string of length A, and B will
our desired longest square substring.
Otherwise, position g − j1 + 1 in s is placed before every character of A. Then
as depicted in Fig. 6, since A is square, position l − j2 must be in the first half of A.
Consequently, when we extend P forward to position g + k1 , this position is also in
the first half of A (otherwise the p-periodic parts would overlap, and we would have
been in Case 2a instead). As in the previous case, using the fact that A is a square
again, we get that position l + k2 must be exactly characters to the right of g + k1 .
So = (l + k2 ) − (g + k1 ) = is forced. Then when we construct the string C by
searching backwards and forwards from positions g + k1 and l + k2 we find a square
string of length A, so C will be our desired longest square substring.
This handles all of the cases. So far, we have a described an algorithm that, for
any integer i, will find the longest square substring of s with size in [d, (1 + ε)d)
with probability
√ at least (d/n) (recall this is the probability that g is good), in
time Õ( d). By amplitude amplification and trying out the O(log n) choices of i in
decreasing order, we recover an algorithm for the Longest Square Substring problem
which runs in
√ √
Õ d · n/d = Õ( n)
time, as desired.
We√show that our algorithm is optimal by giving a quantum query lower bound
of ( n) for finding the longest square substring. This proof is essentially already
present in [12], where the authors give a lower bound for finding the longest palin-
dromic substring, but we sketch the argument here for completeness.
Proposition 5.2 Any quantum algorithm
√ that computes the longest square substring
of a string of length n requires ( n) queries.
123
2312 Algorithmica (2023) 85:2260–2317
Proof Let S be the set of strings of length 2n over the alphabet {0, 1} which contain at
most one occurrence of the character 1. In [21] the authors prove that deciding
√ with
whether a given string s ∈ S is the string consisting of all 0s requires ( n) queries
in the quantum setting.
The longest square substring of the 0s string of length 2n is just the entire string,
and has length 2n. However, every other string in S has an odd number of 1s, and
thus has longest square substring of size strictly less than 2n. So solving the Longest
Square Substring problem lets us decide if a string from S is the all 0s √string, which
means that any quantum algorithm solving this problem requires ( n) queries as
well.
6 Open Problems
• Our Õ(n 2/3 )-time algorithm for LCS assumes that the input characters are integers
in [poly(n)]. This assumption was used for constructing string synchronizing sets
sublinearly (Sect. 3.3.2). However, the previous Õ(n 5/6 )-time algorithm by Le
Gall and Seddighin [12] can work with general ordered alphabet, where the only
allowed query is to compare two symbols S[i], S[ j] in the input strings (with three
possible outcomes S[i] > S[ j], S[i] = S[ j], or S[i] < S[ j]). Is Õ(n 2/3 ) query
complexity (or even time complexity) achievable in this more restricted setting?
Alternatively, can we show a better query lower bound?
• Our algorithm for the Minimal String Rotation problem (and other related prob-
lems in Sect. 4) has time complexity (and query complexity) n 1/2+o(1) . Can we
reduce the n o(1) factor down to poly log(n)? A subsequent work by Childs, Kothari,
Kovacs-Deak, Sundaram, and Wang [82] showed such an improvement for the
decision version of Minimal String Rotation, but the question remains open for
the search version.
• In our time-efficient implementation of the LCS algorithm, we used a simple sam-
pling technique to bypass certain restrictions on 2D range query data structures
(Sect. 3.2.3). Can this idea have further applications in designing time-efficient
quantum walk algorithms? As a simple example, we can use this idea to get
an Õ(n 2/3 )-time comparison-based algorithm for the element distinctness prob-
lem with simpler implementation. At the beginning, uniformly sample r items
x1 , . . . , xr from the input array, and sort them so that x1 ≤ · · · ≤ xr . Then, we
create a hash table with r + 1 buckets each having O(log n) capacity, where the
hash function h(x) is defined as the index i such that xi ≤ x < xi+1 , which can
be found by binary search. Then, each insertion, deletion, and search operation
can be performed in O(log n) time, provided that the buckets do not overflow. The
error caused by overflows can be analyzed using Ambainis’ proof of [13, Lemma
6]. In comparison, Ambainis’ implementation [13] additionally used a skip list,
and Jeffery’s (non-comparison-based) implementation used a quantum radix tree
[52, Section 3.3.4].
123
Algorithmica (2023) 85:2260–2317 2313
Acknowledgements We thank Virginia Vassilevska Williams, Ryan Williams, and Yinzhan Xu for several
helpful discussions. We additionally thank Virginia Vassilevska Williams for several useful comments on
the writeup of this paper.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence,
and indicate if changes were made. The images or other third party material in this article are included
in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If
material is not included in the article’s Creative Commons licence and your intended use is not permitted
by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
References
1. Knuth, D.E., Morris, J.H., Jr., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2),
323–350 (1977). https://doi.org/10.1137/0206024
2. Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2),
249–260 (1987). https://doi.org/10.1147/rd.312.0249
3. Weiner, P.: Linear pattern matching algorithms. In: Proceedings of the 14th Annual Symposium on
Switching and Automata Theory, pp. 1–11 (1973). https://doi.org/10.1109/SWAT.1973.13
4. Farach, M.: Optimal suffix tree construction with large alphabets. In: Proceedings of the 38th Annual
Symposium on Foundations of Computer Science (FOCS 1997), pp. 137–143 (1997). https://doi.org/
10.1109/SFCS.1997.646102
5. Babenko, M.A., Starikovskaya, T.: Computing longest common substrings via suffix arrays. In: Pro-
ceedings of the 3rd International Computer Science Symposium in Russia (CSR 2008), Theory and
Applications, pp. 64–75 (2008). https://doi.org/10.1007/978-3-540-79709-8_10
6. Booth, K.S.: Lexicographically least circular substrings. Inf. Process. Lett. 10(4/5), 240–242 (1980).
https://doi.org/10.1016/0020-0190(80)90149-0
7. Shiloach, Y.: Fast canonization of circular strings. J. Algorithms 2(2), 107–121 (1981). https://doi.org/
10.1016/0196-6774(81)90013-4
8. Duval, J.-P.: Factorizing words over an ordered alphabet. J. Algorithms 4(4), 363–381 (1983). https://
doi.org/10.1016/0196-6774(83)90017-2 √ √
9. Ramesh, H., Vinay, V.: String matching in Õ( n + m) quantum time. J. Discrete Algorithms 1(1),
103–110 (2003). https://doi.org/10.1016/S1570-8667(03)00010-8
10. Vishkin, U.: Deterministic sampling: a new technique for fast pattern matching. SIAM J. Comput.
20(1), 22–40 (1991). https://doi.org/10.1137/0220002
11. Grover, L.K.: A fast quantum mechanical algorithm for database search. In: Proceedings of the 28th
Annual ACM Symposium on the Theory of Computing (STOC 1996), pp. 212–219 (1996). https://
doi.org/10.1145/237814.237866
12. Le Gall, F., Seddighin, S.: Quantum meets fine-grained complexity: Sublinear time quantum algo-
rithms for string problems. In: Proceedings of the 13th Innovations in Theoretical Computer Science
Conference (ITCS 2022), pp. 97:1–97:23 (2022). https://doi.org/10.4230/LIPIcs.ITCS.2022.97
13. Ambainis, A.: Quantum walk algorithm for element distinctness. SIAM J. Comput. 37(1), 210–239
(2007). https://doi.org/10.1137/S0097539705447311
14. Wang, Q., Ying, M.: Quantum algorithm for lexicographically minimal string rotation. CoRR (2020).
arXiv:2012.09376
15. Durr, C., Høyer, P.: A quantum algorithm for finding the minimum. Preprint (1996).
arXiv:quant-ph/9607014
16. Apostolico, A., Iliopoulos, C.S., Paige, R.: On O(n log n) cost parallel algorithm for the single function
coarsest partition problem. In: Parallel Algorithms and Architectures, International Workshop, 1987,
Proceedings, pp. 70–76 (1987). https://doi.org/10.1007/3-540-18099-0_30
123
2314 Algorithmica (2023) 85:2260–2317
17. Iliopoulos, C.S., Smyth, W.F.: Optimal algorithms for computing the canonical form of a circular
string. Theor. Comput. Sci. 92(1), 87–105 (1992). https://doi.org/10.1016/0304-3975(92)90137-5
18. Aaronson, S., Shi, Y.: Quantum lower bounds for the collision and the element distinctness problems.
J. ACM 51(4), 595–605 (2004). https://doi.org/10.1145/1008731.1008735
19. Kutin, S.: Quantum lower bound for the collision problem with small range. Theory Comput. 1(1),
29–36 (2005). https://doi.org/10.4086/toc.2005.v001a002
20. Ambainis, A.: Polynomial degree and lower bounds in quantum complexity: collision and element
distinctness with small range. Theory Comput. 1(1), 37–46 (2005). https://doi.org/10.4086/toc.2005.
v001a003
21. Bennett, C.H., Bernstein, E., Brassard, G., Vazirani, U.V.: Strengths and weaknesses of quantum
computing. SIAM J. Comput. 26(5), 1510–1523 (1997). https://doi.org/10.1137/S0097539796300933
22. Starikovskaya, T., Vildhøj, H.W.: Time-space trade-offs for the longest common substring problem.
In: Proceedings of the 24th Annual Symposium on Combinatorial Pattern Matching (CPM 2013), pp.
223–234 (2013). https://doi.org/10.1007/978-3-642-38905-4_22
23. Charalampopoulos, P., Crochemore, M., Iliopoulos, C.S., Kociumaka, T., Pissis, S.P., Radoszewski,
J., Rytter, W., Waleń, T.: Linear-time algorithm for long LCF with k mismatches. In: Proceedings of
the 29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018), pp. 23:1–23:16 (2018).
https://doi.org/10.4230/LIPIcs.CPM.2018.23
24. Amir, A., Charalampopoulos, P., Pissis, S.P., Radoszewski, J.: Longest common substring made fully
dynamic. In: Proceedings of the 27th Annual European Symposium on Algorithms (ESA 2019), pp.
6:1–6:17 (2019). https://doi.org/10.4230/LIPIcs.ESA.2019.6
25. Amir, A., Charalampopoulos, P., Pissis, S.P., Radoszewski, J.: Dynamic and internal longest common
substring. Algorithmica 82(12), 3707–3743 (2020). https://doi.org/10.1007/s00453-020-00744-0
26. Ben-Nun, S., Golan, S., Kociumaka, T., Kraus, M.: Time-space tradeoffs for finding a long common
substring. In: Proceedings of the 31st Annual Symposium on Combinatorial Pattern Matching (CPM
2020), pp. 5:1–5:14 (2020). https://doi.org/10.4230/LIPIcs.CPM.2020.5
27. Charalampopoulos, P., Gawrychowski, P., Pokorski, K.: Dynamic longest common substring in poly-
logarithmic time. In: Proceedings of the 47th International Colloquium on Automata, Languages, and
Programming (ICALP 2020), pp. 27:1–27:19 (2020). https://doi.org/10.4230/LIPIcs.ICALP.2020.27
28. Charalampopoulos, P., Kociumaka, T., Pissis, S.P., Radoszewski, J.: Faster algorithms for longest
common substring. In: Proceedings of the 29th Annual European Symposium on Algorithms (ESA
2021), pp. 30:1–30:17 (2021). https://doi.org/10.4230/LIPIcs.ESA.2021.30
29. Burkhardt, S., Kärkkäinen, J.: Fast lightweight suffix array construction and checking. In: Proceedings
of the 14th Annual Symposium on Combinatorial Pattern Matching (CPM 2003), pp. 55–69 (2003).
https://doi.org/10.1007/3-540-44888-8_5
√
30. Maekawa, M.: A N algorithm for mutual exclusion in decentralized systems. ACM Trans. Comput.
Syst. 3(2), 145–159 (1985). https://doi.org/10.1145/214438.214445
31. Birenzwige, O., Golan, S., Porat, E.: Locally consistent parsing for text indexing in small space. In:
Proceedings of the 31st ACM-SIAM Symposium on Discrete Algorithms (SODA 2020), pp. 607–626
(2020). https://doi.org/10.1137/1.9781611975994.37
32. Kempa, D., Kociumaka, T.: String synchronizing sets: sublinear-time BWT construction and optimal
LCE data structure. In: Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of
Computing (STOC 2019), pp. 756–767. ACM (2019). https://doi.org/10.1145/3313276.3316368
33. Magniez, F., Nayak, A., Roland, J., Santha, M.: Search via quantum walk. SIAM J. Comput. 40(1),
142–164 (2011). https://doi.org/10.1137/090745854
34. Willard, D.E., Lueker, G.S.: Adding range restriction capability to dynamic data structures. J. ACM
32(3), 597–617 (1985). https://doi.org/10.1145/3828.3839
35. Christian Worm Mortensen: Fully dynamic orthogonal range reporting on RAM. SIAM J. Comput.
35(6), 1494–1525 (2006). https://doi.org/10.1137/s0097539703436722
36. Chan, T.M., Tsakalidis, K.: Dynamic orthogonal range searching on the RAM, revisited. In: Proceed-
ings of the 33rd International Symposium on Computational Geometry (SoCG 2017), vol. 77, pp.
28:1–28:13 (2017). https://doi.org/10.4230/LIPIcs.SoCG.2017.28
37. Masek, W.J., Paterson, M.: A faster algorithm computing string edit distances. J. Comput. Syst. Sci.
20(1), 18–31 (1980). https://doi.org/10.1016/0022-0000(80)90002-1
38. Backurs, A., Indyk, P.: Edit distance cannot be computed in strongly subquadratic time (unless SETH
is false). SIAM J. Comput. 47(3), 1087–1097 (2018). https://doi.org/10.1137/15M1053128
123
Algorithmica (2023) 85:2260–2317 2315
39. Boroujeni, M., Ehsani, S., Ghodsi, M., HajiAghayi, M.T., Seddighin, S.: Approximating edit distance
in truly subquadratic time: quantum and MapReduce. J. ACM 68(3), 1–41 (2021). https://doi.org/10.
1145/3456807
40. Chakraborty, D., Das, D., Goldenberg, E., Koucký, M., Saks, M.E.: Approximating edit distance within
constant factor in truly sub-quadratic time. J. ACM 67(6), 36:1-36:22 (2020). https://doi.org/10.1145/
3422823
41. Naumovitz, T., Saks, M.E., Seshadhri, C.: Accurate and nearly optimal sublinear approximations to
ulam distance. In: Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms
(SODA 2017), pp. 2012–2031 (2017). https://doi.org/10.1137/1.9781611974782.131
42. Montanaro, A.: Quantum pattern matching fast on average. Algorithmica 77(1), 16–39 (2017). https://
doi.org/10.1007/s00453-015-0060-4
43. Ambainis, A., Balodis, K., Iraids, J., Khadiev, K., Kļevickis, V., Prūsis, K., Shen, Y., Smotrovs, J.,
Vihrovs, J.: Quantum lower and upper bounds for 2D-grid and Dyck language. In: Proceedings of the
45th International Symposium on Mathematical Foundations of Computer Science (MFCS 2020), pp.
8:1–8:14 (2020). https://doi.org/10.4230/LIPIcs.MFCS.2020.8
44. Ambainis, A., Montanaro, A.: Quantum algorithms for search with wildcards and combinatorial group
testing. Quant. Inf. Comput. 14(5–6), 439–453 (2014). https://doi.org/10.26421/QIC14.5-6-4
45. Cleve, R., Iwama, K., Le Gall, F., Nishimura, H., Tani, S., Teruyama, J., Yamashita, S.: Reconstructing
strings from substrings with quantum queries. In: Proceedings of the 13th Scandinavian Symposium
and Workshops on Algorithm Theory (SWAT 2012), pp. 388–397 (2012). https://doi.org/10.1007/978-
3-642-31155-0_34
46. Szegedy, M.: Quantum speed-up of Markov chain based algorithms. In: Proceedings of the 45th
Symposium on Foundations of Computer Science (FOCS 2004), pp. 32–41 (2004). https://doi.org/10.
1109/FOCS.2004.53
47. Magniez, F., Santha, M., Szegedy, M.: Quantum algorithms for the triangle problem. SIAM J. Comput.
37(2), 413–424 (2007). https://doi.org/10.1137/050643684
48. Jeffery, S., Kothari, R., Magniez, F.: Nested quantum walks with quantum data structures. In: Pro-
ceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2013), pp.
1474–1485 (2013). https://doi.org/10.1137/1.9781611973105.106
49. Le Gall, F.: Improved quantum algorithm for triangle finding via combinatorial arguments. In: Pro-
ceedings of the 55th IEEE Annual Symposium on Foundations of Computer Science (FOCS 2014),
pp. 216–225 (2014). https://doi.org/10.1109/FOCS.2014.31
50. Belovs, A., Childs, A.M., Jeffery, S., Kothari, R., Magniez, F.: Time-efficient quantum walks for
3-distinctness. In: Proceedings of the 40th International Colloquium on Automata, Languages, and
Programming (ICALP 2013), Part I, pp. 105–122 (2013). https://doi.org/10.1007/978-3-642-39206-
1_10
51. Bernstein, D.J., Jeffery, S., Lange, T., Meurer, A.: Quantum algorithms for the subset-sum problem.
In: Proceedings of the 5th International Workshop on Post-Quantum Cryptography (PQCrypto 2013),
pp. 16–33 (2013). https://doi.org/10.1007/978-3-642-38616-9_2
52. Jeffery, S.: Frameworks for quantum algorithms. PhD thesis, University of Waterloo (2014). http://hdl.
handle.net/10012/8710
53. Aaronson, S., Chia, N.-H., Lin, H.-H., Wang, C., Zhang, R.: On the quantum complexity of closest
pair and related problems. In: Proceedings of the 35th Computational Complexity Conference (CCC
2020), pp. 16:1–16:43 (2020). https://doi.org/10.4230/LIPIcs.CCC.2020.16
54. Buhrman, H., Patro, S., Speelman, F.: A framework of quantum strong exponential-time hypotheses.
In: Proceedings of the 38th International Symposium on Theoretical Aspects of Computer Science
(STACS 2021), pp. 19:1–19:19 (2021). https://doi.org/10.4230/LIPIcs.STACS.2021.19
55. Buhrman, H., Loff, B., Patro, S., Speelman, F.: Limits of quantum speed-ups for computational geom-
etry and other problems: Fine-grained complexity via quantum walks. In: Proceedings of the 13th
Innovations in Theoretical Computer Science Conference (ITCS 2022), pp. 31:1–31:12 (2022). https://
doi.org/10.4230/LIPIcs.ITCS.2022.31
56. Ambainis, A., Larka, N.: Quantum algorithms for computational geometry problems. In: Proceedings
of the 15th Conference on the Theory of Quantum Computation, Communication and Cryptography
(TQC 2020), pp. 9:1–9:10 (2020). https://doi.org/10.4230/LIPIcs.TQC.2020.9
57. Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational
Biology. Cambridge University Press (1997). https://doi.org/10.1017/CBO9780511574931
123
2316 Algorithmica (2023) 85:2260–2317
58. Crochemore, M., Rytter, W.: Jewels of Stringology. World Scientific (2002). https://doi.org/10.1142/
4838
59. Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press (2007).
https://doi.org/10.1017/CBO9780511546853
60. Kociumaka, T., Starikovskaya, Ta., Vildhøj, H.W.: Sublinear space algorithms for the longest common
substring problem. In: Proceedings of the 22th Annual European Symposium on Algorithms (ESA
2014), pp. 605–617 (2014). https://doi.org/10.1007/978-3-662-44777-2_50
61. Abboud, A., Williams, R.R., Yu, H.: More applications of the polynomial method to algorithm design.
In: Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2015),
pp. 218–230 (2015). https://doi.org/10.1137/1.9781611973730.17
62. Flouri, T., Giaquinta, E., Kobert, K., Ukkonen, E.: Longest common substrings with k mismatches.
Inf. Process. Lett. 115(6–8), 643–647 (2015). https://doi.org/10.1016/j.ipl.2015.03.006
63. Thankachan, S.V., Apostolico, A., Aluru, S.: A provably efficient algorithm for the k-mismatch average
common substring problem. J. Comput. Biol. 23(6), 472–482 (2016). https://doi.org/10.1089/cmb.
2015.0235
64. Starikovskaya, T.: Longest common substring with approximately k mismatches. In: Proceedings of
the 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016), pp. 21:1–21:11 (2016).
https://doi.org/10.4230/LIPIcs.CPM.2016.21
65. Kociumaka, T., Radoszewski, J., Starikovskaya, T.: Longest common substring with approximately k
mismatches. Algorithmica 81(6), 2633–2652 (2019). https://doi.org/10.1007/s00453-019-00548-x
66. Gourdel, G., Kociumaka, T., Radoszewski, J., Starikovskaya, T.: Approximating longest common
substring with k mismatches: Theory and practice. In: Proceedings of the 31st Annual Symposium on
Combinatorial Pattern Matching (CPM 2020), pp. 16:1–16:15 (2020). https://doi.org/10.4230/LIPIcs.
CPM.2020.16
67. Apostolico, A., Crochemore, M.: Optimal canonization of all substrings of a string. Inf. Comput. 95(1),
76–95 (1991). https://doi.org/10.1016/0890-5401(91)90016-U
68. Babenko, M.A., Kolesnichenko, I.I., Starikovskaya, T.: On minimal and maximal suffixes of a substring.
In: Proceedings of the 24th Annual Symposium on Combinatorial Pattern Matching (CPM 2013), pp.
28–37, Springer (2013). https://doi.org/10.1007/978-3-642-38905-4_5
69. Babenko, M.A., Gawrychowski, P., Kociumaka, T., Kolesnichenko, I.I., Starikovskaya, T.: Computing
minimal and maximal suffixes of a substring. Theor. Comput. Sci. 638, 112–121 (2016). https://doi.
org/10.1016/j.tcs.2015.08.023
70. Kociumaka, T.: Minimal suffix and rotation of a substring in optimal time. In: Proceedings of the 27th
Annual Symposium on Combinatorial Pattern Matching (CPM 2016), pp. 28:1–28:12 (2016). https://
doi.org/10.4230/LIPIcs.CPM.2016.28
71. Urabe, Y., Nakashima, Y., Inenaga, S., Bannai, H., Takeda, M.: Longest Lyndon substring after edit.
In: Proceedings of the 29th Annual Symposium on Combinatorial Pattern Matching, (CPM 2018), pp.
19:1–19:10 (2018). https://doi.org/10.4230/LIPIcs.CPM.2018.19
72. Crochemore, M.: An optimal algorithm for computing the repetitions in a word. Inf. Process. Lett.
12(5), 244–250 (1981). https://doi.org/10.1016/0020-0190(81)90024-7
73. Main, M.G., Lorentz, R.J.: An O(n log n) algorithm for finding all repetitions in a string. J. Algorithms
5(3), 422–432 (1984). https://doi.org/10.1016/0196-6774(84)90021-X
74. Amir, A., Boneh, I., Charalampopoulos, P., Kondratovsky, E.: Repetition detection in a dynamic string.
In: Proceedings of the 27th Annual European Symposium on Algorithms (ESA 2019), pp. 5:1–5:18
(2019). https://doi.org/10.4230/LIPIcs.ESA.2019.5
75. Bille, P., Gawrychowski, P.G., Inge, L., Landau, G.M., Weimann, O.: Longest common extensions
in trees. In: Proceedings of the 26th Annual Symposium on Combinatorial Pattern Matching (CPM
2015), pp. 52–64 (2015). https://doi.org/10.1007/978-3-319-19929-0_5
76. Gawrychowski, P., Kociumaka, T., Rytter, W., Waleń, T.: Faster longest common extension queries
in strings over general alphabets. In: Proceedings of the 27th Annual Symposium on Combinatorial
Pattern Matching (CPM 2016), pp. 5:1–5:13 (2016). https://doi.org/10.4230/LIPIcs.CPM.2016.5
77. Alzamel, M., Crochemore, M., Iliopoulos, C.S., Kociumaka, T., Radoszewski, J., Rytter, W.,
Straszyński, J., Waleń, T., Zuba, W.: Quasi-linear-time algorithm for longest common circular fac-
tor. In: Proceedings of the 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019),
pp. 25:1–25:14 (2019). https://doi.org/10.4230/LIPIcs.CPM.2019.25
78. Kempa, D., Kociumaka, T.: Breaking the O(n)-barrier in the construction of compressed suffix arrays.
CoRR, (2021). To appear in SODA 2023. arXiv:2106.12725
123
Algorithmica (2023) 85:2260–2317 2317
79. Kociumaka, T., Radoszewski, J., Rytter, W., Waleń, T.: Internal pattern matching queries in a text and
applications. In: Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms
(SODA 2015), pp. 532–551 (2015). https://doi.org/10.1137/1.9781611973730.36
80. Kociumaka, T.: Efficient data structures for internal queries in texts. PhD thesis, University of Warsaw
(2018). https://depotuw.ceon.pl/handle/item/3614
81. Jin, C., Nogler, J.: Quantum speed-ups for string synchronizing sets, longest common substring, and
k-mismatch matching. CoRR (2022). To appear in SODA 2023. arXiv:2211.15945
82. Childs, A.M., Kothari, R., Kovacs-Deak, M., Sundaram, A., Wang, D.: Quantum divide and conquer.
CoRR 2022. arXiv:2210.06419
83. Kent, C., Lewenstein, M., Sheinwald, D.: On demand string sorting over unbounded alphabets. Theor.
Comput. Sci. 426, 66–74 (2012). https://doi.org/10.1016/j.tcs.2011.12.001
84. Fine, N.J., Wilf, H.S.: Uniqueness theorems for periodic functions. Proc. Am. Math. Soc. 16(1), 109–
114 (1965). https://doi.org/10.2307/2034009
85. Plandowski, W., Rytter, W.: Application of Lempel-Ziv encodings to the solution of words equations.
In: Proceedings of the 25th International Colloquium on Automata, Languages and Programming
(ICALP 1998), pp. 731–742 (1998). https://doi.org/10.1007/BFb0055097
86. Ambainis, A.: Quantum query algorithms and lower bounds. In: Classical and New Paradigms of
Computation and their Complexity Hierarchies, pp. 15–32. Springer (2004). https://doi.org/10.1007/
978-1-4020-2776-5_2
87. Buhrman, H., de Wolf, R.: Complexity measures and decision tree complexity: a survey. Theor. Comput.
Sci. 288(1), 21–43 (2002). https://doi.org/10.1016/S0304-3975(01)00144-X
88. Barenco, A., Bennett, C.H., Cleve, R., DiVincenzo, D.P., Margolus, N., Shor, P., Sleator, T., Smolin,
J.A., Weinfurter, H.: Elementary gates for quantum computation. Phys. Rev. A 52, 3457–3467 (1995).
https://doi.org/10.1103/PhysRevA.52.3457
89. Brassard, G., Høyer, P., Mosca, M., Tapp, A.: Quantum amplitude amplification and estimation. Preprint
(2000). arXiv:quant-ph/0005055
90. Høyer, P., Mosca, M., de Wolf, R.: Quantum search on bounded-error inputs. In: Proceedings of the
30th International Colloquium on Automata, Languages and Programming (ICALP 2003), pp. 291–299
(2003). https://doi.org/10.1007/3-540-45061-0_25
91. de Wolf, R.: Quantum computing: Lecture notes. CoRR (2019). arXiv:1907.09415v2
92. Blelloch, G.E., Golovin, D., Vassilevska, V.: Uniquely represented data structures for computational
geometry. In: Proceedings of the 11th Scandinavian Workshop on Algorithm Theory (SWAT 2008),
pp. 17–28 (2008). https://doi.org/10.1007/978-3-540-69903-3_4
93. Pugh, W.: Skip lists: a probabilistic alternative to balanced trees. Commun. ACM 33(6), 668–676
(1990). https://doi.org/10.1145/78973.78977
94. Pughm, W.: A skip list cookbook. Technical Report CS-TR-2286.1, University of Maryland at College
Park, USA (1990). http://hdl.handle.net/1903/544
95. Indyk, P.: A small approximately min-wise independent family of hash functions. J. Algorithms 38(1),
84–90 (2001). https://doi.org/10.1006/jagm.2000.1131
96. Chen, K.T., Fox, R.H., Lyndon, R.C.: Free differential calculus, IV: the quotient groups of the lower
central series. Ann. Math. (1958). https://doi.org/10.2307/1970044
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.
123