0% found this document useful (0 votes)
23 views21 pages

Near-Optimal Quantum Algorithm For Finding The Longest Common Substring Between Run-Length Encoded Strings

Uploaded by

geek.bill.0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views21 pages

Near-Optimal Quantum Algorithm For Finding The Longest Common Substring Between Run-Length Encoded Strings

Uploaded by

geek.bill.0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Near-Optimal Quantum Algorithm for Finding

arXiv:2411.02421v1 [quant-ph] 21 Oct 2024

the Longest Common Substring between


Run-Length Encoded Strings

Tzu-Ching Lee Han-Hsuan Lin†
November 6, 2024

Abstract
We give a near-optimal quantum algorithm for the longest common
substring (LCS) problem between two run-length encoded (RLE) strings,
with the assumption that the prefix-sums of the run-lengths are given. Our
algorithm costs Õ(n2/3 /d1/6−o(1) · polylog(ñ)) time, while the query lower
bound for the problem is Ω̃(n2/3 /d1/6 ), where n and ñ are the encoded and
decoded length of the inputs, respectively, and d is the encoded length of
the LCS. We justify the use of prefix-sum oracles for two reasons. First, we
note that creating the prefix-sum oracle only incurs a constant overhead
in the RLE compression. Second, we show that, without the oracles, there
is a Ω(n/ log2 n) lower bound on the quantum query complexity of finding
the LCS given two RLE strings due to a reduction of PARITY to the
problem. With a small modification, our algorithm also solves the longest
repeated substring problem for an RLE string.

1 Introduction
String processing is an important field of research in theoretical computer sci-
ence. There are many results for various classic string processing problems, such
as string matching [26, 4, 23], longest common substring, and edit distance. The
development of string processing algorithms has led to the discovery of many im-
pactful computer science concepts and tools, including dynamic programming,
suffix tree [31, 9] and trie [10]. String processing also has applications in various
fields such as bioinformatics [29], image analysis [16], and compression [32].
A natural extension of string processing is to do it between compressed
strings. Ideally, the time cost of string processing between compressed strings
∗ National
Tsing Hua University, Taiwan.
† National
Tsing Hua University, Taiwan. Supported by NSTC QC project under Grant
no. 111-2119-M-001-006- and 110-2222-E-007-002-MY3.

1
would be independent of the decoded lengths of the strings. Since the com-
pressed string can be much shorter than the original string, this would signifi-
cantly save computation time. Whether such fast string processing is possible
depends on what kind of compression scheme we are using.
Run-Length Encoding (RLE) is a simple way to compress strings. In RLE,
the consecutive repetition of a character (run) is replaced by a character-length
pair, the character itself and the length of run. For example, the RLE of the
string aaabcccdd is a3 b1 c3 d2 . RLE is a common method to compress fax data
[21], and is also part of the JPEG and TIFF image standard [19, 20]. String
processing on RLE strings has been extensively studied. Apostolico, Landau,
and Skiena gave an algorithm in time O(n2 log n) to find the longest common
sequence between two RLE compressed strings [3], where n is the length of the
compressed strings. Hooshmand, Tavakoli, Abedin, and Thankachan obtained
an O(n log n)-time algorithm on computing the Average Common Substring
with RLE inputs [17]. Chen and Chao proposed an algorithm to compute the
edit distance between two RLE strings [6], and Clifford, Gawrychowski, Kociu-
maka, Martin and Uznanski further improved the result to near-optimal in [7],
which runs in O(n2 log n) time.
Alternatively, another way to speed up string processing is to use quantum
algorithms. If we use quantum algorithms for string processing, it is possible
to get the time cost sublinear in the input length because the quantum com-
puter can read the strings in superposition. One of the√ earliest such results was
by Hariharan and Vinay [15], who constructed a Õ( n)-time string matching
quantum algorithm, in which Grover’s search [13] and Vishkin’s deterministic
sampling technique [30] were used to reach this near-optimal time complexity.
Le Gall and Seddighin [27] use quantum walk [28] to obtain several sublinear-
time quantum algorithms for various string problems, including
√ a Õ(n5/6 )-time
algorithm for longest common substring (LCS), √ a Õ( n)-time algorithm for
longest palindrome substring (LPS), and a Õ( n)-time algorithm for approxi-
mating the Ulam distance. Another work is done by Akmal and Jin [1], using
string synchronizing sets [24] with quantum walk [28], showing that LCS can be
solved in Õ(n2/3 ) quantum time. They also introduced a n1/2+o(1) algorithm

for the lexicographically minimal string rotation problem, and a Õ( n)-time
algorithm for longest square substring problem in the same paper. [22] further
improves on [1] with a better quantum string synchronizing set construction,
getting Õ(n2/3 /d1/6−o(1) ) quantum time on the LCS problem, which is near-
optimal with respect to both n and d, the length of the common substring. [22]
also gives a Õ(kn1/2 )-time quantum algorithm for the k-mismatch Matching
problem.
In this work, we combine the two above ideas and investigate the possibility
of using quantum algorithm to do string processing on compressed strings, while
keeping the advantages of both methods. Thus, we ask the following question:
Is it possible to have a quantum string processing algorithm on compressed
strings whose time cost is sublinear in the encoded lengths of the strings and

2
independent of the decoded lengths?1
The main contribution of this paper is the first almost2 affirmative answer
to the above question, an almost optimal quantum algorithm computing the
longest common substring (LCS) between two RLE strings:
Theorem 1 (Informal). There is a quantum algorithm that finds the RLE of an
LCS given two RLE strings in O(n2/3 /d1/6−o(1) · polylog(n) · polylog(ñ)) time,
with oracle access to the RLE strings and the prefix-sum3 of their runs, where n
and ñ are the encoded length and the decoded length of the inputs, respectively,
and d is the encoded length of the longest common substring.
Note that we modify the RLE compression by adding the prefix-sum oracle
(definition 2), which tells the position of a run in the uncompressed string.
The addition of the prefix-sum oracle is necessary and efficient: finding an LCS
from RLE inputs needs at least Ω̃(n) queries due to a reduction of the PARITY
problem (corollary 1); constructing the prefix-sum oracle takes O(ñ) time and
saving it takes O(n) space, which are the same as those of RLE, so adding prefix
sum oracle to RLE only incurs a constant overhead in the resource used.
To construct a quantum LCS algorithm between RLE strings, a major chal-
lenge is that the longest common substring in terms of encoded length may
differ from the longest common substring in terms of decoded length. For ex-
ample, between a1 b1 c1 d1 b4 c5 and a1 b1 c1 d1 @1 b4 c2 , the RLE string a1 b1 c1 d1 is
the longest one in terms of encoded length, and b4 c2 is the longest one in terms
of decoded length, which is what we want to find. Therefore, applying existing
LCS algorithms for strings directly on RLE inputs will not work.
Our algorithm is nearly optimal, and we prove a matching lower bound in
lemma 7.

1.1 Related work



Gibney, Jin, Kociumaka and Thankachan [11, 12] developed a Õ( zn)-time
quantum algorithm for the Lempel-Ziv77 algorithm (LZ77) [32] and calculating
the Run-Length-encoded Burrows-Wheeler Transform (RL-BWT), where n is
the length of the input string and z is the number of factor in the LZ77 factor-
ization of the input, which roughly corresponds to the encoded length of that
string. Given two strings A and B, they showed how to calculate the LZ77
compression and a supporting data structure of A$B. With this compressed
data, the LCS between A and B can be found efficiently.
Note that the model [11, 12] is different from our model because they need
do preprocessing on the concatenated string A$B, while in our work, we can
preprocess and compress A and B independently, so we can compress and store
the strings in the downtime, and when two compressed strings need to be com-
pared, their LCS can be calculated in time almost independent of the uncom-
1 With a non-trivial string problem and a non-trivial compression scheme.
2 We have polylog dependence on the decoded length ñ.
3 defined in definition 2

3
pressed length, which is potentially much faster than running the compression
of [11, 12] on the uncompressed strings.

1.2 Overview of the algorithm


Our algorithm is built on and modified from the LCS with threshold algorithm
of [22], which decides whether a common substring of length at least d exists in
time Õ(n2/3 /d1/6−o(1) ).
As stated in the introduction, the main obstacle we face is the difference
between encoded length and decoded length. To overcome this difficulty, we
perform “binary searches” on both encoded length and decoded length with a
nested search. The outer loop is a binary search on the decoded length of the an-
swer d˜ ∈ [ñ]. In each iteration of this binary search, we check whether a common
substring of decode length at least d˜ exists. The inner loop searches over en-
coded length d = n/2, n/4, n/8, . . . . In each iteration of the inner loop, we check
whether a common substring with encoded length in [d, 2d] and decoded length
at least d˜ exists. Running these two loops gives us an O(log(n) log(ñ)) = Õ(1)
overhead.
Note that there is a subtle issue in our inner loop search: unlike the original
LCS problem, where having a common substring of length d guarantees that
there is a common substring of length d − 1, in our LCS between RLE problem,
there might be no common substring of encoded length d − 1 and decoded
length d,˜ even though there is a common substring of encoded length d and
decoded length d. ˜ Therefore, in our inner-loop search, we need to search over
every possible encoded string length. We accomplish this by modifying the
algorithms of [22] so that instead of just checking d, it will check the range
[d, 2d] and loops over d = n/2, n/4, n/8, . . . .
Similarly to [22], in each iteration of our inner loop search, we run Ambainis’
element distinctness algorithm [2] on an anchor set of A$B, where A and B are
the input RLE strings of length n. Roughly speaking, a d-anchor set of a
concatenated string A$B is a subset of A$B such that if a common substring of
length d, A[i1 : i1 + d − 1] = B[i2 : i2 + d − 1], exists, the respective copies of the
common substring in A and B will be “anchored” at the same positions, meaning
that there exists a shift 0 ≤ h ≤ d such that both i1 + h and n + 1 + i2 + h are
in the anchor set. The d-anchored set has size roughly n/d. In each iteration of
our inner loop search, we run a quantum walk on the elements of the d anchored
set and check for a “collision”: a pair of anchored positions in A and B that
can be extended backward and forward into a pair of common substrings with
encoded length in [d, ˜
√ 2d] and decoded length at least d and elements of it can be
computed in time d by the construction in [22]. To check both conditions in
the encoded length and decoded length, we have a delicate checking procedure:
to check everything with encoded length [d, 2d], we search over all possible shifts
of the anchor in encoded length; To efficiently check for the decoded length, we
store data about the lexicographically sorted decoded prefixes and suffixes of the
stored anchors in the data structure.

4
2 Preliminaries
2.1 Conventions and Notations
We abbreviate both “run-length encoding” and “run-length-encoded” to “RLE”
We use tilde ( e· ) to denote decoded strings and their properties, while notations
without tilde refer to their RLE counterparts. We use calligraphic letters (e.g.,
A and B) to denote algorithms, use teletype letters (e.g., a) to denote strings or
character literals, and use sans-serif letters (e.g., LCS) to denote problems. We
count indices from 1. By [m], we mean the set {1, 2, . . . , m}. The asymptotic
notations Õ( · ) and Ω̃( · ) hide polylog(n) and polylog(ñ) factors, where n is
the encoded length of the input, and ñ is the decoded length of the input.
We say that a quantum algorithm succeeds with high probability if its success
probability is at least Ω(1 − 1/poly(n)).

Strings. A string s̃ ∈ Σ∗ is a sequence of characters over a character set


Σ. The length of a string s̃ is denoted as |s̃|. For a string s̃ of length n, a
substring of s̃ is defined as s̃[i : j] := s̃[i′ : j ′ ] = s̃[i′ ]s̃[i′ + 1] . . . s̃[j ′ ], where
i′ = max(1, i) and j ′ = min(n, j). I.e. it starts at the i′ -th character and ends at
the j ′ -th character. If i > j, we define s̃[i : j] as an empty string ǫ. s̃R denotes
the reverse of s̃: s̃R = s̃[n]s̃[n − 1] . . . s̃[1].
s̃ ≺ t̃ denotes that s̃ is lexicographically smaller than t̃. We use , , ≻, 
analogously.

2.2 Run-Length Encoding


Run-Length encoding (RLE) of a string s̃, denoted as s, is a sequence of runs
of identical characters s[1]s[2] · · · s[n], where s[i] is a maximal run of identical
characters, n is the length of s, i.e. the number of such runs. For a run s[i],
R(s[i]) is its length and C(s[i]) denotes the unique character comprising the run.
When we write out s explicitly, we write each s[i] in the format of C(s[i])R(s[i]) ,
with C(s[i]) in a teletype font (e.g., a3 ). Equivalently, each run s[i] can be
represented as a character-length pair (C(s[i]), R(s[i])). When there exist i and
j ≥ i such that t = s[i : j], we call t an substring of s. In addition, we define
generalized substring of an RLE string as follows:
Definition 1 (Generalized substring of an RLE String). For two RLE strings
s and t, we say s is a generalized substring of t if s̃ is a substring of t̃.
For example, for t = a3 b4 c2 d5 , the RLE string b4 c2 is a substring as well
as a generalized substring, while a1 b4 c2 d2 is a generalized substring but not a
substring.
Our algorithm needs to know the location of a run of an RLE string in
the original string. This is formalized by the ability to query an oracle of the
following prefix-sum function:
Definition 2 (prefix-sum of the runs of an RLE string). For an RLE string s,
Pi
Ps [i] is the ith prefix-sum of the runs, i.e. Ps [i] := j=1 R(s[j]), with Ps [0] := 0.

5
Intuitively, Ps [i] is the index where s[i], the i-th run in s, ends in the decoded
string s̃. As a consequence, for i ≤ j, the decoded length of s[i : j] is Ps [j] −
Ps [i − 1].
Note that an oracle of prefix-sum can be constructed and stored in QRAM
in linear time while doing the RLE compression, thus constructing it only adds
a constant factor to the preprocessing time.
Also note that given prefix-sum oracle, the inverse of prefix-sum can be
calculated in O(log n) time:
Lemma 1 (Inverse Prefix-sum, PS-1 ). Given a prefix-sum oracle of an RLE
string S of encoded length O(n), one can calculate the function PS-1 : [ñ] → [n]
that maps indices of S̃, the decoded string, to the corresponding ones of S in
O(log n) time.
Proof. Let ĩ ∈ [ñ] be a decoded index. To find the corresponding index i ∈ [n],
we do a binary search over [n] to find the i ∈ [n] such that PS [i − 1] < ĩ ≤ PS [i].
The process is correct since the prefix-sum is strictly increasing.
Finally, we will often compute the length of longest decoded common prefix
of two RLE string in our algorithm, so we formalize it as follows:
Definition 3 (length of longest decoded common prefix (ldcp)). For two RLE
string s, t, we define ldcp(s, t) = max{j : s̃[1, . . . , j] = t̃[1, . . . , j]}
The following lemma follows from a well-known fact.
Lemma 2 (e.g. [25] lemma 1). Given strings s1 ≺ s2 ≺ · · · ≺ sn , we have
ldcp(s1 , sn ) = min1≤i≤n−1 ldcp(si , si+1 )

2.3 Computation Model


Quantum Oracle. Let S be an RLE string. In a quantum algorithm, we
access an RLE string S via querying the oracle OS . More precisely,

OS : |ii |cichar |rirun 7→ |ii |c ⊕ C(S[i])ichar |r ⊕ R(S[i])irun (1)

is a unitary mapping for any i ∈ [|S|], any c ∈ Σ, and any r ∈ [ñ]. The
corresponding prefix-sum PS (see Definition 2) can be accessed from the unitary
mapping
OP : |ii |xi 7→ |ii |x ⊕ PS [i]i , (2)
for any i ∈ {0} ∪ [|S|] and any x ∈ [ñ].

Word RAM model. We assume basic arithmetic and comparison oper-


ations between two bit strings of length O(log(ñ)) and O(log n) both cost O(1)
quantum time.

6
2.4 Definitions
Definition 4 (d-anchor set (Definition 4.1, Theorem 4.2, and Theorem 1.1 of
[22]).). For a concatenated string T = S1 $S2 of length n, X = {X(1), X(2),
. . . , X(m)} ⊆ [n] is a d-anchor set if either one of the following holds: 1. a
common substring S1 [i : i + d] = S2 [j : j + d] of length d and a shift h ∈ [n]
exist such that i + h ∈ X and |S1 | + 1 + j + h ∈ X. 2. S1 and S2 do not have a
common substring of length d.
The construction of the anchor set X can depend on the contents of S1 and
S2 . There exists a d-anchor set of size m ≤ n/d1−o(1) whose entries X(i) can
be computed using Õ(d1/2+o(1) ) quantum time when i ∈ [m] is given.
Definition 5 (Longest Common Substring (LCS)). A string s̃ is a longest
common substring (LCS) of strings à and B̃ if it is a substring of both, and
|s̃| ≥ |t̃| for every common substring.
Definition 6 (LCS Problem on RLE Strings). Given oracle access to two RLE
strings A and B, find the longest common generalized substring s of A and B,
i.e. the RLE of an LCS s̃ between à and B̃, and locate an instance of s in each
input. More precisely, find a tuple (iA , iB , ℓ) such that |s| = ℓ, with an instance
of s starting within the run A[iA ], and another instance of s starting within the
run B[iB ]. We denote this problem by LCS-RLE.
Definition 7 (Decoded Length of LCS on RLE Strings Problem (DL-LCS-RLE)).
Given oracle access to two RLE strings A and B, calculate |s̃| such that s̃ is an
LCS of à and B̃. We denote this problem as DL-LCS-RLE.
Definition 8 (Encoded Length of LCS on RLE Strings Problem). Given oracle
access to two RLE strings A and B, calculate |s| such that s̃ is an LCS of Ã
and B̃. We denote this problem as EL-LCS-RLE.
In section 4 we show a near-linear lower bound on query complexity for
DL-LCS-RLE and a near-linear, Ω(n/ log2 n), lower bound for EL-LCS-RLE. Both
problems are bounded by reductions from the parity problem.
Definition 9 (Parity
LnProblem). Given oracle access to a length-n binary string
B ∈ {0, 1}n , find i=1 Bi , the parity of B, where ⊕ is addition in Z2 . We
denote this problem as PARITY.
With a short reduction, we show that EL-LCS-RLE and LCS-RLE share the
same lower bound on query complexity (corollary 1). As a result, we loosen the
requirement and assume that the oracle of prefix-sum of the inputs is also given.
Our main algorithm solves the LCS problem with the prefix-sum oracle pro-
vided, formalized below.
Definition 10 (LCS Problem on RLE Strings, with Prefix-sum Oracles). Given
oracle access to two RLE strings A and B and prefix-sums of their runs, PA and
PB , find an RLE string s, such that s̃ is an LCS of their decoded counterparts
à and B̃. More precisely, the algorithm outputs the same triplet as the one for
LCS-RLE. We denote this problem as LCS-RLEp .

7
Definition 11 (Longest Repeated Substring problem on RLE string). Given
oracle access to an RLE string A and prefix-sums of it runs, PA , find an RLE
string s, such that s̃ is a longest repeated substring of the decoded string Ã.
More precisely, the algorithm outputs (i1 , i2 , ℓ) for two heads of the generalized
substrings and its encoded length.
The longest repeated substring of a string s̃ of size ñ is a string t̃ = s[i :
i + ℓ − 1] = s[j : j + ℓ − 1] for distinct i, j ∈ [ñ] with the maximum possible ℓ.

2.5 Primitives
Grover’s search ([13]). Let f : [n] → {0, 1} be a function. There is
a quantum algorithm A that finds an element x ∈ [n] such that f (x) = 1 or
verifies the absence of such an√element. A succeeds with probability at least 2/3
and has time complexity Õ( n · T ), where T is the complexity of computing
f (i).

Amplitude amplification ([5], [14]). Let A be a quantum algorithm


that solves a decision problem with one-sided error and success probability p ∈
(0, 1) in T quantum time. There is another quantum algorithm B that solves
the same decision problem with one-sided error and success probability at least

2/3 in Õ(T / p) quantum time.

Minimum finding ([8]). Let f : [n] → X be a function, where X is a set


with a total order. There is a quantum algorithm A that finds an index i ∈ [n]
such that f (i)
√ ≤ f (j) for all j ∈ [n]. A succeeds with probability at least 2/3
and costs Õ( n · T ) time, where T is the time to compare f (i) to f (j) for any
i, j ∈ [n].

Element distinctness ([2], [27])4 Let X and Y be two lists of size n and
f : (X ∪ Y ) → N be a function. There is a quantum algorithm A that finds
an x ∈ X and a y ∈ Y such that f (x) = f (y). A succeeds with probability at
least 2/3 and costs Õ(n2/3 · T (n)) time; T (n) is the time to make the three-way
comparison between f (a) and f (b) for any a, b ∈ X ∪ Y .
Lemma 3. [2D range sum (Lemma 3.15 of [1])] Let K be a set of r (possibly
duplicated) points in [n] × [n]. There exists a history-independent data structure
of K that, with 1 ≤ x1 ≤ x2 ≤ n and 1 ≤ y1 ≤ y2 ≤ n given, returns the number
of points in [x1 , x2 ] × [y1 , y2 ] using Õ(1) time. Also, entries can be inserted into
and deleted from the data structure in Õ(1) time.
Lemma 4 (Dynamic array (Lemma 3.14 in [22]).). There is a history-independent
data structure of size Õ(r) that maintains an array of key-value pairs (key1 , value1 )
. . . (keyr , valuer ) with distinct keys and supports the following operations with
worst-case Õ(1) time complexity and high success probability:
4 The definition here is also known as claw finding. The time upper bound is obtained in

[27, Section 2.1]. We also explain it in section 3.1

8
• Indexing: Given an index 1 ≤ i ≤ r, return the i-th key-value pair.
• Insertion: Given an index i ≤ i ≤ r + 1 and a new pair, insert it into
the array between the (i − 1)-th and the i-th pair and shift the later item
to the right.
• Deletion: Given an index 1 ≤ i ≤ r, delete the i-th pair from the array
and shift later pairs to the left.
• Location: Given a key, return its index in the array.
• Range-minimum query: Given 1 ≤ a ≤ b ≤ r, return mina≤i≤b {valuei }.
Lemma 5 (Boost to high success probability). Let A be a bounded-error quan-
tum algorithm with time complexity O(T ). By repeating A for O(log n) times
then outputting the majority of the outcomes, we can boost the success probability
of A to Ω(1 − 1/poly(n)) with overall time complexity O(T · log n).
lemma 5 enables us to do Grover’s search over the outcomes of applying A on
different inputs, because quantum computational errors accumulate linearly.5

3 LCS from two RLE strings with Prefix-sum


Oracles
3.1 Quantum walk search
We use the quantum walk framework of [28] on a Johnson graph.
A Johnson graph,  denoted as J(m, r), where r is a number to be chosen
later, consists of m
r vertices, each being an r-sized subset R of a list S of size
m. In J(m, r), two vertices R1 and R2 are connected iff |R1 ∩ R2 | = r − 1.
Associated with each vertex R, is a data structure D(R) that supports three
operations: setup, update, and checking; whose costs are denoted by s(r), u(r),
and c(r), respectively. The setup operation initializes the D(R) for any vertex
R; the update operation transforms D(R) into D(R′ ) which is associated with
a neighboring vertex R′ of R in the graph; and the checking operation checks
whether the vertex R is marked, where the meaning of marked will be defined
later. The MNRS quantum walk search algorithm can be summarized as:
Theorem 2 (MNRS Quantum Walk search [28]). Assume the fraction of the
marked vertices is zero or at least δ. Then there is a quantum algorithm that
always rejects when no marked vertex exists; otherwise, with high probability, it
finds a marked vertex R. The algorithm has complexity
 
1 √ 
Õ s(r) + √ r · u(r) + c(r) . (3)
δ
5 In fact, it is possible to apply Grover’s search over bounded-error verifier without the

logarithmic overhead [18].

9
Remark 1. As noted in [2], the data structure associated with data of each
vertex of the quantum walk needs to be history-independent. I.e. the form of
some data stored in the data structure is independent of history of insertions
and deletions to aggregate these data.

3.2 The algorithm


Theorem 3 (Algorithm for LCS-RLEp ). Given oracle access to RLE strings A
and B, and their prefix-sums, there exists a quantum algorithm A that, with
high probability, finds a 3-tuple (iA , iB , |s|) that identifies a longest common
generalized substring (see definition 1) s between A and B if it exists; otherwise
A rejects. A has a time cost Õ(n2/3 /d1/6−o(1) ) · O(log ñ), where n and ñ are
the encoded length and the decoded length of input strings, respectively.
Proof. We give a constructive algorithm here. The high level structure of the
algorithm is summerized as Algorithm 1.
The algorithm runs two loops. The outer loop binary searches over d˜ ∈ [ñ].
In each iteration of this binary search, we check whether a common substring
of decode length at least d˜ exists. The inner loop searches over encoded length
d = n/2, n/4, n/8, . . . . In each iteration of the inner loop, we check whether a
common substring with encoded length in [d, 2d] and decoded length at least d˜
exists. Running these two loops gives us an O(log(n) log(ñ)) = Õ(1) overhead.

Algorithm 1: Algorithm for LCS-RLEp


Input: RLE strings A, B, their decoded length ñ, encoded length n,
and prefix-sums PA and PB .
Output: (iA , iB , |s|) associated with the longest common generalized
substring of A and B
1 BinarySearch for maximal d˜ ∈ [ñ] such that search flag=1
2 search flag ← 0
3 for encoded length d ∈ {n/2, n/4, n/8, . . .} do
4 (iA , iB , |s|) ← Quantum Walk search on anchor set X of size
m = n/d1−o(1) with r = O(m2/3 ) for anchors iA , iB of a
common generalized substring s with |s̃| ≥ d˜ and |s| ∈ [d, 2d].
5 if line 4 found a marked item then
6 search flag ← 1
7 Record (iA , iB , |s|)
8 break

9 return (iA , iB , |s|)

Let X be the d-anchor set on the concatenated RLE string S = A$1 B. As


stated in definition 4, X has size m = n/d1−o(1) . Let S̃ be the decoded string of
S. For an index k ∈ [m], we define the following decoded “prefix” and “suffix”

10
strings of encoded length 2d:

P (k) = S̃[PS (X(k) − 1) + 1 : PS (X(k) + 2d)] (4)


R
Q(k) = S̃[PS (X(k) − 2d − 1) + 1 : PS (X(k))] , (5)

where the prefix sum oracle PS [ · ] is defined in definition 2.


To check whether a common substring with encoded length in [d, 2d] and
decoded length at least d˜ exists, we run the MNRS quantum walk of theorem 2
on the Johnson graph J(m, r), where each vertex represents a subset of r items
out the m items in the anchor set X. For each vertex on the Johnson graph,
we store the following data in the associated data structures:
1. Indices (in the anchor sets) of the r chosen points sorted according to their
values: (k1 , k2 , . . . , kr ) ∈ [m]r such that ki < ki+1 for all i.
2. corresponding positions of the chosen anchors on the encoded string:
X(k1 ), . . . , X(kr ) ∈ [|S|].
3. The indices (k1 , k2 , . . . , kr ) sorted according to the decoded string after
them: An array (k1P , k2P , ..., krP ), which is a permutation of (k1 , k2 , . . . , kr ),
satisfying that P (kiP )  P (ki+1 P
) for all i.
4. The array of length of LCP between kiP : (hp1 , hP P P
2 , . . . , hr−1 ) where hi =
P P 6
ldcp(P (ki ), P (ki+1 )).
5. The indices (k1 , k2 , . . . , kr ) sorted according to the decoded string before
them: An array (k1Q , k2Q , ..., krQ ), which is a permutation of (k1 , k2 , . . . , kr ),
satisfying that Q(kiQ )  Q(ki+1 Q
) for all i.

6. The array of length of LCP between kiQ : (hQ Q Q Q


1 , h2 , . . . , hr−1 ) where hi =
ldcp(Q(kiQ ), Q(ki+1
Q
))
7. Additional data to apply lemma 6
We store ((k1 , X(k1 )), (k2 , X(k2 )), . . . , (kr , X(kr ))), (k1P , k2P , ..., krP ),
(hp1 , hP P Q Q Q Q Q Q
2 , . . . , hr−1 ), (k1 , k2 , ..., kr ), and (h1 , h2 , . . . , hr−1 ) in 5 different dy-
namic arrays of lemma 4.
For every index k ∈ [m], we assign a color to it to specify whether the anchor
is on string A or string B: if X(k) ≤ n, we say k is red. If X(k) ≥ n + 2, we say
k is blue. if X(k) = n + 1, we say k is white. Since we store X(k1 ), . . . , X(kr ),
we can look up the color of each ki in Õ(1) time.
For every i ∈ [r], we define posP (i) as the index j such that ki = kjP . Simi-
larly, define posQ (i) as the index j such that ki = kjQ . Finding posP (i) can be
done in Õ(1) with the indexing operation of ((k1 , X(k1 )), (k2 , X(k2 )), . . . , (kr , X(kr )))
followed by the location operation of (k1P , k2P , ..., krP ), and similarly for posQ (i).
6 Recall that ldcp is defined in definition 3.

11
Recall that the quantum walk is composed of the setup, update, and check-
ing operations. The setup operation can be done by inserting r elements of the
anchor set into the stored data set. The update operation can be done by insert-
ing an element and deleting an element. Since deletion can be done by reversing
an insertion, both the setup and update operations can be done by multiple ap-
plications of the insertion procedure. This insertion procedure is summarized
in Algorithm 2. The checking operation is summarized in Algorithm 3

Algorithm 2: The insertion procedure


Input: k ∈ [m] to be inserted
1 Compute X(k)
2 Compute i such that ki ≤ k < ki+1
3 Update ((k1 , X(k1 )), . . . , (kr , X(kr ))) ←
((k1 , X(k1 )), . . . , (ki , X(ki )), (k, X(k)), (ki+1 , X(ki+1 )), . . . , (kr , X(kr )))
P
4 Compute j such that P (kj ) ≤ P (k) < P (kj+1 )
P
5 Compute hp = ldcp(P (kj ), P (k))
P
6 Compute hs = ldcp(P (kj+1 ), P (k))
P P
7 Compute ho = ldcp(P (kj ), P (kj+1 ))
P P P P P P
8 Update (k1 , . . . , kr ) ← (k1 , . . . , kj , k, kj+1 , . . . , kr )
P P P P P P
9 Update (h1 , . . . , hr ) ← (h1 , . . . , hj−1 , hp , hs , hj+1 , . . . , hr )
10 Compute j, hp , hs , ho for Q
Q Q Q Q Q Q
11 Update (k1 , . . . , kr ) ← (k1 , . . . , kj , k, kj+1 , . . . , kr )
Q Q Q Q Q Q
12 Update (h1 , . . . , hr ) ← (h1 , . . . , hj−1 , hp , hs , hj+1 , . . . , hr )

We now outline the steps involved in Algorithm 2 and analyze its time com-
plexity. To insert an entry k ∈ [m] into the data structure, Algorithm 2 follows
these steps:
1. Compute X(k) in time d1/2+o(1) .
2. Find i such that ki ≤ k < ki+1 in the ordered array (k1 , k2 , . . . , kr ) by
binary search in time Õ(1). Insert (k, X(k)) into the i-th position of
((k1 , X(k1 )), (k2 , X(k2 )), . . . , (kr , X(kr ))) in time Õ(1).
3. Find the position i to insert k in the ordered array (k1P , ...krP ) by binary
search. In each iteration of the binary search, we need to compare the
lexicographical order between P (k) and P (kjP ) for some j. Since the
strings
√ have compressed length O(d), the comparison can be done in time
O( d) by finding the first run where they √are different through minimum
finding. This step can be done in time Õ( d).
4. Use minimum finding and the prefix sum oracle to compute hp =
ldcp(P (kiP ), P (k)), hs = ldcp(P (ki+1
P
), P (k)), and ho = ldcp(P (kiP ), P (ki+1
P
))
√ p P
in O( d) time. Update (h1 , . . . , hr−1 ) by inserting hp , hs and uncompute
ho .

12

5. Do the same for Q in time Õ( d).
Therefore the insertion cost is Õ(d1/2+o(1) ). To delete an element, we can reverse
the insertion in Õ(d1/2+o(1) ) time.

Algorithm 3: The checking procedure


1 GroverSearch d′ ∈ [0, 2d] and r′ ∈ [r]
2 if kr′ is red then
3 flag color ← blue
4 else
5 flag color ← red
6 L ← PS (X(kr′ )) − PS (X(kr′ ) − d′ − 1)
7 Find lQ , rQ such that ldcp(Q(kiQ ), Q(kr′ )) ≥ L if and only if
l Q ≤ i ≤ rQ .
8 Find lP , rP such that ldcp(P (kiP ), P (kr′ )) ≥ d˜ − L if and only if
l P ≤ i ≤ rP .
9 Check the existence of a j ′ ∈ [r] such that the color of kj ′ is
flag color, lQ ≤ posQ (j ′ ) ≤ rQ , and lP ≤ posP (j ′ ) ≤ rP .
10 if j ′ found then
11 return marked
12 return unmarked

Next, we outline the steps and analyze the time cost of Algorithm 3. To
check whether the subset of r points is marked, we perform a Grover search
over d′ ∈ [0, 2d] and r′ ∈ [r] to determine if the r′ -th stored item anchors a
˜ The
common substring s such that r′ is roughly at the d′ -th run of s and s̃ ≥ d.
′ ′
checking for each (d , r ) can be done by the following sub-algorithm in Õ(1)
time:
1. If kr′ is red, set flag color = blue. If kr′ is blue, set flag color = red.
2. Compute L = L(d′ ) = PS (X(kr′ )) − PS (X(kr′ ) − d′ − 1).
3. Find lQ , rQ such that ldcp(Q(kiQ ), Q(kr′ )) ≥ L if and only if lQ ≤ i ≤ rQ .
To find lQ , we calculate j ′ = posQ (r′ ) and do a binary search to find the
minimum l ∈ [j ′ ] such that the range minimum of (hQ Q
l , . . . , hj ′ ) is greater
or equal to L. The range minimum query can be done in Õ(1) time by
lemma 4. This guarantees that ldcp(Q(kiQ ), Q(kr′ )) ≥ L for lQ ≤ i ≤ j ′
because by lemma 2, ldcp(Q(kiQ ), Q(kr′ )) is equal to the range minimum
of (hQ Q
i , . . . , hj ′ ). Also , l
Q
always exists because hQ j ′ ≥ L. Similarly, to
Q
find r we do a binary search to find the maximum w ∈ [j ′ : r] such that
the range minimum of (hQ Q
j ′ , . . . , hw ) is greater or equal to L.

4. Find lP , rP such that ldcp(P (kiP ), P (kr′ )) ≥ d˜ − L if and only if lP ≤ i ≤


rP .

13
5. Check the existence of a j ′ ∈ [r] such that the color of kj ′ is flag color,
lQ ≤ posQ (j ′ ) ≤ rQ , and lP ≤ posP (j ′ ) ≤ rP . If such j ′ exist, return
marked. This can be done in Õ(1) time by lemma 6.
If the Grover search does not
√ find a marked item, return unmarked.
The checking cost is Õ( rd) since we are Grover searching over 2dr items.
Note that the first run of an RLE-compressed common substring does not
necessary equal to the corresponding runs of both input strings because one
of them can be longer. If a common substring whose length of the first run
matches the length of the corresponding run of the red string, and encoded
length in [d, 2d] and decoded length d˜ exist, it will be anchored by the d anchor
set with some shift h, so when d′ = h we will find a collision with kr′ being
red. Similarly for common substrings whose length of the first run matches the
length of the corresponding run of the blue string, we will find a collision with
kr′ being blue.
Finally we summarize the costs of the quantum walk. The setup can be
done by r insertions, so the cost is s(r) = O(rd1/2+o(1) ). The update is done
by an insertion and√a deletion, so the update cost is u(r) = O(d1/2+o(1) ). The
checking cost is Õ( rd).
The fraction of marked vertices δ is lower bounded by Ω(r2 /m2 ) since in the
2 2
worst case there is only one mark vertex, and thus m−1 r−1 out of m
r pairs of
subsets are marked.
Putting things together, by choosing r = O(m2/3 ) = Õ(n2/3 /d1/3−o(1) ), the
time complexity of the algorithm is

r !
m2  √ √ 
Õ rd1/2+o(1) + r · d1/2+o(1)
+ rd = Õ(n2/3 /d1/6−o(1) )
r2

After finding a marked vertex, we do another checking operation on the


marked subset to find the anchors X(kr′ ), X(kj ′ ), and the shift d′ . Then we
˜ where
compute iA = X(kr′ ) − d′ , iB = X(kj ′ ) − d′ , and ℓ = PA−1 (PA [iA ] + d),
PA−1 is the inverse of the prefix sum oracle7 . Finally, we output (iA , iB , ℓ).
Lemma 6 ((Algorithm 3 and 4 in [22]).). item 5 of the checking procedure of
the quantum walk can be implemented in Õ(1) time with some modifications to
the algorithm.
Proof. We apply the following modifications.
Before the quantum walk, we sample an r-subset V of [m], and denote the
ranking of kiP among {P (v)|v ∈ V } as ρP (kiP ). √
Define ρQ ( · ) similarly. We store
V in lexicographical order. This requires O(r d) time, which is on the same
order of the setup cost.
Without loss of generality, let flag color be blue. To check whether a blue
kj ′ exists, we maintain a dynamic 2D range sum data structure (lemma 3) to
7 P −1 can be computed in Õ(1) time by binary search.
A

14
store (ρP (k), ρQ (k)) for blue k. In the checking operation, after finding lP ,
rP , lQ , and rQ , we check whether the range [ρP (klP ) + 1 . . . ρP (krP ) − 1] ×
[ρQ (klQ ) + 1 . . . ρQ (krQ ) − 1] is non-zero in the dynamic 2D range sum data
structure. If so, such a kj ′ exists. Otherwise, we check at most O(log m) blue
k with ρP (k) ∈ {ρP (klP ), ρP (krP )} or ρQ (k) ∈ {ρQ (klQ ), ρQ (krQ )} explicitly
whether the lexicographical ranking of its prefix and suffix are in [lP , rP ] and
in [lQ , rQ ], respectively. The insertion and deletion to the 2D range sum data
structure can be done in Õ(1) time. The checking algorithm uses Õ(1) time
and has 1/poly(m) one-sided error.

Remark 2. By dropping the red-blue constraint in the checking step, and re-
ceiving only one input string S of encoded length n, we can adapt theorem 3 to
solve the Longest Repeated Substring problem.

4 Lower Bounds
In this section, we first show a query lower bound for LCS-RLEp . We then
investigate the time complexity lower bound for calculating the length (encoded
and decoded) of longest common substring from two RLE strings without access
to prefix-sum oracles. And use the results to show the lower bound of finding
an LCS from two RLE strings is Ω̃(n), which is our motivation and justification
to introduce the prefix-sum oracle.

4.1 Lower Bound on LCS-RLEp


Lemma 7 (Lower Bound of LCS-RLEp ). Any quantum oracle algorithm A re-
quires at least Ω̃(n2/3 /d1/6 ) queries to solve LCS-RLEp , with probability at least
2/3.
Proof. Note that for every string A = a1 a2 , . . . , we can insert a special character
@ to create a string A@ = a1 @a2 @ . . . , so that A@ cannot be RLE compressed.
Also note that the length of LCS between any pair (A@ , B@ ) is exactly twice the
length of LCS between (A, B), so any lower bound on LCS is also a lower bound
on LCS-RLEp . Prefix-sum oracle makes no difference since it can be calculated
by multiplying the index of the runs by 2. Therefore, this lower bound follows
Theorem 1.2 of [22].

4.2 Lower Bound on DL-LCS-RLE


In this section we show how to reduce PARITY to DL-LCS-RLE, obtaining the
following result.

Lemma 8 (Lower Bound of DL-LCS-RLE). Any quantum oracle algorithm A


requires at least Ω(n) queries to solve DL-LCS-RLE, with probability at least 2/3.

15
The main idea is to encode an n-bit binary string B = B1 B2 . . . Bn as an
RLE string SB , in which R(SB [i]) = 2 + Bi . Then, using A, we find the length
of LCS of SB with itself. Here, what A outputs is basically the decode length
of SB , i.e. |Sf
B |. From that, we can calculate the parity of B easily, and thus
lemma 8 is proven.
Proof. Given an n-bit binary string B, we can construct an RLE string SB as:
SB = aB1 +2 bB2 +2 aB3 +2 bB4 +2 · · · γ Bn +2 , (6)
where γ is a if n is odd, otherwise it is b.
Then we assume the algorithm A exists. With SB as the inputs, the output
of A, the decoded length of LCS between SB and itself, is
n
X n
X
A(SB , SB ) = |S̃B | = (Bi + 2) = 2n + Bi, (7)
i=1 i=1
L
which has the same parity as i∈[n] Bi , i.e. the parity of B. Therefore, by
checking the lowest bit of A(SB , SB ), we can solve PARITY with no extra query.
Since solving PARITY requires Ω(n) queries, solving DL-LCS-RLE needs at least
the same number of queries.

References
[1] Shyan Akmal and Ce Jin. Near-optimal quantum algorithms for string
problems. In Joseph (Seffi) Naor and Niv Buchbinder, editors, Proceedings
of the 2022 ACM-SIAM Symposium on Discrete Algorithms, SODA 2022,
Virtual Conference / Alexandria, VA, USA, January 9 - 12, 2022, pages
2791–2832. SIAM, 2022. doi:10.1137/1.9781611977073.109.
[2] Andris Ambainis. Quantum walk algorithm for element distinctness. SIAM
J. Comput., 37(1):210–239, 2007. doi:10.1137/S0097539705447311.
[3] Alberto Apostolico, Gad M. Landau, and Steven Skiena. Match-
ing for run-length encoded strings. J. Complex., 15(1):4–16, 1999.
doi:10.1006/jcom.1998.0493.
[4] Robert S. Boyer and J. Strother Moore. A fast string searching algorithm.
Commun. ACM, 20(10):762–772, 1977. doi:10.1145/359842.359859.
[5] Gilles Brassard and Peter Høyer. An exact quantum polynomial-time
algorithm for simon’s problem. In Fifth Israel Symposium on The-
ory of Computing and Systems, ISTCS 1997, Ramat-Gan, Israel, June
17-19, 1997, Proceedings, pages 12–23. IEEE Computer Society, 1997.
doi:10.1109/ISTCS.1997.595153.
[6] Kuan-Yu Chen and Kun-Mao Chao. A fully compressed algorithm for
computing the edit distance of run-length encoded strings. Algorithmica,
65(2):354–370, 2013. doi:10.1007/s00453-011-9592-4.

16
[7] Raphaël Clifford, Pawel Gawrychowski, Tomasz Kociumaka, Daniel P. Mar-
tin, and Przemyslaw Uznanski. RLE edit distance in near optimal time.
In Peter Rossmanith, Pinar Heggernes, and Joost-Pieter Katoen, editors,
44th International Symposium on Mathematical Foundations of Computer
Science, MFCS 2019, August 26-30, 2019, Aachen, Germany, volume 138
of LIPIcs, pages 66:1–66:13. Schloss Dagstuhl - Leibniz-Zentrum für Infor-
matik, 2019. doi:10.4230/LIPIcs.MFCS.2019.66.
[8] Christoph Dürr and Peter Høyer. A quantum algorithm for
finding the minimum. CoRR, quant-ph/9607014, 1996. URL:
http://arxiv.org/abs/quant-ph/9607014.
[9] Martin Farach. Optimal suffix tree construction with large alphabets. In
38th Annual Symposium on Foundations of Computer Science, FOCS ’97,
Miami Beach, Florida, USA, October 19-22, 1997, pages 137–143. IEEE
Computer Society, 1997. doi:10.1109/SFCS.1997.646102.
[10] Edward Fredkin. Trie memory. Commun. ACM, 3(9):490–499, 1960.
doi:10.1145/367390.367400.
[11] Daniel Gibney, Ce Jin, Tomasz Kociumaka, and Sharma V. Thankachan.
Near-optimal quantum algorithms for bounded edit distance and lempel-
ziv factorization. In David P. Woodruff, editor, Proceedings of the 2024
ACM-SIAM Symposium on Discrete Algorithms, SODA 2024, Alexan-
dria, VA, USA, January 7-10, 2024, pages 3302–3332. SIAM, 2024.
doi:10.1137/1.9781611977912.118.
[12] Daniel Gibney and Sharma V. Thankachan. Compressibility-aware
quantum algorithms on strings. CoRR, abs/2302.07235, 2023.
arXiv:2302.07235, doi:10.48550/arXiv.2302.07235.
[13] Lov K. Grover. A fast quantum mechanical algorithm for database
search. In Gary L. Miller, editor, Proceedings of the Twenty-Eighth
Annual ACM Symposium on the Theory of Computing, Philadelphia,
Pennsylvania, USA, May 22-24, 1996, pages 212–219. ACM, 1996.
doi:10.1145/237814.237866.
[14] Lov K. Grover. Quantum computers can search rapidly by using
almost any transformation. Phys. Rev. Lett., 80:4329–4332, 5 1998.
URL: https://link.aps.org/doi/10.1103/PhysRevLett.80.4329,
doi:10.1103/PhysRevLett.80.4329.

[15] √
Ramesh Hariharan and V. Vinay. String matching in õ( n +
m) quantum time. J. Discrete Algorithms, 1(1):103–110, 2003.
doi:10.1016/S1570-8667(03)00010-8.
[16] Stuart C. Hinds, James L. Fisher, and Donald P. D’Amato. A document
skew detection method using run-length encoding and the hough transform.
In 10th IAPR International Conference on Pattern Recognition, Conference

17
A: Computer Vision & Conference B Pattern recognition systems and ap-
plications, ICPR 1990, Atlantic City, NJ, USA, 16-21 June, 1990, Volume
1, pages 464–468. IEEE, 1990. doi:10.1109/ICPR.1990.118147.
[17] Sahar Hooshmand, Neda Tavakoli, Paniz Abedin, and Sharma V.
Thankachan. On computing average common substring over run
length encoded sequences. Fundam. Informaticae, 163(3):267–273, 2018.
doi:10.3233/FI-2018-1743.
[18] Peter Høyer, Michele Mosca, and Ronald de Wolf. Quantum search on
bounded-error inputs. In Jos C. M. Baeten, Jan Karel Lenstra, Joachim
Parrow, and Gerhard J. Woeginger, editors, Automata, Languages and
Programming, 30th International Colloquium, ICALP 2003, Eindhoven,
The Netherlands, June 30 - July 4, 2003. Proceedings, volume 2719
of Lecture Notes in Computer Science, pages 291–299. Springer, 2003.
doi:10.1007/3-540-45061-0\_25.
[19] ISO. ISO/IEC 10918-1:1994: Information technology — Digital compres-
sion and coding of continuous-tone still images: Requirements and guide-
lines. International Organization for Standardization, Geneva, Switzerland,
1994. URL: http://www.iso.ch/cate/d18902.html.
[20] ISO. ISO 12639:1998: Graphic technology — Prepress digital data ex-
change — Tag image file format for image technology (TIFF/IT). Interna-
tional Organization for Standardization, Geneva, Switzerland, 1998. URL:
http://www.iso.ch/cate/d2181.html.
[21] ITU-T. T.4 : standardization of group 3 facsimile terminals
for document transmission. Recommendation E 24901, In-
ternational Telecommunication Union, February 2004. URL:
https://www.itu.int/rec/T-REC-T.4-200307-I/en.
[22] Ce Jin and Jakob Nogler. Quantum speed-ups for string synchronizing sets,
longest common substring, and k-mismatch matching. In Proceedings of
the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA),
pages 5090–5121. SIAM, 2023.
[23] Richard M. Karp and Michael O. Rabin. Efficient randomized
pattern-matching algorithms. IBM J. Res. Dev., 31(2):249–260, 1987.
doi:10.1147/rd.312.0249.
[24] Dominik Kempa and Tomasz Kociumaka. String synchronizing sets:
sublinear-time BWT construction and optimal LCE data structure. In
Moses Charikar and Edith Cohen, editors, Proceedings of the 51st An-
nual ACM SIGACT Symposium on Theory of Computing, STOC 2019,
Phoenix, AZ, USA, June 23-26, 2019, pages 756–767. ACM, 2019.
doi:10.1145/3313276.3316368.

18
[25] Carmel Kent, Moshe Lewenstein, and Dafna Sheinwald. On demand string
sorting over unbounded alphabets. Theoretical Computer Science, 426:66–
74, 2012.
[26] Donald E. Knuth, James H. Morris Jr., and Vaughan R. Pratt. Fast
pattern matching in strings. SIAM J. Comput., 6(2):323–350, 1977.
doi:10.1137/0206024.
[27] François Le Gall and Saeed Seddighin. Quantum meets fine-grained com-
plexity: Sublinear time quantum algorithms for string problems. In
Mark Braverman, editor, 13th Innovations in Theoretical Computer Sci-
ence Conference, ITCS 2022, January 31 - February 3, 2022, Berkeley, CA,
USA, volume 215 of LIPIcs, pages 97:1–97:23. Schloss Dagstuhl - Leibniz-
Zentrum für Informatik, 2022. doi:10.4230/LIPIcs.ITCS.2022.97.
[28] Frédéric Magniez, Ashwin Nayak, Jérémie Roland, and Miklos Santha.
Search via quantum walk. SIAM J. Comput., 40(1):142–164, 2011.
doi:10.1137/090745854.
[29] Saul B. Needleman and Christian D. Wunsch. A general method ap-
plicable to the search for similarities in the amino acid sequence of
two proteins. Journal of Molecular Biology, 48(3):443–453, 1970. URL:
https://www.sciencedirect.com/science/article/pii/0022283670900574,
doi:10.1016/0022-2836(70)90057-4.
[30] Uzi Vishkin. Deterministic sampling-a new technique for fast pattern
matching. In Harriet Ortiz, editor, Proceedings of the 22nd Annual ACM
Symposium on Theory of Computing, May 13-17, 1990, Baltimore, Mary-
land, USA, pages 170–180. ACM, 1990. doi:10.1145/100216.100235.
[31] Peter Weiner. Linear pattern matching algorithms. In 14th An-
nual Symposium on Switching and Automata Theory, Iowa City, Iowa,
USA, October 15-17, 1973, pages 1–11. IEEE Computer Society, 1973.
doi:10.1109/SWAT.1973.13.
[32] Jacob Ziv and Abraham Lempel. A universal algorithm for sequen-
tial data compression. IEEE Trans. Inf. Theory, 23(3):337–343, 1977.
doi:10.1109/TIT.1977.1055714.

A Lower Bounds on EL-LCS-RLE and LCS-RLE


In this section, we show a Ω̃(n) lower bound on both EL-LCS-RLE (lemma 9)
and LCS-RLE (corollary 1). More precisely, we reduce PARITY to EL-LCS-RLE,
which is then reduced to LCS-RLE.
Here is a high-level overview of the first reduction (from PARITY to EL-LCS-RLE).
We encode an n-bit binary string B = B1 B2 · · · Bn into an RLE string SB , in
a way similar to eq. (6) in the proof of lemma 8. We then assume an algorithm
A of query complexity Q(A) for EL-LCS-RLE exists. Using A, we construct an

19
algorithm to compare the decoded length of SB , i.e. |S̃B |, with any k > 0. We
then use binary search on k to find |S̃B |, invoking A for O(log n) times. From
|S̃B |, we calculate the parity of B without extra query. Finally, since PARITY
has query lower bound Ω(n), Q(A) is at least Ω̃(n), getting lemma 9 below.
Lemma 9 (Lower Bound of EL-LCS-RLE). Any quantum oracle algorithm A
requires at least Ω̃(n) queries to solve EL-LCS-RLE, with probability at least 2/3.
Proof. Given an n-bit binary string B = B1 B2 B3 . . . Bn ∈ {0, 1}n, we can
construct an RLE string

SB = a2B1 +2 b2B2 +2 a2B3 +2 b2B4 +2 . . . γ 2Bn +2 , (8)

where γ is a if n is odd, otherwise it is b.


For every positive natural number k, we can also construct an RLE string,
simply by repeating another character: Sk = ck . We then concatenate SB and
Sk together with different characters in the middle, getting

SB,@,k := SB @1 Sk and SB,#,k := SB #1 Sk . (9)

Let us check what we know about the LCS s̃ between SeB,@,k and SeB,#,k .
Firstly, @ and # are not in s̃ since none of them appears in SeB,@,k and SeB,#,k at
the same time. Secondly, s̃ is a substring of S̃B or S̃k , but not both, because
the character set of S̃B , {a, b}, and the one of S̃k , {c}, do not intersect. Finally,
s̃ is the “longest” common substring, so it is the longest one among S̃B and S̃k .
Now we assume the algorithm A in lemma 9 exists, and it has query com-
plexity Q(A).
Additionally, the success probability of A can be boosted from constant
to high probability with an extra logarithmic factor on its query complexity
(lemma 5).
With SB,@,k and SB,#,k as inputs, A outputs


|Sk |, |S̃k | > |S̃B |
A(SB,@,k , SB,#,k ) = |SB | or |Sk |, |S̃k | = |S̃B | (10)


|SB |, |S̃k | < |S̃B |


1, k > |S̃B |
= n or 1, k = |S̃B | . (11)


n, k < |S̃B |

For a given B, we use AB ( · ) as a shorthand for A(SB,@, · , SB,#, · ) in the


following text. Note that when k = |S̃B |, two answers (n and 1) are possible,
and we only assume A outputs one of them. Thus, AB (k) is non-deterministic
when k = |S̃B |. We will resolve this issue with a property of binary search later.
To find |S̃B |, we do a binary search on k to find a k ′ ∈ [2n, 4n], such that
AB (k ′ − 1) = n and AB (k ′ ) = 1.8 In the binary search, AB will not be called
8 The
P
search range [2n, 4n] comes from 2n ≤ |S̃B | = i (2Bi + 2) ≤ 4n.

20
with the same k twice so it does not matter whether AB and the underlying A
are deterministic or not. So from now on, we treat AB as if it were deterministic.
Since there are two possible outputs for AB (k) when k = |S̃B | (the middle
case in eq. (11)), each corresponds to a different result k ′ for the binary search.
If AB (|S̃B |) outputs 1, we will get k ′ = |S̃B |, the desired result. But if AB (|S̃B |)
outputs n, we will get k ′ = |S̃B | + 1 instead. WePcan detect if the latter one
n
is the case from the parity of k ′ because |S̃B | = 2 i=1 (Bi + 1) is always even,
and thus we can correct the result accordingly.
With |S̃B | in hand, we then check if
n
X n
1X 1
Bi = (2Bi + 2) − n = |S̃B | − n (12)
i=1
2 i=1
2

is odd or even to determine the parity of B.


Alternatively, we can XOR the lowest bit of n with the second-lowest bit of
k ′ , which is the same as the one of |S̃B |, directly. Then the result is the parity
of B. This allows us to avoid correcting k ′ explicitly.
In total, we use Q(A) log2 n queries to solve PARITY. The logarithmic factors
come from boosting A to high probability and the binary search. Finally, solving
PARITY requires Ω(n) queries so we have

Q(A) log2 n ∈ Ω(n) =⇒ Q(A) ∈ Ω(n/ log2 n) ∈ Ω̃(n), (13)

and lemma 9 follows.


Furthermore, an algorithm solving LCS-RLE outputs a triplet (iA , iB , ℓ),
where ℓ is the encoded length of LCS between the inputs, which is also the
answer to EL-LCS-RLE. I.e. EL-LCS-RLE can be reduced to LCS-RLE with no
extra query to the input strings. As a result, LCS-RLE shares the same query
lower bound with EL-LCS-RLE. This gives the corollary below.
Corollary 1 (Lower Bound of LCS-RLE). Any quantum oracle algorithm A
requires at least Ω̃(n) queries to solve LCS-RLE, with probability at least 2/3.
corollary 1 is our motivation and justification to introduce the prefix-sum
oracles.

21

You might also like