Binary Jumbled String Matching: Faster Indexing in Less Space

Badkobeh, Golnaz; Fici, Gabriele; Kroon, Steve; Lipták, Zsuzsanna

Computer Science > Data Structures and Algorithms

arXiv:1206.2523v2 (cs)

[Submitted on 12 Jun 2012 (v1), revised 26 Nov 2012 (this version, v2), latest version 31 May 2013 (v3)]

Title:Binary Jumbled String Matching: Faster Indexing in Less Space

Authors:Golnaz Badkobeh, Gabriele Fici, Steve Kroon, Zsuzsanna Lipták

View PDF

Abstract:The binary jumbled string matching problem is defined as: Given a binary string $s$ of length $n$ and a query $(x,y)$, with $x,y$ non-negative integers, decide whether $s$ has a substring $t$ with exactly $x$ $a$'s and $y$ $b$'s. Previous solutions created an index of size O(n) in a pre-processing step, which was then used to answer queries in constant time. The fastest algorithms for construction of this index have running time $O(n^2/\log n)$ [Burcsi et al., FUN 2010; Moosa and Rahman, IPL 2010], or $O(n^2/\log^2 n)$ in the word-RAM model [Moosa and Rahman, JDA 2012]. We propose an index constructed directly from the run-length encoding of $s$ in time $O(\rho^2\log \rho)$, where $\rho$ is the length of the run-length encoding of $s$. This is no worse than previous solutions if $\rho = O(n/\log n)$ and better if $\rho = o(n/\log n)$---which is the case for binary strings in many domains. Our index $L$ can be queried in $O(\log |L|)$ time; while $L$ has worst-case size O(n), extensive experimentation has consistently yielded $|L|$ between $0.8\rho$ and $3\rho$. We characterize the index in terms of the prefix normal forms of $s$, and conjecture that both the index size and the construction working space are O(\rho). Thus, for strings with good run-length compression, our algorithm is vastly superior to all previous approaches, both in construction time and index size, while paying only little in query time.

Comments:	Submitted
Subjects:	Data Structures and Algorithms (cs.DS); Information Retrieval (cs.IR)
MSC classes:	68W32, 68P05, 68P20
ACM classes:	G.2.1
Cite as:	arXiv:1206.2523 [cs.DS]
	(or arXiv:1206.2523v2 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.1206.2523

Submission history

From: Zsuzsanna Lipták [view email]
[v1] Tue, 12 Jun 2012 13:33:32 UTC (901 KB)
[v2] Mon, 26 Nov 2012 21:38:40 UTC (860 KB)
[v3] Fri, 31 May 2013 17:32:12 UTC (89 KB)

Computer Science > Data Structures and Algorithms

Title:Binary Jumbled String Matching: Faster Indexing in Less Space

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Data Structures and Algorithms

Title:Binary Jumbled String Matching: Faster Indexing in Less Space

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators