BPTree: an $\ell_2$ heavy hitters algorithm using constant memory

Braverman, Vladimir; Chestnut, Stephen R.; Ivkin, Nikita; Nelson, Jelani; Wang, Zhengyu; Woodruff, David P.

Computer Science > Data Structures and Algorithms

arXiv:1603.00759 (cs)

[Submitted on 2 Mar 2016 (v1), last revised 9 Nov 2017 (this version, v4)]

Title:BPTree: an $\ell_2$ heavy hitters algorithm using constant memory

Authors:Vladimir Braverman, Stephen R. Chestnut, Nikita Ivkin, Jelani Nelson, Zhengyu Wang, David P. Woodruff

View PDF

Abstract:The task of finding heavy hitters is one of the best known and well studied problems in the area of data streams. One is given a list $i_1,i_2,\ldots,i_m\in[n]$ and the goal is to identify the items among $[n]$ that appear frequently in the list. In sub-polynomial space, the strongest guarantee available is the $\ell_2$ guarantee, which requires finding all items that occur at least $\epsilon\|f\|_2$ times in the stream, where the vector $f\in\mathbb{R}^n$ is the count histogram of the stream with $i$th coordinate equal to the number of times~$i$ appears $f_i:=\#\{j\in[m]:i_j=i\}$. The first algorithm to achieve the $\ell_2$ guarantee was the CountSketch of [CCF04], which requires $O(\epsilon^{-2}\log n)$ words of memory and $O(\log n)$ update time and is known to be space-optimal if the stream allows for deletions. The recent work of [BCIW16] gave an improved algorithm for insertion-only streams, using only $O(\epsilon^{-2}\log\epsilon^{-1}\log\log n)$ words of memory. In this work, we give an algorithm \bptree for $\ell_2$ heavy hitters in insertion-only streams that achieves $O(\epsilon^{-2}\log\epsilon^{-1})$ words of memory and $O(\log\epsilon^{-1})$ update time, which is the optimal dependence on $n$ and $m$. In addition, we describe an algorithm for tracking $\|f\|_2$ at all times with $O(\epsilon^{-2})$ memory and update time. Our analyses rely on bounding the expected supremum of a Bernoulli process involving Rademachers with limited independence, which we accomplish via a Dudley-like chaining argument that may have applications elsewhere.

Comments:	v4: PODS'17 camera-ready version, includes improved space l_2 tracking (by log(1/epsilon) factor); v3: fixed accidental mis-sorting of author last names; v2: added section explaining why pick-and-drop sampling fails for l2 heavy hitters, and fixed minor typos
Subjects:	Data Structures and Algorithms (cs.DS)
Cite as:	arXiv:1603.00759 [cs.DS]
	(or arXiv:1603.00759v4 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.1603.00759

Submission history

From: Jelani Nelson [view email]
[v1] Wed, 2 Mar 2016 15:48:36 UTC (20 KB)
[v2] Sun, 6 Mar 2016 02:35:37 UTC (22 KB)
[v3] Tue, 8 Mar 2016 17:56:02 UTC (22 KB)
[v4] Thu, 9 Nov 2017 13:49:13 UTC (38 KB)

Computer Science > Data Structures and Algorithms

Title:BPTree: an $\ell_2$ heavy hitters algorithm using constant memory

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Data Structures and Algorithms

Title:BPTree: an $\ell_2$ heavy hitters algorithm using constant memory

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators