Computer Science > Data Structures and Algorithms
[Submitted on 21 Oct 2020 (v1), last revised 16 Dec 2020 (this version, v2)]
Title:On Adaptive Distance Estimation
View PDFAbstract:We provide a static data structure for distance estimation which supports {\it adaptive} queries. Concretely, given a dataset $X = \{x_i\}_{i = 1}^n$ of $n$ points in $\mathbb{R}^d$ and $0 < p \leq 2$, we construct a randomized data structure with low memory consumption and query time which, when later given any query point $q \in \mathbb{R}^d$, outputs a $(1+\epsilon)$-approximation of $\lVert q - x_i \rVert_p$ with high probability for all $i\in[n]$. The main novelty is our data structure's correctness guarantee holds even when the sequence of queries can be chosen adaptively: an adversary is allowed to choose the $j$th query point $q_j$ in a way that depends on the answers reported by the data structure for $q_1,\ldots,q_{j-1}$. Previous randomized Monte Carlo methods do not provide error guarantees in the setting of adaptively chosen queries. Our memory consumption is $\tilde O((n+d)d/\epsilon^2)$, slightly more than the $O(nd)$ required to store $X$ in memory explicitly, but with the benefit that our time to answer queries is only $\tilde O(\epsilon^{-2}(n + d))$, much faster than the naive $\Theta(nd)$ time obtained from a linear scan in the case of $n$ and $d$ very large. Here $\tilde O$ hides $\log(nd/\epsilon)$ factors. We discuss applications to nearest neighbor search and nonparametric estimation.
Our method is simple and likely to be applicable to other domains: we describe a generic approach for transforming randomized Monte Carlo data structures which do not support adaptive queries to ones that do, and show that for the problem at hand, it can be applied to standard nonadaptive solutions to $\ell_p$ norm estimation with negligible overhead in query time and a factor $d$ overhead in memory.
Submission history
From: Yeshwanth Cherapanamjeri [view email][v1] Wed, 21 Oct 2020 19:12:57 UTC (380 KB)
[v2] Wed, 16 Dec 2020 07:16:56 UTC (381 KB)
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.