Computer Science > Data Structures and Algorithms
[Submitted on 18 Nov 2019 (v1), last revised 13 Nov 2020 (this version, v2)]
Title:Finding Skewed Subcubes Under a Distribution
View PDFAbstract:Say that we are given samples from a distribution $\psi$ over an $n$-dimensional space. We expect or desire $\psi$ to behave like a product distribution (or a $k$-wise independent distribution over its marginals for small $k$). We propose the problem of enumerating/list-decoding all large subcubes where the distribution $\psi$ deviates markedly from what we expect; we refer to such subcubes as skewed subcubes. Skewed subcubes are certificates of dependencies between small subsets of variables in $\psi$. We motivate this problem by showing that it arises naturally in the context of algorithmic fairness and anomaly detection.
In this work we focus on the special but important case where the space is the Boolean hypercube, and the expected marginals are uniform. We show that the obvious definition of skewed subcubes can lead to intractable list sizes, and propose a better definition of a minimal skewed subcube, which are subcubes whose skew cannot be attributed to a larger subcube that contains it. Our main technical contribution is a list-size bound for this definition and an algorithm to efficiently find all such subcubes. Both the bound and the algorithm rely on Fourier-analytic techniques, especially the powerful hypercontractive inequality.
On the lower bounds side, we show that finding skewed subcubes is as hard as the sparse noisy parity problem, and hence our algorithms cannot be improved on substantially without a breakthrough on this problem which is believed to be intractable. Motivated by this, we study alternate models allowing query access to $\psi$ where finding skewed subcubes might be easier.
Submission history
From: Roie Levin [view email][v1] Mon, 18 Nov 2019 00:32:58 UTC (47 KB)
[v2] Fri, 13 Nov 2020 02:13:20 UTC (47 KB)
Current browse context:
cs.DS
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.