Active Sampling Count Sketch (ASCS) for Online Sparse Estimation of a Trillion Scale Covariance Matrix

Dai, Zhenwei; Desai, Aditya; Heckel, Reinhard; Shrivastava, Anshumali

Computer Science > Data Structures and Algorithms

arXiv:2010.15951 (cs)

[Submitted on 29 Oct 2020 (v1), last revised 11 Jun 2021 (this version, v2)]

Title:Active Sampling Count Sketch (ASCS) for Online Sparse Estimation of a Trillion Scale Covariance Matrix

Authors:Zhenwei Dai, Aditya Desai, Reinhard Heckel, Anshumali Shrivastava

View PDF

Abstract:Estimating and storing the covariance (or correlation) matrix of high-dimensional data is computationally challenging because both memory and computational requirements scale quadratically with the dimension. Fortunately, high-dimensional covariance matrices as observed in text, click-through, meta-genomics datasets, etc are often sparse. In this paper, we consider the problem of efficient sparse estimation of covariance matrices with possibly trillions of entries. The size of the datasets we target requires the algorithm to be online, as more than one pass over the data is prohibitive. In this paper, we propose Active Sampling Count Sketch (ASCS), an online and one-pass sketching algorithm, that recovers the large entries of the covariance matrix accurately. Count Sketch (CS), and other sub-linear compressed sensing algorithms, offer a natural solution to the problem in theory. However, vanilla CS does not work well in practice due to a low signal-to-noise ratio (SNR). At the heart of our approach is a novel active sampling strategy that increases the SNR of classical CS. We demonstrate the practicality of our algorithm with synthetic data and real-world high dimensional datasets. ASCS significantly improves over vanilla CS, demonstrating the merit of our active sampling strategy.

Comments:	13 pages
Subjects:	Data Structures and Algorithms (cs.DS); Computation (stat.CO)
Cite as:	arXiv:2010.15951 [cs.DS]
	(or arXiv:2010.15951v2 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.2010.15951

Submission history

From: Zhenwei Dai [view email]
[v1] Thu, 29 Oct 2020 21:20:15 UTC (1,645 KB)
[v2] Fri, 11 Jun 2021 02:15:14 UTC (2,541 KB)

Computer Science > Data Structures and Algorithms

Title:Active Sampling Count Sketch (ASCS) for Online Sparse Estimation of a Trillion Scale Covariance Matrix

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Data Structures and Algorithms

Title:Active Sampling Count Sketch (ASCS) for Online Sparse Estimation of a Trillion Scale Covariance Matrix

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators