On the Sample Complexity of HGR Maximal Correlation Functions for Large Datasets

Huang, Shao-Lun; Xu, Xiangxiang

doi:10.1109/TIT.2020.3044622

Computer Science > Information Theory

arXiv:1907.00393 (cs)

[Submitted on 30 Jun 2019 (v1), last revised 21 Jul 2020 (this version, v3)]

Title:On the Sample Complexity of HGR Maximal Correlation Functions for Large Datasets

Authors:Shao-Lun Huang, Xiangxiang Xu

View PDF

Abstract:The Hirschfeld-Gebelein-Rényi (HGR) maximal correlation and the corresponding functions have been shown useful in many machine learning scenarios. In this paper, we study the sample complexity of estimating the HGR maximal correlation functions by the alternating conditional expectation (ACE) algorithm using training samples from large datasets. Specifically, we develop a mathematical framework to characterize the learning errors between the maximal correlation functions computed from the true distribution, and the functions estimated from the ACE algorithm. For both supervised and semi-supervised learning scenarios, we establish the analytical expressions for the error exponents of the learning errors. Furthermore, we demonstrate that for large datasets, the upper bounds for the sample complexity of learning the HGR maximal correlation functions by the ACE algorithm can be expressed using the established error exponents. Moreover, with our theoretical results, we investigate the sampling strategy for different types of samples in semi-supervised learning with a total sampling budget constraint, and an optimal sampling strategy is developed to maximize the error exponent of the learning error. Finally, the numerical simulations are presented to support our theoretical results.

Comments:	Submitted to IEEE Transactions on Information Theory
Subjects:	Information Theory (cs.IT)
Cite as:	arXiv:1907.00393 [cs.IT]
	(or arXiv:1907.00393v3 [cs.IT] for this version)
	https://doi.org/10.48550/arXiv.1907.00393
Journal reference:	IEEE Transactions on Information Theory (Volume: 67, Issue: 3, March 2021)
Related DOI:	https://doi.org/10.1109/TIT.2020.3044622

Submission history

From: Xiangxiang Xu [view email]
[v1] Sun, 30 Jun 2019 15:24:23 UTC (424 KB)
[v2] Fri, 29 Nov 2019 03:22:52 UTC (397 KB)
[v3] Tue, 21 Jul 2020 03:15:51 UTC (400 KB)

Computer Science > Information Theory

Title:On the Sample Complexity of HGR Maximal Correlation Functions for Large Datasets

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Theory

Title:On the Sample Complexity of HGR Maximal Correlation Functions for Large Datasets

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators