Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution

Ma, Cong; Wang, Kaizheng; Chi, Yuejie; Chen, Yuxin

doi:10.1007/s10208-019-09429-9

Computer Science > Machine Learning

arXiv:1711.10467 (cs)

[Submitted on 28 Nov 2017 (v1), last revised 30 Jul 2019 (this version, v3)]

Title:Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution

Authors:Cong Ma, Kaizheng Wang, Yuejie Chi, Yuxin Chen

View PDF

Abstract:Recent years have seen a flurry of activities in designing provably efficient nonconvex procedures for solving statistical estimation problems. Due to the highly nonconvex nature of the empirical loss, state-of-the-art procedures often require proper regularization (e.g. trimming, regularized cost, projection) in order to guarantee fast convergence. For vanilla procedures such as gradient descent, however, prior theory either recommends highly conservative learning rates to avoid overshooting, or completely lacks performance guarantees.
This paper uncovers a striking phenomenon in nonconvex optimization: even in the absence of explicit regularization, gradient descent enforces proper regularization implicitly under various statistical models. In fact, gradient descent follows a trajectory staying within a basin that enjoys nice geometry, consisting of points incoherent with the sampling mechanism. This "implicit regularization" feature allows gradient descent to proceed in a far more aggressive fashion without overshooting, which in turn results in substantial computational savings. Focusing on three fundamental statistical estimation problems, i.e. phase retrieval, low-rank matrix completion, and blind deconvolution, we establish that gradient descent achieves near-optimal statistical and computational guarantees without explicit regularization. In particular, by marrying statistical modeling with generic optimization theory, we develop a general recipe for analyzing the trajectories of iterative algorithms via a leave-one-out perturbation argument. As a byproduct, for noisy matrix completion, we demonstrate that gradient descent achieves near-optimal error control --- measured entrywise and by the spectral norm --- which might be of independent interest.

Comments:	accepted to Foundations of Computational Mathematics (FOCM)
Subjects:	Machine Learning (cs.LG); Information Theory (cs.IT); Optimization and Control (math.OC); Statistics Theory (math.ST); Machine Learning (stat.ML)
Cite as:	arXiv:1711.10467 [cs.LG]
	(or arXiv:1711.10467v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1711.10467
Journal reference:	Foundations of Computational Mathematics, vol. 20, no. 3, pp. 451-632, June 2020
Related DOI:	https://doi.org/10.1007/s10208-019-09429-9

Submission history

From: Cong Ma [view email]
[v1] Tue, 28 Nov 2017 18:53:38 UTC (4,323 KB)
[v2] Thu, 14 Dec 2017 10:47:03 UTC (1,664 KB)
[v3] Tue, 30 Jul 2019 02:14:54 UTC (2,023 KB)

Computer Science > Machine Learning

Title:Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators