Momentum Provably Improves Error Feedback!

Fatkhullin, Ilyas; Tyurin, Alexander; Richtárik, Peter

Computer Science > Machine Learning

arXiv:2305.15155 (cs)

[Submitted on 24 May 2023 (v1), last revised 30 Oct 2023 (this version, v2)]

Title:Momentum Provably Improves Error Feedback!

Authors:Ilyas Fatkhullin, Alexander Tyurin, Peter Richtárik

View PDF

Abstract:Due to the high communication overhead when training machine learning models in a distributed environment, modern algorithms invariably rely on lossy communication compression. However, when untreated, the errors caused by compression propagate, and can lead to severely unstable behavior, including exponential divergence. Almost a decade ago, Seide et al [2014] proposed an error feedback (EF) mechanism, which we refer to as EF14, as an immensely effective heuristic for mitigating this issue. However, despite steady algorithmic and theoretical advances in the EF field in the last decade, our understanding is far from complete. In this work we address one of the most pressing issues. In particular, in the canonical nonconvex setting, all known variants of EF rely on very large batch sizes to converge, which can be prohibitive in practice. We propose a surprisingly simple fix which removes this issue both theoretically, and in practice: the application of Polyak's momentum to the latest incarnation of EF due to Richtárik et al. [2021] known as EF21. Our algorithm, for which we coin the name EF21-SGDM, improves the communication and sample complexities of previous error feedback algorithms under standard smoothness and bounded variance assumptions, and does not require any further strong assumptions such as bounded gradient dissimilarity. Moreover, we propose a double momentum version of our method that improves the complexities even further. Our proof seems to be novel even when compression is removed from the method, and as such, our proof technique is of independent interest in the study of nonconvex stochastic optimization enriched with Polyak's momentum.

Subjects:	Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
MSC classes:	68W40, 68W15, 90C25, 90C06
ACM classes:	G.1.6; F.2.1; E.4
Cite as:	arXiv:2305.15155 [cs.LG]
	(or arXiv:2305.15155v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2305.15155

Submission history

From: Ilyas Fatkhullin [view email]
[v1] Wed, 24 May 2023 13:52:02 UTC (3,339 KB)
[v2] Mon, 30 Oct 2023 16:18:44 UTC (5,390 KB)

Computer Science > Machine Learning

Title:Momentum Provably Improves Error Feedback!

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Momentum Provably Improves Error Feedback!

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators