Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets

Liu, Mingrui; Mroueh, Youssef; Ross, Jerret; Zhang, Wei; Cui, Xiaodong; Das, Payel; Yang, Tianbao

Mathematics > Optimization and Control

arXiv:1912.11940 (math)

[Submitted on 26 Dec 2019 (v1), last revised 25 Dec 2020 (this version, v2)]

Title:Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets

Authors:Mingrui Liu, Youssef Mroueh, Jerret Ross, Wei Zhang, Xiaodong Cui, Payel Das, Tianbao Yang

View PDF

Abstract:Adaptive gradient algorithms perform gradient-based updates using the history of gradients and are ubiquitous in training deep neural networks. While adaptive gradient methods theory is well understood for minimization problems, the underlying factors driving their empirical success in min-max problems such as GANs remain unclear. In this paper, we aim at bridging this gap from both theoretical and empirical perspectives. First, we analyze a variant of Optimistic Stochastic Gradient (OSG) proposed in~\citep{daskalakis2017training} for solving a class of non-convex non-concave min-max problem and establish $O(\epsilon^{-4})$ complexity for finding $\epsilon$-first-order stationary point, in which the algorithm only requires invoking one stochastic first-order oracle while enjoying state-of-the-art iteration complexity achieved by stochastic extragradient method by~\citep{iusem2017extragradient}. Then we propose an adaptive variant of OSG named Optimistic Adagrad (OAdagrad) and reveal an \emph{improved} adaptive complexity $O\left(\epsilon^{-\frac{2}{1-\alpha}}\right)$, where $\alpha$ characterizes the growth rate of the cumulative stochastic gradient and $0\leq \alpha\leq 1/2$. To the best of our knowledge, this is the first work for establishing adaptive complexity in non-convex non-concave min-max optimization. Empirically, our experiments show that indeed adaptive gradient algorithms outperform their non-adaptive counterparts in GAN training. Moreover, this observation can be explained by the slow growth rate of the cumulative stochastic gradient, as observed empirically.

Comments:	Accepted by ICLR 2020
Subjects:	Optimization and Control (math.OC); Machine Learning (cs.LG)
Cite as:	arXiv:1912.11940 [math.OC]
	(or arXiv:1912.11940v2 [math.OC] for this version)
	https://doi.org/10.48550/arXiv.1912.11940

Submission history

From: Mingrui Liu [view email]
[v1] Thu, 26 Dec 2019 22:10:10 UTC (18,209 KB)
[v2] Fri, 25 Dec 2020 02:17:20 UTC (18,210 KB)

Mathematics > Optimization and Control

Title:Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Mathematics > Optimization and Control

Title:Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators