Global Optimality Guarantees For Policy Gradient Methods

Bhandari, Jalaj; Russo, Daniel

Computer Science > Machine Learning

arXiv:1906.01786 (cs)

[Submitted on 5 Jun 2019 (v1), last revised 20 Jun 2022 (this version, v3)]

Title:Global Optimality Guarantees For Policy Gradient Methods

Authors:Jalaj Bhandari, Daniel Russo

View PDF

Abstract:Policy gradients methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices. Unfortunately, even for simple control problems solvable by standard dynamic programming techniques, policy gradient algorithms face non-convex optimization problems and are widely understood to converge only to a stationary point. This work identifies structural properties -- shared by several classic control problems -- that ensure the policy gradient objective function has no suboptimal stationary points despite being non-convex. When these conditions are strengthened, this objective satisfies a Polyak-lojasiewicz (gradient dominance) condition that yields convergence rates. We also provide bounds on the optimality gap of any stationary point when some of these conditions are relaxed.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1906.01786 [cs.LG]
	(or arXiv:1906.01786v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1906.01786

Submission history

From: Jalaj Bhandari [view email]
[v1] Wed, 5 Jun 2019 02:12:22 UTC (55 KB)
[v2] Thu, 29 Oct 2020 06:30:38 UTC (9,428 KB)
[v3] Mon, 20 Jun 2022 01:01:28 UTC (283 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2019-06

Change to browse by:

cs
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Jalaj Bhandari
Daniel Russo

export BibTeX citation

Computer Science > Machine Learning

Title:Global Optimality Guarantees For Policy Gradient Methods

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Global Optimality Guarantees For Policy Gradient Methods

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators