Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms

Xu, Tengyu; Wang, Zhe; Liang, Yingbin

Computer Science > Machine Learning

arXiv:2005.03557 (cs)

[Submitted on 7 May 2020 (v1), last revised 8 May 2020 (this version, v2)]

Title:Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms

Authors:Tengyu Xu, Zhe Wang, Yingbin Liang

View PDF

Abstract:As an important type of reinforcement learning algorithms, actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies. In the first nested-loop design, actor's one update of policy is followed by an entire loop of critic's updates of the value function, and the finite-sample analysis of such AC and NAC algorithms have been recently well established. The second two time-scale design, in which actor and critic update simultaneously but with different learning rates, has much fewer tuning parameters than the nested-loop design and is hence substantially easier to implement. Although two time-scale AC and NAC have been shown to converge in the literature, the finite-sample convergence rate has not been established. In this paper, we provide the first such non-asymptotic convergence rate for two time-scale AC and NAC under Markovian sampling and with actor having general policy class approximation. We show that two time-scale AC requires the overall sample complexity at the order of $\mathcal{O}(\epsilon^{-2.5}\log^3(\epsilon^{-1}))$ to attain an $\epsilon$-accurate stationary point, and two time-scale NAC requires the overall sample complexity at the order of $\mathcal{O}(\epsilon^{-4}\log^2(\epsilon^{-1}))$ to attain an $\epsilon$-accurate global optimal point. We develop novel techniques for bounding the bias error of the actor due to dynamically changing Markovian sampling and for analyzing the convergence rate of the linear critic with dynamically changing base functions and transition kernel.

Comments:	The results of this paper were initially submitted for publication in February 2020
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2005.03557 [cs.LG]
	(or arXiv:2005.03557v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2005.03557

Submission history

From: Tengyu Xu [view email]
[v1] Thu, 7 May 2020 15:42:31 UTC (135 KB)
[v2] Fri, 8 May 2020 02:06:12 UTC (136 KB)

Computer Science > Machine Learning

Title:Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators