Finite-Sample Analysis for SARSA with Linear Function Approximation

Zou, Shaofeng; Xu, Tengyu; Liang, Yingbin

Computer Science > Machine Learning

arXiv:1902.02234 (cs)

[Submitted on 6 Feb 2019 (v1), last revised 19 Nov 2019 (this version, v3)]

Title:Finite-Sample Analysis for SARSA with Linear Function Approximation

Authors:Shaofeng Zou, Tengyu Xu, Yingbin Liang

View PDF

Abstract:SARSA is an on-policy algorithm to learn a Markov decision process policy in reinforcement learning. We investigate the SARSA algorithm with linear function approximation under the non-i.i.d.\ data, where a single sample trajectory is available. With a Lipschitz continuous policy improvement operator that is smooth enough, SARSA has been shown to converge asymptotically \cite{perkins2003convergent,melo2008analysis}. However, its non-asymptotic analysis is challenging and remains unsolved due to the non-i.i.d. samples and the fact that the behavior policy changes dynamically with time. In this paper, we develop a novel technique to explicitly characterize the stochastic bias of a type of stochastic approximation procedures with time-varying Markov transition kernels. Our approach enables non-asymptotic convergence analyses of this type of stochastic approximation algorithms, which may be of independent interest. Using our bias characterization technique and a gradient descent type of analysis, we provide the finite-sample analysis on the mean square error of the SARSA algorithm. We then further study a fitted SARSA algorithm, which includes the original SARSA algorithm and its variant in \cite{perkins2003convergent} as special cases. This fitted SARSA algorithm provides a more general framework for \textit{iterative} on-policy fitted policy iteration, which is more memory and computationally efficient. For this fitted SARSA algorithm, we also provide its finite-sample analysis.

Comments:	NeurIPS 2019
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1902.02234 [cs.LG]
	(or arXiv:1902.02234v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1902.02234

Submission history

From: Shaofeng Zou [view email]
[v1] Wed, 6 Feb 2019 15:33:45 UTC (57 KB)
[v2] Fri, 4 Oct 2019 18:10:46 UTC (32 KB)
[v3] Tue, 19 Nov 2019 15:53:40 UTC (32 KB)

Computer Science > Machine Learning

Title:Finite-Sample Analysis for SARSA with Linear Function Approximation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Finite-Sample Analysis for SARSA with Linear Function Approximation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators