Efficient Model-Based Concave Utility Reinforcement Learning through Greedy Mirror Descent

Moreno, Bianca Marin; Brégère, Margaux; Gaillard, Pierre; Oudjane, Nadia

Mathematics > Optimization and Control

arXiv:2311.18346 (math)

[Submitted on 30 Nov 2023]

Title:Efficient Model-Based Concave Utility Reinforcement Learning through Greedy Mirror Descent

Authors:Bianca Marin Moreno (Thoth), Margaux Brégère (EDF R&D, LPSM, SU), Pierre Gaillard (Thoth), Nadia Oudjane (EDF R&D)

View PDF

Abstract:Many machine learning tasks can be solved by minimizing a convex function of an occupancy measure over the policies that generate them. These include reinforcement learning, imitation learning, among others. This more general paradigm is called the Concave Utility Reinforcement Learning problem (CURL). Since CURL invalidates classical Bellman equations, it requires new algorithms. We introduce MD-CURL, a new algorithm for CURL in a finite horizon Markov decision process. MD-CURL is inspired by mirror descent and uses a non-standard regularization to achieve convergence guarantees and a simple closed-form solution, eliminating the need for computationally expensive projection steps typically found in mirror descent approaches. We then extend CURL to an online learning scenario and present Greedy MD-CURL, a new method adapting MD-CURL to an online, episode-based setting with partially unknown dynamics. Like MD-CURL, the online version Greedy MD-CURL benefits from low computational complexity, while guaranteeing sub-linear or even logarithmic regret, depending on the level of information available on the underlying dynamics.

Subjects:	Optimization and Control (math.OC); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)
Cite as:	arXiv:2311.18346 [math.OC]
	(or arXiv:2311.18346v1 [math.OC] for this version)
	https://doi.org/10.48550/arXiv.2311.18346

Submission history

From: Bianca Marin Moreno [view email] [via CCSD proxy]
[v1] Thu, 30 Nov 2023 08:32:50 UTC (801 KB)

Mathematics > Optimization and Control

Title:Efficient Model-Based Concave Utility Reinforcement Learning through Greedy Mirror Descent

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Mathematics > Optimization and Control

Title:Efficient Model-Based Concave Utility Reinforcement Learning through Greedy Mirror Descent

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators