Mean-Variance Efficient Reinforcement Learning by Expected Quadratic Utility Maximization

Kato, Masahiro; Nakagawa, Kei; Abe, Kenshi; Morimura, Tetsuro

Computer Science > Machine Learning

arXiv:2010.01404 (cs)

[Submitted on 3 Oct 2020 (v1), last revised 5 Sep 2021 (this version, v3)]

Title:Mean-Variance Efficient Reinforcement Learning by Expected Quadratic Utility Maximization

Authors:Masahiro Kato, Kei Nakagawa, Kenshi Abe, Tetsuro Morimura

View PDF

Abstract:Risk management is critical in decision making, and mean-variance (MV) trade-off is one of the most common criteria. However, in reinforcement learning (RL) for sequential decision making under uncertainty, most of the existing methods for MV control suffer from computational difficulties caused by the double sampling problem. In this paper, in contrast to strict MV control, we consider learning MV efficient policies that achieve Pareto efficiency regarding MV trade-off. To achieve this purpose, we train an agent to maximize the expected quadratic utility function, a common objective of risk management in finance and economics. We call our approach direct expected quadratic utility maximization (EQUM). The EQUM does not suffer from the double sampling issue because it does not include gradient estimation of variance. We confirm that the maximizer of the objective in the EQUM directly corresponds to an MV efficient policy under a certain condition. We conduct experiments with benchmark settings to demonstrate the effectiveness of the EQUM.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2010.01404 [cs.LG]
	(or arXiv:2010.01404v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2010.01404

Submission history

From: Masahiro Kato [view email]
[v1] Sat, 3 Oct 2020 18:17:34 UTC (770 KB)
[v2] Sat, 3 Apr 2021 20:49:23 UTC (930 KB)
[v3] Sun, 5 Sep 2021 10:28:58 UTC (952 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2020-10

Change to browse by:

cs
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Masahiro Kato
Kei Nakagawa

export BibTeX citation

Computer Science > Machine Learning

Title:Mean-Variance Efficient Reinforcement Learning by Expected Quadratic Utility Maximization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Mean-Variance Efficient Reinforcement Learning by Expected Quadratic Utility Maximization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators