Popcorn: Partially observed prediction constrained reinforcement learning

J Futoma, MC Hughes, F Doshi-Velez - arXiv preprint arXiv:2001.04032, 2020 - arxiv.org
arXiv preprint arXiv:2001.04032, 2020arxiv.org
Many medical decision-making tasks can be framed as partially observed Markov decision
processes (POMDPs). However, prevailing two-stage approaches that first learn a POMDP
and then solve it often fail because the model that best fits the data may not be well suited for
planning. We introduce a new optimization objective that (a) produces both high-performing
policies and high-quality generative models, even when some observations are irrelevant
for planning, and (b) does so in batch off-policy settings that are typical in healthcare, when …
Many medical decision-making tasks can be framed as partially observed Markov decision processes (POMDPs). However, prevailing two-stage approaches that first learn a POMDP and then solve it often fail because the model that best fits the data may not be well suited for planning. We introduce a new optimization objective that (a) produces both high-performing policies and high-quality generative models, even when some observations are irrelevant for planning, and (b) does so in batch off-policy settings that are typical in healthcare, when only retrospective data is available. We demonstrate our approach on synthetic examples and a challenging medical decision-making problem.
arxiv.org
Showing the best result for this search. See all results