Multiagent soft q-learning

E Wei, D Wicke, D Freelan, S Luke - arXiv preprint arXiv:1804.09817, 2018 - arxiv.org
… We will first derive the policy gradient estimator for the cooperative multiagent case and then
… We first introduce Soft Q-Learning and then describe how we use it for multiagent training. …

Regularized softmax deep multi-agent q-learning

L Pan, T Rashid, B Peng, L Huang… - Advances in Neural …, 2021 - proceedings.neurips.cc
… Tackling overestimation in Q-learning is an important … multi-agent setting. In this work,
we empirically demonstrate that QMIX, a popular Q-learning algorithm for cooperative multiagent

Inverse factorized soft Q-Learning for cooperative multi-agent imitation learning

T Mai, T NGUYEN - 2024 - ink.library.smu.edu.sg
… Recall that when convex activation functions are used in building mixing networks, our
Theorem 4.5 shows that the objective function of the multi-agent inverse soft Q-learning is convex …

Modular Q-learning based multi-agent cooperation for robot soccer

KH Park, YJ Kim, JH Kim - Robotics and Autonomous systems, 2001 - Elsevier
… from other multi-agent … , Q-learning can be used in the reinforcement scheme as it is
applicable where no model of the environment is available [8], [16]. In this paper, modular Q-learning

Iq-learn: Inverse soft-q learning for imitation

D Garg, S Chakraborty, C Cundy… - Advances in Neural …, 2021 - proceedings.neurips.cc
Q-learning update rule for imitation learning that can be implemented on top of soft-Q learning
or soft actor-… Our objective forms a variant of soft-Q learning: to learn the optimal Q-function …

[PDF][PDF] Learning to Coordinate Efficiently through Multiagent Soft Q-Learning in the presence of Game-Theoretic Pathologies

S Danisa - 2022 - open.uct.ac.za
… Lastly, we introduce extensions of multiagent soft Q-learning whose designs are in line
with our hypotheses and their respective experimental investigations. This chapter is then …

Multi-Agent Optimistic Soft Q-Learning: A Co-MARL Algorithm with a Global Convergence Guarantee

R Hu, L Ying - openreview.net
… Based on this observation, we propose multi-agent optimistic soft Q-learning (MAOSQL) by
… it determines local policy resembles that of soft Q-learning. This definition naturally leads to a …

Reducing Q-Value Estimation Bias via Mutual Estimation and Softmax Operation in MADRL

Z Li, X Chen, J Fu, N Xie, T Zhao - Algorithms, 2024 - mdpi.com
… a multi-agent mutual evaluation method and a multi-agent softmax method to reduce the
estimation bias of Q values in multi-agent … This paper primarily studies the Q-learning of agents …

Balancing two-player stochastic games with soft q-learning

J Grau-Moya, F Leibfried, H Bou-Ammar - arXiv preprint arXiv:1802.03216, 2018 - arxiv.org
… behaviour by generalising soft Q-learning to stochastic games, … On the theory side, we show
that games with soft Q-learning … Markov games as a framework for multi-agent reinforcement …

Maximum entropy gflownets with soft q-learning

S Mohammadpour, E Bengio… - International …, 2024 - proceedings.mlr.press
… It is convenient to learn log n both for numerical purposes and for the synergy it has with
Soft Qlearning. In the log space, the sum in (3) is replaced by a log-sum-exp. Indeed, as the …