Safe Multi-Agent Reinforcement Learning via Approximate Hamilton-Jacobi Reachability

Zhu, Kai; Lan, Fengbo; Zhao, Wenbo; Zhang, Tao

doi:10.1007/s10846-024-02156-6

Safe Multi-Agent Reinforcement Learning via Approximate Hamilton-Jacobi Reachability

Regular paper
Open access
Published: 30 December 2024

Volume 111, article number 7, (2025)
Cite this article

Download PDF

You have full access to this open access article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Safe Multi-Agent Reinforcement Learning via Approximate Hamilton-Jacobi Reachability

Download PDF

Kai Zhu¹,
Fengbo Lan¹,
Wenbo Zhao¹ &
…
Tao Zhang^1,2

831 Accesses
Explore all metrics

Abstract

Multi-Agent Reinforcement Learning (MARL) promises to address the challenges of cooperation and competition among multiple agents, often involving safety-critical scenarios. However, realizing safe MARL remains a domain of limited progress. Current works extend single-agent safe learning approaches, employing shielding or backup policies to ensure safety satisfaction. Nevertheless, these approaches require good cooperation among multiple agents, and weakly distributed approaches with centralized shielding become infeasible when agents encounter complex situations such as non-cooperative agents and coordination failures. In this paper, we integrate the Hamilton-Jacobi (HJ) reachability theory and present a Centralized Training and Decentralized Execution (CTDE) framework for Safe MARL. Our framework enables the learning of safety policies without the need for system model or shielding layer pre-training. Additionally, we enhance adaptability to varying levels of cooperation through a conservative approximation estimation of the value function. Experimental results validate the efficacy of our proposed method, demonstrating its ability to ensure safety while successfully achieving target tasks under cooperative conditions. Furthermore, our approach exhibits robustness in the face of non-cooperative behaviors induced by complex disturbance factors.

Article PDF

Bottom-up multi-agent reinforcement learning by reward shaping for cooperative-competitive tasks

Article 04 January 2021

Uncertainty modified policy for multi-agent reinforcement learning

Article 09 September 2024

MRRC: Multi-agent Reinforcement Learning with Rectification Capability in Cooperative Tasks

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Code Availability

The code used in this paper is available from the corresponding author upon reasonable request.

References

Gronauer, S., Diepold, K.: Multi-agent deep reinforcement learning: a survey. Artif. Intell. Rev., 1–49 (2022)
Altman, E.: Constrained Markov Decision Processes vol. 7, (1999)
Brunke, L., Greeff, M., Hall, A.W., Yuan, Z., Zhou, S., Panerati, J., Schoellig, A.P.: Safe learning in robotics: From learning-based control to safe reinforcement learning. Ann. Rev. Control Robot. Auton. Syst. 5, 411–444 (2022)
Article MATH Google Scholar
Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Dalal, G., Dvijotham, K., Vecerik, M., Hester, T., Paduraru, C., Tassa, Y.: Safe Exploration in Continuous Action Spaces. 1801–08757 (2018) https://doi.org/10.48550/arXiv.1801.08757 arXiv:1801.08757 [cs.AI]
Sheebaelhamd, Z., Zisis, K., Nisioti, A., Gkouletsos, D., Pavllo, D., Kohler, J.: Safe Deep Reinforcement Learning for Multi-Agent Systems with Continuous Action Spaces. 2108–03952 (2021) https://doi.org/10.48550/arXiv.2108.03952 arXiv:2108.03952 [cs.LG]
Lowe, R., WU, Y., Tamar, A., Harb, J., Pieter Abbeel, O., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. Adv. Neural Inf. Process. Syst., 30 (2017)
Mitchell, I.M., Bayen, A.M., Tomlin, C.J.: A time-dependent hamilton-jacobi formulation of reachable sets for continuous dynamic games. IEEE Trans. Autom. Control 50(7), 947–957 (2005). https://doi.org/10.1109/TAC.2005.851439
Article MathSciNet MATH Google Scholar
Munos, R., Baird, L.C., Moore, A.W.: Gradient descent approaches to neural-net-based solutions of the hamilton-jacobi-bellman equation. In: IJCNN’99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339), vol. 3, pp. 2152–21573 (1999). https://doi.org/10.1109/IJCNN.1999.832721
Fisac, J.F., Lugovoy, N.F., Rubies-Royo, V., Ghosh, S., Tomlin, C.J.: Bridging hamilton-jacobi safety analysis and reinforcement learning. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 8550–8556 (2019). https://doi.org/10.1109/ICRA.2019.8794107
Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W.M., Zambaldi, V., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J.Z., Tuyls, K., Graepel, T.: Value-decomposition networks for cooperative multi-agent learning based on team reward. In: 18th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS) (AAMAS’ 18), pp. 2085–2087 (2018)
Rashid, T., Samvelyan, M., Witt, C.S., Farquhar, G., Foerster, J., Whiteson, S.: Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: 35th International Conference on Machine Learning (ICML). Proceedings of Machine Learning Research, vol. 80 (2018)
Son, K., Kim, D., Kang, W.J., Hostallero, D.E., Yi, Y.: Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 5887–5896 (2019). PMLR
Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Yang, J., Nakhaei, A., Isele, D., Fujimura, K., Zha, H.: Cm3: Cooperative multi-goal multi-stage multi-agent reinforcement learning. In: International Conference on Learning Representations (2020)
Yu, C., Velu, A., Vinitsky, E., Gao, J., Wang, Y., Bayen, A., Wu, Y.: The surprising effectiveness of ppo in cooperative multi-agent games. Adv. Neural Inf. Process. Syst. 35, 24611–24624 (2022)
MATH Google Scholar
Kuba, J., Chen, R., Wen, M., Wen, Y., Sun, F., Wang, J., Yang, Y.: Trust region policy optimisation in multi-agent reinforcement learning. In: ICLR 2022-10th International Conference on Learning Representations, pp. 1046 (2022). The International Conference on Learning Representations (ICLR)
Amhraoui, E., Masrour, T.: Smooth q-learning: An algorithm for independent learners in stochastic cooperative markov games. J. Intell. Robot. Syst. 108(4), 65 (2023)
Article MATH Google Scholar
Wen, M., Kuba, J., Lin, R., Zhang, W., Wen, Y., Wang, J., Yang, Y.: Multi-agent reinforcement learning is a sequence modeling problem. Adv. Neural Inf. Process. Syst. 35, 16509–16521 (2022)
MATH Google Scholar
Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437–1480 (2015)
MathSciNet MATH Google Scholar
Gu, S., Yang, L., Du, Y., Chen, G., Walter, F., Wang, J., Yang, Y., Knoll, A.: A review of safe reinforcement learning: Methods, theory and applications. arXiv:2205.10330 (2022)
Chow, Y., Ghavamzadeh, M., Janson, L., Pavone, M.: Risk-constrained reinforcement learning with percentile risk criteria. J. Mach. Learn. Res. 18(167), 1–51 (2018)
MathSciNet MATH Google Scholar
Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: International Conference on Machine Learning, pp. 22–31 (2017). PMLR
Yang, T.-Y., Rosca, J., Narasimhan, K., Ramadge, P.J.: Projection-based constrained policy optimization. In: International Conference on Learning Representations (2019)
T.-Y. Yang, J. Rosca, K. Narasimhan, and P. J. Ramadge: Accelerating safe reinforcement learning with constraint-mismatched baseline policies. In: International Conference on Machine Learning, pp. 11795–11807 (2021). PMLR
Gu, S., Kuba, J.G., Chen, Y., Du, Y., Yang, L., Knoll, A., Yang, Y.: Safe multi-agent reinforcement learning for multi-robot control. Artif. Intell. 319, 103905 (2023)
Article MathSciNet MATH Google Scholar
Ziyan, W., Yali, D., Aivar, S., Haitham Bou, A., Jun, W.: Cama : A new framework for safe multi-agent reinforcement learning using constraint augmentation. (2023)
Sootla, A., Cowen-Rivers, A.I., Jafferjee, T., Wang, Z., Mguni, D.H., Wang, J., Ammar, H.: Sauté rl: Almost surely safe reinforcement learning using state augmentation. In: International Conference on Machine Learning, pp. 20423–20443 (2022). PMLR
Zhao, W., He, T., Chen, R., Wei, T., Liu, C.: State-wise safe reinforcement learning: A survey. arXiv:2302.03122. (2023)
ElSayed-Aly, I., Bharadwaj, S., Amato, C., Ehlers, R., Topcu, U., Feng, L.: Safe multi-agent reinforcement learning via shielding, 483–491 (2021)
Li, S., Wu, Y., Cui, X., Dong, H., Fang, F., Russell, S.: Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4213–4220 (2019)
Bansal, S., Chen, M., Herbert, S., Tomlin, C.J.: Hamilton-jacobi reachability: A brief overview and recent advances. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 2242–2253 (2017). IEEE
Chen, M., Tomlin, C.J.: Hamilton-jacobi reachability: Some recent theoretical advances and applications in unmanned airspace management. Ann. Rev. Control Robot. Auton. Syst. 1, 333–358 (2018)
Article MATH Google Scholar
Shao, Y.S., Chen, C., Kousik, S., Vasudevan, R.: Reachability-based trajectory safeguard (rts): A safe and fast reinforcement learning safety layer for continuous control. IEEE Robot. Autom. Lett. 6(2), 3663–3670 (2021)
Article MATH Google Scholar
Kochdumper, N., Krasowski, H., Wang, X., Bak, S., Althoff, M.: Provably safe reinforcement learning via action projection using reachability analysis and polynomial zonotopes. IEEE Open J. Control Syst. 2, 79–92 (2023)
Article MATH Google Scholar
Selim, M., Alanwar, A., Kousik, S., Gao, G., Pavone, M., Johansson, K.H.: Safe reinforcement learning using black-box reachability analysis. IEEE Robot. Autom. Lett. 7(4), 10665–10672 (2022)
Article MATH Google Scholar
Hsu, K.-C., Rubies-Royo, V., Tomlin, C.J., Fisac, J.F.: Safety and liveness guarantees through reach-avoid reinforcement learning. In: Proceedings of Robotics: Science and Systems, Held Virtually (2021). https://doi.org/10.15607/RSS.2021.XVII.077
Yu, D., Ma, H., Li, S., Chen, J.: Reachability constrained reinforcement learning. In: International Conference on Machine Learning, pp. 25636–25655 (2022). PMLR
Ganai, M., Gong, Z., Yu, C., Herbert, S., Gao, S.: Iterative reachability estimation for safe reinforcement learning. Adv. Neural Inf. Process. Syst. 36 (2024)
Bardi, M., Falcone, M., Soravia, P.: Numerical methods for pursuit-evasion games via viscosity solutions. In: Stochastic and Differential Games: Theory and Numerical Methods, pp. 105–175 (1999)
Munos, R., Baird, L.C., Moore, A.W.: Gradient descent approaches to neural-net-based solutions of the hamilton-jacobi-bellman equation. In: IJCNN’99. International Joint Conference on Neural Networks. Proceedings (Cat. No. 99CH36339), vol. 3, pp. 2152–2157 (1999). IEEE
Kumar, A., Zhou, A., Tucker, G., Levine, S.: Conservative q-learning for offline reinforcement learning. Adv. Neural Inf. Process. Systems. 33, 1179–1191 (2020)
MATH Google Scholar
Bharadhwaj, H., Kumar, A., Rhinehart, N., Levine, S., Shkurti, F., Garg, A.: Conservative safety critics for exploration. In: International Conference on Learning Representations (2021)
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. Robotica. 17(2), 229–235 (1999)
MATH Google Scholar
Mordatch, I., Abbeel, P.: Emergence of grounded compositional language in multi-agent populations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

Download references

Acknowledgements

We would like to thank the support of Scientific and Technological Innovation 2030 under Grant 2021ZD0110900.

Funding

This research was funded by Scientific and Technological Innovation 2030 under Grant 2021ZD0110900.

Author information

Authors and Affiliations

Department of Automation, Tsinghua University, Beijing, 100084, China
Kai Zhu, Fengbo Lan, Wenbo Zhao & Tao Zhang
Beijing National Research Center for Information Science and Technology, Beijing, 100084, China
Tao Zhang

Authors

Kai Zhu
View author publications
Search author on:PubMed Google Scholar
Fengbo Lan
View author publications
Search author on:PubMed Google Scholar
Wenbo Zhao
View author publications
Search author on:PubMed Google Scholar
Tao Zhang
View author publications
Search author on:PubMed Google Scholar

Contributions

Kai Zhu contributed to the design, implementation and manuscript writing of the research; Fengbo Lan and Wenbo Zhao contributed to the discussion of the experiment and the revision of the manuscript; Tao Zhang supervised the work and has reviewed, and edited the final version of the paper.

Corresponding author

Correspondence to Tao Zhang.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflicts of interest.

Consent to participate

All authors of this research paper have consented to participate in the research study.

Consent for publication

All authors of this research paper have read and approved the submitted version.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zhu, K., Lan, F., Zhao, W. et al. Safe Multi-Agent Reinforcement Learning via Approximate Hamilton-Jacobi Reachability. J Intell Robot Syst 111, 7 (2025). https://doi.org/10.1007/s10846-024-02156-6

Download citation

Received: 01 November 2023
Accepted: 05 August 2024
Published: 30 December 2024
DOI: https://doi.org/10.1007/s10846-024-02156-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Safe Multi-Agent Reinforcement Learning via Approximate Hamilton-Jacobi Reachability

Abstract

Article PDF

Similar content being viewed by others

Bottom-up multi-agent reinforcement learning by reward shaping for cooperative-competitive tasks

Uncertainty modified policy for multi-agent reinforcement learning

MRRC: Multi-agent Reinforcement Learning with Rectification Capability in Cooperative Tasks

Explore related subjects

Code Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords