Skip to main content
Springer Nature Link
Account
Menu
Find a journal Publish with us Track your research
Search
Cart
  1. Home
  2. Journal of Intelligent & Robotic Systems
  3. Article

Safe Multi-Agent Reinforcement Learning via Approximate Hamilton-Jacobi Reachability

  • Regular paper
  • Open access
  • Published: 30 December 2024
  • Volume 111, article number 7, (2025)
  • Cite this article
Download PDF

You have full access to this open access article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript
Safe Multi-Agent Reinforcement Learning via Approximate Hamilton-Jacobi Reachability
Download PDF
  • Kai Zhu1,
  • Fengbo Lan1,
  • Wenbo Zhao1 &
  • …
  • Tao Zhang1,2 
  • 831 Accesses

  • Explore all metrics

Abstract

Multi-Agent Reinforcement Learning (MARL) promises to address the challenges of cooperation and competition among multiple agents, often involving safety-critical scenarios. However, realizing safe MARL remains a domain of limited progress. Current works extend single-agent safe learning approaches, employing shielding or backup policies to ensure safety satisfaction. Nevertheless, these approaches require good cooperation among multiple agents, and weakly distributed approaches with centralized shielding become infeasible when agents encounter complex situations such as non-cooperative agents and coordination failures. In this paper, we integrate the Hamilton-Jacobi (HJ) reachability theory and present a Centralized Training and Decentralized Execution (CTDE) framework for Safe MARL. Our framework enables the learning of safety policies without the need for system model or shielding layer pre-training. Additionally, we enhance adaptability to varying levels of cooperation through a conservative approximation estimation of the value function. Experimental results validate the efficacy of our proposed method, demonstrating its ability to ensure safety while successfully achieving target tasks under cooperative conditions. Furthermore, our approach exhibits robustness in the face of non-cooperative behaviors induced by complex disturbance factors.

Article PDF

Download to read the full article text

Similar content being viewed by others

Bottom-up multi-agent reinforcement learning by reward shaping for cooperative-competitive tasks

Article 04 January 2021

Uncertainty modified policy for multi-agent reinforcement learning

Article 09 September 2024

MRRC: Multi-agent Reinforcement Learning with Rectification Capability in Cooperative Tasks

Chapter © 2024

Explore related subjects

Discover the latest articles, books and news in related subjects, suggested using machine learning.
  • Chemical Safety
  • Learning Theory
  • Multistability
  • Smooth pursuit
  • Stochastic Learning and Adaptive Control
  • Teamwork
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Code Availability

The code used in this paper is available from the corresponding author upon reasonable request.

References

  1. Gronauer, S., Diepold, K.: Multi-agent deep reinforcement learning: a survey. Artif. Intell. Rev., 1–49 (2022)

  2. Altman, E.: Constrained Markov Decision Processes vol. 7, (1999)

  3. Brunke, L., Greeff, M., Hall, A.W., Yuan, Z., Zhou, S., Panerati, J., Schoellig, A.P.: Safe learning in robotics: From learning-based control to safe reinforcement learning. Ann. Rev. Control Robot. Auton. Syst. 5, 411–444 (2022)

    Article  MATH  Google Scholar 

  4. Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

  5. Dalal, G., Dvijotham, K., Vecerik, M., Hester, T., Paduraru, C., Tassa, Y.: Safe Exploration in Continuous Action Spaces. 1801–08757 (2018) https://doi.org/10.48550/arXiv.1801.08757arXiv:1801.08757 [cs.AI]

  6. Sheebaelhamd, Z., Zisis, K., Nisioti, A., Gkouletsos, D., Pavllo, D., Kohler, J.: Safe Deep Reinforcement Learning for Multi-Agent Systems with Continuous Action Spaces. 2108–03952 (2021) https://doi.org/10.48550/arXiv.2108.03952arXiv:2108.03952 [cs.LG]

  7. Lowe, R., WU, Y., Tamar, A., Harb, J., Pieter Abbeel, O., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. Adv. Neural Inf. Process. Syst., 30 (2017)

  8. Mitchell, I.M., Bayen, A.M., Tomlin, C.J.: A time-dependent hamilton-jacobi formulation of reachable sets for continuous dynamic games. IEEE Trans. Autom. Control 50(7), 947–957 (2005). https://doi.org/10.1109/TAC.2005.851439

    Article  MathSciNet  MATH  Google Scholar 

  9. Munos, R., Baird, L.C., Moore, A.W.: Gradient descent approaches to neural-net-based solutions of the hamilton-jacobi-bellman equation. In: IJCNN’99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339), vol. 3, pp. 2152–21573 (1999). https://doi.org/10.1109/IJCNN.1999.832721

  10. Fisac, J.F., Lugovoy, N.F., Rubies-Royo, V., Ghosh, S., Tomlin, C.J.: Bridging hamilton-jacobi safety analysis and reinforcement learning. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 8550–8556 (2019). https://doi.org/10.1109/ICRA.2019.8794107

  11. Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W.M., Zambaldi, V., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J.Z., Tuyls, K., Graepel, T.: Value-decomposition networks for cooperative multi-agent learning based on team reward. In: 18th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS) (AAMAS’ 18), pp. 2085–2087 (2018)

  12. Rashid, T., Samvelyan, M., Witt, C.S., Farquhar, G., Foerster, J., Whiteson, S.: Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: 35th International Conference on Machine Learning (ICML). Proceedings of Machine Learning Research, vol. 80 (2018)

  13. Son, K., Kim, D., Kang, W.J., Hostallero, D.E., Yi, Y.: Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 5887–5896 (2019). PMLR

  14. Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

  15. Yang, J., Nakhaei, A., Isele, D., Fujimura, K., Zha, H.: Cm3: Cooperative multi-goal multi-stage multi-agent reinforcement learning. In: International Conference on Learning Representations (2020)

  16. Yu, C., Velu, A., Vinitsky, E., Gao, J., Wang, Y., Bayen, A., Wu, Y.: The surprising effectiveness of ppo in cooperative multi-agent games. Adv. Neural Inf. Process. Syst. 35, 24611–24624 (2022)

    MATH  Google Scholar 

  17. Kuba, J., Chen, R., Wen, M., Wen, Y., Sun, F., Wang, J., Yang, Y.: Trust region policy optimisation in multi-agent reinforcement learning. In: ICLR 2022-10th International Conference on Learning Representations, pp. 1046 (2022). The International Conference on Learning Representations (ICLR)

  18. Amhraoui, E., Masrour, T.: Smooth q-learning: An algorithm for independent learners in stochastic cooperative markov games. J. Intell. Robot. Syst. 108(4), 65 (2023)

    Article  MATH  Google Scholar 

  19. Wen, M., Kuba, J., Lin, R., Zhang, W., Wen, Y., Wang, J., Yang, Y.: Multi-agent reinforcement learning is a sequence modeling problem. Adv. Neural Inf. Process. Syst. 35, 16509–16521 (2022)

    MATH  Google Scholar 

  20. Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437–1480 (2015)

    MathSciNet  MATH  Google Scholar 

  21. Gu, S., Yang, L., Du, Y., Chen, G., Walter, F., Wang, J., Yang, Y., Knoll, A.: A review of safe reinforcement learning: Methods, theory and applications. arXiv:2205.10330 (2022)

  22. Chow, Y., Ghavamzadeh, M., Janson, L., Pavone, M.: Risk-constrained reinforcement learning with percentile risk criteria. J. Mach. Learn. Res. 18(167), 1–51 (2018)

    MathSciNet  MATH  Google Scholar 

  23. Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: International Conference on Machine Learning, pp. 22–31 (2017). PMLR

  24. Yang, T.-Y., Rosca, J., Narasimhan, K., Ramadge, P.J.: Projection-based constrained policy optimization. In: International Conference on Learning Representations (2019)

  25. T.-Y. Yang, J. Rosca, K. Narasimhan, and P. J. Ramadge: Accelerating safe reinforcement learning with constraint-mismatched baseline policies. In: International Conference on Machine Learning, pp. 11795–11807 (2021). PMLR

  26. Gu, S., Kuba, J.G., Chen, Y., Du, Y., Yang, L., Knoll, A., Yang, Y.: Safe multi-agent reinforcement learning for multi-robot control. Artif. Intell. 319, 103905 (2023)

    Article  MathSciNet  MATH  Google Scholar 

  27. Ziyan, W., Yali, D., Aivar, S., Haitham Bou, A., Jun, W.: Cama : A new framework for safe multi-agent reinforcement learning using constraint augmentation. (2023)

  28. Sootla, A., Cowen-Rivers, A.I., Jafferjee, T., Wang, Z., Mguni, D.H., Wang, J., Ammar, H.: Sauté rl: Almost surely safe reinforcement learning using state augmentation. In: International Conference on Machine Learning, pp. 20423–20443 (2022). PMLR

  29. Zhao, W., He, T., Chen, R., Wei, T., Liu, C.: State-wise safe reinforcement learning: A survey. arXiv:2302.03122. (2023)

  30. ElSayed-Aly, I., Bharadwaj, S., Amato, C., Ehlers, R., Topcu, U., Feng, L.: Safe multi-agent reinforcement learning via shielding, 483–491 (2021)

  31. Li, S., Wu, Y., Cui, X., Dong, H., Fang, F., Russell, S.: Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4213–4220 (2019)

  32. Bansal, S., Chen, M., Herbert, S., Tomlin, C.J.: Hamilton-jacobi reachability: A brief overview and recent advances. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 2242–2253 (2017). IEEE

  33. Chen, M., Tomlin, C.J.: Hamilton-jacobi reachability: Some recent theoretical advances and applications in unmanned airspace management. Ann. Rev. Control Robot. Auton. Syst. 1, 333–358 (2018)

    Article  MATH  Google Scholar 

  34. Shao, Y.S., Chen, C., Kousik, S., Vasudevan, R.: Reachability-based trajectory safeguard (rts): A safe and fast reinforcement learning safety layer for continuous control. IEEE Robot. Autom. Lett. 6(2), 3663–3670 (2021)

    Article  MATH  Google Scholar 

  35. Kochdumper, N., Krasowski, H., Wang, X., Bak, S., Althoff, M.: Provably safe reinforcement learning via action projection using reachability analysis and polynomial zonotopes. IEEE Open J. Control Syst. 2, 79–92 (2023)

    Article  MATH  Google Scholar 

  36. Selim, M., Alanwar, A., Kousik, S., Gao, G., Pavone, M., Johansson, K.H.: Safe reinforcement learning using black-box reachability analysis. IEEE Robot. Autom. Lett. 7(4), 10665–10672 (2022)

    Article  MATH  Google Scholar 

  37. Hsu, K.-C., Rubies-Royo, V., Tomlin, C.J., Fisac, J.F.: Safety and liveness guarantees through reach-avoid reinforcement learning. In: Proceedings of Robotics: Science and Systems, Held Virtually (2021). https://doi.org/10.15607/RSS.2021.XVII.077

  38. Yu, D., Ma, H., Li, S., Chen, J.: Reachability constrained reinforcement learning. In: International Conference on Machine Learning, pp. 25636–25655 (2022). PMLR

  39. Ganai, M., Gong, Z., Yu, C., Herbert, S., Gao, S.: Iterative reachability estimation for safe reinforcement learning. Adv. Neural Inf. Process. Syst. 36 (2024)

  40. Bardi, M., Falcone, M., Soravia, P.: Numerical methods for pursuit-evasion games via viscosity solutions. In: Stochastic and Differential Games: Theory and Numerical Methods, pp. 105–175 (1999)

  41. Munos, R., Baird, L.C., Moore, A.W.: Gradient descent approaches to neural-net-based solutions of the hamilton-jacobi-bellman equation. In: IJCNN’99. International Joint Conference on Neural Networks. Proceedings (Cat. No. 99CH36339), vol. 3, pp. 2152–2157 (1999). IEEE

  42. Kumar, A., Zhou, A., Tucker, G., Levine, S.: Conservative q-learning for offline reinforcement learning. Adv. Neural Inf. Process. Systems. 33, 1179–1191 (2020)

    MATH  Google Scholar 

  43. Bharadhwaj, H., Kumar, A., Rhinehart, N., Levine, S., Shkurti, F., Garg, A.: Conservative safety critics for exploration. In: International Conference on Learning Representations (2021)

  44. Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. Robotica. 17(2), 229–235 (1999)

    MATH  Google Scholar 

  45. Mordatch, I., Abbeel, P.: Emergence of grounded compositional language in multi-agent populations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

Download references

Acknowledgements

We would like to thank the support of Scientific and Technological Innovation 2030 under Grant 2021ZD0110900.

Funding

This research was funded by Scientific and Technological Innovation 2030 under Grant 2021ZD0110900.

Author information

Authors and Affiliations

  1. Department of Automation, Tsinghua University, Beijing, 100084, China

    Kai Zhu, Fengbo Lan, Wenbo Zhao & Tao Zhang

  2. Beijing National Research Center for Information Science and Technology, Beijing, 100084, China

    Tao Zhang

Authors
  1. Kai Zhu
    View author publications

    Search author on:PubMed Google Scholar

  2. Fengbo Lan
    View author publications

    Search author on:PubMed Google Scholar

  3. Wenbo Zhao
    View author publications

    Search author on:PubMed Google Scholar

  4. Tao Zhang
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Kai Zhu contributed to the design, implementation and manuscript writing of the research; Fengbo Lan and Wenbo Zhao contributed to the discussion of the experiment and the revision of the manuscript; Tao Zhang supervised the work and has reviewed, and edited the final version of the paper.

Corresponding author

Correspondence to Tao Zhang.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflicts of interest.

Consent to participate

All authors of this research paper have consented to participate in the research study.

Consent for publication

All authors of this research paper have read and approved the submitted version.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, K., Lan, F., Zhao, W. et al. Safe Multi-Agent Reinforcement Learning via Approximate Hamilton-Jacobi Reachability. J Intell Robot Syst 111, 7 (2025). https://doi.org/10.1007/s10846-024-02156-6

Download citation

  • Received: 01 November 2023

  • Accepted: 05 August 2024

  • Published: 30 December 2024

  • DOI: https://doi.org/10.1007/s10846-024-02156-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Multi-agent systems
  • Deep reinforcement learning
  • Safety satisfaction
  • Hamilton-Jacobi reachability
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Advertisement

Search

Navigation

  • Find a journal
  • Publish with us
  • Track your research

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Journal finder
  • Publish your research
  • Language editing
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our brands

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Discover
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support
  • Legal notice
  • Cancel contracts here

Not affiliated

Springer Nature

© 2025 Springer Nature