Two-Player Zero-Sum Differential Games With One-Sided Information
Two-Player Zero-Sum Differential Games With One-Sided Information
Iterations
4: for (x, p) ∈ S do 103
ε̄t
ε
CFR+ 20
in Fig. 2, it is a two-player game, in which P1’s goal is to MMD DeepCFR (|A| = 16)
10−2 CFR-BR-Primal DeepCFR (|A| = 9)
get closer to the target Θ unknown to P2, while keeping P2 CAMS (ours) CAMS (ours)
with τ = T and a fixed P1 Figure 4: Trajectories using strategies from CAMS and
tr P2
initial state x0 to demonstrate DeepCFR. Markers indicate initial position.
p = 0.5
that IIEFG algorithms suffer Conclusion
from increasing costs along |A|
while CAMS does not. We con- This work highlights the need for a scalable algorithm for
sider CFR+ (Tammelin 2014), Figure 2: Hexner’s game solving incomplete-information differential games which
MMD (Sokota et al. 2022), and with a sample equilib- are structurally similar to imperfect-information games such
a modified CFR-BR (Johanson rium trajectory. P1 starts as poker. We demonstrated that SOTA IIEFG solvers are in-
et al. 2012) (dubbed CFR-BR- to move to its target after tractable when it comes to solving differential games. To
Primal, where we only focus on tr . the authors’ best knowledge, this is the first method to pro-
solving P1’s optimal strategy) as baselines. Each player’s vide tractable solution for incomplete-information differen-
state consists of 2D position and velocity. For base- tial games with continuous action spaces without problem-
lines, we discretize the action sets A1 and A2 with sizes specific abstraction and discretization.
Acknowledgment Ghimire, M.; Zhang, L.; Xu, Z.; and Ren, Y. 2024. State-
This work is partially supported by NSF CNS 2304863, Constrained Zero-Sum Differential Games with One-Sided
CNS 2339774, IIS 2332476, and ONR N00014-23-1-2505. Information. In Salakhutdinov, R.; Kolter, Z.; Heller, K.;
Weller, A.; Oliver, N.; Scarlett, J.; and Berkenkamp, F.,
eds., Proceedings of the 41st International Conference on
References Machine Learning, volume 235 of Proceedings of Machine
Abernethy, J.; Bartlett, P. L.; and Hazan, E. 2011. Black- Learning Research, 15512–15539. PMLR.
well approachability and no-regret learning are equivalent. Gilpin, A.; Hoda, S.; Pena, J.; and Sandholm, T. 2007.
In Proceedings of the 24th Annual Conference on Learning Gradient-based algorithms for finding Nash equilibria in ex-
Theory, 27–46. JMLR Workshop and Conference Proceed- tensive form games. In Internet and Network Economics:
ings. Third International Workshop, WINE 2007, San Diego,
Aumann, R. J.; Maschler, M.; and Stearns, R. E. 1995. Re- CA, USA, December 12-14, 2007. Proceedings 3, 57–69.
peated games with incomplete information. MIT press. Springer.
Billings, D.; Burch, N.; Davidson, A.; Holte, R.; Schaeffer, Gilpin, A.; and Sandholm, T. 2006. Finding equilibria in
J.; Schauenberg, T.; and Szafron, D. 2003. Approximating large sequential games of imperfect information. In Pro-
game-theoretic optimal strategies for full-scale poker. In IJ- ceedings of the 7th ACM conference on Electronic com-
CAI, volume 3, 661. merce, 160–169.
Blackwell, D. 1956. An analog of the minimax theorem for Harsanyi, J. C. 1967. Games with incomplete information
vector payoffs. played by “Bayesian” players, I–III Part I. The basic model.
Brown, N.; Bakhtin, A.; Lerer, A.; and Gong, Q. 2020a. Management science, 14(3): 159–182.
Combining deep reinforcement learning and search for Hexner, G. 1979. A differential game of incomplete infor-
imperfect-information games. Advances in Neural Informa- mation. Journal of Optimization Theory and Applications,
tion Processing Systems, 33: 17057–17069. 28: 213–232.
Brown, N.; Bakhtin, A.; Lerer, A.; and Gong, Q. 2020b. Johanson, M.; Bard, N.; Burch, N.; and Bowling, M. 2012.
Combining deep reinforcement learning and search for Finding optimal abstract strategies in extensive-form games.
imperfect-information games. Advances in Neural Informa- In Proceedings of the AAAI Conference on Artificial Intelli-
tion Processing Systems, 33: 17057–17069. gence, volume 26, 1371–1379.
Brown, N.; Lerer, A.; Gross, S.; and Sandholm, T. 2019. Koller, D.; and Megiddo, N. 1992. The complexity of two-
Deep counterfactual regret minimization. In International person zero-sum games in extensive form. Games and eco-
conference on machine learning, 793–802. PMLR. nomic behavior, 4(4): 528–552.
Brown, N.; and Sandholm, T. 2019. Superhuman AI for mul- Lanctot, M.; Waugh, K.; Zinkevich, M.; and Bowling, M.
tiplayer poker. Science, 365(6456): 885–890. 2009. Monte Carlo sampling for regret minimization in ex-
tensive games. Advances in neural information processing
Burch, N.; Johanson, M.; and Bowling, M. 2014. Solving systems, 22.
imperfect information games using decomposition. In Pro-
ceedings of the AAAI Conference on Artificial Intelligence, Lanctot, M.; Zambaldi, V.; Gruslys, A.; Lazaridou, A.;
volume 28. Tuyls, K.; Pérolat, J.; Silver, D.; and Graepel, T. 2017. A uni-
fied game-theoretic approach to multiagent reinforcement
Cardaliaguet, P. 2007. Differential games with asymmetric learning. Advances in neural information processing sys-
information. SIAM journal on Control and Optimization, tems, 30.
46(3): 816–838.
McMahan, B. 2011. Follow-the-Regularized-Leader and
Cardaliaguet, P. 2009. Numerical approximation and opti- Mirror Descent: Equivalence Theorems and L1 Regulariza-
mal strategies for differential games with lack of informa- tion. In Gordon, G.; Dunson, D.; and Dudı́k, M., eds., Pro-
tion on one side. Advances in Dynamic Games and Their ceedings of the Fourteenth International Conference on Ar-
Applications: Analytical and Numerical Developments, 1– tificial Intelligence and Statistics, volume 15 of Proceedings
18. of Machine Learning Research, 525–533. Fort Lauderdale,
Cen, S.; Wei, Y.; and Chi, Y. 2021. Fast policy extragradi- FL, USA: PMLR.
ent methods for competitive games with entropy regulariza- Moravčı́k, M.; Schmid, M.; Burch, N.; Lisỳ, V.; Morrill, D.;
tion. Advances in Neural Information Processing Systems, Bard, N.; Davis, T.; Waugh, K.; Johanson, M.; and Bowling,
34: 27952–27964. M. 2017. Deepstack: Expert-level artificial intelligence in
De Meyer, B. 1996. Repeated games, duality and the central heads-up no-limit poker. Science, 356(6337): 508–513.
limit theorem. Mathematics of Operations Research, 21(1): Perolat, J.; De Vylder, B.; Hennes, D.; Tarassov, E.; Strub,
237–251. F.; de Boer, V.; Muller, P.; Connor, J. T.; Burch, N.; Anthony,
FAIR†, M. F. A. R. D. T.; Bakhtin, A.; Brown, N.; Dinan, E.; T.; et al. 2022. Mastering the game of Stratego with model-
Farina, G.; Flaherty, C.; Fried, D.; Goff, A.; Gray, J.; Hu, H.; free multiagent reinforcement learning. Science, 378(6623):
et al. 2022. Human-level play in the game of Diplomacy by 990–996.
combining language models with strategic reasoning. Sci- Perolat, J.; Munos, R.; Lespiau, J.-B.; Omidshafiei, S.; Row-
ence, 378(6624): 1067–1074. land, M.; Ortega, P.; Burch, N.; Anthony, T.; Balduzzi, D.;
De Vylder, B.; et al. 2021. From poincaré recurrence to
convergence in imperfect information games: Finding equi-
librium via regularization. In International Conference on
Machine Learning, 8525–8535. PMLR.
Sandholm, T. 2010. The state of solving large incomplete-
information games, and application to poker. Ai Magazine,
31(4): 13–32.
Schmid, M.; Moravčı́k, M.; Burch, N.; Kadlec, R.; David-
son, J.; Waugh, K.; Bard, N.; Timbers, F.; Lanctot, M.; Hol-
land, G. Z.; et al. 2023. Student of Games: A unified learning
algorithm for both perfect and imperfect information games.
Science Advances, 9(46): eadg3256.
Silver, D.; Hubert, T.; Schrittwieser, J.; Antonoglou, I.; Lai,
M.; Guez, A.; Lanctot, M.; Sifre, L.; Kumaran, D.; Graepel,
T.; et al. 2017a. Mastering chess and shogi by self-play with
a general reinforcement learning algorithm. arXiv preprint
arXiv:1712.01815.
Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.;
Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton,
A.; et al. 2017b. Mastering the game of go without human
knowledge. nature, 550(7676): 354–359.
Sokota, S.; D’Orazio, R.; Kolter, J. Z.; Loizou, N.; Lanctot,
M.; Mitliagkas, I.; Brown, N.; and Kroer, C. 2022. A uni-
fied approach to reinforcement learning, quantal response
equilibria, and two-player zero-sum games. arXiv preprint
arXiv:2206.05825.
Tammelin, O. 2014. Solving large imperfect information
games using CFR+. arXiv preprint arXiv:1407.5042.
Vieillard, N.; Kozuno, T.; Scherrer, B.; Pietquin, O.; Munos,
R.; and Geist, M. 2020. Leverage the average: an analysis
of kl regularization in reinforcement learning. Advances in
Neural Information Processing Systems, 33: 12163–12174.
Wang, Z.; Veličković, P.; Hennes, D.; Tomašev, N.; Prince,
L.; Kaisers, M.; Bachrach, Y.; Elie, R.; Wenliang, L. K.; Pic-
cinini, F.; et al. 2024. TacticAI: an AI assistant for football
tactics. Nature communications, 15(1): 1906.
Zheng, T.; Zhu, L.; So, A. M.-C.; Blanchet, J.; and Li,
J. 2023. Universal gradient descent ascent method for
nonconvex-nonconcave minimax optimization. Advances in
Neural Information Processing Systems, 36: 54075–54110.
Zinkevich, M.; Johanson, M.; Bowling, M.; and Piccione, C.
2007. Regret minimization in games with incomplete infor-
mation. Advances in neural information processing systems,
20.