0% found this document useful (0 votes)

13 views6 pages

Two-Player Zero-Sum Differential Games With One-Sided Information

This document presents a study on two-player zero-sum differential games with one-sided information, focusing on continuous action spaces and the challenges they pose for computational complexity. The authors develop an algorithm that approximates optimal strategies while leveraging insights from convexification and the Isaacs' condition, enabling scalable convergence to equilibrium. Real-world applications include scenarios like sports matchups where one player has private information, and the paper contributes to the understanding of game theory in the context of incomplete information.

Uploaded by

cnstiger625

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views6 pages

Two-Player Zero-Sum Differential Games With One-Sided Information

Uploaded by

cnstiger625

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Two-Player Zero-Sum Differential Games with One-Sided Information

Mukesh Ghimire1 , Zhe Xu1 , Yi Ren1

1
Arizona State University
{mghimire, xzhe1, yiren}@asu.edu
arXiv:2502.05314v2 [cs.GT] 13 Feb 2025

Abstract As a step towards addressing this challenge, our study fo-

cuses on games with one-sided information, which repre-
Unlike Poker where the action space A is discrete, differ-
ential games in the physical world often have continuous
sent a variety of attack-defence scenarios: Both players have
action spaces not amenable to discrete abstraction, render- common knowledge about the finite set of I possible payoff
ing no-regret algorithms with O(|A|) complexity not scal- types and nature’s distribution over these types p0 . At the be-
able. To address this challenge within the scope of two- ginning of the game, nature draws a type and informs Player
player zero-sum (2p0s) games with one-sided information, 1 (P1) about the type but not P2. As the game progresses, the
we show that (1) a computational complexity independent of public belief about the chosen type is updated from p0 based
|A| can be achieved by exploiting the convexification prop- on the action sequence taken by P1 via the Bayes rule. P1’s
erty of incomplete-information games and the Isaacs’ condi- goal is to minimize the expected cost over p0 . This game is
tion that commonly holds for dynamical systems, and that proved to have a value under Isaacs’ condition (Cardaliaguet
(2) the computation of the two equilibrium strategies can 2009). Due to the zero-sum nature, P1 may need to delay
be decoupled under one-sidedness of information. Leverag-
ing these insights, we develop an algorithm that successfully
information release or manipulate P2’s belief to take full ad-
approximates the optimal strategy in a homing game. Code vantage of information asymmetry, and P2’s strategy is to
available in github1 . optimize the worst-case payoff. Real-world examples of the
game include man-on-man matchup in sports where the at-
tacker has private information about which play is to be ex-
Introduction ecuted, and defense games where multiple potential targets
The strength of game solvers has grown rapidly in the last are concerned.
decade, beating elite-level human players in Chess (Silver The two differences between
et al. 2017a), Go (Silver et al. 2017b), Poker (Brown and our game and commonly studied
Sandholm 2019; Brown et al. 2020a), Diplomacy (FAIR† imperfect-information extensive-
et al. 2022), Stratego (Perolat et al. 2022), among others with form games (IIEFGs) (Sandholm
increasing complexity. These successes motivated recent in- 2010; Perolat et al. 2022; FAIR†
terests in solving differential games in continuous time and et al. 2022) are that: (1) IIEFGs
space, e.g., competitive sports (Wang et al. 2024; Ghimire often have belief spaces (e.g., be-
et al. 2024), where critical strategic plays should be exe- lief about opponent’s cards in |A|
cuted precisely within the continuous action space and at Poker) larger than their abstracted Figure 1: SOTA algo-
specific moments in time (e.g., consider set piece scenar- action spaces (e.g., betting cate- rithms like CFR require ex-
ios in soccer). However, existing regret minimization algo- gories in Poker), and (2) infor- panding over entire action
rithms, e.g., CFR+ (Tammelin 2014) and its variants (Burch, mation asymmetry in our games space (left), whereas our al-
Johanson, and Bowling 2014; Moravčı́k et al. 2017; Brown is only one-sided. This paper in- gorithm only requires ex-
et al. 2020a; Lanctot et al. 2009), and last-iterate online vestigates the potential computa- panding over at most I ac-
learning algorithms, e.g., variants of follow the regularized tional advantages from exploiting tions for P1 (I +1 for P2) at
leader (FTRL) (McMahan 2011; Perolat et al. 2021) and of each decision node (right).
these differences via the followi-
mirror descent (Sokota et al. 2022; Cen, Wei, and Chi 2021; ng insights: (1) At any infostate, P1’s (resp. P2’s) behavioral
Vieillard et al. 2020), are designed for discrete actions and strategy is I (resp. I + 1)-atomic and convexifies the primal
have computational complexities increasing with respect to (resp. dual) value with respect to the public belief (Fig. 1).
the size of the action space A. Thus applying these algo- With this, we can reformulate the convex-concave minimax
rithms to differential games would require either insightful problem of size O(|A|) at each infostate into a nonconvex-
action and time abstraction or enormous compute, neither of nonconcave problem of size O(I 2 ). When I 2 ≪ |A|, and
which are readily available. in particular when |A| = ∞, the latter becomes more ef-
Multi-Agent AI in the Real World Workshop at AAAI 2025 ficient to solve in practice. (2) Due to the one-sidedness of
1
https://github.com/ghimiremukesh/cams/tree/workshop information, the equilibrium behavioral strategies of P1 and
P2 can be solved separately through primal and dual formu- ential games equipped with the Isaacs’ condition, in which
lations of the game, in each of which the opponent plays case the equilibrium strategy is mostly pure along the game
pure best responses. This decoupling avoids recurrent learn- tree, and is atomic on A when mixed.
ing dynamics between the pair of strategies without regular-
ization (Perolat et al. 2021). Table 1: Solver computational complexity with respect to
To summarize, this work has two contributions: (1) fa- action space A and equilibrium error ε
miliarizing the broader AI community with the connections Algorithm Complexity
between computational game theory and differential game CFR variants (Zinkevich et al. 2007; O |A|ε−2
theory, and (2) providing the first algorithm with scalable Lanctot et al. 2009; Brown et al. 2019; to ε-Nash
convergence to the equilibrium of differential games with Tammelin 2014; Johanson et al. 2012)
continuous action spaces and one-sided information.
FTRL variants & MMD (McMahan O ln(|A|) ε ln 1ε
Related Work 2011; Perolat et al. 2021; Sokota et al. to ε-QRE
2p0s games with incomplete information. (Harsanyi 2022)
1967) introduced a Bayesian game framework to solve Descent-ascent algorithms for nonconvex-nonconcave
incomplete-information normal-form games by transform- minimax problems. Existing developments in IIEFGs fo-
ing the game into an imperfect-information one involv- cused on convex-concave minimax problems due to the bi-
ing a chance mechanism. The seminal work of (Aumann, linear form of the expected payoff through the conversion of
Maschler, and Stearns 1995) extended this idea to re- games to their normal forms. This paper, on the other hand,
peated games and established the connection between value investigates the nonconvex-nonconcave minimax problems
convexification and belief manipulation. Within the same to be solved at every infostate when actions are considered
framework, Blackwell’s approachability theorem (Black- continuous. To this end, we use the doubly smoothed gra-
well 1956) naturally becomes the theoretical support for the dient descent ascent method (DS-GDA) which has a worst-
optimal strategy of the uninformed player (P2). Building on case complexity of O(ε−4 ) (Zheng et al. 2023).
top of (Aumann, Maschler, and Stearns 1995), (De Meyer
1996) introduced the concept of a dual game in which the be- 2p0s Differential Games w/ One-Sided Info.
havioral strategy of the uninformed player becomes Markov.
This concept later helped (Cardaliaguet 2007; Ghimire et al. Notations and preliminaries. We use ∆(I) as the sim-
2024) to establish the value existence proof for 2p0s differ- plex in RI , [K] := {1, ..., K} for K ∈ Z+ , a[i] as the ith
ential games with incomplete information. Unlike repeated element of vector a, ∂V as the subgradient of function V ,
games in which belief manipulation occurs only in the first and ⟨·, ·⟩ for vector product. Consider a time-invariant dy-
round of the game, differential games may have multiple namical system that defines the evolution of the joint state
critical collocation points in the joint space of time, state, x ∈ X ⊆ Rdx of P1 and P2 with control inputs u ∈ U and
and public belief where belief manipulations are necessary v ∈ V, respectively:
to achieve Nash equilibrium, depending on the specifications ẋ(t) = f (x(t), u, v). (1)
of system dynamics, payoffs, and state constraints (Ghimire The game starts at t0 ∈ [0, T ] from some initial state
et al. 2024). For this reason, scalable value and strategy ap- x(t0 ) = x0 . The initial belief p0 ∈ ∆(I) is set to nature’s
proximation for 2p0s differential games with incomplete in- distribution. P1 of type i accumulates a running cost li (u, v)
formation has not yet been achieved. during the game and receives a terminal cost gi (x(T )),
Imperfect information extensive-form games. IIEFGs where i ∼ p0 . The goal of P1 is to minimize the expected
represent the more general set of simultaneous or sequen- sum of the running and terminal costs, while P2 aims to
tial multi-agent decision-making problems with finite hori- maximize it. A behavioral strategy pair (η, ζ) is a Nash equi-
zons. Since any 2p0s IIEFG with finite action sets has a librium (NE) of a zero-sum game if and only if
normal-form formulation, a unique Nash equilibrium always Z T Z T
exists in the space of mixed strategies. Significant efforts inf sup Eη,ζ,i∼p0 li dt+gi = sup inf Eη,ζ,i∼p0 li dt+gi ,
η ζ 0 ζ η 0
have been taken to find equilibrium of large IIEFGs such as
(2)
poker (Koller and Megiddo 1992; Billings et al. 2003; Gilpin
and we call this common value the value of the game. A NE
and Sandholm 2006; Gilpin et al. 2007; Sandholm 2010;
is called pure if the strategies (η, ζ) are deterministic, spec-
Brown and Sandholm 2019), with a converging set of algo-
ifying a definite action for every decision point. It is called
rithms that are no-regret, average- or last-iterate converging,
mixed if the strategies are probabilistic, involving random-
and with sublinear or linear convergence rates (Zinkevich
ization over action spaces. When information is one-sided,
et al. 2007; Abernethy, Bartlett, and Hazan 2011; McMahan
η = {ηi }I since P1 prepares one strategy for each possible
2011; Tammelin 2014; Johanson et al. 2012; Lanctot et al.
game type. We introduce the following assumptions under
2009; Brown et al. 2019, 2020b; Perolat et al. 2021; Sokota
which mixed NE exists for Eq. (2) (Cardaliaguet 2009):
et al. 2022; Perolat et al. 2022; Schmid et al. 2023) (see sum-
mary in Tab. 1). These algorithms all have computational 1. U ⊆ Rdu and V ⊆ Rdv are compact and finite-
complexities increasing with |A|, provided that the equilib- dimensional sets.
rium behavioral strategy lies in the interior of the simplex 2. f : X × U × V → X is bounded, continuous, and uni-
∆(|A|). Critically, this assumption does not hold for differ- formly Lipschitz continuous with respect to x.
3. gi : X → R and li : U ×V → R are Lipschitz continuous backward induction for the dual value:
and bounded. Vτ∗ (T, x, p̂) = max{p̂i − gi (x(T ))}
i
4. Isaacs’ condition holds for the Hamiltonian H : X ×
Rdx → R: Vτ (t, x, p̂) = Vexp̂ min max Vτ∗ (t + τ, x + τ f (x, u, v), (7)
∗
v∈V u∈U
H(x, ξ) := min max f (x, u, v)⊤ ξ − li (u, v)

u∈U v∈V p̂ − τ l(u, v)) ,
(3)
= max min f (x, u, v)⊤ ξ − li (u, v).
v∈V u∈U where, l = [l1 , ..., lI ]T . Then at any (t, x, p̂), P2 finds λ =
[λ1 , . . . , λI+1 ] and p̂k ∈ RI for k ∈ [I + 1] such that:
5. Both players have full knowledge about f , {gi }Ii=1 , I+1
{li }Ii=1 , p0 , and the NE of the game. Control inputs and
X
∗
Vτ (t, x, p̂) = λk min max Vτ∗ (t + τ, x + τ f (x, u, v),
states are fully observable and we assume perfect recall. k
v u

Critically, the Isaacs’ condition ensures that 2p0s differential XI+1

games with complete information have pure NE. p̂ − τ l(u, v)) ; λk p̂k = p̂
k
Behavioral strategy of P1. A behavioral strategy pre- (8)
scribes distributions over the action space at every sub- where l = [l1 , ..., lI ]T . P2’s strategy is to compute the min-
game (t, x, p). In order to determine the strategy, it is imax solution v k corresponding to p̂k and chooses v = v k
necessary to first characterize the value function. From with probability λk .
Cardaliaguet (2009), we obtain the following backward in-
duction to approximate the value given a sufficiently fine
Methods
time-discretization τ → 0+ : Reformulation of the primal and dual games. To re-
cap, at any (t, x), P1 computes actions uk and their type-
Vτ (t, x, p) = Vexp min max Vτ (t + τ, x + τ f (x, u, v), p)
u∈U v∈V conditioned probabilities αki := Pr(u = uk |i) such that
PI k
PI
k=1 αki = 1 for i ∈ [I]. Then, λ =

i=1 αki p[i] and
X
+ τ E l(u, v) ; Vτ (T, x, p) = pi gi (x)
i
pk [i] = αki p[i]/λk are both functions of αki . We can now
(4) reformulate (5) as follows:
where Vex is the convexification operator. The behavioral I
strategy of P1 is computed as follows: P1 first finds λ =
X
min max λk V (t + τ, xk , pk ) + τ Ei∼pk [li (uk , v k )]
[λ1 , . . . , λI ] ∈ ∆(I) and pk ∈ ∆(I) for k ∈ [I] such that: {uk },{αki } {v k }
k=1
k
s.t. u ∈ U , x = ODE(x, τ, uk , v k ; f ),
k
v k ∈ V, αki ∈ [0, 1],
X
Vτ (t, x, p) = λk min max Vτ (t + τ, x + τ f (x, u, v), p)
u v I I
k X X αki p[i]
X αki = 1, λk = αki p[i], pk [i] = , ∀i, k ∈ [I].
+ E l(u, v) ; λk pk = p k=1 i=1
λk
k (P1 )
(5) P1 is in general a nonconvex-nonconcave minimax problem
He then chooses uk with Pr(u = uk |i) = λk pk [i]/p[i] if of size (O(I(I + du )), O(Idv )) that needs to be solved at
he is of type i and updates the belief to pk . This is famously all sampled infostates (t, x, p) ∈ [0, T ] × X × ∆(I). The
known as the splitting mechanism in repeated game, and is resultant minimax objective is by definition the convexified
a consequence of the “Cav u” theorem (Aumann, Maschler, value of the primal game.
and Stearns 1995; De Meyer 1996). P2, on the other hand, keeps track of the dual variable
Behavioral strategy of P2. For P2, the idea is to reformu- p̂ ∈ RI instead of the public belief p during the dual game
late the game so that we can compute the value using P2’s and solves the following problem at all sampled infostates
behavioral strategies and P1’s pure best responses. This can (t, x, p̂):
be achieved by introducing the Fenchel conjugate V ∗ of V :
I+1
V ∗ (t0 , x0 , p̂) := max p · p̂ − V (t0 , x0 , p) X
p min max λk V ∗ (t + τ, xk , p̂k − τ l(uk , v k ))
{v k },{λk },{p̂k } {uk }
k=1
n h
= inf sup max p̂i − Eη,ζ gi XTt0 ,x0 ,η,ζ k k
ζ η i∈{1,...,I} s.t. u ∈ U , v ∈ V, xk = ODE(x, τ, uk , v k ; f ), λk ∈ [0, 1],
Z T I+1 I+1
+ li (η(s), ζ(s))ds ,
X X
λk p̂k = p̂, λk = 1, k ∈ [I + 1].
t0
k=1 k=1
(6)
(P2 )
which describes a dual game with complete information in
P2 is in general nonconvex-nonconcave of size (O(I(I +
which P2’s goal is to minimize some worst-case dual payoff.
dv ), O(Idu )).
It is proved that P2’s equilibrium in the dual game starting
from some (t0 , x0 , p̂) is also an equilibrium for the primal Game solver. We propose a continuous-action mixed-
game if p̂ ∈ ∂p V (t0 , x0 , p) (Cardaliaguet 2007). strategy (CAMS) solver for 2p0s differential games with
P2’s strategy can be obtained through the dual game us- one-sided information. Our algorithm performs Bellman
ing a procedure similar to that of P1’s: We first obtain the backup through P1 (resp. P2 ) starting from the terminal
condition in (4) (resp. (8)) at discretized time stamps t ∈ {16, 36, 64, 144}. All algorithms terminate when a thresh-
{T, T − τ, ..., 0} and (x, p) (resp. (x, p̂)) uniformly sampled old of NashConv (see Lanctot et al. (2017) for definition) is
in X × ∆(I) (resp. X × RI ). Specifically, at any t, with met. For conciseness, we only consider solving P1’s strat-
a value approximation model V̂t+1 : X × ∆(I) → R, we egy and thus use P1’s δ in NashConv. We then use Deep-
solve P1 using DS-GDA at N collocation points (x, p) ∈ CFR as a baseline for a Hexner’s game with 4 time-steps,
X × ∆(I) and collect a dataset Dt := {(x(i) , p(i) , Ṽ (i) )}N where T = 1 and τ = 0.25. DeepCFRs were run for 1000
i=1
CFR iterations (resp. 100) with 10 (resp. 5) traversals for
where Ṽ is the numerical approximation of the convexified
|A| = 9 (resp. 16). We compare the computational cost and
value at (t, x(i) , p(i) ) for the minimax problem. Then we fit a the expected action error ε (and average action error at each
model V̂t (x, p) to Dt and go to t − τ . Alg. 1 summarizes the time-step, ε̄t for 4-stage game) from the ground-truth action
solver for the primal game. The dual game solver is similarly of P1. Fig. 3 summarizes the comparisons. For the normal-
defined. form game, all baselines have complexities increasing with
A, while CAMS is invariant. In the 4-stage game, CAMS
Algorithm 1: Continuous Action Mixed Strategy Solver achieves significantly better strategies than DeepCFR, as vi-
(CAMS) sualized in Fig. 4.
Require: τ , V (T, ·, ·), N , minimax solver O Time Complexity Algorithm Iterations
−τ
1: Initialize {V̂t }T
t=0 , D ← ∅ 105
2: S ← sample N states (x, p) ∈ X × ∆(I) 104
104

wall time (s)

3: for t ∈ {T − τ, . . . , 0} do

Iterations
4: for (x, p) ∈ S do 103

5: Append {(t, x, p), O(t, x, p)} to D 102 CFR+ CFR+

MMD 103 MMD
6: end for 101 CFR-BR-Primal
CAMS (ours)
CFR-BR-Primal
CAMS (ours)
7: Fit V̂t to D 100
8: end for 16 36 64 144 16 36 64 144
|A| |A|
(a) (b)
Empirical Validation Action Error Action Error
We introduce Hexner’s homing game (Hexner 1979) that has 100
an analytical Nash equilibrium. We use variants of this game 40
to compare CAMS with baselines (MMD, CFR+, and Deep-
CFR) on solution quality and computational cost. As shown

ε̄t
ε

CFR+ 20
in Fig. 2, it is a two-player game, in which P1’s goal is to MMD DeepCFR (|A| = 16)
10−2 CFR-BR-Primal DeepCFR (|A| = 9)
get closer to the target Θ unknown to P2, while keeping P2 CAMS (ours) CAMS (ours)

away and minimizing running costs. The cost to P1 is the 0

expected value of the total cost: 16 36 64 144 0 0.25 0.5 0.75
Z T |A| t
(c) (d)
J= (u⊤ R1 u − v ⊤ R2 v)dt + [x1 (T ) − Θ]⊤ K1 [x1 (T ) − Θ]
0
Figure 3: Comparisons b/w CAMS and baseline algorithms.
− [x2 (T ) − Θ]⊤ K2 [x2 (T ) − Θ],
(9) CAMS Deep CFR (|A| = 9) Deep CFR (|A| = 16)
where R1 , R2 ≻ 0 and K1 , K2 ⪰ 0 are control and state- 1
P1(Goal-1)
P1(Goal-2)
P2 (Goal-1)
P2 (Goal-2)
1 1

penalty matrices respectively. Due to the quadratic cost and GT

decoupled dynamics, this game can be solved analytically as 0 0 0

done in Hexner (1979).

Comparison on 1- and 4-stage ⋆
−1 −1 −1
−0.5 0 0.5 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
Hexner’s Games. We first use p = 1
X X X

a normal-form Hexner’s game ⋆

with τ = T and a fixed P1 Figure 4: Trajectories using strategies from CAMS and
tr P2
initial state x0 to demonstrate DeepCFR. Markers indicate initial position.
p = 0.5
that IIEFG algorithms suffer Conclusion
from increasing costs along |A|
while CAMS does not. We con- This work highlights the need for a scalable algorithm for
sider CFR+ (Tammelin 2014), Figure 2: Hexner’s game solving incomplete-information differential games which
MMD (Sokota et al. 2022), and with a sample equilib- are structurally similar to imperfect-information games such
a modified CFR-BR (Johanson rium trajectory. P1 starts as poker. We demonstrated that SOTA IIEFG solvers are in-
et al. 2012) (dubbed CFR-BR- to move to its target after tractable when it comes to solving differential games. To
Primal, where we only focus on tr . the authors’ best knowledge, this is the first method to pro-
solving P1’s optimal strategy) as baselines. Each player’s vide tractable solution for incomplete-information differen-
state consists of 2D position and velocity. For base- tial games with continuous action spaces without problem-
lines, we discretize the action sets A1 and A2 with sizes specific abstraction and discretization.
Acknowledgment Ghimire, M.; Zhang, L.; Xu, Z.; and Ren, Y. 2024. State-
This work is partially supported by NSF CNS 2304863, Constrained Zero-Sum Differential Games with One-Sided
CNS 2339774, IIS 2332476, and ONR N00014-23-1-2505. Information. In Salakhutdinov, R.; Kolter, Z.; Heller, K.;
Weller, A.; Oliver, N.; Scarlett, J.; and Berkenkamp, F.,
eds., Proceedings of the 41st International Conference on
References Machine Learning, volume 235 of Proceedings of Machine
Abernethy, J.; Bartlett, P. L.; and Hazan, E. 2011. Black- Learning Research, 15512–15539. PMLR.
well approachability and no-regret learning are equivalent. Gilpin, A.; Hoda, S.; Pena, J.; and Sandholm, T. 2007.
In Proceedings of the 24th Annual Conference on Learning Gradient-based algorithms for finding Nash equilibria in ex-
Theory, 27–46. JMLR Workshop and Conference Proceed- tensive form games. In Internet and Network Economics:
ings. Third International Workshop, WINE 2007, San Diego,
Aumann, R. J.; Maschler, M.; and Stearns, R. E. 1995. Re- CA, USA, December 12-14, 2007. Proceedings 3, 57–69.
peated games with incomplete information. MIT press. Springer.
Billings, D.; Burch, N.; Davidson, A.; Holte, R.; Schaeffer, Gilpin, A.; and Sandholm, T. 2006. Finding equilibria in
J.; Schauenberg, T.; and Szafron, D. 2003. Approximating large sequential games of imperfect information. In Pro-
game-theoretic optimal strategies for full-scale poker. In IJ- ceedings of the 7th ACM conference on Electronic com-
CAI, volume 3, 661. merce, 160–169.
Blackwell, D. 1956. An analog of the minimax theorem for Harsanyi, J. C. 1967. Games with incomplete information
vector payoffs. played by “Bayesian” players, I–III Part I. The basic model.
Brown, N.; Bakhtin, A.; Lerer, A.; and Gong, Q. 2020a. Management science, 14(3): 159–182.
Combining deep reinforcement learning and search for Hexner, G. 1979. A differential game of incomplete infor-
imperfect-information games. Advances in Neural Informa- mation. Journal of Optimization Theory and Applications,
tion Processing Systems, 33: 17057–17069. 28: 213–232.
Brown, N.; Bakhtin, A.; Lerer, A.; and Gong, Q. 2020b. Johanson, M.; Bard, N.; Burch, N.; and Bowling, M. 2012.
Combining deep reinforcement learning and search for Finding optimal abstract strategies in extensive-form games.
imperfect-information games. Advances in Neural Informa- In Proceedings of the AAAI Conference on Artificial Intelli-
tion Processing Systems, 33: 17057–17069. gence, volume 26, 1371–1379.
Brown, N.; Lerer, A.; Gross, S.; and Sandholm, T. 2019. Koller, D.; and Megiddo, N. 1992. The complexity of two-
Deep counterfactual regret minimization. In International person zero-sum games in extensive form. Games and eco-
conference on machine learning, 793–802. PMLR. nomic behavior, 4(4): 528–552.
Brown, N.; and Sandholm, T. 2019. Superhuman AI for mul- Lanctot, M.; Waugh, K.; Zinkevich, M.; and Bowling, M.
tiplayer poker. Science, 365(6456): 885–890. 2009. Monte Carlo sampling for regret minimization in ex-
tensive games. Advances in neural information processing
Burch, N.; Johanson, M.; and Bowling, M. 2014. Solving systems, 22.
imperfect information games using decomposition. In Pro-
ceedings of the AAAI Conference on Artificial Intelligence, Lanctot, M.; Zambaldi, V.; Gruslys, A.; Lazaridou, A.;
volume 28. Tuyls, K.; Pérolat, J.; Silver, D.; and Graepel, T. 2017. A uni-
fied game-theoretic approach to multiagent reinforcement
Cardaliaguet, P. 2007. Differential games with asymmetric learning. Advances in neural information processing sys-
information. SIAM journal on Control and Optimization, tems, 30.
46(3): 816–838.
McMahan, B. 2011. Follow-the-Regularized-Leader and
Cardaliaguet, P. 2009. Numerical approximation and opti- Mirror Descent: Equivalence Theorems and L1 Regulariza-
mal strategies for differential games with lack of information. In Gordon, G.; Dunson, D.; and Dudı́k, M., eds., Pro-
tion on one side. Advances in Dynamic Games and Their ceedings of the Fourteenth International Conference on Ar-
Applications: Analytical and Numerical Developments, 1– tificial Intelligence and Statistics, volume 15 of Proceedings
18. of Machine Learning Research, 525–533. Fort Lauderdale,
Cen, S.; Wei, Y.; and Chi, Y. 2021. Fast policy extragradi- FL, USA: PMLR.
ent methods for competitive games with entropy regulariza- Moravčı́k, M.; Schmid, M.; Burch, N.; Lisỳ, V.; Morrill, D.;
tion. Advances in Neural Information Processing Systems, Bard, N.; Davis, T.; Waugh, K.; Johanson, M.; and Bowling,
34: 27952–27964. M. 2017. Deepstack: Expert-level artificial intelligence in
De Meyer, B. 1996. Repeated games, duality and the central heads-up no-limit poker. Science, 356(6337): 508–513.
limit theorem. Mathematics of Operations Research, 21(1): Perolat, J.; De Vylder, B.; Hennes, D.; Tarassov, E.; Strub,
237–251. F.; de Boer, V.; Muller, P.; Connor, J. T.; Burch, N.; Anthony,
FAIR†, M. F. A. R. D. T.; Bakhtin, A.; Brown, N.; Dinan, E.; T.; et al. 2022. Mastering the game of Stratego with model-
Farina, G.; Flaherty, C.; Fried, D.; Goff, A.; Gray, J.; Hu, H.; free multiagent reinforcement learning. Science, 378(6623):
et al. 2022. Human-level play in the game of Diplomacy by 990–996.
combining language models with strategic reasoning. Sci- Perolat, J.; Munos, R.; Lespiau, J.-B.; Omidshafiei, S.; Row-
ence, 378(6624): 1067–1074. land, M.; Ortega, P.; Burch, N.; Anthony, T.; Balduzzi, D.;
De Vylder, B.; et al. 2021. From poincaré recurrence to
convergence in imperfect information games: Finding equi-
librium via regularization. In International Conference on
Machine Learning, 8525–8535. PMLR.
Sandholm, T. 2010. The state of solving large incomplete-
information games, and application to poker. Ai Magazine,
31(4): 13–32.
Schmid, M.; Moravčı́k, M.; Burch, N.; Kadlec, R.; David-
son, J.; Waugh, K.; Bard, N.; Timbers, F.; Lanctot, M.; Hol-
land, G. Z.; et al. 2023. Student of Games: A unified learning
algorithm for both perfect and imperfect information games.
Science Advances, 9(46): eadg3256.
Silver, D.; Hubert, T.; Schrittwieser, J.; Antonoglou, I.; Lai,
M.; Guez, A.; Lanctot, M.; Sifre, L.; Kumaran, D.; Graepel,
T.; et al. 2017a. Mastering chess and shogi by self-play with
a general reinforcement learning algorithm. arXiv preprint
arXiv:1712.01815.
Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.;
Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton,
A.; et al. 2017b. Mastering the game of go without human
knowledge. nature, 550(7676): 354–359.
Sokota, S.; D’Orazio, R.; Kolter, J. Z.; Loizou, N.; Lanctot,
M.; Mitliagkas, I.; Brown, N.; and Kroer, C. 2022. A uni-
fied approach to reinforcement learning, quantal response
equilibria, and two-player zero-sum games. arXiv preprint
arXiv:2206.05825.
Tammelin, O. 2014. Solving large imperfect information
games using CFR+. arXiv preprint arXiv:1407.5042.
Vieillard, N.; Kozuno, T.; Scherrer, B.; Pietquin, O.; Munos,
R.; and Geist, M. 2020. Leverage the average: an analysis
of kl regularization in reinforcement learning. Advances in
Neural Information Processing Systems, 33: 12163–12174.
Wang, Z.; Veličković, P.; Hennes, D.; Tomašev, N.; Prince,
L.; Kaisers, M.; Bachrach, Y.; Elie, R.; Wenliang, L. K.; Pic-
cinini, F.; et al. 2024. TacticAI: an AI assistant for football
tactics. Nature communications, 15(1): 1906.
Zheng, T.; Zhu, L.; So, A. M.-C.; Blanchet, J.; and Li,
J. 2023. Universal gradient descent ascent method for
nonconvex-nonconcave minimax optimization. Advances in
Neural Information Processing Systems, 36: 54075–54110.
Zinkevich, M.; Johanson, M.; Bowling, M.; and Piccione, C.
2007. Regret minimization in games with incomplete infor-
mation. Advances in neural information processing systems,
20.

OEC One Service Manual SM 6888001-1EN 5
100% (3)
OEC One Service Manual SM 6888001-1EN 5
319 pages
4.3 Adversarial Search
No ratings yet
4.3 Adversarial Search
12 pages
D&D Shop Catalog
No ratings yet
D&D Shop Catalog
12 pages
Book Mathmatical Foundation of Reinforcement Learning Lecture Slides
No ratings yet
Book Mathmatical Foundation of Reinforcement Learning Lecture Slides
524 pages
GSP Schedule of Activities
100% (10)
GSP Schedule of Activities
2 pages
Lecture Note 1 Introduction
No ratings yet
Lecture Note 1 Introduction
8 pages
Bongolesia
100% (5)
Bongolesia
18 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
50 pages
Unit 2
No ratings yet
Unit 2
26 pages
Game Theoretic Decision Making PHD Thesis CMU-CS-23-117
No ratings yet
Game Theoretic Decision Making PHD Thesis CMU-CS-23-117
358 pages
Offline Reinforcement Learning With Instrumental Variables in Confounded Markov Decision Processes
No ratings yet
Offline Reinforcement Learning With Instrumental Variables in Confounded Markov Decision Processes
71 pages
Lecture 11 - Multi Agent RL
No ratings yet
Lecture 11 - Multi Agent RL
36 pages
Actor-Critic Policy Optimization in Partially Observable Multiagent Environments 1810.09026
No ratings yet
Actor-Critic Policy Optimization in Partially Observable Multiagent Environments 1810.09026
28 pages
2110 04638
No ratings yet
2110 04638
32 pages
A Scalable Solver For 2p0s Differential Games With One-Sided Payoff Information and Continuous Actions, States, and Time
No ratings yet
A Scalable Solver For 2p0s Differential Games With One-Sided Payoff Information and Continuous Actions, States, and Time
32 pages
Ubc 2015 May Namvargharehshiran Omid
No ratings yet
Ubc 2015 May Namvargharehshiran Omid
238 pages
Game Theory
No ratings yet
Game Theory
34 pages
Pursuit-Evasion Games Chapter 22
No ratings yet
Pursuit-Evasion Games Chapter 22
89 pages
Multiplayer Residual Advantage Learningwith General Function Approximation
No ratings yet
Multiplayer Residual Advantage Learningwith General Function Approximation
15 pages
Artificial Intelligence - Adversarial Search - Tpoint Tech
No ratings yet
Artificial Intelligence - Adversarial Search - Tpoint Tech
3 pages
Optimistic Linear Support and Successor Features As A Basis For Optimal Policy Transfer (Alegre, 2022)
No ratings yet
Optimistic Linear Support and Successor Features As A Basis For Optimal Policy Transfer (Alegre, 2022)
20 pages
C R L S A C - C Z - S G ?: AN Einforcement Earning Olve Symmetric Ombinatorial Ontinuous ERO UM Ames
No ratings yet
C R L S A C - C Z - S G ?: AN Einforcement Earning Olve Symmetric Ombinatorial Ontinuous ERO UM Ames
28 pages
Unit 2
No ratings yet
Unit 2
65 pages
Path Integral Control
No ratings yet
Path Integral Control
8 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
65 pages
Dynamic and Adaptive Games
No ratings yet
Dynamic and Adaptive Games
77 pages
Exploiting Structure in Offline Multi-Agent RL - The Benefits of Low Interaction Rank
No ratings yet
Exploiting Structure in Offline Multi-Agent RL - The Benefits of Low Interaction Rank
42 pages
Maximum-Entropy Multi-Agent Dynamic Games
No ratings yet
Maximum-Entropy Multi-Agent Dynamic Games
15 pages
3050 A Unified Approach To Reinforc
No ratings yet
3050 A Unified Approach To Reinforc
41 pages
The Isaacs Equation For Differential Games, Totally Optimal Fields of Trajectories and Related Problems
No ratings yet
The Isaacs Equation For Differential Games, Totally Optimal Fields of Trajectories and Related Problems
30 pages
NIPS 2016 Generative Adversarial Imitation Learning Paper
No ratings yet
NIPS 2016 Generative Adversarial Imitation Learning Paper
9 pages
NeurIPS 2021 Deep Synoptic Monte Carlo Planning in Reconnaissance Blind Chess Paper
No ratings yet
NeurIPS 2021 Deep Synoptic Monte Carlo Planning in Reconnaissance Blind Chess Paper
14 pages
Toward Generality: Building Better Counterfactual Regret Minimization For Imperfect Information Games
No ratings yet
Toward Generality: Building Better Counterfactual Regret Minimization For Imperfect Information Games
75 pages
Deep Reinforcement Learning From Self-Play in Imperfect-Information Games
No ratings yet
Deep Reinforcement Learning From Self-Play in Imperfect-Information Games
10 pages
Lecture 06
No ratings yet
Lecture 06
98 pages
Follow Up Paper
No ratings yet
Follow Up Paper
18 pages
Skew-Fit: State-Covering Self-Supervised Reinforcement Learning
No ratings yet
Skew-Fit: State-Covering Self-Supervised Reinforcement Learning
19 pages
Gardner 23 A
No ratings yet
Gardner 23 A
14 pages
Tembine Book
No ratings yet
Tembine Book
30 pages
Action Robust Reinforcement Learning and Applications in Continuous Control
No ratings yet
Action Robust Reinforcement Learning and Applications in Continuous Control
10 pages
Stochastic Dynamic Games in Belief Space: Wilko Schwarting, Alyssa Pierson, Sertac Karaman, and Daniela Rus
No ratings yet
Stochastic Dynamic Games in Belief Space: Wilko Schwarting, Alyssa Pierson, Sertac Karaman, and Daniela Rus
16 pages
3rd Quarter Summative Test in MAPEH 9
89% (18)
3rd Quarter Summative Test in MAPEH 9
3 pages
IV. Engine Primary Tightening Torque Table
No ratings yet
IV. Engine Primary Tightening Torque Table
5 pages
Xu 2023
No ratings yet
Xu 2023
5 pages
Ai Unit 2
No ratings yet
Ai Unit 2
88 pages
Robust Control Design For Zero-Sum Differential Games Problem Based On Off-Policy Reinforcement Learning Technique
No ratings yet
Robust Control Design For Zero-Sum Differential Games Problem Based On Off-Policy Reinforcement Learning Technique
9 pages
Unit II Full Notes - Artifical Intelligence Unit II Full Notes - Artifical Intelligence
No ratings yet
Unit II Full Notes - Artifical Intelligence Unit II Full Notes - Artifical Intelligence
65 pages
Unit 2 L4
No ratings yet
Unit 2 L4
67 pages
Depth Limited Solving
No ratings yet
Depth Limited Solving
14 pages
Ai-Notes Exact Section C
No ratings yet
Ai-Notes Exact Section C
26 pages
Unit 2 Studcou
No ratings yet
Unit 2 Studcou
64 pages
Games, The Mini-Max Algorithm
No ratings yet
Games, The Mini-Max Algorithm
160 pages
Preference-CFR: Beyond Nash Equilibrium For Better Game Strategies
No ratings yet
Preference-CFR: Beyond Nash Equilibrium For Better Game Strategies
16 pages
AI Unit 2 Adversarial Search
No ratings yet
AI Unit 2 Adversarial Search
51 pages
Chap-4 Adversarial Search
No ratings yet
Chap-4 Adversarial Search
43 pages
MFG Ambiguity Aversion
No ratings yet
MFG Ambiguity Aversion
22 pages
Valuetools Preprint
No ratings yet
Valuetools Preprint
10 pages
1.1 Discounted (Infinite-Horizon) Markov Decision Processes
No ratings yet
1.1 Discounted (Infinite-Horizon) Markov Decision Processes
26 pages
Ai Unit 4
No ratings yet
Ai Unit 4
39 pages
Aaai2013 Sandholm Poker Ai 01
No ratings yet
Aaai2013 Sandholm Poker Ai 01
90 pages
Notes On Non-Cooperative Game Theory Econ 8103, Spring 2009, Aldo Rustichini
No ratings yet
Notes On Non-Cooperative Game Theory Econ 8103, Spring 2009, Aldo Rustichini
30 pages
07 Deep Reinforcement Learning (John)
No ratings yet
07 Deep Reinforcement Learning (John)
52 pages
Reinforcement Learning With Poker
No ratings yet
Reinforcement Learning With Poker
10 pages
Solution of Differential Games
No ratings yet
Solution of Differential Games
6 pages
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
No ratings yet
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
30 pages
Hexyz Force (USA) (ULJM-10506) CWCheats
No ratings yet
Hexyz Force (USA) (ULJM-10506) CWCheats
5 pages
Week 5&6-National Artist of The Philippines - Traditional and Collaborative Works of National Artsits
No ratings yet
Week 5&6-National Artist of The Philippines - Traditional and Collaborative Works of National Artsits
57 pages
Joey 2
No ratings yet
Joey 2
6 pages
Clothes Exercises
No ratings yet
Clothes Exercises
6 pages
X32
No ratings yet
X32
3 pages
2017 Yearbook
No ratings yet
2017 Yearbook
81 pages
Lets Fall in Love For The Night Chords
No ratings yet
Lets Fall in Love For The Night Chords
2 pages
XYZ1270144258
No ratings yet
XYZ1270144258
2 pages
Computer Hardware, Input and Ouput Devices
No ratings yet
Computer Hardware, Input and Ouput Devices
14 pages
BPC Macros
No ratings yet
BPC Macros
4 pages
Shadows RPG - (Zak Arntson)
No ratings yet
Shadows RPG - (Zak Arntson)
5 pages
Desktop Publishing Assignment 1 - Greeting Card
No ratings yet
Desktop Publishing Assignment 1 - Greeting Card
4 pages
Toronto Touristic Spots PDF
No ratings yet
Toronto Touristic Spots PDF
6 pages
Good Vibrations
No ratings yet
Good Vibrations
3 pages
Your Client Fulfillment Baseline
No ratings yet
Your Client Fulfillment Baseline
5 pages
Arts 4TH Quarter Module
No ratings yet
Arts 4TH Quarter Module
15 pages
Poe Guide 1-1
No ratings yet
Poe Guide 1-1
40 pages
Catalogo DVR5004T SA
No ratings yet
Catalogo DVR5004T SA
2 pages
12760/charminar Exp Sleeper Class (SL) : WL WL
No ratings yet
12760/charminar Exp Sleeper Class (SL) : WL WL
2 pages
Jadwal
No ratings yet
Jadwal
7 pages
Summary of Pinterest: Electronic Commerce
No ratings yet
Summary of Pinterest: Electronic Commerce
3 pages
FLOWCHART
No ratings yet
FLOWCHART
2 pages
Conditioner
No ratings yet
Conditioner
3 pages
KH01212009
No ratings yet
KH01212009
1 page
Artificial Intelligence Video Games: Fundamentals and Applications
From Everand
Artificial Intelligence Video Games: Fundamentals and Applications
Fouad Sabry
No ratings yet

Two-Player Zero-Sum Differential Games With One-Sided Information

Uploaded by

Two-Player Zero-Sum Differential Games With One-Sided Information

Uploaded by

Two-Player Zero-Sum Differential Games with One-Sided Information

Mukesh Ghimire1 , Zhe Xu1 , Yi Ren1

Abstract As a step towards addressing this challenge, our study fo-

Critically, the Isaacs’ condition ensures that 2p0s differential  XI+1

wall time (s)

5: Append {(t, x, p), O(t, x, p)} to D 102 CFR+ CFR+

away and minimizing running costs. The cost to P1 is the 0

penalty matrices respectively. Due to the quadratic cost and GT

decoupled dynamics, this game can be solved analytically as 0 0 0

done in Hexner (1979).

a normal-form Hexner’s game ⋆

You might also like

Critically, the Isaacs’ condition ensures that 2p0s differential XI+1