Randomness For Free: A, B C A
Randomness For Free: A, B C A
Abstract
We consider two-player zero-sum games on finite-state graphs. These games can be
classified on the basis of the information of the players and on the mode of interaction
between them. On the basis of information the classification is as follows: (a) partial-
observation (both players have partial view of the game); (b) one-sided complete-
observation (one player has complete observation); and (c) complete-observation (both
players have complete view of the game). On the basis of mode of interaction we
have the following classification: (a) concurrent (players interact simultaneously); and
(b) turn-based (players interact in turn). The two sources of randomness in these games
are randomness in the transition function and randomness in the strategies. In general,
randomized strategies are more powerful than deterministic strategies, and probabilistic
transitions give more general classes of games. We present a complete characterization
for the classes of games where randomness is not helpful in: (a) the transition function
(probabilistic transitions can be simulated by deterministic transitions); and (b) strate-
gies (pure strategies are as powerful as randomized strategies). As a consequence of
our characterization we obtain new undecidability results for these games.
1. Introduction
✩ A preliminary version of this paper appeared in the Proceedings of the 35th International Symposium
on Mathematical Foundations of Computer Science (MFCS), Lecture Notes in Computer Science 6281,
Springer, 2010, pp. 246-257.
✩✩ This research was partly supported by Austrian Science Fund (FWF) Grant No P 23499- N23, FWF NFN
Grant No S11407-N23 and S11402-N23 (RiSE), ERC Start grant (279307: Graph Games), Microsoft faculty
fellows award, the ERC Advanced Grant QUAREM (267989: Quantitative Reactive Modeling), European
project Cassting (FP7-601148), European project COMBEST, and the European Network of Excellence
ArtistDesign.
∗ Corresponding author: Laurent Doyen, LSV, CNRS UMR 8643 & ENS Cachan, 61 avenue du Président
2
randomness in transition function may be essential for modeling appropriate stochastic
reactive systems, but the analysis can focus on the deterministic subclass); (b) if for a
class of games it can be shown that randomness is for free in strategies, then all future
works related to correctness results can focus on the simpler class of pure strategies,
and the results would follow for the more general class of randomized strategies; and
(c) the characterization of randomness for free will allow hardness results obtained
for the more general class of games (such as games with randomness in the transition
function) to be carried over to simpler class of games (such as games with deterministic
transitions).
Contribution. The contributions of this paper are as follows:
1. Randomness for free in the transition function. We show that randomness in the
transition function can be obtained for free for complete-observation concurrent
games (and any class that subsumes complete-observation concurrent games)
and for one-sided complete-observation turn-based games (and any class that
subsumes this class). The reduction is polynomial for complete-observation con-
current games, and exponential for one-sided complete-observation turn-based
games. It is known that for complete-observation turn-based games, a probabilis-
tic transition function cannot be simulated by a deterministic transition function
(see discussion in Section 3.4 for details), and thus we present a complete char-
acterization when randomness can be obtained for free in the transition function.
2. Randomness for free in the strategies. We show that randomness in strate-
gies is free for complete-observation turn-based games, and for 1-player partial-
observation games (POMDPs). For all other classes of games randomized strate-
gies are more powerful than pure strategies. It follows from a result of Mar-
tin [20] that for 1-player complete-observation games with probabilistic tran-
sitions (MDPs) pure strategies are as powerful as randomized strategies. We
present a generalization of this result to the case of POMDPs. Our proof is totally
different from Martin’s proof and based on a new derandomization technique of
randomized strategies.
3. Concurrency for free in games. We show that concurrency is obtained for free
with partial-observation, both for one-sided complete-observation games as well
as for general partial-observation games (see Section 3.5). It follows that for
partial-observation games, future research can focus on the simpler model of
turn-based games, and concurrency does not add anything in the presence of
partial observation.
4. New undecidability results. As a consequence of our characterization of random-
ness for free, we obtain new undecidability results. In particular, using our re-
sults and results of Baier et al. [2] we show for one-sided complete-observation
deterministic games, the problems of almost-sure winning for coBüchi objec-
tives and positive winning for Büchi objectives are undecidable. Thus we obtain
the first undecidability result for qualitative analysis (almost-sure and positive
winning) of one-sided complete-observation deterministic games with ω-regular
objectives.
3
Applications of our results. While we already show that our results allow us to obtain
new undecidability results, they have also been used to simplify proofs and analysis
of POMDPs and partial-observation games [6, 7, 8, 9, 16] (e.g. [7, Lemma 21] and [9,
Claim 2. Lemma 5.1]) as well as extended to other settings such as probabilistic au-
tomata [17].
2. Definitions
We sometimes relax the assumption that games have a finite state space, and we
allow the set S of states to be countable. This is useful in the context of game solving,
where we get a countable state space after fixing an arbitrary strategy for one of the
players in a game. In our results we explicitly mention when we consider countable
state space and when we consider finite state space.
Special cases. We consider the following special cases of partial-observation concur-
rent games, obtained either by restrictions in the observations, the mode of selection of
moves, the type of transition function, or the number of players:
• (Observation restriction). The games with one-sided complete-observation are
the special case of games where O1 = {{s} | s ∈ S} (i.e., player 1 has complete
observation) or O2 = {{s} | s ∈ S} (player 2 has complete observation). The
games of complete-observation are the special case of games where O1 = O2 =
4
{{s} | s ∈ S}, i.e., every state is visible to each player and hence both players
have complete observation. If a player has complete observation we omit the
corresponding observation sets from the description of the game.
• (Mode of interaction restriction). A turn-based state is a state s such that either
(i) δ(s, a, b) = δ(s, a, b′ ) for all a ∈ A1 and all b, b′ ∈ A2 (i.e, the action of
player 1 determines the transition function and hence it can be interpreted as
player 1’s turn to play), we refer to s as a player-1 state, and we use the notation
δ(s, a, −); or (ii) δ(s, a, b) = δ(s, a′ , b) for all a, a′ ∈ A1 and all b ∈ A2 , we
refer to s as a player-2 state, and we use the notation δ(s, −, b). A state s which
is both a player-1 state and a player-2 state is called a probabilistic state (i.e.,
the transition function is independent of the actions of the players). We write
δ(s, −, −) to denote the transition function in s. The turn-based games are the
special case of games where all states are turn-based.
• (Transition function restriction). The deterministic games are the special case
of games where for all states s ∈ S and actions a ∈ A1 and b ∈ A2 , there
exists a state s′ ∈ S such that δ(s, a, b)(s′ ) = 1. We refer to such states s as
deterministic states. For deterministic games, it is often convenient to assume
that δ : S × A1 × A2 → S.
• (Player restriction). The 1 1/2-player games, also called partially observable
Markov decision processes (or POMDPs), are the special case of games where
the action set A1 or A2 is a singleton. Note that 1 1/2-player games are turn-based.
Games without player restriction are sometimes called 2 1/2-player games.
The 1 1/2-player games of complete-observation are Markov decision processes (or
MDPs), and MDPs with all states deterministic can be viewed as graphs (and are often
called 1-player games).
Classes of game graphs. We use the following abbreviations (Table 1a): we write
Pa for partial-observation, Os for one-sided complete-observation, Co for complete-
observation, C for concurrent, and T for turn-based. For example, CoC will denote
complete-observation concurrent games, and OsT will denote one-sided complete-
observation turn-based games. For C ∈ {Pa, Os, Co} × {C, T}, we denote by GC
the set of all C games. Note the following strict inclusions (see also Figure 2): partial
observation (Pa) is more general than one-sided complete-observation (Os) and Os is
more general than complete-observation (Co), and concurrent (C) is more general than
turn-based (T). We will denote by GD the set of all games with deterministic transition
function. The results we establish in this article are summarized in Figure 3.
Plays. In concurrent games of partial observation, in each turn, player 1 chooses
an action a ∈ A1 , player 2 chooses an action b ∈ A2 , and the successor of the
current state s is chosen according to the probabilistic transition function δ(s, a, b).
A play in a game G is an infinite sequence ρ = s0 a0 b0 s1 a1 b1 s2 . . . such that
δ(si , ai , bi , si+1 ) > 0 for all i ≥ 0. The prefix up to sn of the play ρ is denoted by
ρ(n). The set of plays in G is denoted Plays(G), and the set of corresponding fi-
nite prefixes (or histories) is denoted Prefs(G). The observation sequence of ρ for
5
Pa partial observation ΣG all player-1 strategies
Os one-sided complete observation ΣO
G observation-based pl.-1 strategies
Co complete observation ΣP
G pure player-1 strategies
C concurrent ΠG all player-2 strategies
T turn-based ΠO
G observation-based pl.-2 strategies
D deterministic transition function ΠP
G pure player-2 strategies
(a) Classes of games (b) Classes of strategies in game G
Table 1: Abbreviations.
6
(−, −)
(a1 , −)
s2 s3
(−, b1 )
(a2 , −)
s1
(−, −)
(a2 , −)
(−, b2 ) (−, −)
s′2 s′3 s4
(a1 , −)
o1 o2 o3 o4
hh1iiG
val (ϕ)(s) = sup inf Prσ,π
s (ϕ).
σ∈ΣO π∈ΠO
G
G
Example 1 ([10]). Consider the game with one-sided complete observation (player 2
has complete information) shown in Figure 1. Consider the Büchi objective de-
fined by the state s4 (i.e., state s4 has priority 0 and other states have prior-
ity 1). Because player 1 has partial observation (given by the partition O1 =
{{s1 }, {s2 , s′2 }, {s3 , s′3 }, {s4 }}), she cannot distinguish between s2 and s′2 and there-
fore has to play the same actions with same probabilities in s2 and s′2 (while it would be
easy to win by playing a2 in s2 and a1 in s′2 , this is not possible). In fact, player 1 can-
not win using a pure observation-based strategy. However, playing a1 and a2 uniformly
at random in all states is almost-sure winning. Every time the game visits observation
o2 , for any strategy of player 2, the game visits s3 and s′3 with probability 12 , and hence
7
Pa - partial observation
C - concurrent
Os - one-sided complete observation
T - turn-based
Co - complete observation
Th. 5
Th. 4
also reaches s4 with probability 12 . It follows that against all player-2 strategies the
play eventually reaches s4 with probability 1, and then stays there.
Theorem 1 ([20]). Let G be a CoT stochastic game (with countable state space S)
with initial state s and an objective ϕ ⊆ S ω . Then the following equalities hold:
σ,π
hh1iiG G
val (ϕ)(s) = hh2iival (ϕ)(s) = supσ∈ΣO P inf π∈ΠO Prs
G ∩ΣG G
(ϕ).
8
PaC ≡ PaT
Th. 6
OsC ≡ OsT Concurrency for free
CoC
Th. 4 & Th. 5
Randomness for free
CoT
turn-based) games, randomness in the transition function cannot be obtained for free,
and conclude with the concurrency for free result that OsT and PaT games can simu-
late OsC and PaC games respectively.
A reduction from a class G of games to a class G ′ is a mapping that, from a game
G ∈ G and an objective ϕ in G, returns a game G′ ∈ G ′ and an objective ϕ′ in G′ , and
such that the state space S of G is (injectively) mapped to the state space S ′ of G′ . In
all our reductions we have S ⊆ S ′ , and thus the state-space mapping is the identity (on
S). The mapping of objectives in our reductions is such that ϕ is the projection of ϕ′
on S ω . It follows that when ϕ is a parity objective defined with at most d priorities,
then so is ϕ′ (and in the sequel, we omit the definition of the priority function for ϕ′ ),
and when ϕ is an objective in the k-th level of the Borel hierarchy, then so is ϕ′ .
All our reductions are local: they consist of a gadget construction and replacement
locally at every state. Additional properties of interest for reductions are as follows:
• A reduction is almost-sure-preserving (resp., positive-preserving), if for all states
s ∈ S in G: player 1 is almost-sure winning (resp., positive winning) in G from s
if and only if player 1 is almost-sure winning (resp., positive winning) in G′
from s.
′
• A reduction is value-preserving if hh1iiG G ′
val (ϕ)(s) = hh1iival (ϕ )(s) for all s ∈ S,
and threshold-preserving if for all η ∈ R, all states s ∈ S, and all ⊲⊳ ∈ {>, ≥}:
there exists an observation-based strategy σ ∈ ΣO G for player 1 in G such that
σ,π
∀π ∈ ΠO G : Prs (ϕ) ⊲⊳ η if and only if there exists an observation-based
σ′ ,π ′
strategy σ ′ ∈ ΣO ′ ′ O
G′ for player 1 in G such that ∀π ∈ ΠG′ : Prs (ϕ′ ) ⊲⊳ η.
All reductions presented in this paper are threshold-preserving. Note that threshold-
preserving implies value-preserving, almost-sure-preserving (⊲⊳ = ≥, η = 1), and
positive-preserving (⊲⊳ = >, η = 0).
A reduction restriction-preserving if when G is one-sided complete-observation,
then so is G′ , when G is complete-observation, then so is G′ , and when G is turn-
based, then so is G′ . We say that a reduction is computable in polynomial time (resp.,
in exponential time) if the game G′ can be constructed in polynomial time (resp., in
exponential time) from G (assuming a reasonable encoding of games, such as explicit
9
lists of binary-encoded states, observations, actions, and transitions, and rational prob-
abilities encoded in binary).
An overview of the class of games for which randomness is for free in the transition
function (which we establish in this section) is given in Figure 3.
10
1 s1
3
(a1 , b1 )
s s′
2 s2
3
SA SP SA
1
Figure 4: Example of interaction separation for δ(s, a1 , b1 )(s1 ) = 3 and
δ(s, a1 , b1 )(s2 ) = 23 .
11
1
2
1 s′1 1 s′1
4 2
s1 s1 1
2
3 1
4 s′′1 2 s′′1
1
1
2
1 s′2 1 s′2
3 2
s2 s2 1
2
2 1
1
3 s′′2 2
2
1
s′′2
2
Figure 5: An example showing why the uniform-binary reduction cannot be used with
partial observation.
Theorem 3. There exists a reduction from the class of PaC games to the class of
uniform-n-ary PaC games (where 1/n is the greatest common divisor of all proba-
bilities in the original game) such that this reduction is
1. threshold-preserving,
2. restriction-preserving, and
3. computable in exponential time (and in polynomial time for CoC games [28]).
Note that the above reduction is worst-case exponential (because so can be the
inverse of the greatest common divisor of the transition probabilities). This is necessary
to have the property that all probabilistic states in the game have the same number
of successors. This property is crucial because it determines the number of actions
available to player 1 in the reductions presented in Section 3.2 and 3.3, and the number
of available actions should not differ in states that have the same observation.
12
1 1
2 2 (a1 , b1 ) (a1 , b2 )
s1 s∗ s2 s1 s∗ s2
(a2 , b2 ) (a2 , b1 )
1. threshold-preserving, and
2. computable in polynomial time if a = Co, and in exponential time if a = Pa or
a = Os.
Proof. To prove the result we show that a uniform-n-ary probabilistic state can be
simulated by a CoC deterministic gadget. For simplicity we present the details for the
case when n = 2, and the gadget for the general case is presented later. Our reduction
is as follows: we consider a uniform-binary CoC game such that there is only one
probabilistic state, and reduce it to a CoC deterministic game. For uniform-binary
CoC games with multiple probabilistic states the reduction can be applied to each
state one at a time and we would obtain the desired reduction from uniform-binary
CoC games to CoC deterministic games. It is easy to see that the reduction can be
computed in polynomial time from uniform-n-ary games. The complexity result (item
(2) of the theorem) then follows from Theorem 2 and Theorem 3.
The reduction is illustrated in Figure 6 and is defined as follows. Consider a
uniform-binary CoC game G with a single probabilistic state s∗ with two successors
s1 and s2 . Construct the CoC deterministic game G′ obtained from G by transform-
ing the state s∗ to a concurrent deterministic state as follows: the actions available
for player 1 at s∗ are a1 and a2 , and the actions available for player 2 at s∗ are b1
and b2 ; the transition function is as follows: δ(s∗ , a1 , b1 ) = δ(s∗ , a2 , b2 ) = s1 and
δ(s∗ , a1 , b2 ) = δ(s∗ , a2 , b1 ) = s2 . Note that the state space of G′ is the same as in
G, thus ϕ′ = ϕ. Then for all objectives ϕ, we show that the reduction is threshold-
preserving as follows.
13
and ⊲⊳ ∈ {>, ≥}, and consider the strategy σ for player 1 in G that plays like σ ′
for all histories in G. Assume towards contradiction that against σ there exists
σ,π
a strategy π ∈ ΠO G such that ¬Prs (ϕ) ⊲⊳ η. Then consider the strategy π in
′
′
G that copies the strategy π for all histories other than when the current state
is s∗ , and if the current state is s∗ , then the strategy π ′ plays the actions b1 and
b2 uniformly with probability 21 . Given the strategy π ′ in G′ , if the current state
is s∗ , then for any probability distribution over player 1’s actions a1 and a2 , the
successor states are s1 and s2 with probability 21 (i.e., it plays exactly the role of
′ ′
σ′ ,π ′
state s∗ in G). It follows that Prsσ ,π (ϕ) = Prσ,π s (ϕ) and thus ¬Prs (ϕ) ⊲⊳ η,
in contradiction with the assumption on σ ′ . Therefore, such a strategy π cannot
exist, and we have Prσ,π O
s (ϕ) ⊲⊳ η for all π ∈ ΠG , which concludes the proof that
the reduction is threshold-preserving.
Gadget for uniform-n-ary probability reduction. We now show how to simulate a prob-
abilistic state s∗ , with n successors s0 , s1 , . . . , sn−1 such that the transition probability
is 1/n to each of the successors, by a concurrent deterministic state. In the concurrent
deterministic state s∗ there are n actions a0 , a1 , . . . , an−1 available for player 1 and n
actions b0 , b1 , . . . , bn−1 available for player 2. The transition function is as follows:
for 0 ≤ i < n and 0 ≤ j < n we have δ(s∗ , ai , bj ) = s(i+j) mod n . Intuitively,
the transition function matrix is obtained as follows: the first row is filled with states
s0 , s1 , . . . , sn−1 , and from a row i, the row i + 1 is obtained by moving the state of
the first column of row i to the last column in row i + 1 and left-shifting by one po-
sition all the other states; the construction is illustrated on an example with n = 4
successors in (1). The construction ensures that in every row and every column each
state s0 , s1 , . . . , sn−1 appears exactly once. It follows that if player 1 plays all actions
uniformly at random, then against any probability distribution of player 2 the succes-
sor states are s0 , s1 , . . . , sn−1 with probability 1/n each; and a similar result holds if
player 2 plays all actions uniformly at random. The correctness of the reduction for
uniform-n-ary probabilistic state is then exactly as for the case of n = 2.
s0 s1 s2 s3
s1 s2 s3 s0
s2 s3 s0 s1
(1)
s3 s0 s1 s2
The desired result follows.
14
player can unilaterally decide to simulate the probabilistic state, the value and proper-
ties of strategies of the game are preserved.
Theorem 5. Let a ∈ {Pa, Os, Co} and b ∈ {C, T}, and let a′ = Os if a = Co, and
a′ = a otherwise. Let C = ab and C ′ = a′ b. There exists a reduction from the class of
games GC to the class of games GC ′ ∩ GD (thus with deterministic transition function)
such that this reduction is
1. threshold-preserving, and
2. computable in polynomial time if a = Co, and in exponential time if a = Pa or
a = Os.
Proof. First, we present the proof for a 6= Pa, assuming that player 2 has complete
observation. A similar construction where player-1 instead of player-2 has complete
observation is obtained symmetrically. Let G = hSA ∪ SP , A1 , A2 , δ, O1 i and assume
w.l.o.g. (according to Theorem 2 and Theorem 3) that G satisfies interaction separation
(i.e., states in SA are deterministic states, and SP are probabilistic states) and G is
uniform-n-ary, i.e. all probabilities are equal to n1 . For each probabilistic state s ∈ SP ,
let Succ(s) = hs′0 , . . . , s′n−1 i be the n-tuple of states such that δ(s, −, −)(s′i ) = n1 for
each 1 ≤ i ≤ n.
We present a reduction that replaces the probabilistic states in G by a gadget
with player-1 and player-2 turn-based states. From G, we construct the one-sided
complete-observation game G′ where player-2 has complete observation. The game
G′ = hS ′ , A′1 , A′2 , δ ′ , O1′ i is defined as follows: S ′ = S ∪ (S × [n]) ∪ {sink},
A′1 = A1 ∪ [n], A′2 = A2 ∪ [n], O1′ = {o ∪ (o × [n]) | o ∈ O1 }, and δ ′ is ob-
tained from δ by applying the following transformation for each state s ∈ S:
Note that turn-based states in G remain turn-based in G′ and the states (s, i) are
player-1 states with the same observation as s. As usual, the objective ϕ′ is defined as
the set of plays in G′ whose projection on S ω belongs to ϕ.
Intuitively, each player in G′ has the possibility to ensure exact simulation of the
probabilistic states of G by playing actions in [n] uniformly at random. For instance,
if player 1 does so, then irrespective of the (possibly randomized) choice of player 2
among the states (s, 1), . . . , (s, n), the states in Succ(s) are reached with probability
1/n, as in G. The same property holds if player 2 plays the actions in [n] uniformly
at random, no matter what player 1 does. Therefore, by arguments similar to the proof
of Theorem 4, player 1 can ensure the objective ϕ′ in G′ is satisfied with the same
probability as ϕ in G, against any strategy of player 2, and the reduction is threshold-
preserving.
15
0
s′0 (s, 0)
1 1, 2
3 0 2 s′0
s s 1
(s, 1)
2
3
2 1 0, 1 s′1
s′1 (s, 2)
0, 2
Figure 7: For the probabilistic state s (on the left), we have Succ(s) = hs′0 , s′1 , s′1 i and
n = 3 is the gcd of the probabilities denominators. Therefore, we apply the reduction
of Theorem 5 to obtain the turn-based game on the right, where s is a player-2 state.
The reduction can be easily adapted to the case a = Pa of games with partial in-
formation for both players. Since the construction of G′ is polynomial, the complexity
result (item (2) of the theorem) follows from Theorem 2 and Theorem 3.
16
2 1/2-player 1 1/2-player
complete one-sided partial MDP POMDP
turn-based not (Rmk. 1) free (Th. 5) free (Th. 5) not (Rmk. 1) not (Rmk. 1)
concurrent free (Th. 4) free (Th. 4) free (Th. 4) (NA) (NA)
Table 2: When randomness is for free in the transition function. In particular, proba-
bilities can be eliminated in all classes of 2-player games except complete-observation
turn-based games. In the table, Rmk. 1 refers to Remark 1, Th. 5 refers to Theorem 5,
and Th. 4 refers to Theorem 4.
Theorem 6. There exists a reduction from OsC games to OsT games, and from PaC
games to PaT games, such that these reductions are
1. threshold-preserving, and
2. computable in polynomial time.
Proof. We present the reduction from OsC games to OsT games, for the case where
player 1 has complete information. The reduction for one-sided games where player 2
has complete information is symmetric. Finally, the reduction from PaC games to PaT
games is obtained analogously.
Let G = hS, A1 , A2 , δ, O2 i be a OsC game where player 1 has complete informa-
tion, and we construct a OsT game G′ = hS ′ , A1 , A2 , δ ′ , O1′ i as follows:
1. S ′ = S ∪ (S × A1 ),
2. O2′ = {o ∪ (o × A1 ) | o ∈ O2 }, and
3. δ ′ is defined as follows, for each state s ∈ S and actions a ∈ A1 , b ∈ A2 :
δ ′ (s, a, −) = (s, a) and δ ′ ((s, a), −, b) = δ(s, a, b).
Hence the transition function δ ′ lets player 1 play first an action a, then player 2
plays an action b, and the successor state of s is chosen according to the tran-
sition relation δ(s, a, b) from the original game. As usual, the objective ϕ′ =
{s0 (s0 , a0 )s1 (s1 , a1 ) · · · | s0 s1 · · · ∈ ϕ ∧ ∀i ≥ 0 : ai ∈ A1 } in G′ requires that
the projection of a play on S ω satisfies ϕ. Since player 1 plays first in G′ , player 1
can achieve the objective ϕ′ in G′ with at most the same probability as for ϕ in G, and
since for all s ∈ S and actions a ∈ A1 , the states s and (s, a) are indistinguishable for
player 2, player 2 does not know the last action chosen by player 1 and therefore does
not gain any advantage in playing after player 1 rather than concurrently. Therefore the
reduction is threshold-preserving and since it is computable in polynomial time, the
result follows.
17
Role of concurrency in complete-observation games. We have shown that concur-
rency can be obtained for free in partial-observation games (OsT and PaT games).
In contrast, for complete-observation games, the value is irrational in general for con-
current games with deterministic transitions (CoC deterministic games) [11], while
the value is always rational in turn-based stochastic games with rational probabilities
(CoT stochastic games) [12]. This rules out any value-preserving reduction of CoC
(deterministic) games to CoT (stochastic) games with rational probabilities.
18
Intuitively, the sequence x fixes in advance the sequence of results of coin tosses used
for playing with σ. Note that if σ is observation-based, then for every sequence x the
strategy σx is both observation-based and pure.
To prove the lemma, we show that [0, 1]ω can be equipped with a probability mea-
sure ν such that the mapping x 7→ Prsσ∗x (ϕ) from [0, 1]ω to [0, 1] is measurable, and:
Z
Prσs∗ (ϕ) = Prσs∗x (ϕ) dν(x) . (2)
x∈[0,1]ω
Suppose that (2) holds. Then there exists x ∈ [0, 1]ω (actually many x’s) such that
Prσs∗ (ϕ) ≤ Prσs∗x (ϕ) and since strategy σx is deterministic, this proves the lemma.
To complete the proof, it is thus enough to construct a probability measure ν on
[0, 1]ω such that (2) holds.
We start with the definition of the probability measure ν. The set [0, 1]ω is equipped
with the sigma-field generated by sequence-cylinders which are defined as follows.
For every finite sequence x = x0 , x1 , . . . , xn ∈ [0, 1]∗ the sequence-cylinder C(x)
is the subset [0, x0 ] × [0, x1 ] × . . . × [0, xn ] × [0, 1]ω ⊆ [0, 1]ω . According to Tul-
cea’s theorem [4], there is a unique product probability measure ν on [0, 1]ω such that
ν(C(ǫ)) = 1 and for every sequence x0 , . . . , xn , xn+1 in [0, 1],
Now that ν is defined, it remains to prove that the mapping x 7→ Prσs∗x (ϕ) from
[0, 1]ω to [0, 1] is measurable and that (2) holds. For that, we introduce the following
mapping:
fs∗ ,σ : [0, 1]ω × [0, 1]ω → (SA1 )ω ,
that associates with every pair of sequences ((xn )n∈N , (yn )n∈N ) the infinite history
h = s0 a1 s1 a2 . . . ∈ (SA1 )ω defined recursively as follows. First s0 = s∗ , and for
every n ∈ N, (
0 if xn ≤ σ(s0 a1 s1 · · · sn )(0),
an+1 =
1 otherwise.
(
L(sn , an+1 ) if yn ≤ δ(sn , an+1 , L(sn , an+1 )),
sn+1 =
R(sn , an+1 ) otherwise.
Intuitively, (xn )n∈N fixes in advance the coin tosses used by the strategy, while
(yn )n∈N takes care of the coin tosses used by the probabilistic transitions, and fs∗ ,σ
produces the resulting description of the play. Thanks to the mapping fs∗ ,σ , random-
ness related to the use of the randomized strategy σ is separated from randomness due
to transitions of the game, which allows to represent the randomized strategy σ by
mean of a probability measure over the set of pure strategies {σx | x ∈ [0, 1]ω }.
We equip both sets (SA1 )ω and [0, 1]ω × [0, 1]ω with sigma-fields that make fs∗ ,σ
measurable. First, (SA1 )ω is equipped with the sigma-field generated by cylinders,
defined as follows. An action-cylinder is a subset C(h) ⊆ (SA1 )ω such that C(h) =
h(SA1 )ω for some h ∈ (SA1 )∗ . A state-cylinder is a subset C(h) ⊆ (SA1 )ω such
that C(h) = h(A1 S)ω for some h ∈ (SA1 )∗ S. The set of cylinders is the union of
19
the sets of action-cylinders and state-cylinders. Second, [0, 1]ω × [0, 1]ω is equipped
with the sigma-field generated by products of sequence-cylinders. Checking that fs∗ ,σ
is measurable is an elementary exercise.
Now we define two probability measures µ and µ′ on (SA1 )ω and prove that they
coincide. On one hand, the measurable mapping fs∗ ,σ : [0, 1]ω × [0, 1]ω → (SA1 )ω
defines naturally a probability measure µ′ on (SA1 )ω . Equip the set [0, 1]ω × [0, 1]ω
with the product measure ν × ν. Then for every measurable subset B ⊆ (SA1 )ω ,
µ′ (B) = (ν × ν)(fs−1
∗ ,σ
(B)) .
On the other hand, the strategy σ and the initial state s∗ naturally define another proba-
bility measure µ on (SA1 )ω . According to Tulcea’s theorem [4], there exists a unique
product probability measure µ on (SA1 )ω such that µ(C(s∗ )) = 1, µ(C(s)) = 0 for
s ∈ S \ {s∗ }, and for h = s0 a1 s1 a2 · · · sn ∈ (SA1 )∗ S and (a, t) ∈ A1 × S,
µ(C(ha)) = µ(C(h)) · σ(h)(a)
µ(C(hat)) = µ(C(ha)) · δ(sn , a, t).
To prove that µ and µ′ coincide, it is enough to prove that µ and µ′ coincide on the
set of cylinders, that is for every cylinder C(h) ⊆ (SA1 )ω ,
µ(C(h)) = (ν × ν)(fs−1
∗ ,σ
(C(h))) . (3)
This is obvious for h = s∗ and h = s ∈ S \ {s∗ }. The general case goes by induction.
Let h = s0 a1 s1 a2 · · · sn ∈ (SA1 )∗ S and (a, t) ∈ A1 × S. Let I = [0, 1]. Let
Ia = [0, σ(h)(a)] if a = 0 and Ia = [σ(h)(a), 1] if a = 1. Let It = [0, δ(sn , a, t)] if
t = L(sn , a) and It = [δ(sn , a, t), 1] if t = R(sn , a). Then:
µ(C(ha) | C(h)) = σ(h)(a)
= (ν × ν)((I × I)n (Ia × I)(I × I)ω )
= (ν × ν)(fs−1
∗ ,σ
(C(ha)) | fs−1
∗ ,σ
(C(h)))
µ(C(hat) | C(ha)) = δ(sn , a, t)
= (ν × ν)((I × I)n (I × It )(I × I)ω )
= (ν × ν)(fs−1
∗ ,σ
(C(hat)) | fs−1
∗ ,σ
(C(ha))) ,
which proves that (3) holds for every cylinder C(h).
Now all the tools needed to prove (2) have been introduced, and we can state the
main relation between fs∗ ,σ and Prσs∗ (ϕ). Let ϕ′ ⊆ (SA1 )ω be the set of histories
s0 a1 s1 a2 . . . such that s0 s1 · · · ∈ ϕ, and let 1ϕ and 1ϕ′ be the indicator functions of
ϕ and ϕ′ . Then:
Z Z Z
Prσs∗ (ϕ) = 1ϕ (p) dPrσs∗ (p) = 1ϕ′ (p) dµ(p) = 1ϕ′ (p) dµ′ (p)
p∈S ω p∈(SA1 ) ω p∈(SA1 ) ω
Z
= 1ϕ′ (fs∗ ,σ (x, y)) d(ν × ν)(x, y)
(x,y)∈[0,1]ω ×[0,1]ω
Z Z !
= 1ϕ′ (fs∗ ,σ (x, y)) dν(y) dν(x) , (4)
x∈[0,1]ω y∈[0,1]ω
20
2 1/2-player 1 1/2-player
complete one-sided partial MDP POMDP
turn-based ǫ > 0 (Th. 1) not (Rmk. 2) not (Rmk. 2) ǫ ≥ 0 (Th. 7) ǫ ≥ 0 (Th. 7)
concurrent not (Rmk. 2) not (Rmk. 2) not (Rmk. 2) (NA) (NA)
where the first and second equalities are by definition of Prσs∗ (ϕ), the third equality
holds because µ = µ′ , the fourth equality is a basic property of image measures, and
the last equality holds by Fubini’s theorem [4] that we can use since 1ϕ′ ◦ fs∗ ,σ is
positive.
To complete the proof, we show that for every x ∈ [0, 1]ω ,
Z
1ϕ′ (fs∗ ,σ (x, y)) dν(y) = Prσs x (ϕ), (5)
y∈[0,1]ω
Equation (4) holds for every observation-based strategy σ, hence in particular for strat-
egy σx . But strategy σx has the following property: for every x′ ∈ ]0, 1[ω and every
y ∈ [0, 1]ω , fs∗ ,σx (x′ , y) = fs∗ ,σ (x, y). Together with (4), this gives (5). This com-
pletes the proof, since (4) and (5) immediately give (2).
We obtain the following result as a consequence of Lemma 1.
Theorem 7 shows that the result of Theorem 1 can be generalized to POMDPs, and
a stronger result (item (2) of Theorem 7) can be proved for POMDPs (and MDPs as a
special case). It remains open whether a result similar to item (2) of Theorem 7 can be
proved for CoT stochastic games. Note that it was already shown in [13, Example 1]
that in CoT stochastic games with Borel objectives optimal strategies need not exist.
The results summarizing when randomness can be obtained for free for strategies is
shown in Table 3.
Undecidability result for POMDPs. The results of [2] show that the emptiness prob-
lem for finite-state probabilistic coBüchi (resp., Büchi) automata under the almost-
sure (resp., positive) semantics [2] is undecidable. As a consequence it follows that
for finite-state POMDPs the problem of deciding if there is a pure observation-based
almost-sure (resp., positive) winning strategy for coBüchi (resp., Büchi) objectives is
21
undecidable, and as a consequence of Theorem 7 we obtain an analogous undecidabil-
ity result for randomized strategies. The undecidability result holds even if the coBüchi
(resp., Büchi) objectives is visible.
Corollary 1. Let G be a finite-state POMDP with initial state s∗ and let T ⊆ S be a
subset of states (or union of observations). Whether there exists a pure or randomized
almost-sure winning strategy for player 1 from s∗ in G for the objective coBuchi(T ) is
undecidable; and whether there exists a pure or randomized positive winning strategy
for player 1 from s∗ in G for the objective Buchi(T ) is undecidable.
Undecidability result for one-sided complete-observation turn-based games. The
undecidability results of Corollary 1 also holds for finite-state OsT stochastic games
(as they subsume finite-state POMDPs as a special case). It follows from Theorem 5
that finite-state OsT stochastic games can be reduced to finite-state OsT deterministic
games. The reduction holds for randomized strategies and thus we obtain the first
undecidability result for finite-state OsT deterministic games (Corollary 2), solving
the open question of [10]. Note that for pure strategies, OsT deterministic games with
a parity objective are EXPTIME-complete [25, 10].
Corollary 2. Let G be a finite-state OsT deterministic game with initial state s∗ and
let T ⊆ S be a subset of states (or union of observations). Whether there exists
a randomized almost-sure winning strategy for player 1 from s∗ in G for the objec-
tive coBuchi(T ) is undecidable; and whether there exists a randomized positive win-
ning strategy for player 1 from s∗ in G for the objective Buchi(T ) is undecidable.
5. Conclusion
In this work we have presented a precise characterization for classes of games
where randomization can be obtained for free in transition functions and in strate-
gies. As a consequence of our characterization we obtain new undecidability results.
The other impact of our characterization is as follows: for the class of games where
randomization is free in transition function, future algorithmic and complexity analysis
can focus on the simpler class of deterministic games; and for the class of games where
randomization is free in strategies, future analysis of such games can focus on the sim-
pler class of pure strategies. Thus our results will be useful tools for simpler analysis
techniques in the study of games, as already demonstrated in [6, 7, 8, 9, 16, 17].
Finally, note that it can be expected that randomness would not be for free in both
the transition function and the strategies, and the results of this paper show that the
classes of games in which randomness is for free in the transition function (Table 2) are
those in which randomized strategies are more powerful than pure strategies (Table 3),
i.e. randomness is not for free in strategies when randomness is for free in the transition
function.
References
[1] R. Alur, T. A. Henzinger, and O. Kupferman. Alternating-time temporal logic.
Journal of the ACM, 49:672–713, 2002.
22
[2] C. Baier, N. Bertrand, and M. Größer. On decision problems for probabilistic
Büchi automata. In FoSSaCS, LNCS 4962, pages 287–301. Springer, 2008.
[3] N. Bertrand, B. Genest, and H. Gimbert. Qualitative determinacy and decidabil-
ity of stochastic games with signals. In Proc. of LICS, pages 319–328. IEEE
Computer Society, 2009.
[4] P. Billingsley. Probability and Measure. Wiley-Interscience, 1995.
[5] J. R. Büchi and L. H. Landweber. Solving sequential conditions by finite-state
strategies. Transactions of the AMS, 138:295–311, 1969.
[6] P. Cerný, K. Chatterjee, T. A. Henzinger, A. Radhakrishna, and R. Singh. Quan-
titative synthesis for concurrent programs. In CAV, pages 243–259, 2011.
[7] K. Chatterjee and M. Chmelik. POMDPs under probabilistic semantics. CoRR,
abs/1408.2058, 2014 (Conference version: UAI, 2013).
[8] K. Chatterjee, M. Chmelik, and M. Tracol. What is decidable about partially
observable Markov decision processes with omega-regular objectives. In CSL,
pages 165–180, 2013.
[9] K. Chatterjee and L. Doyen. Partial-observation stochastic games: How to win
when belief fails. ACM Trans. Comput. Log., 15(2):16, 2014.
[10] K. Chatterjee, L. Doyen, T. A. Henzinger, and J.-F. Raskin. Algorithms for
omega-regular games of incomplete information. Logical Methods in Computer
Science, 3(3:4), 2007.
[11] K. Chatterjee and T. A. Henzinger. Semiperfect-information games. In
FSTTCS’05. LNCS 3821, Springer, 2005.
[12] K. Chatterjee, M. Jurdziński, and T. A. Henzinger. Quantitative stochastic parity
games. In SODA’04, pages 121–130. SIAM, 2004.
[13] K. Chatterjee, R. Majumdar, and M. Jurdziński. On Nash equilibria in stochastic
games. In CSL’04, pages 26–40. LNCS 3210, Springer, 2004.
[14] L. de Alfaro and T. A. Henzinger. Interface theories for component-based design.
In EMSOFT’01, LNCS 2211, pages 148–165. Springer, 2001.
[15] H. Everett. Recursive games. In Contributions to the Theory of Games III, vol-
ume 39 of Annals of Mathematical Studies, pages 47–78, 1957.
[16] H. Gimbert and Y. Oualhadj. Deciding the value 1 problem for ♯-acyclic partially
observable Markov decision processes. In SOFSEM, pages 281–292, 2014.
[17] J. Goubault-Larrecq and R. Segala. Random measurable selections. In Horizons
of the Mind, pages 343–362, 2014.
[18] T. A. Henzinger, O. Kupferman, and S. Rajamani. Fair simulation. Information
and Computation, 173:64–81, 2002.
23
[19] A. Kechris. Classical Descriptive Set Theory. Springer, 1995.
[20] D. A. Martin. The determinacy of Blackwell games. The Journal of Symbolic
Logic, 63(4):1565–1581, 1998.
[21] R. McNaughton. Infinite games played on finite graphs. Annals of Pure and
Applied Logic, 65:149–184, 1993.
[22] J.-F. Mertens, S. Sorin, and S. Zamir. Repeated games. Core Discussion Papers,
9422, 1994.
[23] A. Pnueli and R. Rosner. On the synthesis of a reactive module. In POPL’89,
pages 179–190. ACM Press, 1989.
[24] P. J. Ramadge and W. M. Wonham. Supervisory control of a class of discrete-
event processes. SIAM Journal of Control and Optimization, 25(1):206–230,
1987.
[25] J. H. Reif. The complexity of two-player games of incomplete information. Jour-
nal of Computer and System Sciences, 29(2):274–301, 1984.
[26] W. Thomas. Languages, automata, and logic. In Handbook of Formal Languages,
volume 3, Beyond Words, chapter 7, pages 389–455. Springer, 1997.
[27] M. Y. Vardi. Automatic verification of probabilistic concurrent finite-state sys-
tems. In FOCS, pages 327–338. IEEE Computer Society Press, 1985.
[28] U. Zwick and M. Paterson. The complexity of mean payoff games on graphs.
Theoretical Computer Science, 158:343–359, 1996.
24