Analysing Causal Structures With Entropy: Department of Mathematics, University of York, Heslington, York, YO10 5DD, UK
Analysing Causal Structures With Entropy: Department of Mathematics, University of York, Heslington, York, YO10 5DD, UK
to quantum cryptography, where it can be crucial to eliminate the possibility of classical causes. We
discuss the achievements and limitations of the entropic approach in comparison to other techniques
and point out the main open problems.
I. INTRODUCTION
Deciding whether a causal explanation is compatible with given statistical data and exploring whether it is the most
suitable explanation for the data at hand are central scientific tasks. Sometimes the most reasonable explanation of
a set of observations involves unobserved common causes. In the case where the common causes are classical, the
well-developed machinery of Bayesian networks can be used [1, 2]. In principle, such networks are well-understood
and it is known how to check whether observed correlations are compatible with a given network [3]. In practice,
however, testing compatibility for networks that involve unobserved systems is only computationally tractable for
small cases [4, 5]. Furthermore, the methodology has to be adapted whenever non-classical common causes are
permitted.
Finding good heuristics to help identify correlations that are (in)compatible with a causal structure is currently an
active area of research [6–20] and the use of entropy measures is common to many of these [6–13, 17, 18, 20]. Such
methods are important in the quantum context, where recent cryptographic protocols rely on the lack of a classical
causal explanation for certain quantum correlations in specified causal structures [21–29], an idea that lies behind
Bell’s theorem [30] (see also [31]).
In Section II of this article, we review the entropic characterisation of the correlations compatible with causal
structures in classical, quantum and more general non-signalling theories. We detail refinements of the approach
based on post-selection in Section III. Together, these sections show the current capabilities of entropic techniques,
also establishing and clarifying connections between different contributions. Our review is illustrated with several
examples to assist its understanding and to make it easily accessible for applications. In Section IV we outline and
compare further approaches to the problem, before concluding in Section V with some open questions.
Characterising the joint distributions of a set of random variables or alternatively considering a multi-party quantum
state in terms of its entropy (and of those of its marginals) has a tradition in information theory, dating back to
Shannon [32–34]. However, only recently has this approach been extended to account for causal structure [6, 8]. In
Sections II A and II B respectively, we review this approach with and without imposing causal constraints. All our
considerations are concerned with discrete random variables, for extensions of the approach to continuous random
variables (and its limitations) we refer to [8, 35].
∗
Electronic address: [email protected]
†
Electronic address: [email protected]
2
The entropy cone for a joint distribution of n random variables was introduced in [33]. It is defined in terms of the
Shannon entropy [32], which for a discrete random variable X taking values x ∈ X with probability distribution PX
is defined by
We use H(P ) ∈ R2≥0 −1 to denote the vector corresponding to a particular distribution P ∈ Pn . The set of all such
n
R
vectors is Γ∗n ∶= {v ∈ 2≥0 −1 ∣ ∃P ∈ Pn s.t. v = H(P )} . Its closure Γ∗n includes vectors v for which there exists a sequence
n
of distributions Pk ∈ Pn such that H(Pk ) tends to v as k → ∞. It is known that the entropy cone Γ∗n is a convex
cone for any n ∈ N [36]. As such, its boundary may be characterised in terms of (potentially infinitely many) linear
inequalities. Because Γ∗n is difficult to characterise, we will in the following consider various approximations.
The standard outer approximation to Γ∗n is the polyhedral cone constrained by the Shannon inequalities listed in
the following3 :
• Monotonicity: For all XT , XS ⊆ Ω, H(XS ∖ XT ) ≤ H(XS ).
• Submodularity: For all XS , XT ⊆ Ω, H(XS ∩ XT ) + H(XS ∪ XT ) ≤ H(XS ) + H(XT ).
These inequalities are always obeyed by the entropies of a set of jointly distributed random variables.
They may be concisely rewritten in terms of the following information measures: the conditional entropy
of two jointly distributed random variables X and Y , H(X∣Y ) ∶= H(XY ) − H(Y ), their mutual information,
I(X ∶ Y ) ∶= H(X) + H(Y ) − H(XY ), and the conditional mutual information between two jointly distributed ran-
dom variables X and Y given a third, Z, denoted I(X ∶ Y ∣Z) ∶= H(XZ) + H(Y Z) − H(Z) − H(XY Z). Hence, the
monotonicity constraints correspond to positivity of conditional entropy, H(XS ∩XT ∣XS ∖XT ) ≥ 0, and submodularity
is equivalent to positivity of the conditional mutual information, I(XS ∖ XT ∶ XT ∖ XS ∣XS ∩ XT ) ≥ 0. The monotonic-
ity and submodularity constraints can all be generated from a minimal set of n + n(n − 1)2n−3 inequalities [33]: for
the monotonicity constraints it is sufficient to consider the n constraints with XS = Ω and XT = Xi for some Xi ∈ Ω;
for the submodularity constraints it is sufficient to consider those with XS ∖ XT = Xi and XT ∖ XS = Xj with i < j
and where XU ∶= XS ∩ XT is any subset of Ω not containing Xi or Xj , i.e., submodularity constraints of the form
I(Xi ∶ Xj ∣XU ) ≥ 0.
These n+n(n−1)2n−3 independent Shannon inequalities can be expressed in terms of a (n + n(n − 1)2n−3 ) × (2n − 1)
dimensional matrix, which we call MSH n
, such that for any v ∈ Γ∗n , the conditions MSH
n
⋅ v ≥ 0 hold4 . More generally, for
v ∈ R≥0 , a violation of MSH ⋅ v ≥ 0 certifies that there is no distribution P ∈ Pn such that v = H(P ). It follows that
2n −1 n
∗
R
the Shannon cone, Γn ∶= {v ∈ 2≥0 −1 ∣ MSH
n
n
⋅ v ≥ 0}, is an outer approximation of the set of achievable entropy vectors,
Γn [33].
1
Note limp→0+ p log2 p = 0
2
Since the empty set always has zero entropy, we choose to omit it from the entropy vector in this work.
3 It is a matter of convention, whether H({}) = 0 is included as a Shannon inequality; we keep this implicit.
4 n
This condition is to be interpreted as the requirement that each component of MSH ⋅ v is non-negative.
3
⎛ 0 0 0 0 0 −1 1 ⎞
⎜ 0 0 0 0 0 1 ⎟
⎜ ⎟
−1
⎜ 0 1 ⎟
⎜ 0 0 −1 0 0 ⎟
⎜ 1 0 ⎟
⎜ 1 0 −1 0 0 ⎟
MSH = ⎜
3
⎜ 1 0 1 0 −1 0 0 ⎟
⎟.
⎜ 0 0 ⎟
⎜ 1 1 0 0 −1 ⎟
⎜ −1 −1 ⎟
⎜ 0 0 1 1 0 ⎟
⎜ ⎟
⎜ 0 −1 0 1 0 1 −1 ⎟
⎝ 0 0 −1 0 1 1 −1 ⎠
The first three rows are monotonicity constraints, the remaining six ensure submodularity.
For two variables the Shannon cone coincides with the actual entropy cone, Γ2 = Γ∗2 , while for three random variables
this holds only for the closure of the entropic cone, i.e. Γ3 = Γ∗3 but Γ3 ≠ Γ∗3 [36, 37]. For n ≥ 4 further independent
constraints on the set of entropy vectors are needed to fully characterise Γ∗n , the first of which was discovered in [38].
Proposition 1 (Zhang & Yeung). For any four discrete random variables T , U , V and W the following inequality
holds: −H(T ) − H(U ) − 21 H(V ) + 32 H(T U ) + 32 H(T V ) + 12 H(T W ) + 23 H(U V ) + 12 H(U W ) − 12 H(V W ) − 2H(T U V ) −
1
2
H(T U W ) ≥ 0.
For n ≥ 4 the convex cone Γ∗n ⊊ Γn is not polyhedral, i.e., it cannot be characterised by finitely many linear
inequalities [39]. Nonetheless, many linear entropic inequalities have been discovered [36, 38, 40, 41]. Recently,
systematic searches for new entropic inequalities for n = 4 have been conducted [42, 43], which recover most of the
previously known inequalities; in particular the inequality of Proposition 1 is re-derived and shown to be implied by
tighter ones [43]. The systematic search in [43] is based on considering additional random variables that obey certain
constraints and then deriving four variable inequalities from the known identities for five or more random variables
(see also [38, 39]), an idea that is captured by a so-called copy lemma [38, 43, 44]. In the same article, several rules
to generate families of inequalities have been suggested, in the style of techniques introduced by Matúš [39].
For more than four variables, a few additional inequalities are known [38, 40, 45]. Curiously, to our knowledge,
in the case of four variables, all known relevant non Shannon inequalities (i.e., the ones found in [38–43] that
are not yet superseded by tighter ones) can be written as a positive linear combination of the Ingleton quantity,
I(T ∶ U ∣V ) + I(T ∶ U ∣W ) + I(V ∶ W ) − I(T ∶ U ), and conditional mutual information terms (see also [43]).
3. Inner approximations
For the four variable entropy cone, Γ∗4 , an inner approximation is defined as the region constrained by the Shannon in-
equalities and the six permutations of the Ingleton inequality [46], I(T ∶ U ∣V ) + I(T ∶ U ∣W ) + I(V ∶ W ) − I(T ∶ U ) ≥ 0,
for random variables T , U , V and W . These inequalities can be concisely written as a matrix MI ∈ R6 × R15 . The
constrained region is called the Ingleton cone, ΓI ∶= {v ∈ 15 R
≥0 ∣ MSH ⋅ v ≥ 0 and MI ⋅ v ≥ 0} , and it has the property that
4
v ∈ Γ implies v ∈ Γn [47]. In contrast, there are entropy vectors that violate the Ingleton inequalities, as the following
I ∗
example shows.
Example 2. Let T , U , V and W be four jointly distributed random variables. Let V and
W be uniform random bits and let T = AND(¬V, ¬W ) and U = AND(V, W ). This distri-
bution [39] leads to the entropy vector v ≈(0.81, 0.81, 1, 1, 1.50, 1.50, 1.50, 1.50, 1.50, 2, 2, 2, 2, 2, 2), for which
I(T ∶ U ∣V ) + I(T ∶ U ∣W ) + I(V ∶ W ) − I(T ∶ U ) ≈ −0.12 in violation of the Ingleton inequality.
For five random variables an inner approximation in terms of Shannon, Ingleton and 24 additional inequalities and
their permutations is known (including partial extensions to more variables) [48, 49].
Causal relations among a set of variables impose constraints on their possible joint distributions, which can be
conveniently represented with a causal structure.
4
AZ AY
CX CY B A
X Z Y A C B Z
Figure 1: (a) Pearl’s instrumental scenario. The nodes X, Y and Z are observed, A is unobserved. In the classical case this
can be understood in the following way: A random variable X and an unobserved A are used to generate another random
variable Z. Then Y is generated from A and the observed output of node Z. In particular, note that no other information
can be forwarded from X through the node Z to Y . In the quantum case, the source A shares a quantum system ρA ∈ S(HA ),
where HA ≅ HAZ ⊗ HAY . The subsystem AZ is measured to produce Z and likewise for Y . The subsystems AZ and AY are
both considered to be parents of Z (and Y ). (b) Bell scenario. The observed variables A and B together with an unobserved
system C are used to generate outputs X and Y respectively. In the classical case, C is modelled as a random variable, in
the quantum case it is a quantum state on a Hilbert space HC ≅ HCX ⊗ HCY . (c) Triangle causal structure. Three observed
random variables X, Y and Z share pairwise common causes A, B and C, which in the classical case are modelled by random
variables. Some of the valid inequalities such as 2I(X ∶ Y ∣ Z) + I(X ∶ Z ∣ Y ) + I(Y ∶ Z ∣ X) − I(X ∶ Y ) ≥ 0 can only be recovered
using non-Shannon entropic inequalities [51].
Definition 2. A causal structure is a set of variables arranged in a directed acyclic graph (DAG), in which a subset
of the nodes is assigned as observed.
The directed edges of the graph are intended to represent causation, perhaps by propagation of some influence,
and cycles are excluded to avoid the well-known paradoxes associated with causal loops. We will interpret causal
structures in different ways depending on the supposed physics of whatever is mediating the causal influence.
One of the simplest causal structures that leads to interesting insights and one of the most thoroughly analysed
ones is Pearl’s instrumental causal structure, IC [50]. It is displayed in Figure 1(a) and will be used as an example
throughout this review.
In the classical case, the causal relations among a set of random variables can be explored by means of the theory
of Bayesian networks (see for instance [1, 2] for a complete presentation of this theory).
Definition 3. A classical causal structure, C C , is a causal structure in which each node of the DAG has an associated
random variable.
It is common to use the same label for the node and its associated random variable. The DAG encodes which joint
distributions of the involved variables are allowed in a causal structure C C . To explain this we need a little more
terminology.
Definition 4. Let XS , XT , XU be three disjoint sets of jointly distributed random variables. Then XS and XT
are said to be conditionally independent given XU if and only if their joint distribution PXS XT XU can be written as
PXS XT XU = PXS ∣XU PXT ∣XU PXU . Conditional independence of XS and XT given XU is denoted as XS XT ∣XU .
⊧
Two variables XS and XT are (unconditionally) independent if PXS XT = PXS PXT , concisely written XS XT . With
⊧
reference to a DAG with a subset of nodes, X, we will use X ↓ to denote the ancestors of X and X ↑ to denote the
descendants of X. The parents of X are represented by X ↓1 and the non-descendants are X ↑̸ .
Definition 5. Let C C be a classical causal structure with nodes {X1 , X2 , . . . , Xn }. A probability distribution
PX1 X2 ...Xn ∈ Pn is (Markov) compatible with C C if it can be decomposed as PX1 X2 ...Xn = ∏i PXi ∣X↓1 .
i
The compatibility constraint encodes all conditional independences of the random variables in the causal structure
C C . Nonetheless, whether a particular set of variables is conditionally independent of another is more easily read
from the DAG, as explained in the following.
Definition 6. Let X, Y and Z be three pairwise disjoint sets of nodes in a DAG G. The sets X and Y are said to
be d-separated 5 by Z, if Z blocks any path from any node in X to any node in Y . A path is blocked by Z, if the path
5
The d in d-separation stands for directional.
5
X Z Y X Z Y X Z Y X Z Y
Figure 2: While in the left causal structure X Y , the other three networks share the conditional independence relation X Y ∣Z.
⊧
This illustrates that the conditional independences are not sufficient to characterise the causal links among a set of random
variables.
contains one of the following: i → z → j or i ← z → j for some nodes i, j and a node z ∈ Z in that path, or if the path
contains i → k ← j, where k ∉ Z.
The d-separation of the nodes in a causal structure is directly related to the conditional independence of its variables.
The following proposition corresponds to Theorem 1.2.5 from [1], previously introduced in [52, 53]. It justifies the
application of d-separation as a means to identify independent variables.
Proposition 7 (Verma & Pearl). Let C C be a classical causal structure and let X, Y and Z be pairwise disjoint
subsets of nodes in C C . If a probability distribution P is compatible with C C , then the d-separation of X and Y
by Z implies the conditional independence X Y ∣Z. Conversely, if for every distribution P compatible with C C the
⊧
conditional independence X Y ∣Z holds, then X is d-separated from Y by Z.
⊧
The compatibility of probability distributions with a classical causal structure is conveniently determined with the
following proposition, which has also been called the parental or local Markov condition before (Theorem 1.2.7 in [1]).
Proposition 8 (Pearl). Let C C be a classical causal structure. A probability distribution P is compatible with C C if
and only if every variable in C C is independent of its non-descendants, conditioned on its parents.
Hence, to establish whether a probability distribution is compatible with a certain classical causal structure, it
is enough to check that every variable X is independent of its non-descendants X ↑̸ given its parents X ↓1 , concisely
written as X X ↑̸ ∣X ↓1 , i.e., to check one constraint for each variable. In particular, it is not necessary to explicitly
⊧
check for all possible sets of nodes whether they obey the independence relations implied by d-separation. Each such
constraint can be conveniently expressed as6
I(X ∶ X ↑̸ ∣X ↓1 ) = 0. (1)
While the conditional independence relations capture some features of the causal structure, they are insufficient
to completely capture the causal relations between variables, as illustrated in Figure 2. In this case, the probability
distributions themselves are unable to capture the difference between these causal structures: correlations are insuf-
ficient to determine causal links between random variables. External interventions allow for the exploration of causal
links beyond the conditional independences [1]. However, we do not consider these here.
Let C C be a classical causal structure involving n random variables {X1 , X2 , . . . , Xn }. The restricted set of
distributions that are compatible with the causal structure C C is P (C C ) ∶= {P ∈ Pn ∣ P = ∏ni=1 PXi ∣X↓1 }.
i
Example 3 (Allowed distributions in the instrumental scenario). The classical instrumental scenario of Figure 1
allows for any four variable distribution in the set P (IC C ) = {PAXYZ ∈ P4 ∣ PAXYZ = PY∣AZ PZ∣AX PX PA } .
The restrictions on the allowed distributions also restrict the corresponding entropy cones. Due to Proposition 8
there are at most n independent conditional independence equalities (1) in a causal structure C C . Their coefficients
can be concisely written in terms of a matrix MCI (C C ), where CI stands for conditional independence. For a causal
structure C C , we define the two sets Γ∗ (C C ) ∶= {v ∈ Γ∗n ∣ MCI (C C ) ⋅ v = 0} and Γ (C C ) ∶= {v ∈ Γn ∣ MCI (C C ) ⋅ v = 0},
where Γ∗ (C C ) ⊆ Γ (C C ). The following lemma justifies the notation we use for Γ∗ (C C ); it is the set of achievable
entropy vectors in C C .
Lemma 9. For a causal structure C C , Γ∗ (C C ) = {v ∈ R2≥0 −1 ∣ ∃P = ∏ni=1 PXi ∣X↓1 ∈ Pn s.t. v =H(P )} . Furthermore, its
n
6 This follows because the relative entropy D(P ∥Q) ∶= ∑x PX (x) log (PX (x)/QX (x)) satisfies D(P ∥Q) = 0 ⇔ P = Q, and because
I(X ∶ X ↑̸ ∣X ↓1 ) = D(PXX↑̸ X↓1 ∥PX∣X↓1 PX↑̸ ∣X↓1 PX↓1 ).
6
Proof. For the causal structure C C , let E (C C ) ∶={v ∈ R2≥0 −1 ∣ ∃P = ∏ni=1 PXi ∣X↓1 ∈ Pn s.t. v =H(P )} be the set of
n
i
all entropy vectors. Since (1) holds for each variable Xi if and only if P = ∏ni=1 PXi ∣X↓1 (cf. Proposition 8),
i
E (C C )={v ∈ Γ∗n ∣ ∀i ∈ {1, . . . , n} , I(Xi ∶ Xi↑̸ ∣Xi↓1 ) = 0}. Applying the definition of MCI (C C ) yields E (C C ) =
Γ∗ (C C ). Now, let us consider the set F (C C ) ∶= {v ∈ Γ∗n ∣ MCI (C C ) ⋅ v = 0} ⊆ R2 −1 . This is a closed convex set,
n
since Γ∗n is known to be closed and convex and since restricting the closed convex cone Γ∗n with linear equality con-
straints retains these properties. More precisely, the set of solutions to the matrix equality MCI (C C ) ⋅ v = 0 is also
closed and convex. Being the intersection of two closed convex sets, the set F (C C ) is also closed and convex. From this
we conclude that Γ∗ (C C ) is convex because Γ∗ (C C ) = {v ∈ Γ∗n ∣ MCI (C C ) ⋅ v = 0} equals F (C C ). (Because F (C C )
is closed, any element w ∈ F (C C ), in particular any element on its boundary, is the limit of a sequence of elements
{wk }k for k → ∞, where the wk lie in the interior of F (C C ) for all k. Hence w ∈ Γ∗ (C C ).)
The convexity of Γ∗ (C C ) is crucial for the considerations of the following sections. Note that in spite of the convexity
of Γ∗ (C C ), the set P (C C ) is generally not convex. This alludes to the fact that significant information about the
achievable correlations among the random variables is lost via the mapping from P (C C ) to the corresponding entropic
cone Γ∗ (C C ).
Example 4 (Entropic outer approximation for the instrumental scenario). The instrumental scenario has at
most 4 independent conditional independence equalities (1). We find that there are only two, I(A ∶ X) = 0 and
I(Y ∶ X ∣ AZ) = 0. This yields Γ∗ (IC C ) = {v ∈ Γ∗4 ∣ MCI (IC C ) ⋅ v = 0} with
−1 −1 0 0 1 0 0 0 0 0 0 0 0 0 0
MCI (IC C ) = ( ),
0 0 0 0 0 0 1 0 0 0 0 −1 −1 0 1
where the coordinates are ordered as (H(A), H(X), H(Y ), H(Z), H(AX), H(AY ), H(AZ), H(XY ), H(XZ), H(Y Z),
H(AXY ), H(AXZ), H(AY Z), H(XY Z), H(AXY Z)). An outer approximation is given by Γ (IC C ) =
{v ∈ Γ4 ∣ MCI (IC C ) ⋅ v = 0} .
In general, the outer approximation to Γ∗ (C C ) can be further tightened by taking non-Shannon inequalities into
account. These have lead to the derivation of numerous new entropic inequalities for various causal structures [51] (see
e.g. the triangle causal structure of Figure 1(c)). For the instrumental scenario, however, such additional inequalities
are irrelevant. This can for instance be seen by constructing the following inner approximation to the cone.
Example 5 (Entropic inner approximation for the instrumental scenario [51]). For the instrumental scenario an inner
approximation is given in terms of the Ingleton cone and the conditional independence constraints from the previous
example, ΓI (IC C ) = {v ∈ ΓI ∣ MCI (IC C ) ⋅ v = 0}. For this causal structure the Ingleton constraints are implied by
the Shannon inequalities and the conditional independence constraints and hence, inner and outer approximation
coincide. Consequently, they also coincide with the actual entropy cone, i.e., ΓI (IC C ) = Γ (IC C ) = Γ∗ (IC C ). In
particular, non-Shannon entropic inequalities cannot improve the outer approximation in this example.
Inner approximations have been considered in [51]. They are particularly useful in cases where identical inner and
outer approximations are found, where they identify the actual boundary of the entropy cone. In other cases they
can allow parts of the actual boundary to be identified or give clues on how to find better outer approximations.
Arguably all interesting scenarios (such as the previous example) involve unobserved variables that are suspected
to cause some of the correlations between the variables we observe. These unobserved variables may yield constraints
on the possible joint distributions of the observed variables, a well-known example being a Bell inequality [30]7 . More
generally we would like to infer constraints on the observed variables that follow from the presence of unobserved
variables.
For a causal structure on n random variables {X1 , X2 , . . . , Xn }, the restriction to the set of observed variables is
called its marginal scenario, denoted M. Here, we assume w.l.o.g. that the first k ≤ n variables are observed and the
remaining n − k are not. We are thus interested in the correlations among the first k variables that can be obtained
as the marginal of some distribution over all n variables. Without any causal restrictions the set of all probability
distributions of the k observed variables is PM ∶= {P ∈ Pk ∣ P = ∑Xk+1 ,...,Xn PX1 X2 ...Xn }, i.e., PM = Pk . For a classical
causal structure C C on the set of variables {X1 , X2 , . . . , Xn }, marginalising all distributions P ∈ P (C C ) over the
7
For a detailed discussion of the significance of Bell inequality violation on classical causal structures see [31].
7
n − k unobserved variables leads to the set PM (C C ) ∶= {P ∈ Pk ∣ P = ∑Xk+1 ,...,Xn ∏ni=1 PXi ∣X↓1 }. In contrast to the
i
unrestricted case, this set of distributions is in general not recovered by considering a causal structure that involves
only k observed random variables, as can be seen in the following example.
Example 6 (Observed distributions in the instrumental scenario). For the instrumental sce-
nario the observed variables are X, Y and Z and their joint distribution is of the form
PM (IC C ) = {PXYZ ∈ P3 ∣ PXYZ = ∑A PY∣AZ PZ∣AX PX PA }.
The first entropic inequalities for a marginal scenario were derived in [12], where certificates for the existence of
common ancestors of a subset of the observed random variables of at least a certain size were given. One such scenario
is the triangle causal structure of Figure 1(c). The systematic entropy vector approach was devised for classical causal
structures in [6, 8, 10]. An outer approximation to the entropic cones of a variety of causal structures was given
in [11]. In the following we give the details of this approach.
In the entropic picture, marginalisation is performed by eliminating the coordinates that represent entropies of sets
of variables containing at least one unobserved variable from the vectors. This corresponds to a projection of a cone
n k n k
in R2 −1 to its marginal cone in R2 −1 [9]. We will denote this projection πM ∶ R2 −1 → R2 −1 . It gives all entropy
vectors w of the observed sets of variables, i.e., of the marginal scenario M, for which there exists at least one entropy
vector v in the original scenario with matching entropies on the observed variables.
Starting from the set of all entropy vectors, Γ∗ (C C ), those relevant for the marginal scenario can be obtained by
discarding the appropriate components. For a finitely generated cone such as Γ (C C ), its projection can be more
efficiently determined from the projection of its extremal rays. In the dual description of the entropic cone in terms
of its facets (i.e., its inequality description) the transition to the marginal scenario can be made computationally
by eliminating all entropies of sets of variables not contained in M from the system of inequalities. The standard
algorithm that achieves this is Fourier-Motzkin elimination [54], which has been used in this context in [6, 8, 9].
Without any causal restrictions, the entropy cone Γ∗n is projected to the marginal cone Γ∗M ∶=
{w ∈ R2≥0−1 ∣ ∃v ∈ Γ∗n s.t. w = πM (v)}. Note that if we marginalise Γ∗n over n−k variables we recover the entropy cone for
k
k random variables, i.e., Γ∗M = Γ∗k . The same applies to the outer approximations: The n variable Shannon cone Γn is
projected to the k variable Shannon cone with the mapping πM , i.e., ΓM ∶= {w ∈ R≥02k −1
∣ ∃v ∈ Γn s.t. w = πM (v)} = Γk .
This follows because the n variable Shannon constraints contain the corresponding k variable constraints as a subset,
and since any vector in Γk can be extended to a vector in Γn , for instance by taking H(Xk+1 ) = H(Xk+2 ) = ⋯ =
H(Xn ) = 0 and H(XS ∪ XT ) = H(XS ) for any XT ⊆ {Xk+1 , Xk+2 , . . . , Xn }.
For a classical causal structure C C , we will be interested in the set Γ∗M (C C ) ∶=
{w ∈ R2≥0−1 ∣ ∃v ∈ Γ∗ (C C ) s.t.w = πM (v)}, which is by construction a convex cone, since projection preserves
k
convexity. The following lemma confirms that this is the entropy cone of the marginal scenario M and thus also
formally justifies the method of projecting the sets directly.
Lemma 10. Γ∗M (C C ) is equal to the set of entropy vectors compatible with the marginal scenario of the classical
causal structure C C , i.e., Γ∗M (C C ) = {w ∈ R≥0
2 −1 ∣ ∃P ∈ P
M (C ) s.t.w = H(P )}.
k
C
Proof. Let F denote the set on the rhs of the statement in the lemma. Note that w ∈ Γ∗M (C C ) implies that there exists
v ∈ Γ∗ (C C ) s.t. w = πM (v). Using Lemma 9, we have v = H(P ) for some P = ∏i PXi ∣X↓1 . If we take P ′ = ∑Xk+1 ,...,Xn P
then w = H(P ′ ) and hence w ∈ F .
i
Conversely, w ∈ F implies that there exists P ′ ∈ PM (C C ) s.t. w = H(P ′ ) and hence there exists P ∈ P (C C ) such
that P ′ = ∑Xk+1 ,...,Xn P . If we take v = H(P ), then w = πM (v) and hence w ∈ Γ∗M (C C ). Taking the topological
closure of both sets concludes the proof.
As mentioned previously, non-Shannon inequalities cannot give any new entropic constraints for IC, as the Shannon
approximation is already tight. However, in many causal structures they do. For instance in the triangle scenario of
Figure 1(c), non-Shannon inequalities still lead to new entropic constraints, even after marginalisation to the three
observed variables [51].
A quantum causal structure differs from its classical counterpart in that unobserved systems correspond to shared
quantum states.
Definition 11. A quantum causal structure, C Q , is a causal structure where each observed node has a corresponding
random variable, and each unobserved node has an associated quantum system.
In a classical causal structure the edges of the DAG represent the propagation of classical information, and, at
a node with incoming edges, the random variable there can be generated by applying an arbitrary function to its
parents. We are hence implicitly assuming that all the information about the parents is transmitted to its children
(otherwise the set of allowed functions would be restricted). This does not pose a problem since classical information
can be copied. In the quantum case, on the other hand, the no-cloning theorem means that the children of a node
cannot (in general) all have access to the same information as is present at that node. Furthermore, the analogue
of performing arbitrary functions in the classical case is replaced by arbitrary quantum operations. Such a quantum
framework that allows for an analysis with entropy vectors was introduced in [13]. In the following we outline this
approach. However, for unity of description, our account of quantum causal structures is based upon the viewpoint
that is taken for generalised causal structures in [11], which we review in the next section8 .
Let C Q be a quantum causal structure. Nodes without input edges correspond to the preparation of a quantum
state described by a density operator on a Hilbert space, e.g., ρA ∈ S(HA ) for a node A, where for observed nodes this
state is required to be classical9 . For each directed edge in the graph there is a corresponding subsystem with Hilbert
space labelled by the edge’s input and output nodes. For instance, if Y and Z are the only children of A then there
are associated spaces HAY and HAZ such that HA = HAY ⊗ HAZ 10 . At an unobserved node, a CPTP map from the
joint state of all its input edges to the joint state of its output edges is performed. A node is labelled by its output
state. For an observed node the latter is classical. Hence, it corresponds to a random variable that represents the
output statistics obtained in a measurement by applying a positive operator valued measure (POVM) to the input
states11 . If all input edges are classical this can be interpreted as a stochastic map between random variables.
A distribution, P , over the observed nodes of a causal structure C Q is compatible with C Q if there exists a
quantum state labelling each unobserved node (with subsystems for each unobserved edge) and transformations, i.e.,
preparations and CPTP maps for each unobserved node as well as POVMs for each observed node, that allow for the
generation of P by means of the Born rule. We denote the set of all compatible distributions PM (C Q ).
Example 8 (Compatible distributions in the quantum instrumental scenario). For the quantum instrumental scenario
(Figure 1(a)), PM (IC Q ) = {PXY Z ∈ P3 ∣ PXY Z = tr((EX
Z
⊗ FZY )ρA )PX } is the set of compatible distributions. A state
ρA ∈ S(HAZ ⊗HAY ) is prepared. Depending on the random variable X, a POVM {EX Z
}Z on HAZ is applied to generate
the output distribution of the observed variable Z. Depending on the latter, another POVM {FZY }Y is applied to
generate the distribution of Y .
The set of entropy vectors of compatible probability distributions over the observed nodes, PM (C Q ), is Γ∗M (C Q ) ∶=
{w ∈ R2≥0−1 ∣ ∃P ∈ PM (C Q ) s.t.w = H(P )}. Outer approximations ΓM (C Q ) were first derived in [13], a procedure
k
that we outline in the following. For their construction, an entropy is associated to each random variable and to
each subsystem of a quantum state (equivalently each edge originating at a quantum node), corresponding to the von
8
The difference is as follows: In [13] nodes correspond to quantum systems. All outgoing edges of a node together define a completely
positive trace preserving (CPTP) map with output states corresponding to the joint state associated with its child nodes. Similarly, the
CPTP map associated to the input edges of a node must map the states of the parent nodes to the node in question. In [11], on the
other hand, edges correspond to states whereas the transformations occur at the nodes.
9
S(H) denotes the set of all density operators on a Hilbert space H.
10
Note that in the classical case these subsystems may all be taken to be copies of the system itself.
11
Note that preparation and measurement can also be seen as CPTP maps with classical input and output systems respectively, thus
allowing for a unified formulation.
9
Neumann entropy of the respective system. For convenience of exposition, edges and their associated systems share
the same label. The von Neumann entropy of a density operator ρX ∈ S(HX ) is defined as
The quantum conditional entropy, mutual- and conditional mutual information are defined as in the classical case and
with the von Neumann entropy replacing the Shannon entropy.
Because of the impossibility of cloning, the outcomes and the quantum systems that led to them do not exist
simultaneously. Therefore there is in general no joint multi-party quantum state for all subsystems and it does not
make sense to talk about the joint entropy of the states and outcomes. More concretely, if a system A is measured to
produce Z, then ρAZ is not defined and hence neither is H(AZ)12 .
Definition 12. Two subsystems in a quantum causal structure C Q coexist if neither of them is a quantum ancestor
of the other. A set of subsystems that mutually coexist is termed coexisting.
A quantum causal structure may have several maximal coexisting subsets. Only within such subsets is there a well
defined joint quantum state and joint entropy.
Example 9 (Coexisting sets in the quantum instrumental scenario). Consider the quantum version of the instrumental
scenario, as illustrated in Figure 1(a). There are three observed variables as well as two edges originating at unobserved
(quantum) nodes, hence 5 variables to consider. More precisely, the quantum node A has two associated subsystems
AZ and AY . The correlations seen at the two observed nodes Z and Y are formed by measurement on the respective
subsystems AZ and AY . The coexisting sets in this causal structure are {AY , AZ , X}, {AY , X, Z} and {X, Y, Z}
and their (non-empty) proper subsets.
Note that without loss of generality we can assume that any initial, i.e., parentless quantum states such as ρA above,
are pure. This is because any mixed state can be purified, and if the transformations and measurement operators
are then taken to act trivially on the purifying systems the same statistics are observed. In the causal structure of
Example 9, this implies that ρA can be considered to be pure and thus H(AY AZ ) = 0. The Schmidt decomposition
then implies that H(AY ) = H(AZ ). This is computationally useful as it reduces the number of free parameters in the
entropic description of the scenario. Furthermore, by Stinespring’s theorem [56], whenever a CPTP map is applied
at a node that has at least one quantum child, then one can instead consider an isometry to a larger output system.
The additional system that is required for this can be taken to be part of the unobserved quantum output (or one of
them in case of several quantum output nodes). Each such case allows for the reduction of the number of variables
by one, since the joint entropy of all inputs to such a node must be equal to that of all its outputs.
Quantum states are known to obey submodularity [57] and also obey the condition
• Weak monotonicity [57]: H(XS ∖ XT ) + H(XT ∖ XS ) ≤ H(XS ) + H(XT ), for all XS , XT ⊆ Ω (recall H({}) = 0).
This the dual of submodularity in the sense that the two inequalities can be derived from each other by considering
purifications of the corresponding quantum states [58].
Within the context of causal structures, these relations can always be applied between variables in the same
coexisting set. In addition, whenever it is impossible for there to be entanglement between the subsystems XS ∩ XT
and XS ∖XT — for instance if these subsystems are in a cq-state — the monotonicity constraint H(XS ∖XT ) ≤ H(XS )
holds. If it is also impossible for there to be entanglement between XS ∩ XT and XT ∖ XS , then the monotonicity
relation H(XT ∖ XS ) ≤ H(XT ) holds rendering the weak monotonicity relation stated above redundant.
Altogether, these considerations lead to a set of basic inequalities containing some Shannon and some weak-
monotonicity inequalities, which are conveniently expressed in a matrix MB (C Q ). This way of approximating the
entropic cone in the quantum case is inspired by work on the entropic cone for multi-party quantum states [34]. Note
also that there are no further inequalities for the von Neumann entropy known to date (contrary to the classical case
where a variety of non Shannon inequalities is known), except under additional constraints [59–64].
The conditional independence constraints in C Q cannot be identified by Proposition 8, because variables do not
coexist with any quantum parents and hence conditioning a variable on a quantum parent is not meaningful. Nonethe-
less, among the variables in a coexisting set the conditional independences that are valid for C C also hold in C Q . This
can be seen as follows. First, the validity of any constraints that involve only observed variables (which are always
part of a coexisting set) hold by Proposition 16 below. Secondly, for unobserved systems only their classical ancestors
12
Attempts to circumvent this have been made, see for example [55].
10
and none of their descendants can be part of the same coexisting set. An unobserved system is hence independent of
any subset of the same coexisting set with which it shares no ancestors. Note that each of the subsystems associated
with a quantum node is considered to be a parent of all of the node’s children (see Figure 1 for an example).
In addition, suppose XS and XT are disjoint subsets of a coexisting set, Ξ, and that the unobserved system A is
also in Ξ. Then I(A ∶ XS ∣XT ) = 0 if XT d-separates A from XS (in the full graph including quantum nodes)13 . The
same considerations can be made for sets of unobserved systems. These independence constraints may be assembled
in a matrix MQCI (C Q ).
Among the variables that do not coexist, some are obtained from others by means of quantum operations. These
variables are thus related by data processing inequalities (DPIs) [65].
Proposition 13 (DPI). Let ρXS XT ∈ S(HXS ⊗ HXT ) and E be a completely positive trace preserving (CPTP) map14
on S(HXT )15 leading to a state ρ′XS XT . Then I(XS ∶ XT )ρ′X X ≤ I(XS ∶ XT )ρXS XT .
S T
The data processing inequalities provide an additional set of entropic constraints, which can be expressed in terms
of a matrix inequality MDPI (C Q ) ⋅ v ≥ 016 . In general, there are a large number of variables for which data processing
inequalities hold. It is thus beneficial to derive rules that specify which of the inequalities are needed. First, note that
whenever a concatenation of two CPTP maps E1 and E2 , E = E2 ○ E1 , is applied to a state, then any DPIs for inputs
and outputs of E are implied by the DPIs for E1 and E2 17 . Hence, the DPIs for composed maps E never have to be
considered as separate constraints.
Secondly, whenever a state ρXS XT XR ∈ S(HXS ⊗ HXT ⊗ HXR ) can be decomposed as ρXS XT XR = ρXS XT ⊗ ρXR and a
CPTP map E transforms the state on S(HXS ). Then any DPIs for ρXS XT XR are implied by the DPIs for ρXS XT 18 .
Furthermore, whenever a node has classical and quantum inputs, there is not only a CPTP map generating its
output state but this map can be extended to a CPTP map that simultaneously retains the classical inputs, as is the
content of the following lemma, which also shows that retaining a copy of the classical inputs leads to tighter entropic
inequalities.
Lemma 14. Let Y be a node with classical and quantum inputs XC and XQ and E be a CPTP map that acts at
this node, i.e., E is a map from S(HXC ⊗ HXQ ) to S(HY ). Then E can be extended to a map E ′ ∶ S(HXC ⊗ HXQ ) →
S(HXC ⊗ HY ) such that E ′ ∶ ρXC XQ ↦ ρ′XC Y with the property that ρ′XC Y is classical on HXC and ρ′XC = ρXC .
Furthermore, the DPIs for E ′ imply those for E.
Proof. The first part of the lemma follows because classical information can be copied, and hence E ′ can be decomposed
into first copying XC , and then performing E 19 .
Suppose E ∶ ρXC XQ ↦ ρ′Y . The second part follows because if I(XC XQ XS ∶ XT )ρ ≥ I(Y XS ∶ XT )ρ′ is a valid
DPI for E then I(XC XQ XS ∶ XT )ρ ≥ I(XC Y XS ∶ XT )ρ′ is valid for E ′ . The second of these implies the first by the
submodularity relation I(XC Y XS ∶ XT )ρ′ ≥ I(Y XS ∶ XT )ρ′ .
All the above (in)equalities are necessary conditions for a vector to be an entropy vector compatible with the causal
structure C Q . They constrain a polyhedral cone in Rm Q
≥0 , where m is the total number of coexisting sets of C ,
Example 10 (Entropic constraints for the quantum instrumental scenario). The cone
Γ (IC Q ) = {v ∈ R15
≥0 ∣ M B (IC Q
) ⋅ v ≥ 0, M QCI (IC Q
) ⋅ v = 0 and M DPI (IC Q
) ⋅ v ≥ 0} involves the matrix M B (IC Q
)
13
This follows because any quantum states generated from the classical separating variables may be obtained by first producing random
variables from the latter (for which the usual d-separation rules hold) and then using these to generate the quantum states in question
(potentially after generating other variables in the network), hence retaining conditional independence.
14
Note that the map from a quantum state to the diagonal state with entries equal to the outcome probabilities of a measurement is a
CPTP map and hence also obeys the DPI.
15
In general E can be a map between operators on different Hilbert spaces, i.e., E ∶ S(H′XT ) → S(H′′ XT ). However, as we can consider
these operators to act on the same larger Hilbert space we can w.l.o.g. take E to be a map on this larger space, which we call S(HXT ).
16
There are also DPIs for conditional mutual information, e.g., I(A ∶ B∣C)ρ′ ≤ I(A ∶ B∣C)ρABC for ρ′ABC = (I ⊗ E ⊗ I)(ρABC ), but
ABC
these are implied by Proposition 13, so they need not be treated separately here.
17
This follows by deriving the DPIs for input and output states of E1 and E2 respectively and combining the two.
18
This follows from I(XS ∶ XT XR ) = I(XS ∶ XT ), I(XS XR ∶ XT ) = I(XS ∶ XT ) and I(XS XT ∶ XR ) = 0.
19
Alternatively, we can think of E as the concatenation of E ′ with a partial trace; this allows us to use the same output state ρ′ for both
maps in the argument below.
11
that features 29 (independent) inequalities20 . In this case a single independence constraint encodes that X is
independent of AY AZ :
MQCI (IC Q ) = ( 0 0 −1 0 0 −1 0 0 0 0 0 0 1 0 0 ) .
Two data processing inequalities are required (cf. Lemma 14), I(AZ X ∶ AY ) ≥ I(XZ ∶ AY ) and I(AY Z ∶ X) ≥
I(Y Z ∶ X), which yield a matrix
0 0 0 0 0 0 0 0 1 0 −1 0 −1 1 0
MDPI (IC Q ) = ( ).
0 0 0 0 0 0 0 1 0 0 0 −1 0 −1 1
The above matrices are all expressed in terms of coefficients of (H(AY ), H(AZ ), H(X), H(Y ), H(Z), H(AY AZ ),
H(AY X), H(AY Z), H(AZ X), H(XY ), H(XZ), H(Y Z), H(AY AZ X), H(AY XZ), H(XY Z)). Although the nota-
tion suppresses the different states there is no ambiguity because e.g. the entropy of X is the same for all states with
subsystem X. The full list of inequalities is provided in the appendix.
From Γ (C Q ), an outer approximation to the set of compatible entropy vectors Γ∗M (C Q ) of the ob-
served scenario M of C Q can be obtained using Fourier-Motzkin elimination. This leads to ΓM (C Q ) ∶=
{w ∈ R2≥0−1 ∣ ∃v ∈ Γ (C Q ) s.t.w = πM (v)}, which can be written as ΓM (C Q ) = {w ∈ ΓM ∣ MM (C Q ) ⋅ w ≥ 0}. The ma-
k
(except for the Shannon inequalities, which are already included in ΓM ). Note that ΓM (C C ) ⊆ ΓM (C Q ) ⊆ ΓM ,
where the first relation holds because all inequalities relevant for quantum states hold in the classical case as well21 .
Example 11 (Entropic outer approximation for the quantum instrumental scenario). The projection of Γ (IC Q )
leads to the entropic cone ΓM (IC Q ) = {w ∈ Γ3 ∣ MM (IC Q ) ⋅ w ≥ 0}, for which MM (IC Q ) equals MM (IC C ) from
Example 7, thus corresponding to the constraint I(X ∶ Y Z) ≤ H(Z). Hence, Γ∗M (IC Q ) coincides with Γ∗M (IC C ) [51].
This method has been applied to find an outer approximation to the entropy cone of the triangle causal structure
in the quantum case (cf. Figure 1(c)) [13]. This approximation did not coincide with the outer approximation to the
classical triangle scenario obtained from Shannon inequalities and independence constraints. Whether there are more
as yet unknown inequalities in the quantum case remains an open question (as opposed to the classical case where
even better outer approximations have already been found [51]). In [13], the method was furthermore combined with
the approach reviewed in Section III B, where it was applied to a scenario related to IC (cf. Example 17 below).
The concept of a generalised causal structure was introduced in [11], the idea being to have one framework in which
classical, quantum and even more general systems, for instance non-local boxes [66, 67], can be shared by unobserved
nodes and where theory independent features of networks and corresponding bounds on our observations may be
identified.
Definition 15. A generalised causal structure C G is a causal structure which for each observed node has an associated
random variable and for each unobserved node has a corresponding non-signalling resource allowed by a generalised
probabilistic theory.
Classical and quantum causal structures can be viewed as special cases of generalised causal structures [11, 68].
Generalised probabilistic theories may be conveniently described in the operational-probabilistic framework of [69].
Circuit elements correspond to so-called tests that are connected by wires, which represent propagating systems. In
general, such a test has an input system, and two outputs: an output system and an outcome. In the case of a system
20
Note that the only weak monotonicity relations that are not made redundant by other basic inequalities are H(AY ∣AZ X) + H(AY ) ≥ 0,
H(AZ ∣AY X) + H(AZ ) ≥ 0, H(AY ∣ AZ ) + H(AY ∣ X) ≥ 0 and H(AZ ∣ AY ) + H(AZ ∣ X) ≥ 0.
21
This can be seen by thinking of a classical source as made up of two or more (perfectly correlated) random variables as its subsystems,
which are sent to its children and processed there. The Shannon inequalities hold among all of these variables (and also imply any
weak monotonicity constraints). The classical independence relations include the quantum ones but may add constraints that involve
conditioning on any of the variables’ ancestors. These (in)equalities are tighter than the DPIs, which are hence not explicitly considered
in the classical case.
12
with trivial input this corresponds to a preparation test, and in case of trivial output this is an observation-test. In
the causal structure framework, a test is associated to each node. However, each such test has only one output: for
unobserved nodes this is a general resource state; for observed nodes it is a random variable. Furthermore, resource
states do not allow for signalling from the future to the past, i.e., we are considering so-called causal operational-
probabilistic theories. This is important for the interpretation of generalised causal structures.
A distribution P over the observed nodes of a generalised causal structure C G is compatible with C G if there exists
a causal operational-probabilistic theory, a resource for each unobserved edge in that theory and transformations for
each node that allow for the generation of P . We denote the set of all compatible distributions PM (C G ). As in
the quantum case, there is no notion of a joint state of all nodes in the causal structure and of conditioning on an
unobserved system. Even more, there is no consensus on the representation of states and their dynamics in general
non-signalling theories. To circumvent this, the classical notion of d-separation has been reformulated [11], which
enables the following proposition.
Proposition 16 (Henson, Lal & Pusey). Let C G be a generalised causal structure and let X, Y and Z be pairwise
disjoint subsets of observed nodes in C G . If a probability distribution P is compatible with C G , then the d-separation
of X and Y by Z implies the conditional independence X Y ∣Z. Conversely, if for every distribution P compatible
⊧
with C G the conditional independence X Y ∣Z holds, then X is d-separated from Y by Z in C G .
⊧
This allows for the derivation of conditional independence relations among observed variables that hold in any
generalised probabilistic theory, which hence restrict a general entropic cone. Furthermore, it rigorously justifies
retaining the independence constraints among the (observed) variables in coexisting sets in quantum causal structures
(cf. Section II B 2), which can be seen as special cases of generalised causal structures.
In [11], sufficient conditions for identifying causal structures C for which in the classical case, C C , there are
no restrictions on the distribution over observed variables other than those that follow from the d-separation of
these variables were derived. Since, by Proposition 16, these conditions also hold in C Q and C G , this implies
PM (C C ) = PM (C Q ) = PM (C G ). For causal structures with up to six nodes, there are 21 cases (and some that can
be reduced to the these 21) where such equivalence does not hold and where further relations among the observed
variables have to be taken into account [11, 18].
Outer approximations to the entropic cones for causal structures, C G , based on the observed variables and their
independences only were derived in [11]. Moreover, a few additional constraints for certain generalised causal structures
were derived there. For example, the entropic constraint I(X ∶ Y ) + I(X ∶ Z) ≤ H(X) for the triangle causal structure
of Figure 1(c) (which had previously been established in the classical case [70]) was found. This constraint does
not follow from the observed independences, but nonetheless holds for the triangle causal structure in generalised
probabilistic theories.
In spite of this, a systematic entropic procedure, in which the unobserved variables are explicitly modelled and then
eliminated from the description, is not available for generalised causal structures. The issue is that we are lacking a
generalisation of the Shannon and von Neumann entropy to generalised probabilistic theories that obeys submodularity
and for which the conditional entropy can be written as the difference of unconditional entropies [71, 72].
One possible generalised entropy is the measurement entropy, which is positive and obeys some of the submodularity
constraints (those with XS ∩ XT = {}) but not all [71, 72]. Using this, Ref. [73] considered the set of possible entropy
vectors for a bipartite state in box world, a generalised probabilistic theory that permits all bipartite correlations that
are non-signalling [74]. They found no further constraints on the set of possible entropy vectors in this setting (hence,
contrary to the quantum case, measurement entropy vectors of separable states in box world can violate monotonicity).
Other generalised probabilistic theories and multi-party states have, to our knowledge, not been similarly analysed.
The approaches to quantum and generalised causal structures above are based on adaptations of the theory of
Bayesian networks to the respective settings and on retaining the features that remain valid, for instance the relation
between d-separation and independence for observed variables [11] (cf. Section II B 3). Other approaches to generalise
classical networks to the quantum realm have been pursued [55], where a definition of conditional quantum states,
analogous to conditional probability distributions was formulated.
Recent articles have proposed generalisations of Reichenbach’s principle [75] to the quantum realm [16, 76, 77]. In
Ref. [16] a graph separation rule, q-separation, was introduced, whereas [76, 77] rely on a formulation of quantum
networks in terms of quantum channels and their Choi states.
An active area of research is the exploration of frameworks that allow for indefinite causal structures [78–80]. There
are several approaches achieving this, such as the process matrix formalism [81], which has lead to the derivation
of so called causal inequalities and the identification of signalling correlations that are achievable in this framework,
13
however not with any predefined causal structure [81, 82]. Another framework that is able to describe such scenarios
is the theory of quantum combs [83], illustrated by a quantum switch, a quantum bit controlling the circuit structure
in a quantum computation. A recent framework with the aim to model cryptographic protocols is also available [84].
Some initial results on the analysis of indefinite causal structures with entropy have recently appeared [85].
In the classical, quantum and generalised causal structures considered above only the observed classical information
can be transmitted via a link between two observed variables and, in particular, no additional unobserved system.
This understanding of the causal links encodes a Markov condition. In other situations, it can be convenient for the
links in the graph to represent a notion of future instead of direct causation, see e.g. [86, 87].
A technique that leads to additional, more fine-grained inequalities is based on post-selecting on the values of
parentless classical variables. This technique was pioneered by Braunstein and Caves [88] and has been used to
systematically derive numerous entropic inequalities [6–8, 17, 18].
In the following we denote a random variable X post-selected on the event of another random variable, Y , taking
a particular value, Y = y, as X∣Y =y . The same notation is used for a set of random variables S = {X1 , X2 , . . . , Xn },
whose joint distribution is conditioned on Y = y, S∣Y=y = {X1∣Y=y , X2∣Y=y , . . . , Xn∣Y=y }. The following lemma can
be understood as a generalisation of (a part of) Fine’s theorem [89, 90].
Lemma 17. Let C C be a classical causal structure with a parentless observed node X that takes values X = 1, 2, . . . , n
and let P be a joint distribution over all random variables Ω = X ∪ X ↑ ∪ X ↑̸ in C C (with P compatible with C C ). Then
there exists a joint distribution Q over the n ⋅ ∣X ↑ ∣ + ∣X ↑̸ ∣ random variables Ω∣X ∶= X∣X=1
↑ ↑
∪ X∣X=2 ↑
∪ ⋯ ∪ X∣X=n ∪ X ↑̸ such
↑
that Q(X∣X=x X ↑̸ ) = P (X ↑ X ↑̸ ∣ X = x) for all x ∈ {1, . . . , n}.
Proof. The joint distribution over the random variables X ↑ ∪ X ↑̸ in C C can be written as
P (X ↑ X ↑̸ ) = ∑nx=1 P (X ↑ ∣ X ↑̸ X = x)P (X ↑̸ )P (X = x). Now take Q(X∣X=1
↑ ↑
⋯ X∣X=n X ↑̸ ) = ∏nx=1 P (X ↑ ∣ X ↑̸ X = x)P (X ↑̸ ).
↑
As required, this distribution has marginals Q(X∣X=x X ↑̸ ) = P (X ↑ ∣ X ↑̸ X = x)P (X ↑̸).
C
It is perhaps easiest to think about this lemma in terms of a new causal structure CX on Ω∣X that is related to the
original. Roughly speaking the new causal structure is formed by removing X and replacing the descendants of X
with several copies each of which have the same causal relations as in the original causal structure (with no mixing
between copies). More precisely, if X is a parentless node in C C we can form a post-selected causal structure on Ω∣X
(post-selecting on X) as follows: (1) For each pair of nodes A, B ∈ X ↑̸ in C C , make A a parent of B in CX C
iff A is
a parent of B in C . (2) For each node B ∈ X in C and for each node A∣X=x , make B a parent of A∣X=x in CX
C ↑̸ C C
C C
iff B is a parent of A in C . (3) For each pair of nodes, A∣X=x and B∣X=x , make B∣X=x a parent of A∣X=x in CX iff
B is a parent of A in C C . (Note that there is no mixing between different values of X = x.) See Figures 3 and 5
and Example 12 for illustrations. This view gives us the following corollary of Lemma 17, which is an alternative
generalisation of Fine’s theorem
Lemma 18. Let C C be a classical causal structure with a parentless observed node X that takes values X = 1, 2, . . . , n
and let P be a joint distribution over all random variables X ∪ X ↑ ∪ X ↑̸ in C C (with P compatible with C C ). Then
there exists a joint distribution Q compatible with the post-selected causal structure CX C ↑
such that Q(X∣X=x X ↑̸ ) =
P (X ↑ X ↑̸ ∣ X = x) for all x ∈ {1, . . . , n}.
↑
The distributions that are of interest in this new causal structure are the marginals Q(X∣X=x X ↑̸ ) for all x (and their
interrelations), as they correspond to distributions in the original scenario. Any constraints on these distributions
derived in the post-selected scenario are by construction valid for the (post-selected) distributions compatible with
the original causal structure.
Example 12 (Post-selection in the instrumental scenario). Consider the causal structure IC where the parentless
variable X takes values 0 or 1. For any P compatible with IC C , there exists a distribution Q compatible with the
post-selected causal structure (Figure 3(a)) such that Q(Z∣X=0 Y∣X=0 A) = P (ZY ∣ AX = 0)P (A) and Q(Z∣X=1 Y∣X=1 A) =
P (ZY ∣ AX = 1)P (A). These marginals and their relations are of interest for the original scenario.
14
(a) (b)
Z∣X=0 Y∣X=0 X∣A=0 Y∣B=0
A C
Z∣X=1 Y∣X=1 X∣A=1 Y∣B=1
Figure 3: (a) Pearl’s instrumental scenario post-selected on binary X. The causal structure is obtained from the IC by removing
X and replacing Y and Z with copies, each of which has the same causal relations as in the original causal structure. (b)
post-selected Bell scenario with binary inputs A and B.
Note that the above reasoning may be applied recursively. Indeed, the causal structure with variables Ω∣X may be
post-selected on the values of one of its parentless nodes. The joint distributions of the nodes Ω∣X and the associated
causal structure may be analysed in terms of entropies, as illustrated with the following example.
Example 13 (Entropic constraints for the post-selected Bell scenario [88]). In the Bell scenario with bi-
nary inputs A and B (Figure 1(b)), Lemma 18 may be applied first to post-select on the values of A and
then of B. This leads to a distribution Q compatible with the post-selected causal structure (on A and
B) shown in Figure 3(b), for which Q(X∣A=a Y∣B=b ) = P (XY ∣ A = a, B = b) for a, b ∈ {0, 1}22 . Ap-
plying the entropy vector method to the post-selected causal structure and marginalising to vectors of form
(H(X∣A=0 ), H(X∣A=1 ), H(Y∣B=0 ), H(Y∣B=1 ), H(X∣A=0 Y∣B=0 ), H(X∣A=0 Y∣B=1 ), H(X∣A=1 Y∣B=0 ), H(X∣A=1 Y∣B=1 )) yields the
inequality H(Y1 ∣X1 ) + H(X1 ∣Y0 ) + H(X0 ∣Y1 ) − H(X0 ∣Y0 ) ≥ 0 and its permutations [6, 88].23
The extension of Fine’s theorem to more general Bell scenarios [91, 92], i.e., to scenarios involving a number of
spacelike separated parties that each choose input values and produce some output random variable (and scenarios
that can be reduced to the latter), has been combined with the entropy vector method in [6, 8].
Entropic constraints that are derived in this way provide novel and non-trivial entropic inequalities for the distribu-
tions compatible with the original classical causal structure. Ref. [8] introduced this idea and analysed the so-called
n-cycle scenario, which is of particular interest in the context of non-contextuality and includes the Bell scenario (with
binary inputs and outputs) as a special case24 .
In Ref. [6], new entropic inequalities for the bilocality scenario, which is relevant for entanglement swapping [94, 95],
as well as quantum violations of the classical constraints on the 4- and 5-cycle scenarios were derived. For the n-
cycle scenario, the (polynomial number of) entropic inequalities are sufficient for the detection of any non-local
distribution [7]25 . In the following we illustrate the method of [6, 8] with a continuation of Example 12.
Example 14 (Entropic approximation for the post-selected instrumental scenario). The entropy vector method
from Section II is applied to the 5-variable causal structure of Figure 3(a). The marginalisation is performed to
retain all marginals that correspond to distributions in the original causal structure (Figure 1(a)), i.e., any marginals
of P (Y Z ∣ X = 0) and P (Y Z ∣ X = 1). Hence, the 5 variable entropic cone is projected to a cone that restricts
vectors of the form (H(Y∣X=0 ), H(Y∣X=1 ), H(Z∣X=0 ), H(Z∣X=1 ), H(Y∣X=0 Z∣X=0 ), H(Y∣X=1 Z∣X=1 )). Note that entropies
of unobserved marginals such as H(Y∣X=0 Z∣X=1 ) are not included. With this technique, the Shannon constraints for
the three components (H(Y∣X=0 ), H(Z∣X=0 ), H(Y∣X=0 Z∣X=0 )) are recovered (the same holds for X = 1); no additional
constraints arise here.
It is interesting to compare this to the Bell scenario considered in Example 13. In both causal structures any
4-variable distributions, PZ∣X=0 Z∣X=1 Y∣X=0 Y∣X=1 and PX∣A=0 X∣A=1 Y∣B=0 Y∣B=1 respectively, are achievable26 . However, the
marginal entropy vector in the Bell scenario has more components, leading to additional constraints on the observed
variables [6, 88].
In some cases two different causal structures, C1 and C2 , can yield the same set of distributions after marginalising, a
fact that has been further explored in [96]. When this occurs, either causal structure can be imposed when identifying
the set of achievable marginal distributions in either scenario. If the constraints implied by the causal structure C1
22
In this case the joint distribution is already known to exist by Fine’s theorem [89, 90].
23
Whenever the input nodes take more than two values, the latter may be partitioned into two sets, guaranteeing applicability of these
inequalities. Furthermore, [7] showed that these inequalities are sufficient for detecting any behaviour that is not classically reproducible
in the Bell scenario where the two parties perform measurements with binary outputs.
24
A full probabilistic characterisation of the n-cycle scenario was given in [93].
25 This is also true of the exponential number of inequalities in the probabilistic case [93].
26
The additional causal links in Figure 3(b) do not affect the set of compatible distributions.
15
(d) A (e) A
Y∣S=0
X1
Y∣S=1 Y∣S=1
X Z Y∣S=2 X2 Z Y∣S=2
Figure 4: Variations of the instrumental scenario (a), (b) and (c). The causal structure (c) is relevant for the derivation of
the information causality inequality where S takes n possible values. (d) and (e) are the causal structures that are effectively
analysed when post-selecting on a ternary S in (a) and on a binary S in (c) respectively.
are a subset of those implied by C2 , then those of C2 can be used to compute improved outer approximations on the
entropic cone for C1 . Furthermore, valid independence constraints may speed up computations even if they do not lead
to any new relations for the observed variables27 . Similar considerations also yield a criterion for indistinguishability
of causal structures in certain marginal scenarios — if C1 and C2 yield the same set of distributions after marginalising
then they cannot be distinguished in that marginal scenario.
In examples like the above, where no new constraints follow from post-selection, it may be possible to introduce
additional input variables in order to certify the presence of quantum nodes in a network. The new parentless nodes
can then be used to apply Lemma 17 and the above entropic techniques. Mathematically, introducing further nodes to
a causal structure is always possible. However, this is only interesting if experimentally feasible, e.g. if an experimenter
has control over certain observed nodes and is able to devise an experiment where he can change their inputs. In the
instrumental scenario this may be of interest.
Example 15 (Variations of the instrumental scenario). In this scenario (Figure 1(a)), a measurement on system AZ
is performed depending on X (where in the classical case AZ can w.l.o.g. be taken to be a copy of the unobserved
random variable A). Its outcome Z (in the classical case a function of A) is used to choose another measurement
to be performed on AY to generate Y (classically another a copy of A). It may often be straightforward for an
experimenter to choose between several measurements. In the causal structure this corresponds to introducing an
additional observed input S to the second measurement (with the values of S corresponding to different measurements
on AY ). Such an adaptation is displayed in Figure 4(a).28
Alternatively, it may be possible that the first measurement (on AZ ) is chosen depending on a combination of
different, independent factors, which each correspond to a random variable Xi . For two variables X1 and X2 the
corresponding causal structure is displayed in Figure 4(b)29 .
Taken together, these two adaptations yield the causal structure of Figure 4(c), relevant in the context of the
principle of information causality [97] (see also Example 17 below).
A second approach that relies on very similar ideas (also justified by Lemma 17) is taken in [18]. For a causal
structure C C with nodes Ω = X ∪ X ↑ ∪ X ↑̸, where X is a parentless node, conditioning the joint distribution over
all nodes on a particular X = x retains the independences of C C . In particular, the conditioning does not affect the
distribution of the X ↑̸ , i.e., P (X ↑̸ ∣ X = x) = P (X ↑̸) for all x. The corresponding entropic constraints can be used to
derive entropic inequalities without the detour over computing large entropic cones, which may be useful where the
latter computations are infeasible. The constraints that are used in [18] are, however, a (diligently but somewhat
arbitrarily chosen) subset of the constraints that would go into the entropic technique detailed earlier in this section
for the full causal structure. Indeed, when the computations are feasible, applying the full entropy vector method
27
Note that some care has to be taken when identifying valid constraints for scenarios with causal structure [96].
28
Note that for ternary S the outer approximation of the post-selected causal structure of Figure 4(d) with Shannon inequalities does not
lead to any interesting constraints (as opposed to the structure of Figure 4(e), which is analysed further in Example 17).
29 This is an example of a causal structure where non-Shannon inequalities among classical variables lead to a strictly tighter outer
approximation in the classical and quantum case than the approximations derived using only Shannon and weak-monotonicity constraints
(also if there is a causal link from X1 to X2 ) [51].
16
(a) (b) C
(c) C
(d) (e)
X0 Y0
X C Y X Y X Y X0 X1 Y0 Y1
X1 Y1
B A B A B A B A B A
Z Z Z Z Z
Figure 5: Causal structures from Example 16. Post-selecting on a binary observed variable C leads to the causal structure
(d) in the case of structure (a), whereas both (b) and (c) lead to structure (e). In particular, this shows that the conditional
techniques may yield the same results for different causal structures.
to the corresponding post-selected causal structure gives a systematic way to derive constraints, which are in general
strictly tighter (cf. Example 16).
So far, the restricted technique has been used in [18] to derive the entropic inequality
which is valid for all the classical causal structures of Figure 5 (previously considered in [11]). The inequality was
used to certify the existence of classical distributions that respect the conditional independence constraints among
the observed variables but that are not achievable in the respective causal structures30 . In the following we look at
these three causal structures in more detail and illustrate the relation between the two techniques.
Example 16. Applying the post-selection technique for a binary random variable C to the causal structure from
Figure 5(a) yields the effective causal structure 5(d). The latter can be analysed with the above entropy vector
method, which leads to a cone that is characterised by 14 extremal rays or equivalently in terms of 22 inequalities,
both available in the appendix. The inequalities I(Z ∶ X∣C=1 ) ≥ 0, I(Z ∶ Y∣C=0 ) ≥ 0, I(X∣C=1 ∶ Y∣C=1 ∣ Z) ≥ 0 and
H(Z ∣ X∣C=0 ) ≥ I(X∣C=1 Z ∶ Y∣C=1 ), which are part of this description, imply (2) above. We are not aware of any
quantum violations of these inequalities.
Structures (b) and (c) both lead to the causal structure (e) upon post-selecting on a binary C. The latter causal
structure turns out to be computationally harder to analyse with the entropy vector method and we have not been able
to perform the corresponding marginalisation when taking all Shannon and independence constraints into account31 .
Hence, the method outlined in [18] is a useful alternative here.
In causal structures with quantum and more general non-signalling nodes, Lemma 17 is not valid. For instance,
Bell’s theorem can be recast as the statement that there are distributions compatible with the quantum Bell scenario
for which there is no joint distribution of X∣A=0 , X∣A=1 , Y∣B=0 and Y∣B=1 in the post-selected causal structure (on A
and B) that has the required marginals (in the sense of Lemma 18).
Nonetheless, the post-selection technique has been generalised to such scenarios [13, 17], i.e., it is still possible
to post-select on parentless observed (and therefore classical) nodes taking specific values. In such scenarios the
observed variables can be thought of as obtained from the unobserved resources by means of measurements or tests.
If a descendant of the variable that is post-selected on has quantum or general non-signalling nodes as parents, then the
different instances of the latter node and of all its descendants do not coexist (even if they are observed, hence classical).
This is because such observed variables are generated by measuring a quantum or other non-signalling system. Such
a system is altered (or destroyed) in a measurement, hence does not allow for the simultaneous generation of different
instances of its children due to the impossibility of cloning.
In the quantum case, this is reflected in the identification of the coexisting sets in the post-selected causal structure32 ,
as is illustrated with the following example.
30
These causal structures may thus also allow for quantum correlations that are not classically achievable.
31 We were working with conventional variable elimination software on a desktop computer
32
Different instances of a variable after post-selection have to be seen as alternatives and not as simultaneous descendants of their parent
node as the representation of the post-selected causal structure might suggest.
17
Example 17 (Information causality scenario in the quantum case [13]). The communication scenario used to derive
the principle of information causality [97] is based on the variation of the instrumental scenario displayed in Figure 4(c).
It has been analysed with the entropy vector method in Ref. [13], an analysis that is presented in the following.
Conditioning on values of the variable S is possible in the classical and quantum cases. However, whereas in the
classical case the variables Y∣S=s for different S share a joint distribution (cf. Lemma 17), they do not coexist in the
quantum case. For binary S, the coexisting sets are {X1 , X2 , AZ , AY }, {X1 , X2 , Z, AY }, {X1 , X2 , Z, Y∣S=1 } and
{X1 , X2 , Z, Y∣S=2 }. The only independence constraints in the quantum case are that X1 , X2 and AY AZ are mutually
independent. Marginalising until only entropies of {X1 , Y∣S=1 }, {X2 , Y∣S=2 }, {Z} and their subsets remain, yields only
one non-trivial inequality, ∑ns=1 I(Xs ∶ Y∣S=s ) ≤ H(Z), with n = 2. 33 The same inequality was previously derived by
Pawlowski et al. for general n [97], where the choice of marginals was inspired by the communication task considered.
Subsequently, Ref. [13] considered another marginal scenario — the one with with coexisting sets {X1 , X2 , Z, Y∣S=1 },
{X1 , X2 , Z, Y∣S=2 } and all of their subsets — which led to additional inequalities.
Similar considerations were applied to causal structures allowing for general non-signalling resources, C G in [17].
↑ ↑
Let O = XO ↑̸
∪ XO ∪ X be the disjoint union of its observed nodes, where XO are the observed descendants and XO ↑̸
the
observed non-descendants of X. If the variable X takes values x ∈ {1, 2, . . . , n}, this leads to a joint distribution of
XO↑ ↑̸
∪ XO for each X = x, i.e., there is a joint distribution for P (XO
↑ ↑̸
XO ∣ X = x) = P (XO
↑
∣ XO
↑̸
X = x)P (XO↑̸
) for all x,
denoted P (XO∣X=x XO ). Because X does not affect the distribution of the independent variables XO , the distributions
↑ ↑̸ ↑̸
P (XO∣X=x
↑ ↑̸
XO ) have coinciding marginals on XO
↑̸
, i.e., P (XO
↑̸
) = ∑s P (XO∣X=x
↑
= s, XO
↑̸
) for all x, where s runs over the
↑
alphabet of XO . This encodes no-signalling constraints. 34
In terms of entropy, there are n entropic cones, one for each P (XO∣X=x
↑ ↑̸
XO ) (which each encode the independences
↑̸
among the observed variables). According to the above, they are required to coincide on the entropies for XO and
on all of its subsets. These constraints define a convex polyhedral cone that is an outer approximation to the set of
all entropy vectors achievable in the causal structure. Whenever the distributions P (XO∣X=x
↑
XO↑̸
) involve fewer than
three variables and assuming that all constraints implied by the causal structure and no-signalling have been taken
into account35 , this approximation is tight because Γ∗3 = Γ3 .
Several examples of the use of this technique can be found in Ref. [17], including the original information causality
scenario (which we discuss in Example 18) and an entropic analogue of monogamy relations for Bell inequality
violations [98, 99].
Example 18 (Information causality scenario in general non-signalling theories). This is related to Example 17 above
and reproduces an analysis from [17]. In this marginal scenario we consider the Shannon cones for the three sets
{X1 , Y∣S=1 }, {X2 , Y∣S=2 } and {Z} as well as the constraints I(X1 ∶ Y∣S=1 ) ≤ H(Z) and I(X2 ∶ Y∣S=2 ) ≤ H(Z) which are
conjectured to hold [17]. (This conjecture is based on an argument in [100] that covers a special case; we are not
aware of a general proof.)
These conditions constrain a polyhedral cone of vectors (H(X1 ), H(X2 ), H(Z), H(Y∣S=1 ),
H(Y∣S=2 ), H(X1 Y∣S=1 ), H(X2 Y∣S=2 )), with 8 extremal rays that are all achievable using PR-boxes [66, 67].
Importantly, the stronger constraint I(X1 ∶ Y∣S=1 ) + I(X2 ∶ Y∣S=2 ) ≤ H(Z), which holds in the quantum case, (cf.
Example 17) does not hold here.
Instead of relaxing the problem of characterising the set of probability distributions compatible with a causal
structure by considering entropy vectors, other computational techniques are currently being developed. In the
following, we give a brief overview of these methods.
In this context, note also that there are methods that allow certification that the only restrictions implied by a
causal structure are the conditional independence constraints among the observed variables [11], as well as procedures
33
Note that this is slightly adapted from [13] where they found I(X1 ∶ Y∣S=1 ) + I(X2 ∶ Y∣S=2 ) ≤ H(Z) + I(X1 ∶ X2 ), as X1 and X2 were
not assumed independent there. Furthermore, this is also the only inequality found in the classical case when restricting to this same
marginal scenario [17].
34
Note that there may be other constraints that arise from no-signalling. For instance Example 18 suggests further constraints for each
↑ ↑̸
P (XO∣X=x XO ) are implied by requiring non-signalling resources. The latter have to be found and added to the description separately.
35
Note that it may not always be obvious how to identify all relevant constraints (cf. the conjectured constraints in Example 18).
18
to show that the opposite is the case [101, 102]. Such methods may (when applicable) indicate whether a causal
structure should be analysed further (corresponding techniques are reviewed in [18]).
Entropy vectors may be computed in terms of other entropy measures, for instance in terms of the α-Rényi en-
tropies [103]. For a quantum state ρX , the α-Rényi entropy is Hα (X) ∶= 1−α
1
X , for α ∈ (0, ∞) ∖ {1}, the cases
log tr ρα
α = 0, 1, ∞ are defined via the relevant limits (note that H1 (X) = H(X)). 36
One may expect that useful constraints on the compatible distributions can be derived from such entropy vectors.
For 0 < α < 1 and α > 1 such constraints were analysed in [104]. In the classical case positivity and monotonicity
are the only linear constraints on the corresponding entropy vectors for any α ≠ 0, 1. For multi-party quantum states
monotonicity does not hold for any α, like in the case of the von Neumann entropy. For 0 < α < 1, there are no
constraints on the allowed entropy vectors except for positivity, whereas for α > 1 there are constraints, but these
are non-linear. The lack of further linear inequalities that generally hold limits the usefulness of entropy vectors
using α-Rényi entropies for analysing causal structures. To our knowledge it is not known how or whether non-linear
inequalities for Rényi entropies may be employed for this task. The case α = 0, where H0 (X) = log rank ρX ,, has been
considered separately in [73], where it was shown that further linear inequalities hold. However, only bi-partitions of
the parties were considered and the generalisation to full entropy vectors is still to be explored.
The above considerations do not mention conditional entropy and hence could be taken with the definition
Hα (X ∣ Y ) ∶= Hα (XY ) − Hα (Y ). Alternatively, one may consider a definition of the Rényi conditional entropy,
for which Hα (X∣Y Z) ≤ Hα (X∣Y ) [105–109]. With the latter definition, the conditional Rényi entropy cannot be
expressed as a difference of unconditional entropies, and so to use entropy vectors we would need to consider the
conditional entropies as separate components. Along these lines, one may also think about combining Rényi entropies
for different values of α and to use appropriate chain rules [110]. Because of the large increase in the number of
variables compared to the number of constraints it is not clear that this will yield useful new conditions.
The probabilistic characterisation of causal structures, depends (in general) on the dimensionality of the observed
variables. Computational hardness results suggest that a full characterisation is unlikely to be feasible, except in
small cases [111, 112]. Recent progress has been made with the development of procedures to construct polynomial
Bell inequalities. A method that resorts to linear programming techniques [15] has lead to the derivation of new
inequalities for the bilocality scenario (as well as a related four-party scenario). Another, iterative procedure allows
for enlarging networks by adding a party to a network in a particular way37 . This allows for the constructions of
non-linear inequalities for the latter, enlarged network from inequalities that are valid for the former [14].
Furthermore, a recent approach relies on considering enlarged networks, so called inflations, and inferring causal
constraints from those [19, 113]. Inflated networks may contain several copies of a variable that each have the same
dependencies on ancestors (the latter may also exist in several instances) and which share the same distributions with
their originals. Such inflations allow for the derivation of probabilistic inequalities that restrict the set of compatible
distributions. These ideas also bear some resemblance to the procedures in [20], in the sense that they employ the
idea that certain marginal distributions may be obtained from different networks; they are, however, much more
focused on causal structures featuring interesting independence constraints. Inflations allowed the authors of [19] to
refute certain distributions as incompatible with the triangle causal structure from Figure 1(c), in particular the so
called W-distribution which could neither be proven to be incompatible entropically nor with the covariance matrix
approach below.
36 Classical α-Rényi entropies are included in this definition when considering diagonal states.
37
Here, adding a party means adding one observed input and one observed output node as well as an unobserved parent for the output,
the latter may causally influence one other output random variable in the network.
19
One may look for mappings of the distribution of a set of observed variables that encode causal structure beyond
considering entropies. For causal structures with two generations, i.e., one generation of unobserved variables as
ancestors of one generation of observed nodes, a technique has been found using covariance matrices [20]. Each
observed variable is mapped to a vector-valued random variable and the covariance matrix of the direct sum of these
variables is considered. Due to the law of total expectation, this matrix allows for a certain decomposition depending
on the causal structure. For a particular observed distribution and its covariance matrix, the existence of such a
decomposition may be tested via semidefinite programming. The relation of this technique to the entropy vector
method is not yet well understood. A partial analysis considering several examples is given in Section X of [20].
V. OPEN PROBLEMS
The entropy vector approach has led to many certificates for the incompatibility of correlations with causal struc-
tures. However, we are still lacking a general understanding of how well entropic relations can approximate the set of
achievable correlations. Firstly, the non-injective mapping from probabilities to entropies is not sufficiently understood
and secondly, the current methods employ further approximations, e.g. by restricting the number of non-Shannon
inequalities that can be considered at a time. It is as yet unknown whether the entropy vector method (without
post-selection) can ever distinguish correlations that arise from classical, quantum and more general non-signalling
resources. Such insights may also inform the question of whether there exist novel inequalities for the von Neumann
entropy of multi-party quantum states.
The post-selection technique allows for the derivation of additional constraints that may distinguish quantum from
classically achievable correlations in the Bell scenario and possibly in other examples. However, the method relies
on the causal structure featuring parentless observed nodes, hence it is not always applicable (see e.g. the triangle
scenario). In such situations, one may try to combine the entropic techniques reviewed here with the inflation
method [19], which might allow for further entropic analysis of several causal structures, e.g., of the triangle scenario.
Criteria to certify whether a set of entropic constraints is able to detect non-classical correlations are currently not
available. For many of the established entropic constraints on classical causal structures it is unknown whether or not
they are also valid for the corresponding quantum structure. In the case of the Bell scenario this problem has been
overcome. It has been shown that the known entropic constraints are even sufficient for detecting any non-classical
correlations [7]. However, since the proof is specific to the scenario, finding a systematic tool to analyse the scope of
the entropic techniques remains open.
Acknowledgments
We thank Rafael Chaves and Costantino Budroni for confirming details of [17]. RC is supported by the EPSRC’s
Quantum Communications Hub (grant no. EP/M013472/1) and by an EPSRC First Grant (grant no. EP/P016588/1).
[1] Pearl, J. Causality: Models, Reasoning, and Inference (Cambridge Univ. Press, 2009), 2nd edn.
[2] Spirtes, P., Glymour, C. N. & Scheines, R. Causation, Prediction, and Search (MIT Press, 2000).
[3] Geiger, D. & Meek, C. Quantifier elimination for statistical problems. In Proceedings of the Fifteenth conference on
Uncertainty in artificial intelligence, 226–235 (Morgan Kaufmann Publishers Inc., 1999).
[4] Garcia, L. D., Stillman, M. & Sturmfels, B. Algebraic geometry of Bayesian networks. Journal of Symbolic Computation
39, 331–355 (2005).
[5] Lee, C. M. & Spekkens, R. W. Causal inference via algebraic geometry: Feasibility tests for functional causal structures
with two binary observed variables. J. Causal Inference 5 (2017).
[6] Chaves, R. & Fritz, T. Entropic approach to local realism and noncontextuality. Physical Review A 85, 032113 (2012).
[7] Chaves, R. Entropic inequalities as a necessary and sufficient condition to noncontextuality and locality. Physical Review
A 87, 022102 (2013).
[8] Fritz, T. & Chaves, R. Entropic inequalities and marginal problems. IEEE Transactions on Information Theory 59,
803–817 (2013).
[9] Chaves, R., Luft, L. & Gross, D. Causal structures from entropic information: geometry and novel scenarios. New Journal
of Physics 16, 043001 (2014).
20
[10] Chaves, R. et al. Inferring latent structures via information inequalities. In Proceedings of the 30th Conference on
Uncertainty in Artificial Intelligence, 112–121 (AUAI Press, Corvallis, Oregon, 2014).
[11] Henson, J., Lal, R. & Pusey, M. F. Theory-independent limits on correlations from generalized Bayesian networks. New
Journal of Physics 16, 113043 (2014).
[12] Steudel, B. & Ay, N. Information-theoretic inference of common ancestors. Entropy 17, 2304–2327 (2015).
[13] Chaves, R., Majenz, C. & Gross, D. Information-theoretic implications of quantum causal structures. Nature communi-
cations 6, 5766 (2015).
[14] Rosset, D. et al. Nonlinear Bell inequalities tailored for quantum networks. Physical Review Letters 116, 010403 (2016).
[15] Chaves, R. Polynomial Bell inequalities. Physical Review Letters 116, 010402 (2016).
[16] Pienaar, J. & Brukner, Č. A graph-separation theorem for quantum causal models. New Journal of Physics 17, 073020
(2015).
[17] Chaves, R. & Budroni, C. Entropic nonsignaling correlations. Phys. Rev. Lett. 116, 240501 (2016).
[18] Pienaar, J. Which causal structures might support a quantum–classical gap? New Journal of Physics 19, 043021 (2017).
[19] Wolfe, E., Spekkens, R. W. & Fritz, T. The inflation technique for causal inference with latent variables. e-print
arXiv:1609.00672 (2016).
[20] Kela, A., von Prillwitz, K., Aberg, J., Chaves, R. & Gross, D. Semidefinite tests for latent causal structures. e-print
arxiv.org/abs/1701.00652 (2017).
[21] Ekert, A. K. Quantum cryptography based on Bell’s theorem. Physical Review Letters 67, 661–663 (1991).
[22] Mayers, D. & Yao, A. Quantum cryptography with imperfect apparatus. In Proceedings of the 39th Annual Symposium
on Foundations of Computer Science (FOCS-98), 503–509 (IEEE Computer Society, Los Alamitos, CA, USA, 1998).
[23] Barrett, J., Hardy, L. & Kent, A. No signaling and quantum key distribution. Physical Review Letters 95, 010503 (2005).
[24] Acı́n, A., Gisin, N. & Masanes, L. From Bell’s theorem to secure quantum key distribution. Physical Review Letters 97,
120405 (2006).
[25] Colbeck, R. Quantum and Relativistic Protocols For Secure Multi-Party Computation. Ph.D. thesis, University of Cam-
bridge (2007). Also available as arXiv:0911.3814.
[26] Colbeck, R. & Kent, A. Private randomness expansion with untrusted devices. Journal of Physics A: Mathematical and
Theoretical 44, 095305 (2011).
[27] Pironio, S. et al. Random numbers certified by Bell’s theorem. Nature 464, 1021–1024 (2010).
[28] Vazirani, U. & Vidick, T. Fully device-independent quantum key distribution. Physical Review Letters 113, 140501
(2014).
[29] Miller, C. A. & Shi, Y. Universal security for randomness expansion from the spot-checking protocol. SIAM Journal on
Computing 46, 1304–1335 (2017).
[30] Bell, J. S. On the Einstein-Podolsky-Rosen paradox. Physics 1, 195–200 (1964).
[31] Wood, C. J. & Spekkens, R. W. The lesson of causal discovery algorithms for quantum correlations: causal explanations
of Bell-inequality violations require fine-tuning. New Journal of Physics 17, 033002 (2015).
[32] Shannon, C. E. A mathematical theory of communication. Bell System Technical Journal 27, 379–423 (1948).
[33] Yeung, R. A framework for linear information inequalities. IEEE Transactions on Information Theory 43, 1924–1934
(1997).
[34] Pippenger, N. The inequalities of quantum information theory. IEEE Transactions on Information Theory 49, 773–789
(2003).
[35] Chan, T. H. Balanced information inequalities. IEEE Trans. Inf. Theor. 49, 3261–3267 (2003).
[36] Zhang, Z. & Yeung, R. W. A non-Shannon-type conditional inequality of information quantities. IEEE Transactions on
Information Theory 43, 1982–1986 (1997).
[37] Han, T.-S. A uniqueness of Shannon’s information distance and related non-negativity problems. Journal of Combina-
torics, Information & System Sciences 6, 320–321 (1981).
[38] Zhen Zhang & Yeung, R. On characterization of entropy function via information inequalities. IEEE Transactions on
Information Theory 44, 1440–1452 (1998).
[39] Matus, F. Infinitely many information inequalities. In 2007 IEEE International Symposium on Information Theory,
41–44 (IEEE, 2007).
[40] Makarychev, K., Makarychev, Y., Romashchenko, A. & Vereshchagin, N. A new class of non-Shannon-type inequalities
for entropies. Communications in Information and Systems 2, 147–166 (2002).
[41] Dougherty, R., Freiling, C. & Zeger, K. Six new non-Shannon information inequalities. In 2006 IEEE International
Symposium on Information Theory, 233–236 (IEEE, 2006).
[42] Weidong Xu, Jia Wang & Jun Sun. A projection method for derivation of non-Shannon-type information inequalities. In
2008 IEEE International Symposium on Information Theory, 2116–2120 (2008).
[43] Dougherty, R., Freiling, C. & Zeger, K. Non-Shannon information inequalities in four random variables. e-print
arXiv:1104.3602 (2011).
[44] Kaced, T. Equivalence of two proof techniques for non-Shannon-type inequalities. In 2013 IEEE International Symposium
on Information Theory, 236–240 (IEEE, 2013).
[45] Zhang, Z. On a new non-Shannon type information inequality. Communications in Information and Systems 3, 47–60
(2003).
[46] Ingleton, A. W. Representation of matroids. In Welsh, D. J. A. (ed.) Combinatorial Mathematics and its applications,
149–167 (Academic Press, 1971).
[47] Hammer, D., Romashchenko, A., Shen, A. & Vereshchagin, N. Inequalities for Shannon entropy and Kolmogorov com-
21
[84] Portmann, C., Matt, C., Maurer, U., Renner, R. & Tackmann, B. Causal boxes: quantum information-processing systems
closed under composition. IEEE Transactions on Information Theory 63, 3277–3305 (2017).
[85] Miklin, N., Abbott, A. A., Branciard, C., Chaves, R. & Budroni, C. The entropic approach to causal correlations. e-print
arXiv:1706.10270 (2017).
[86] Colbeck, R. & Renner, R. A system’s wave function is uniquely determined by its underlying physical state. New Journal
of Physics 19, 013016 (2017).
[87] Colbeck, R. & Renner, R. The completeness of quantum theory for predicting measurement outcomes. In Chiribella, G.
& Spekkens, R. W. (eds.) Quantum Theory: Informational Foundations and Foils, 497–528 (Springer, 2016).
[88] Braunstein, S. L. & Caves, C. M. Information-theoretic Bell inequalities. Physical Review Letters 61, 662–665 (1988).
[89] Fine, A. Joint distributions, quantum correlations, and commuting observables. Journal of Mathematical Physics 23,
1306 (1982).
[90] Fine, A. Hidden variables, joint probability, and the Bell inequalities. Physical Review Letters 48, 291–295 (1982).
[91] Liang, Y.-C., Spekkens, R. W. & Wiseman, H. M. Specker’s parable of the overprotective seer: A road to contextuality,
nonlocality and complementarity. Physics Reports 506, 1 – 39 (2011).
[92] Abramsky, S. & Brandenburger, A. The sheaf-theoretic structure of non-locality and contextuality. New Journal of
Physics 13, 113036 (2011).
[93] Araújo, M., Quintino, M. T., Budroni, C., Cunha, M. T. & Cabello, A. All noncontextuality inequalities for the n-cycle
scenario. Phys. Rev. A 88, 022118 (2013).
[94] Branciard, C., Gisin, N. & Pironio, S. Characterizing the nonlocal correlations created via entanglement swapping.
Physical Review Letters 104, 170401 (2010).
[95] Branciard, C., Rosset, D., Gisin, N. & Pironio, S. Bilocal versus nonbilocal correlations in entanglement-swapping
experiments. Physical Review A 85, 032119 (2012).
[96] Budroni, C., Miklin, N. & Chaves, R. Indistinguishability of causal relations from limited marginals. Phys. Rev. A 94,
042127 (2016).
[97] Pawlowski, M. et al. Information causality as a physical principle. Nature 461, 1101–1104 (2009).
[98] Masanes, L., Acin, A. & Gisin, N. General properties of nonsignaling theories. Phys. Rev. A 73, 012112 (2006).
[99] Pawlowski, M. & Brukner, Č. Monogamy of Bell’s inequality violations in nonsignaling theories. Phys. Rev. Lett. 102,
030403 (2009).
[100] Popescu, S. Nonlocality beyond quantum mechanics. Nature Physics 10, 264–270 (2014).
[101] Evans, R. J. Graphical methods for inequality constraints in marginalized DAGs. In Machine Learning for Signal
Processing (MLSP), 2012 IEEE International Workshop on, 1–6 (IEEE, 2012).
[102] Evans, R. J. Graphs for margins of Bayesian networks. Scandinavian Journal of Statistics (2015).
[103] Rényi, A. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical
Statistics and Probability, Vol. 1: Contributions to the Theory of Statistics, 547–561 (1960).
[104] Linden, N., Mosonyi, M. & Winter, A. The structure of Rényi entropic inequalities. Proceedings of the Royal Society A:
Mathematical, Physical and Engineering Sciences 469, 20120737–20120737 (2013).
[105] Petz, D. Quasi-entropies for finite quantum systems. Reports on Mathematical Physics 23, 57–65 (1986).
[106] Tomamichel, M., Colbeck, R. & Renner, R. A fully quantum asymptotic equipartition property. IEEE Transactions on
information theory 55, 5840–5847 (2009).
[107] Müller-Lennert, M., Dupuis, F., Szehr, O., Fehr, S. & Tomamichel, M. On quantum Rényi entropies: A new generalization
and some properties. Journal of Mathematical Physics 54, 122203 (2013).
[108] Frank, R. L. & Lieb, E. H. Monotonicity of a relative Rényi entropy. Journal of Mathematical Physics 54, 122201 (2013).
[109] Beigi, S. Sandwiched Rényi divergence satisfies data processing inequality. Journal of Mathematical Physics 54, 122202
(2013).
[110] Dupuis, F. Chain rules for quantum Rényi entropies. Journal of Mathematical Physics 56, 022203 (2015).
[111] Pitowsky, I. Correlation polytopes: Their geometry and complexity. Mathematical Programming 50, 395–414 (1991).
[112] Avis, D., Imai, H., Ito, T. & Sasaki, Y. Deriving tight Bell inequalities for 2 parties with many 2-valued observables from
facets of cut polytopes. e-print arXiv:quant-ph/0404014 (2004).
[113] Navascues, M. & Wolfe, E. The inflation technique solves completely the classical inference problem. e-print
arXiv:1707.06476 (2017).
23
In the following we provide the basic inequalities for the quantum instrumental scenario, IC Q , i.e., the the con-
straints making up the matrix MB (IC Q ).
I(AY ∶ AZ ) ≥ 0 (A1)
I(AY ∶ X) ≥ 0 (A2)
I(AZ ∶ X) ≥ 0 (A3)
I(AY ∶ AZ ∣X) ≥ 0 (A4)
I(AY ∶ X∣AZ ) ≥ 0 (A5)
I(AZ ∶ X∣AY ) ≥ 0 (A6)
I(AY ∶ Z) ≥ 0 (A7)
I(X ∶ Z) ≥ 0 (A8)
I(X ∶ Z∣AY ) ≥ 0 (A9)
I(AY ∶ Z∣X) ≥ 0 (A10)
I(AY ∶ X∣Z) ≥ 0 (A11)
I(X ∶ Y ) ≥ 0 (A12)
I(Y ∶ Z) ≥ 0 (A13)
I(Y ∶ Z∣X) ≥ 0 (A14)
I(X ∶ Z∣Y ) ≥ 0 (A15)
I(X ∶ Y ∣Z) ≥ 0 (A16)
H(AZ ∣X) ≥ 0 (A17)
H(AY AZ ∣X) ≥ 0 (A18)
H(X∣AY AZ ) ≥ 0 (A19)
H(AY ∣XZ) ≥ 0 (A20)
H(X∣AY Z) ≥ 0 (A21)
H(Z∣AY X) ≥ 0 (A22)
H(X∣Y Z) ≥ 0 (A23)
H(Y ∣XZ) ≥ 0 (A24)
H(Z∣XY ) ≥ 0 (A25)
H(AZ ∣AY ) + H(AZ ∣X) ≥ 0 (A26)
H(AY ∣AZ ) + H(AY ∣X) ≥ 0 (A27)
H(AZ ∣AY X) + H(AZ ) ≥ 0 (A28)
H(AY ∣AZ X) + H(AY ) ≥ 0 (A29)
Independence constraints and data processing inequalities are provided in the main text. If we include these and
remove redundant inequalities we obtain the following set of constraints, which for convenience we give in matrix form
(such that Γ(IC Q ) = {v ∈ R15
≥0 ∣ M ⋅ v ≥ 0}):
24
(a) (b)
X C Y X0 X1 Y0 Y1
B A
B A
Z Z
Figure 6: Post-selecting (a) on a binary observed variable C leads to the causal structure (b).
⎛ 0 0 0 0 0 0 0 1 0 0 0 −1 0 −1 1 ⎞
⎜ 0 0 0 0 0 0 0 0 1 0 −1 0 −1 1 0 ⎟
⎜ 0 0 ⎟
⎜ ⎟
⎜
0 0 0 0 0 0 0 0 0 1 0
⎟
−1 −1
⎜ 0 −1 ⎟
⎜ 0 0 0 −1 0 0 0 0 0 1 1 0 0 ⎟
⎜ 0 −1 ⎟
⎜ 0 0 −1 0 0 0 0 0 1 0 1 0 0 ⎟
⎜ 0 −1 ⎟
⎜ 0 −1 0 0 0 0 0 0 1 1 0 0 0 ⎟
⎜ 0 0 ⎟
⎜ 0 0 1 1 0 0 0 0 0 0 −1 0 0 ⎟
⎜ 0 0 ⎟
⎜ 0 1 0 1 0 0 0 0 0 0 0 0 ⎟
⎜ ⎟
−1
⎜ 0 0 ⎟
⎜ 0 1 1 0 0 0 0 0 −1 0 0 0 0 ⎟
⎜ 1 0 ⎟
⎜ 0 0 0 1 0 0 −1 0 0 0 0 0 0 ⎟
M =⎜
⎜ 1 0 1 0 0 0 −1 0 0 0 0 0 0 0 0 ⎟
⎟
⎜ 0 0 ⎟
⎜ −1 0 0 0 1 0 0 1 0 0 0 −1 0 ⎟
⎜ −1 0 ⎟
⎜ 0 0 0 0 1 1 0 0 0 0 0 0 ⎟
⎜ ⎟
−1
⎜ 0 0 ⎟
⎜ ⎟
1 1 0 0 0 0 0 −1 0 0 0 0 0
⎜ 0 1 ⎟
⎜ 0 0 0 0 0 0 0 0 −1 0 0 0 0 ⎟
⎜ 0 1 ⎟
⎜ 0 0 0 0 0 0 0 0 0 −1 0 0 0 ⎟
⎜ 0 0 ⎟
⎜ 0 0 0 0 0 −1 0 0 0 0 0 0 1 ⎟
⎜ 0 0 ⎟
⎜ 0 0 0 0 0 0 0 0 0 0 0 1 ⎟
⎜ ⎟
−1
⎜ 0 0 ⎟
⎜ ⎟
0 0 0 0 0 0 0 0 0 −1 0 0 1
⎜ 1 0 0 0 0 0 0 0 −1 0 0 0 1 0 0 ⎟
⎝ 0 1 0 0 0 0 −1 0 0 0 0 0 1 0 0 ⎠
The causal structure of Figure 6, previously considered in [11, 18] is analysed entropically by means of the entropic
post-selection technique. The outer approximation to the entropic cone of the causal structure of Figure 6 is computed
and marginalised to vectors
(H(Z), H(X∣C=0 ), H(X∣C=1 ), H(Y∣C=0 ), H(Y∣C=1 ), H(X∣C=0 Z), H(X∣C=1 Z), H(Y∣C=0 Z),
H(Y∣C=1 Z), H(X∣C=0 Y∣C=0 ), H(X∣C=1 Y∣C=1 ), H(X∣C=0 Y∣C=0 Z), H(X∣C=1 Y∣C=1 Z)) .
The following 14 extremal rays are obtained from this computation, where each ray is represented by one particular
vector on it. The tip of this pointed polyhedral cone is the zero-vector.
25
(1) 0000100010101
(2) 0001000101010
(3) 0010001000101
(4) 0100010001010
(5) 1111122222222
(6) 1010112120212
(7) 1101021212021
(8) 1000011110011
(9) 1000111110111
(10) 1001011111011
(11) 1010011110111
(12) 1100011111011
(13) 1001111111111
(14) 1110011111111
The corresponding inequality description is given by the 2 equalities and 18 inequalities (or equivalently 22 inequal-
ities if each equality is written as two inequalities).