Advanced Topics in Quantum Information Theory
Advanced Topics in Quantum Information Theory
John Watrous
School of Computer Science and Institute for Quantum Computing
University of Waterloo
November 8, 2020
Lectures
1 Conic Programming 1
1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Feasible solutions, optimal values, and weak duality . . . . . . . . . . 4
1.4 Minimization and maximization . . . . . . . . . . . . . . . . . . . . . 4
1.5 Slater’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Example: conic program for optimal measurements . . . . . . . . . . 8
Conic Programming
The first topic we will discuss in the course is conic programming, which is a valu-
able tool for the study of quantum information. In particular, semidefinite programs,
which are a specific type of conic program, have proven to be particularly useful
in the theory of quantum information and computation—and you may already be
familiar with some of their applications. There is, however, much to be gained in
considering conic programs in greater generality. A few points in support of this
claim follow.
One must appreciate that semidefinite programs do have a very special prop-
erty that contributes enormously to their utility, which is that one can generally
solve a given semidefinite program with reasonable efficiency and precision us-
ing a (classical!) computer. Conic programs, in contrast, are in general hard to
1
solve with a computer. For example, maximizing a linear function over the cone
SepD(Cn : Cn ) of bipartite separable density operators with local dimension n is an
NP-hard optimization problem, even to approximate with a modest degree of pre-
cision.
Having sufficient motivation (I presume) for a study of conic programming, we
will now proceed to such a study.
1.1 Preliminaries
This preliminary subsection defines various notions and discuss a few known facts
(without proofs) that will be needed for a proper treatment of conic programming,
mostly relating to convex analysis.
Let V be a finite-dimensional real inner product space, with the inner product
of any two vectors u, v ∈ V being denoted hu, vi. Note that this inner product is
necessarily symmetric in its arguments, given that V is a vector space over the real
numbers R, which will be our ground field throughout this entire discussion of
conic programming.
A subset C ⊆ V is convex if, for all u, v ∈ C and λ ∈ [0, 1], one has
λu + (1 − λ)v ∈ C. (1.1)
Similarly, the Cartesian product of any two cones is a cone: if K and L are cones,
and (u, v) ∈ K × L and λ ≥ 0, then
2
Theorem 1.1 (Separating hyperplane theorem). Let V be a finite-dimensional real
inner-product space and let C and D be nonempty, disjoint, convex subsets of V. There
exists a nonzero vector w ∈ V and a real number γ ∈ R such that
for every u ∈ C and v ∈ D. If either of C or D is a cone, then there must exist a nonzero
vector w ∈ V as above for which the inequality (1.4) is true, for all u ∈ C and v ∈ D,
when γ = 0.
The fact that A∗ is indeed a cone, and is also closed and convex, irrespective of the
choice of the set A, can be verified. For any two cones K, L ⊆ V, it is the case that
(K × L ) ∗ = K∗ × L∗ . (1.6)
1.2 Definitions
Now suppose that a finite-dimensional real inner-product space V and a closed,
convex cone K ⊆ V have been fixed. In addition, let W be a finite-dimensional real
inner product space, let φ : V → W be a linear map, and let a ∈ V and b ∈ W be
vectors. These choices of objects define a conic program, with which the following
optimization problem is associated.
3
1.3 Feasible solutions, optimal values, and weak
duality
It is convenient to associate two sets of vectors with Optimization Problem 1.2:
A = x ∈ K : φ( x ) = b and B = y ∈ W : φ∗ (y) − a ∈ K∗
(1.7)
are the sets of primal feasible and dual feasible vectors for that conic program. We also
define the primal optimal and dual optimal values of this conic program as
Proposition 1.3 (Weak duality for conic programs). Let V and W be finite-dimensional
real inner product spaces, let K ⊆ V be a closed, convex cone, let φ : V → W be a linear
map, and let a ∈ V and b ∈ W be vectors. For α, β ∈ R ∪ {−∞, ∞} as defined in (1.8)
above, it is the case that α ≤ β.
Proof. If either of the sets A and B defined in (1.7) are empty, then the proposition
is vacuously true: either −∞ ≤ β or α ≤ ∞. It therefore suffices to consider the
case in which A and B are nonempty.
Suppose x ∈ A and y ∈ B are chosen arbitrarily. The set A is a subset of K, so
x ∈ K, and because y ∈ B it is the case that φ∗ (y) − a ∈ K∗ , and therefore
This inequality is maintained as one takes the supremum over all x ∈ A and infi-
mum over y ∈ B, and therefore α ≤ β, as required.
4
(1 + ε) x − εy
x
C y
Notice, in particular, that the dual constraint a − φ∗ (y) ∈ K∗ has replaced the
constraint φ∗ (y) − a ∈ K∗ in Optimization Problem 1.2. The reason for this substi-
tution is that Optimization Problem 1.4 is equivalent, up to a negation of sign, to
an instance of Optimization Problem 1.2 in which a, b, and φ have been replaced
with − a, −b, and −φ, respectively.
With this equivalence in mind, we shall feel free to minimize or maximize in
the primal problem of any conic program as we see fit—naturally selecting the
corresponding dual problem formulation—but at the same time we shall lose no
generality by adopting Optimization Problem 1.2 as the standard form for conic
programs.
5
Theorem 1.5 (Slater’s theorem for conic programs). Let V and W be finite-dimensional
real inner-product spaces, let K ⊆ V be a closed, convex cone, let φ : V → W be a linear
map, and let a ∈ V and b ∈ W be vectors. With respect to the notations A, B, α, and β
defined in Subsection 1.3, the following two statements are true.
1. If B is nonempty and there exists x ∈ relint(K) such that φ( x ) = b, then there must
exist y ∈ B such that hb, yi = α.
2. If A is nonempty and there exists y ∈ W for which φ∗ (y) − a ∈ relint(K∗ ), then
there must exist x ∈ A such that h a, x i = β.
Both statements imply the equality α = β.
Proof. We will prove just the first statement—the second statement can be proved
through a similar technique, or one may conclude that the second statement is true
given the first by formulating the dual problem of Optimization Problem 1.2 as the
primal problem of a (different but equivalent) conic program. We will also make
the simplifying assumptions that V = span(K) and W = im(φ), both of which
cause no loss of generality. Note that α and β are both necessarily finite, as the
assumptions of the first statement imply that A and B are nonempty.
Define two subsets of W ⊕ V ⊕ R as follows:
C = (b − φ( x ), z, h a, x i) : x, z ∈ V, x − z ∈ K ,
(1.12)
D = (0, 0, η ) : η > α .
Both of these sets are evidently convex, and they are disjoint by the definition of α,
so they are separated by a hyperplane. That is, there must exist a nonzero vector
(y, u, λ) ∈ W ⊕ V ⊕ R such that
(y, u, λ), (b − φ( x ), z, h a, x i) ≤ (y, u, λ), (0, 0, η ) , (1.13)
or equivalently
hy, b − φ( x )i + hu, zi + λh a, x i ≤ λη, (1.14)
for all x, z ∈ V for which x − z ∈ K and all η > α. Let us observe that there is no
loss of generality in assuming λ ∈ {−1, 0, 1}, as the inequality (1.14) remains true
when the vector (y, u, λ) is rescaled (i.e., multiplied by any positive real number).
We will now draw several conclusions from the fact that (1.14) holds for all
x, z ∈ V for which x − z ∈ K and all η > α.
1. The inequality (1.14) must be true when x = 0 and z = 0, and therefore
hy, bi ≤ λη (1.15)
for all η > α. This implies that λ ≥ 0, for otherwise the right-hand side of the
inequality tends to −∞ as η becomes large while the left-hand side remains
fixed. Thus, λ = −1 is impossible, so λ ∈ {0, 1}.
6
2. For any choice of x ∈ A and z ∈ −K, it is the case that x − z ∈ K. Substituting
these vectors into the inequality (1.14) and rearranging yields
for every η > α. We conclude that u ∈ K∗ , for otherwise the left-hand side
of the above inequality can be made to approach ∞ while the right-hand side
remains bounded, for any fixed η > α, through an appropriate selection of
z ∈ −K.
3. Assume toward contradiction that λ = 0. Fix any choice of x ∈ A ∩ relint(K),
which is possible by the assumption of the statement being proved. The in-
equality (1.14) simplifies to
hu, zi ≤ 0 (1.17)
for every choice of z ∈ V for which x − z ∈ K.
Setting z = x, we have that x − z ∈ K, and therefore
hu, x i ≤ 0. (1.18)
As it was was for x, we find that hu, vi = 0. Seeing that this is true for all v ∈ K,
and recognizing that u ∈ V = span(K), we conclude that u = 0.
But if λ = 0 and u = 0, then we may free the vector x to range over all of V
and set z = x to conclude from (1.14) that
hy, b − φ( x )i ≤ 0 (1.21)
7
The steps just describe have allowed us to conclude that there exist vectors
y ∈ W and u ∈ K∗ such that
for all x, z ∈ V for which x − z ∈ K and all η > α. Setting z = 0 and flipping sign,
we find that
hφ∗ (y) − a, x i ≥ hy, bi − η (1.23)
for all x ∈ K and η > α. This implies
φ ∗ ( y ) − a ∈ K∗ , (1.24)
for otherwise the left-hand side of the previous inequality can be made to ap-
proach −∞ while the right-hand side remains fixed. The vector y is therefore a
dual-feasible point: y ∈ B. Finally, again considering the possibility that x = 0
and z = 0, we conclude that hb, yi ≤ η for all η > α. It is therefore the case that
hb, yi ≤ α, and hence hb, yi = α by weak duality. This concludes the proof (of the
first statement).
maximize: h H1 , X1 i + · · · + h Hn , Xn i
subject to: X 1 + · · · + X n = 1X
X1 , . . . , X n ∈ K
8
We can set this problem up as a conic program as follows. First, observe that
Kn is a closed, convex cone. The problem may therefore be expressed as the primal
problem of a conic program:
maximize: ( H1 , . . . , Hn ), ( X1 , . . . , Xn )
subject to: φ ( X 1 , . . . , X n ) = X 1 + · · · + X n = 1X
( X1 , . . . , X n ) ∈ K n
The symbol = indicates that this is the definition of the function φ, whereas the
ordinary equal sign represents a constraint.
Here is the dual problem, as is dictated by Optimization Problem 1.2:
minimize: 1X , Y
∗
subject to: φ∗ (Y ) − ( H1 , . . . , Hn ) ∈ Kn
Y ∈ Herm(X)
minimize: Tr(Y )
subject to: Y − H1 ∈ K∗
..
.
Y − Hn ∈ K∗
Y ∈ Herm(X)
In summary, the following conic program, expressed in a simplified form, has been
obtained.
9
Now let us consider a specific instance of this conic program. Define four pure
states | ψ1 i, . . . , | ψ4 i ∈ C4 ⊗ C4 as follows:
1 1
| ψ1 i = | 0 i| 0 i + | 1 i| 1 i + | 2 i| 2 i + | 3 i| 3 i = vec(1 ⊗ 1),
2 2
1 1
| ψ2 i = | 0 i| 3 i + | 1 i| 2 i + | 2 i| 1 i + | 3 i| 0 i = vec(σx ⊗ σx ),
2 2
1 i
| ψ3 i = | 0 i| 3 i + | 1 i| 2 i − | 2 i| 1 i − | 3 i| 0 i = vec(σy ⊗ σx ),
2 2
1 1
| ψ4 i = | 0 i| 1 i + | 1 i| 0 i − | 2 i| 3 i − | 3 i| 2 i = vec(σz ⊗ σx ).
2 2
These states were identified by Yu, Duan, and Ying (2012), who proved that they
cannot be perfectly discriminated by a PPT measurement. Cosentino (2013) subse-
quently showed that the optimal PPT discrimination probability is 7/8, by solving
the associated semidefinite program.
We will prove that no separable measurement can discriminate these states with
probability greater than 3/4, assuming that one of the four states is selected uni-
formly at random. This is easily achievable by coarse-graining a measurement with
respect to the standard basis, so this is in fact the optimal probability to correctly
discriminate the states by a separable measurement. Here is a precise description
of the corresponding conic program.
Dual problem
minimize: Tr(Y )
1
subject to: Y − | ψk ih ψk | ∈ Sep(C4 : C4 )∗ (1 ≤ k ≤ 4)
4
Y ∈ Herm(C4 ⊗ C4 )
Our goal will be to describe a dual-feasible solution having objective value 3/4,
for this will then be an upper-bound on the probability of a correct discrimination
by weak duality.
10
Note that the dual-feasibility of a given Y ∈ Herm(C4 ⊗ C4 ) is equivalent to
1
Y− vec(Uk ) vec(Uk )∗ ∈ Sep(C4 : C4 )∗ (1.25)
16
for
U1 = 1 ⊗ 1, U2 = σx ⊗ σx , U3 = iσy ⊗ σx , U4 = σz ⊗ σx . (1.26)
In order to prove the dual-feasibility of a specific choice for Y that will be specified
shortly, we will make use of the following lemma.
is contained in Sep(Cn : Cn )∗ .
This follows from the observation that the vectors Uz and Vz must be orthogonal
unit vectors:
Vz, Uz = z∗ V T Uz = zzT , V T U
(1.29)
= (zzT )T , (V T U )T = − zzT , V T U = 0.
It follows that
hyy∗ ⊗ zz∗ , Z i = (y ⊗ z)∗ Z (y ⊗ z) ≥ 0 (1.30)
for every y ∈ Cn . The required containment follows by convexity.
1
14 ⊗ 14 − (T ⊗ 1)(vec(V ) vec(V )∗ )
Y= (1.31)
16
is dual-feasible. Given that Tr(Y ) = (16 − 4)/16 = 3/4, we have obtained the
claimed upper-bound on correctly discriminating these states by a separable mea-
surement.
11
Lecture 2
In this lecture we will first define the max-relative entropy and observe some of its
properties. We will then define the conditional min-entropy in terms of the quantum
max-relative entropy, derive an alternative characterization of this quantity, and
consider the conditional min-entropy of a few example classes of states.
Before proceeding to the definition of the max-relative entropy, it will be help-
ful to consider the ordinary quantum relative entropy and its relationship to the
conditional quantum entropy as a source of inspiration. Recall that the quantum
relative entropy is defined as follows for all density operators ρ and all positive
semidefinite operators Q acting on the same complex Euclidean space:
(
Tr(ρ log ρ) − Tr(ρ log Q) im(ρ) ⊆ im( Q)
D( ρ k Q ) = (2.1)
∞ im(ρ) 6⊆ im( Q).
We can define this function more generally for any positive semidefinite operator P
in place of the density operator ρ, but our focus will be on the case where the first
argument is a density operator.
One way to think about the quantum relative entropy is that it represents the
loss of efficiency, measured in bits, that is incurred when one plans ahead for Q but
receives ρ instead. This is highly informal, and should not be taken too seriously,
but we will allow this intuitive description to suggest some useful terminology: we
will refer to the second argument Q in the quantum relative entropy as the model,
and to the first argument ρ as the actual state, for the sake of convenience.
Irrespective of how we choose to interpret the quantum relative entropy func-
tion, there is no denying its enormous utility as a “helper function,” through which
fundamental entropic quantities may be defined and analyzed. In particular, the
conditional quantum entropy and the quantum mutual information are defined in
13
terms of the quantum relative entropy as follows:
for all ρ ∈ D(X ⊗ Y). Then, though properties of the quantum relative entropy,
one may establish important properties of the conditional quantum entropy and
quantum mutual information. For example, through the joint convexity of quantum
relative entropy,
one may prove the critically important strong subadditivity property of von Neu-
mann entropy, which may be expressed as
14
Remark 2.2. The same definition may be used when ρ is any positive semidefinite
operator, and not necessarily a density operator. It is common, in particular, that ρ
is taken to be a sub-normalized state, meaning ρ ≥ 0 and Tr(ρ) ≤ 1. In this course,
however, we will focus on the case that ρ is normalized.
1. Dmax (ρ k Q) < ∞
2. im(ρ) ⊆ im( Q) (or, equivalently, ker( Q) ⊆ ker(ρ))
3. D(ρ k Q) < ∞
In particular, the max-relative entropy is finite if and only if the ordinary quantum
relative entropy is finite.
We may also observe that the max-relative entropy can be expressed through a
semidefinite program. More specifically, the max-relative entropy is the logarithm
of the optimal value of the following semidefinite program.
Notice that all four of the problems just suggested are strictly feasible when
im(ρ) ⊆ im( Q). Slater’s theorem therefore implies that strong duality holds under
this assumption for both semidefinite programs, with optimal values always being
achieved in all four problems. Strong duality also holds when im(ρ) 6⊆ im( Q); in
this case the optimal value of both the primal and dual forms in Optimization
Problem 2.3 is positive infinity, while the optimal value of both the primal and
dual forms in Optimization Problem 2.4 is zero.
15
Two alternative characterizations of max-relative entropy
We will now take moment to observe two alternative characterizations of the max-
relative entropy. For the first, observe that if im(ρ) ⊆ im( Q), then the condition
ρ ≤ 2λ Q is equivalent to
p p
Q + ρ Q + ≤ 2λ . (2.6)
Therefore, we have
p p
log Q+ ρ Q+ im(ρ) ⊆ im( Q)
Dmax (ρ k Q) = (2.7)
∞ im(ρ) 6⊆ im( Q).
The largest that the value η can be, assuming we are free to choose R however we
wish, is precisely 2− Dmax (ρkQ) .
If Q = σ is itself a density operator, then necessarily η ∈ [0, 1], and we may
think of this value as being a probability. The simple fact that η ≤ 1 immediately
yields a variant of Klein’s inequality for the max-relative entropy:
Dmax (ρ k σ) ≥ 0, (2.11)
16
with equality if and only if ρ = σ. If, on the other hand, σ is “highly dissimilar” to ρ,
then any convex combination involving ρ and yielding σ must take the probability
η associated with ρ to be small, so Dmax (ρ k σ) must be large. In the extreme case
that im(ρ) 6⊆ im( Q), then any expression of Q taking the form (2.10) must have
η = 0, which is consistent with Dmax (ρ k σ) = ∞.
for all ρ ∈ D(X), Q ∈ Pos(X), and Φ ∈ C(X, Y). In fact, complete positivity is not
required; the inequality (2.12) holds for all Φ positive and trace preserving.
Before we prove that the max-relative entropy is monotonic in the sense just
described, let us noted that we cannot follow a similar route to this fact that we
followed when proving the analogous fact for the ordinary quantum relative en-
tropy in CS 766/QIC 820—which was through the joint convexity of quantum rel-
ative entropy. This is because the max-relative entropy is not jointly convex—and this
is a sense in which it differs from the ordinary quantum relative entropy. The max-
relative entropy is, however, jointly quasi-convex:
!
n n
Dmax ∑ pk ρk ∑ pk Qk ≤ max Dmax (ρk k Qk ).
k∈{1,...,n}
(2.13)
k =1 k =1
The fact that the max-relative entropy is monotonic with respect to the action
of channels, however, is not only true but is almost immediate from the definition
of the max-relative entropy. Specifically, if we have ρ ≤ 2λ Q for some choice of λ,
then Φ(ρ) ≤ 2λ Φ( Q) by the positivity of Φ, from which (2.12) follows. The as-
sumption that Φ preserves trace implies that Tr(Φ(ρ)) = 1, so that it is a suitable
first argument to the max-relative entropy—but this assumption can be dropped
altogether, provided that we’re willing to allow Φ(ρ) as a first argument to the
max-relative entropy function.
17
for all density operators ρ and all positive semidefinite operators Q.
One way to prove this is to use the fact that the logarithm is an operator monotone
function: for all positive definite operators P and Q with P ≤ Q, it is the case that
log( P) ≤ log( Q). This is not a trivial fact to prove, but it is well-known, and you
should have no trouble finding a proof if you search for one.
Now, suppose that λ satisfies ρ ≤ 2λ Q, or equivalently 2−λ ρ ≤ Q. We then
have
D(ρ k Q) = Tr(ρ log ρ) − Tr(ρ log Q) ≤ Tr(ρ log ρ) − Tr ρ log 2−λ ρ = λ, (2.15)
offers an easy route to a proof of this fact. Observe in particular that this implies
that, for every choice of ρ, Q, and a positive integer n, we have
Dmax ρ⊗n Q⊗n = n Dmax (ρ k Q).
(2.18)
The max-relative entropy also obeys the following identity, for any choice of
density operator ρ1 , . . . , ρn , positive semidefinite operators Q1 , . . . , Qn , and a prob-
ability vector ( p1 , . . . , pn ):
!
n n
Dmax ∑ pk | k ih k | ⊗ ρk ∑ pk | k ih k | ⊗ Qk = max Dmax (ρk k Qk ).
k∈{1,...,n}
(2.19)
k =1 k =1
Using the formula Dmax (ρ k Q) = Dmax (ρ kλQ) − log(η ), we obtain this formula
for the situation in which the probabilities p1 , . . . , pn are not included in the blocks
of the second operator:
!
n n
Dmax ∑ pk | k ih k | ⊗ ρk ∑ | k ih k | ⊗ Qk
k =1 k =1 (2.20)
= max Dmax (ρk k Qk ) + log( pk ) .
k∈{1,...,n}
18
In contrast, the ordinary quantum relative entropy obeys this equation:
!
n n n
D ∑ pk | k ih k | ⊗ ρk ∑ pk | k ih k | ⊗ Qk = ∑ p k D( ρ k k Q k ). (2.21)
k =1 k =1 k =1
Using the equation D(ρ k Q) = D(ρ kλQ) − log(η ), we then conclude that
!
n n n
D ∑ pk | k ih k | ⊗ ρk ∑ | k ih k | ⊗ Qk = ∑ p k D( ρ k k Q k ) − H( p ). (2.22)
k =1 k =1 k =1
the infimum value is always obtained when σ = ρ[Y ]. With this fact in mind, we
define the conditional min-entropy as follows.
Definition 2.5. Let X and Y be registers and let ρ ∈ D(X ⊗ Y) be a state of these
registers. The conditional min-entropy of X given Y for the state ρ is defined as
Hmin (X | Y )ρ = − inf Dmax (ρ k1X ⊗ σ). (2.25)
σ ∈D(Y)
Remark 2.6. It is not, in general, the case that the infimum in (2.25) is achieved
when σ = ρ[Y ].
By expanding the definition of the max-relative entropy, one may alternatively
express the conditional min-entropy in the following way:
19
Semidefinite program for conditional min-entropy
It is evident from (2.26) that the conditional min-entropy can be expressed as a
semidefinite program. In particular, the quantity Hmin (X|Y )ρ is the negative loga-
rithm of the optimal value of the following semidefinite program.
The dual problem is clearly consistent with the expression (2.26), whereas the
primal problem corresponds (essentially) to an optimization of a linear function
(represented by the state ρ) over all channels Φ ∈ C(Y, X). There is a useful and
intuitive way to think about this optimization, but first we will take a moment to
introduce a useful concept, the transpose of a channel.
Definition 2.8. Let X and Y be complex Euclidean spaces and let Φ ∈ T(Y, X). The
transpose of Φ is the unique map ΦT ∈ T(X, Y) satisfying the equation
Here is a short list of facts concerning this notion, all of which are straightforward
to prove.
1. (ΦT )T = Φ.
2. The map Φ 7→ ΦT from T(Y, X) to T(X, Y), is linear, one-to-one, and onto.
3. ΦT ∈ CP(X, Y) if and only if Φ ∈ CP(Y, X).
4. ΦT is unital if and only if Φ preserves trace.
Finally, one may observe that ΦT is (as you might have guessed) the map that
is obtained by taking any Kraus representation of Φ and transposing the Kraus
operators.
20
Returning to Optimization Problem 2.7, let us consider the set A of primal fea-
sible operators, which can be expressed in multiple ways based on the facts about
the transpose of a map just listed:
A = { X ∈ Pos(X ⊗ Y) : TrX ( X ) = 1Y }
= Φ ⊗ 1L(Y) vec(1Y ) vec(1Y )∗ : Φ ∈ C(Y, X)
(2.30)
= 1L(X) ⊗ ΦT vec(1X ) vec(1X )∗ : Φ ∈ C(Y, X)
where
1 n
n a,b∑
τ= | a ih b | ⊗ | a ih b | and n = dim(X). (2.33)
=1
2.3 Examples
We will conclude the lecture by considering the conditional min-entropy of a few
classes of states.
Example 2.9. Suppose Y is trivial (i.e., one-dimensional), so that ρ ∈ D(X). We
then find that
Hmin (X | Y )ρ = − inf Dmax (ρ k1X ⊗ σ)
σ ∈D(Y)
= − Dmax (ρ k 1X ) (2.34)
= − log λ1 (ρ).
Naturally, we omit the register Y from this notation when it is trivial:
Hmin (X)ρ = Hmin (ρ) = − log λ1 (ρ). (2.35)
21
Example 2.10. Suppose ρ = σ ⊗ ξ for σ ∈ D(X) and ξ ∈ D(Y). Through a similar
calculation to the previous example, we find that
= − Dmax (σ k 1X ) (2.36)
= Hmin (X)σ .
This is natural: if the registers X and Y are completely uncorrelated, the conditional
min-entropy of X given Y is simply the min-entropy of X.
Example 2.11. Next, suppose that we have a separable state: ρ ∈ SepD(X : Y).
Then, for any channel Ξ ∈ C(Y, X) we have
applying a channel locally to one part of a separable state always results in a sepa-
rable state. The inner-product between any separable state and the canonical max-
imally entangled state τ is at most 1/n (as we proved in CS 766/QIC 820), and
therefore
1
2− Hmin (X|Y)ρ = n · 1L(X) ⊗ Ξ (ρ), τ ≤ n · = 1.
sup (2.38)
Ξ∈C(Y,X) n
Example 2.12. Suppose that the τ can be recovered perfectly by applying a channel
locally to Y for the state ρ ∈ D(X ⊗ Y). This is equivalent to ρ taking the form
for some choice of a density operator ξ ∈ D(Z) and an isometry V ∈ U(X ⊗ Z, Y).
Then we have
Hmin (X|Y )ρ = − log(n), (2.40)
which is the minimum possible value for the conditional min-entropy.
22
We find that
2− Hmin (X|Y)ρ = n · 1L ( X ) ⊗ Ξ ( ρ ) , τ
sup
Ξ∈C(Y,X)
n (2.42)
= sup ∑ p(a)h a | Ξ(ξ a )| a i,
Ξ∈C(Y,X) a=1
with the simplification to the second line being possible because ρ is a classical-
quantum state. This has the following intuitive meaning: Hmin (X|Y )ρ is the neg-
ative logarithm of the optimal correctness probability to identify a state chosen
randomly according to the ensemble corresponding to ρ.
23
Lecture 3
Recall the definition of the max-relative entropy, which was introduced in the pre-
vious lecture:
Dmax (ρ k Q) = inf λ ∈ R : ρ ≤ 2λ Q .
(3.1)
In this lecture we will consider what happens when we minimize this function
over various choices of ρ and Q. Two common situations in which this is done are
as follows:
ε
Dmax (ρ k Q) = inf Dmax (ξ k Q), (3.2)
ξ ∈Bε ( ρ )
where Bε (ρ) denotes the set of states that are ε-close to ρ with respect to some
notion of distance.
2. Optimizing over models. For a given ρ and a convex set C of possible choices of
models Q, we may consider the quantity
which measures in a certain sense which Q ∈ C incurs the least loss of effi-
ciency when serving as a model for the state ρ.
25
3.1 Smoothed max-relative entropy
Let us begin with the smoothed max-relative entropy, where one takes the min-
imum value of the max-relative entropy over all choices of actual states that are
close to a given state, as suggested above. The idea is that the smoothed max-
relative entropy reflects a tolerance for small errors, which we often have or would
like to express when analyzing operationally defined notions. Without smoothing,
the max-relative entropy can sometimes, in certain settings at least, have unwanted
hyper-sensitivities that smoothing eliminates.
Definition
As it turns out, there is not a single agreed upon definition for the smoothed max-
relative entropy; different authors sometimes choose different notions of distance
with respect to which the smoothing is done, which translates to different choices
for the set Bε (ρ) in (3.2). In addition, the operator ξ is sometimes allowed to range
not only over density operators, but also over sub-normalized density operators,
and in this case the definition of the max-relative entropy is extended in the most
straightforward way to accommodate such operators.
It is typically the case, however, that the notions of distance with respect to
which the smoothed max-relative entropy is defined are based on either the trace
distance or the fidelity function. Through the Fuchs–van de Graaf inequalities, one
finds that the resulting definitions of smoothed max-relative entropy are roughly
equivalent, and are certainly quite similar in a qualitative sense.
For the sake of concreteness, we will define the smoothed max-relative entropy
in terms of the trace distance, as the following definition makes precise.
where
Bε ( ρ ) = ξ ∈ D (X ) : 1
2 k ρ − ξ k1 ≤ε . (3.5)
26
Optimizing over arbitrary closed and convex sets of states
To better understand the smoothed max-relative entropy, it is helpful to consider
a more general set-up. Suppose that C ⊆ D(X) is any closed and compact set of
density operators, let Q ∈ Pos(X) be given, and consider the problem of minimiz-
ing η over all choices of ξ ∈ C and η ∈ R that satisfy ξ ≤ ηQ. This problem can be
expressed as a conic problem as will now be described.
First, define a set K ⊂ R ⊕ Herm(X) ⊕ R ⊕ Herm(X) as follows:
This set is a closed and convex cone. One may think of K as being the Cartesian
product of three sets: the first is
L = (λ, λξ ) : λ ≥ 0, ξ ∈ C ,
(3.8)
the second is the set of all nonnegative real numbers η ≥ 0, and the third is Pos(X).
All three of these sets are closed and convex cones; in the case of L this follows
from the assumption that C is a compact and convex set. It should be noted that
the construction of a convex cone L from a convex set C like this is both common
and useful.
Now, the optimization problem suggested above is evidently equivalent to the
problem of minimizing the inner-product
over all (λ, λξ, η, P) ∈ K, subject to the affine linear constraints that
The optimization problem being considered is therefore the primal form of a conic
program, which is stated below (together with a simplified expression of its dual
form) as Optimization Problem 3.2.
The dual form of this optimization problem is given by the maximization of the
objective function
(1, 0), (µ, Z ) (3.13)
27
subject to the constraint
where (µ, Z ) ranges over the space R ⊕ Herm(X) (as is dictated by Optimization
Problem 1.4 in Lecture 1). To simplify this problem, we must compute the adjoint
map φ∗ and try to understand what K∗ looks like. The adjoint of φ may simply be
computed, and one obtains
It can be shown, through the use of Slater’s theorem, that strong duality always
holds for Optimization Problem 3.2.
28
Conic program for smoothed max-relative entropy
At this point, one may substitute the set Bε (ρ) for C in Optimization Problem 3.2 to
obtain a conic program for the smoothed max-relative entropy. To be precise, the
smoothed max-relative entropy Dmax ε
(ρk Q) is the logarithm of the optimal value
of this conic program. One may also note that if C = {ρ}, then the semidefinite
program for the ordinary (non-smoothed) max-relative entropy is recovered.
Naturally, other notions of smoothing can be considered by making alternative
choices for the set C.
29
Example 3.4. We have, in fact, already seen an example in which we minimize
over models drawn from a convex set: the conditional min-entropy. The conditional
min-entropy of X given Y, for the state ρ ∈ D(X ⊗ Y), is given by
which is consistent with the primal problem in our semidefinite program for the
conditional min-entropy from Lecture 2.
(where it should be stressed that it is the ordinary quantum relative entropy, not
the max-relative entropy, that appears in this equation) as a measure of distance
(or divergence) of ρ from C. For example, the relative entropy of entanglement of a
state ρ ∈ D(Y ⊗ Z) is given by
We may consider a similar notion for the max-relative entropy in place of the
ordinary relative entropy:
To better understand this quantity, let us expand the definition of the max-relative
entropy, so that we obtain
Dmax (ρ k C) = inf λ ∈ R : ρ ≤ 2λ σ, σ ∈ C .
(3.29)
30
2− λ ρ + (1 − 2− λ ) ξ ξ
C
ρ D(X)
Figure 3.1: Dmax (ρ k C) is the minimum value of λ for which 2−λ ρ + (1 − 2−λ )ξ is
in C, for some choice of a density operator ξ.
As ρ and σ are density operators, the slack variable P must have trace equal to
2λ − 1 in order for the equality ρ + P = 2λ σ to hold. We may therefore replace P
by (2λ − 1)ξ for a density operator ξ, and we obtain
which is equivalent to
This expression reveals that the quantity Dmax (ρ k C) has the simple and intuitive
interpretation suggested by Figure 3.1. For λ = Dmax (ρ k C), the quantity 2λ − 1 is
sometimes called the (global or generalized) robustness of ρ with respect to C.
This value is the logarithm of the optimal value of the following conic program.
31
Optimization Problem 3.5
32
Lecture 4
In this lecture we will prove an important theorem concerning the smoothed max-
relative entropy, which is that by regularizing the smoothed max-relative entropy
we obtain the ordinary quantum relative entropy:
ρ⊗n σ ⊗n
ε
Dmax
lim = D( ρ k σ ) (4.1)
n→∞ n
for all density operators ρ, σ ∈ D(X) and all ε ∈ (0, 1). For the sake of clarity, recall
that we define the smoothed max-relative entropy with respect to trace-distance
smoothing:
ε
Dmax (ρ kσ) = inf Dmax (ξ kσ) (4.2)
ξ ∈Bε ( ρ )
where
Bε ( ρ ) = ξ ∈ D (X ) : 1
2 k ρ − ξ k1 ≤ε . (4.3)
Bibliographic remarks
Lemma 4.4, which in some sense is the engine that drives the proof we will discuss,
is due to Bjelaković and Siegmund-Schultze (arXiv:quant-ph/0307170), who used
it to prove the so-called quantum Stein lemma, and through it obtained an alternative
proof of the monotonicity of quantum relative entropy.
The more direct route from Bjelaković and Siegmund-Schultze’s lemma to the
regularization (4.1) to be followed in this lecture appears in the following as-of-yet
unpublished manuscript:
33
4.1 Strong typicality
The general notion of typicality is fundamentally important in information theory,
and there is a sense in which it goes hand-in-hand with the concept of entropy. We
will begin the lecture with a brief and directed summary of strong typicality, which
is a particular formulation of typicality that is convenient for the proof.
First let us introduce some notation. Supposing that Σ is an alphabet, for every
string a1 · · · an ∈ Σn and symbol a ∈ Σ, we write
N ( a | a1 · · · an ) = {k ∈ {1, . . . , n} : ak = a} , (4.4)
which is simply the number of times the symbol a occurs in the string a1 · · · an .
With respect to that notation, strong typicality is defined as follows.
N ( a | a1 · · · a n )
− p( a) ≤ p( a)δ (4.5)
n
for every a ∈ Σ. The set of all δ-strongly typical strings of length n with respect
to p is denoted Sn,δ ( p).
What the definition expresses is that the proportion of each symbol in a strongly
typical string is approximately what one would expect if the individual symbols
were chosen independently at random according to the probability vector p. No-
tice that because it is the quantity p( a)δ, as opposed to δ, that appears on the right-
hand side of the inequality in the definition, we have that the error tolerance for the
frequency with which each symbol appears shrinks proportionately as the proba-
bility for that symbol to appear shrinks—and if p( a) = 0 for some a ∈ Σ, then a
strongly typical string cannot include the symbol a at all.
Next we will prove two basic facts concerning the notion of strong typicality.
The two facts are stated as the lemmas that follow.
Lemma 4.2. Let Σ be an alphabet, let p ∈ P(Σ) be a probability vector, let n be a positive
integer, and let δ > 0 be a positive real number. It is the case that
34
Proof. Suppose first that a ∈ Σ is fixed, and consider the probability that a string
a1 · · · an ∈ Σn , where each symbol is selected independently at random according
to the probability vector p, satisfies
N ( a | a1 · · · a n )
− p( a) > p( a) δ. (4.7)
n
To upper-bound this probability, one may define X1 , . . . , Xn to be independent and
identically distributed random variables, taking value 1 with probability p( a) and
value 0 otherwise, so that the probability of the event (4.7) is equal to
X1 + · · · + X n
Pr − p( a) > p( a) δ . (4.8)
n
If it is the case that p( a) > 0, then Hoeffding’s inequality implies that
X1 + · · · + X n
− p( a) > p( a) δ ≤ 2 exp −2nδ2 p( a)2 ,
Pr (4.9)
n
while it is the case that
X1 + · · · + X n
Pr − p( a) > p( a) δ = 0 (4.10)
n
in case p( a) = 0. The lemma follows from the union bound.
Lemma 4.3. Let Σ be an alphabet, let p ∈ P(Σ) be a probability vector, let n be a positive
integer, let δ > 0 be a positive real number, let a1 · · · an ∈ Sn,δ ( p) be a δ-strongly typical
string with respect to p, and let φ : Σ → [0, ∞) be a nonnegative real-valued function.
The following inequality is satisfied:
φ ( a1 ) + · · · + φ ( a n )
− ∑ p ( a ) φ ( a ) ≤ δ ∑ p ( a ) φ ( a ). (4.11)
n a∈Σ a∈Σ
Proof. The inequality (4.11) follows from the definition of strong typicality together
with the triangle inequality:
φ ( a1 ) + · · · + φ ( a n )
− ∑ p( a)φ( a)
n a∈Σ
N ( a | a1 · · · a n )
= ∑ − p( a) φ( a) (4.12)
a∈Σ
n
N ( a | a1 · · · a n )
≤ ∑ n
− p ( a ) φ ( a ) ≤ δ ∑ p ( a ) φ ( a ),
a∈Σ a∈Σ
as required.
35
4.2 Lemmas
Next we will prove two lemmas that are needed for the proof of the main theo-
rem to which this lecture is devoted. The first of these lemmas is the one due to
Bjelaković and Siegmund-Schultze mentioned at the start of the lecture.
Lemma 4.4. Let ρ, σ ∈ D(X) be density operators for which im(ρ) ⊆ im(σ ) and let
δ > 0 be a positive real number. There exist positive real numbers K and µ such that, for
every positive integer n, there exists a projection operator Πn acting on X⊗n satisfying
Πn , σ⊗n = 0,
Toward a verification that K and µ satisfying the requirements of the lemma, let
Πn = ∑ x a1 x ∗a1 ⊗ · · · ⊗ x an x ∗an , (4.18)
a1 ··· an ∈Sn,δ ( p)
36
follows directly from Lemma 4.2.
It remains to prove the inequalities in (4.14). As
The next lemma is just a simple technical fact concerning the inner-product of
a product of projection operators with a density operator.
Lemma 4.5. Let ρ ∈ D(X) be a density operator, let ε > 0 be a positive real number, and
let Π and ∆ be projection operators on X that satisfy hΠ, ρi ≥ 1 − ε and h∆, ρi ≥ 1 − ε.
It is the case that
h∆Π∆, ρi ≥ 1 − 6ε. (4.23)
and therefore
h(1 − ∆)(1 − Π), ρi ≥ −ε. (4.28)
37
Consequently,
h∆Π, ρi ≥ 1 − 3ε, (4.29)
and therefore
∆Π∆, ρ ≥ (1 − 3ε)2 ≥ 1 − 6ε, (4.30)
as required.
Remark 4.6. If ∆ commutes with ρ, then the bound established by the previous
lemma can be improved to
h∆Π∆, ρi ≥ 1 − 2ε. (4.31)
To see that this is so, note that
and therefore
Remark 4.7. The proof of the lemma above can be extended to the assumptions
hΠ, ρi ≥ 1 − ε and h∆, ρi ≥ 1 − δ to obtain
√ 2
h∆Π∆, ρi ≥ 1 − ε − δ − εδ . (4.35)
We’re also going to make use of Winter’s gentle measurement lemma, which is
a very useful and well-known fact.
Proof. Observe that for any two positive semidefinite operators Q and R, it is nec-
essarily the case that F( R, QRQ) = h R, Qi. Indeed,
q√ √ q √ √ 2 √ √
RQRQ R = RQ R = RQ R, (4.37)
38
and therefore
q√ √ √ √
F( R, QRQ) = Tr RQRQ R = Tr RQ R = h R, Qi. (4.38)
By this formula, along with the square root scaling of the fidelity function, one
finds that
√ √ ! √
Pρ P 1 √ √ P, ρ
F ρ, =p F ρ, Pρ P = p . (4.39)
h P, ρi h P, ρi h P, ρi
√
under the assumption 0 ≤ P ≤ 1, it is the case that
Finally, √ P ≥ P, and
therefore P, ρ ≥ h P, ρi, from which the lemma follows.
Theorem 4.10. Let ρ, σ ∈ D(X) be density operators. For every ε ∈ (0, 1), the following
equality holds.
ρ⊗n σ ⊗n
ε
Dmax
lim = D( ρ k σ ). (4.40)
n→∞ n
Proof. Let us first consider the case that im(ρ) 6⊆ im(σ). In this case the right-hand
side of (4.40) is infinite, and so we must prove the same for the left-hand side.
ε
This follows from the fact that Dmax (ρ⊗n σ⊗n ) is infinite for all but finitely many
positive integers n. Indeed, let Λ be the projection onto im(σ), so that σ = ΛσΛ
and hΛ, ρi < 1. Now, for a given choice of n, one has that if ξ n ∈ D(X⊗n ) satisfies
im(ξ n ) ⊆ im(σ⊗n ), then
⊗n ⊗n ⊗n ⊗n
q n
F ξn, ρ ) = F ξn, Λ ρ Λ ≤ Tr Λ⊗n ρ⊗n Λ⊗n < hΛ, ρi 2 ,
(4.41)
and therefore
1 n
ξ n − ρ⊗n 1 > 1 − hΛ, ρi 2 (4.42)
2
by one of the Fuchs–van de Graaf inequalities. The right-hand side of this inequal-
ity must exceed ε for all but finitely many positive integers n, by the fact that
hΛ, ρi < 1. It follows that for all but finitely many positive integers n, there are
no elements of Bε (ρ⊗n ) whose images are contained in im(σ⊗n ), which implies for
ε
any such n that Dmax (ρ⊗n kσ⊗n ) = ∞.
39
For the remainder of the proof it will be assumed that im(ρ) ⊆ im(σ ). Let
δ > 0 be chosen arbitrarily, and for each positive integer n, let Πn be the projection
whose existence is guaranteed by Lemma 4.4 for ρ, σ, δ, and n, and also let ∆n be
the projection whose existence is guaranteed by Lemma 4.4, again for ρ, δ, and n,
but where σ is replaced by ρ. Thus, we have [Πn , σ⊗n ] = 0 and [∆n , ρ⊗n ] = 0, and
the following inequalities are satisfied:
ρ⊗n σ ⊗n
ε
Dmax
lim ≤ D( ρ k σ ). (4.47)
n→∞ n
by Remark 4.6. It is therefore the case that h∆n , ρ⊗n i, hΠn , ρ⊗n i, and h∆n Πn ∆n , ρ⊗n i
are all positive for all but finitely many n, and we will restrict our attention to those
n for which these values are all positive.
Define a density operator
Πn ∆n ρ⊗n ∆n Πn
ξn = (4.49)
h∆n Πn ∆n , ρ⊗n i
for all values of n under consideration. We will begin by proving a bound on the
trace distance between ξ n and ρ⊗n . To this end, observe that
1 1 1
ξ n − ρ⊗n 1
≤ ξ n − τn 1
+ τn − ρ⊗n 1
(4.50)
2 2 2
for
∆n ρ⊗n ∆n
τn = , (4.51)
∆n , ρ⊗n
and notice that
Πn τn Πn
ξn = . (4.52)
hΠn , τn i
40
By Winter’s gentle measurement lemma and one of the Fuchs–van de Graaf in-
equalities, we find that
1 1 Πn τn Πn
q
ξ n − τn = − τn ≤ 1 − hΠn , τn i, (4.53)
2 1 2 hΠn , τn i 1
and because
h∆n Πn ∆n , ρ⊗n i
hΠn , τn i = ≥ h ∆ n Π n ∆ n , ρ ⊗ n i, (4.54)
h∆n , ρ⊗n i
we obtain
1
q q
ξ n − τn 1
≤ 1 − h ∆n Πn ∆n , ρ⊗n i ≤ 2K exp(−µn). (4.55)
2
Along similar (although simpler) lines, we find that
1
q q
⊗n
τn − ρ 1
≤ 1 − h∆n , ρ i ≤ K exp(−µn).
⊗ n (4.56)
2
These upper bounds are decreasing to 0 (exponentially quickly, as it happens), and
therefore
1
ξ n − ρ⊗n 1 ≤ ε, (4.57)
2
or equivalently ξ n ∈ Bε ρ⊗n , implying
for all but finitely many n. Let us further restrict our attention to these values of n.
Next, we will use the inequalities (4.43) and (4.44) to obtain an upper-bound on
Dmax ξ n σ ⊗ n . First, by (4.44), together with ∆n ≤ 1 and Πn = Πn , we find that
2
Second, by (4.44), together with the fact that [Πn , σ⊗n ] = 0, we have
Accounting for the normalization of ξ n and making use of (4.48), we find that
41
At this point we may conclude that
ρ⊗n σ ⊗n
ε
Dmax
lim ≤ D(ρkσ) + δ(H(ρ) − Tr(ρ log(σ))), (4.63)
n→∞ n
and as δ was an arbitrarily chosen positive real number, we obtain the required
inequality (4.47).
Now we will prove the reverse inequality
ρ⊗n σ ⊗n
ε
Dmax
lim ≥ D( ρ k σ ). (4.64)
n→∞ n
Let δ > 0 again be chosen arbitrarily. For every positive integer n it is the case that
by (4.43), as well as
For every ξ n ∈ Bε (ρ⊗n ) we have, by virtue of the fact that ρ⊗n − ξ n is traceless and
0 ≤ Πn ∆n Πn ≤ 1, that
1 ⊗n
Πn ∆n Πn , ρ⊗n − ξ n ≤ ρ − ξn 1
≤ε (4.69)
2
and therefore
hΠn ∆n Πn , ξ n i = hΠn ∆n Πn , ρ⊗n i + hΠn ∆n Πn , ξ n − ρ⊗n i
(4.70)
≥ 1 − 6K exp(−µn) − ε
by Lemma 4.5. Consequently,
42
Given the assumption ε ∈ (0, 1), one concludes that log 1 − 6K exp(−µn) − ε
converges to a constant value as n goes to infinity. It follows that
ρ⊗n σ ⊗n
ε
Dmax
lim ≥ D(ρkσ) − δ(H(ρ) − Tr(ρ log(σ))). (4.72)
n→∞ n
Once again, as δ was an arbitrarily chosen positive real number, the required in-
equality (4.64) follows.
Remark 4.11. An alternative way to argue the closeness of ξ n to ρ⊗n is to use a
different known equality concerning the fidelity function, which is that
F AA∗ , BB∗ = A∗ B 1
(4.73)
for any choice of operators A, B ∈ L(X, Y). (This fact is closely connected with
Uhlmann’s theorem, and can be found as Lemma 3.21 in my book Theory of Quan-
tum Information.) In the present case, we obtain
q q
⊗n ⊗n
F Πn ∆n ρ ∆n Πn , ρ ρ ∆n Πn ρ⊗n
⊗
= n
1
q q (4.74)
⊗n
≥ Tr ρ ∆n Πn ρ
⊗ n ⊗ n = ∆n Πn ∆n , ρ ,
where the last equality makes use of [∆n , ρ⊗n ] = 0. A suitable bound on the trace
distance between ξ n and ρ⊗n is obtained through the Fuchs–van de Graaf inequal-
ities.
And we will conclude with two corollaries.
It is not important that σ is a density operator in Theorem 4.10—it is true for
arbitrary positive semidefinite models. The only part of the proof that depends on
the scaling of σ occurs in the proof of Lemma 4.4, where q ∈ P(Σ) implies that
φ( a) = − log(q( a)) is nonnegative. Although it would not be difficult to modify
this portion of the proof slightly to handle arbitrary positive semidefinite models,
it is perhaps simpler to observe it as a fairly straightforward corollary of Theo-
rem 4.10.
Corollary 4.12. If ρ is a density operator and Q is any positive semidefinite operator, we
have
ρ⊗n Q⊗n
ε
Dmax
lim = D( ρ k Q ). (4.75)
n→∞ n
Proof. Let σ = Q/ Tr( Q). Then
ρ⊗n Q⊗n = Dmax ρ⊗n Tr( Q)n σ⊗n
ε
ε
Dmax
ρ⊗n σ⊗n − n log(Tr( Q))
ε
= Dmax
43
so
ρ⊗n Q⊗n ρ⊗n σ ⊗n
ε
ε
Dmax Dmax
lim = lim − log(Tr( Q))
n→∞ n n→∞ n
= D(ρkσ) − log(Tr( Q))
= D( ρ k Q ),
as required.
Φ (ρ )⊗n Φ (σ )⊗n
ε
Dmax
D(Φ(ρ)k Φ(σ )) = lim
n→∞ n
Dmax Φ (ρ⊗n ) Φ⊗n (σ⊗n )
⊗ n
ε
= lim
n→∞ n (4.77)
ε
D (ρ ⊗ n ⊗
σ n)
≤ lim max
n→∞ n
= D( ρ k σ ),
as required.
44
Lecture 5
where
√ p
F(ρ, Q) = ρ Q (5.2)
1
Remark 5.2. In the case that F(ρ, Q) = 0, which is equivalent to im(ρ) ⊥ im( Q),
one is to interpret that Dmin (ρk Q) = ∞.
45
Elementary observations
Here are a couple of relevant properties of the min-relative entropy that follow
directly from known properties of the fidelity function.
1. A variant of Klein’s inequality holds for the min-relative entropy. That is, if
ρ, σ ∈ D(X) are density operators, then Dmin (ρkσ) ≥ 0, with equality if and
only if ρ = σ.
2. The min-relative entropy is monotonic with respect to the action of channels.
That is, for every choice of ρ ∈ D(X), Q ∈ Pos(X), and Φ ∈ C(X, Y), it is the
case that
Dmin (Φ(ρ)kΦ( Q)) ≤ Dmin (ρk Q).
This is true, in fact, for all positive and trace-preserving maps Φ ∈ T(X, Y).
One can identify additional properties of the min-relative entropy through its very
direct connection to the fidelity function, which we know to have many interesting
and remarkable properties.
Theorem 5.3. Let ρ ∈ D(X) be a density operator and let Q ∈ Pos(X) be a positive
semidefinite operator, for X a complex Euclidean space. It is the case that
Proof. The theorem is trivial in the case im(ρ) 6⊆ im( Q), as the right-hand side of
(5.3) is infinite in this case, so the remainder of the proof is focused on the case
im(ρ) ⊆ im( Q). There is no loss of generality in assuming that Q is positive defi-
nite in this case, as the values of Dmin (ρk Q) and D(ρk Q) then do not change if X
is replaced by im( Q).
Define a function φ : (−1, 1) → R as
We are using the natural logarithm because it will simplify the calculus that will
soon be considered. It is not really important to the proof that this function is de-
fined on the entire interval (−1, 1), we only require that the function is defined on
46
the interval [0, 1/2] and is differentiable at α = 0. But, in any case, φ is differen-
tiable at every point α ∈ (−1, 1), with the derivative being given by
1−α Qα (ln( ρ ) − ln( Q ))
Tr ρ
φ0 (α) = . (5.5)
Tr ρ1−α Qα
1
φ0 (0) = Tr(ρ ln(ρ)) − Tr(ρ ln( Q)) = D( ρ k Q ). (5.6)
log(e)
Finally, noting that φ(0) = 0, we see that the theorem will follow from a demon-
stration that
φ(1/2) − φ(0)
φ 0 (0) ≥ . (5.9)
1/2
This in turn will follow from a demonstration that φ is a concave function.
To prove that φ is concave, it suffices to compute its second derivative and
observe that its value is non-positive. To make this as simple as possible, and to
avoid a messy calculation, let us use the spectral theorem to write
φ0 (α) = ∑
r a,b (α) ln( p( a)) − ln(q(b)) . (5.12)
( a,b)∈Σ×Γ
47
We may then express the second derivative of φ as
!2
φ00 (α) = ∑
r a,b (α) ln( p( a)) − ln(q(b))
( a,b)∈Σ×Γ (5.13)
∑
2
− r a,b (α) ln( p( a) − ln(q(b)) .
( a,b)∈Σ×Γ
for every α ∈ (−1, 1), we find that φ00 (α) is non-positive by Jensen’s inequality,
which completes the proof.
Equivalently,
Hmax (X|Y )ρ = sup log F(ρ, 1X ⊗ σ )2 .
(5.16)
σ ∈D(Y)
Writing n = dim(X) and ω = 1X /n, we see from the definition of the conditional
max-entropy that
48
Semidefinite program for conditional max-entropy
One way to compute the value Hmax (X|Y )ρ is to use the semidefinite program for
the fidelity function that was discussed in CS 766/QIC 820, obtaining the following
semidefinite program.
Remark 5.6. The dual problem may be simplified to obtain the expression
hρ, Z i k TrX ( Z −1 )k
α = inf + (5.21)
Z >0 2 2
for the optimal value of Optimization Problem 5.5. Using the arithmetic-geometric
mean inequality, one may conclude that
Examples
We may again consider a few examples of classes of states, to gain some intuition
on the conditional max-entropy.
Example 5.7. For any choice of σ ∈ D(X) and ξ ∈ D(Y), it is the case that
√
Hmax (X|Y )σ⊗ξ = 2 log Tr σ. (5.23)
49
Example 5.8. Calculating the conditional max-entropy Hmax (X|Y )ρ for a classical-
quantum state ρ of the form
n
ρ= ∑ pk | k ih k | ⊗ ξ k (5.24)
k =1
yields
!2
n
√
Hmax (X|Y )ρ = log sup ∑ p k F( ξ k , σ ) . (5.25)
σ ∈D(Y) k =1
1 n
n a,b∑
τ= | a ih b | ⊗ | a ih b | (5.26)
=1
for some choice of a density operator ξ ∈ D(Z) and an isometry V ∈ U(X ⊗ Z, Y).
Then we have
Hmax (X|Y )ρ = − log(n), (5.28)
just like the conditional min-entropy and conditional quantum entropy.
Theorem 5.10. Let X, Y, and Z be registers and assume the triple (X, Y, Z) is in a pure
state uu∗ , for u ∈ X ⊗ Y ⊗ Z a unit vector. It is the case that
50
Among the many nice properties that the fidelity function possesses is the fact that
if Q0 , Q1 ∈ Pos(U) and P0 ∈ Pos(U ⊗ V) satisfies TrV ( P0 ) = Q0 , then
for some state σ ∈ D(Z), as operators of the form vec(1X ) vec(1X )∗ ⊗ σ are the
only operators that leave vec(1X ) vec(1X )∗ when Z is traced out. The fidelity is
nondecreasing under the partial trace on the second tensor factor of X, and there-
fore 2
2− Hmin (X|Y) ≤ F TrY (uu∗ ), 1X ⊗ σ ≤ 2Hmax (X|Z) . (5.35)
Now let us prove the reverse inequality
for some choice of a channel Ξ ∈ C(Y, Z ⊗ X). Because the fidelity is nondecreasing
under the partial trace on both copies of Z, we obtain
∗ 2
2Hmax (X|Z) ≤ F 1L(X) ⊗ Φ (TrZ (uu∗ )), vec 1X vec 1X
(5.41)
as required.
51
5.3 Hypothesis-testing relative entropy
We will define the hypothesis-testing relative entropy as follows.
Definition 5.11. Let ρ ∈ D(X), Q ∈ Pos(X), and ε ∈ [0, 1]. The ε-hypothesis-testing
relative entropy of ρ with respect to Q is defined as
Elementary observations
Before we try to understand the intuitive meaning of this quantity, let us note a
few simple things about it.
ε
First, we see that DH (ρk Q) = ∞ is possible:
1. D0H (ρk Q) = ∞ if and only if im(ρ) 6⊆ im( Q).
2. For ε ∈ (0, 1) we have DH
ε
(ρk Q) = ∞ if and only if
Πker(Q) , ρ ≥ ε. (5.44)
This quantity has also been called the min-relative entropy by some, but obviously
we will not use this name given that we have already used it for something else—
the name 1-hypothesis-testing relative entropy will do just fine.
52
Semidefinite programming characterization
The definition of the hypothesis-testing relative entropy immediately suggests a
ε
semidefinite programming characterization. Specifically, the value DH (ρk Q) is the
negative logarithm of the optimal value of the following semidefinite program.
The primal problem is strictly feasible provided that ε ∈ [0, 1). In particular,
( (1+ ε )1
2ε if ε ∈ (0, 1)
X= (5.47)
21 if ε = 0
is strictly primal feasible. Strict primal feasibility is, on the other hand, impossible
when ε = 1. The dual problem is strictly feasible when ε ∈ (0, 1], by λ = 1 and
Y = 21/ε for instance. In the case ε = 0, the dual problem is strictly feasible if and
only if im(ρ) ⊆ im( Q).
Therefore, strong duality holds and the optimal values are achieved for all
choices of ρ and Q when ε ∈ (0, 1), by Slater’s theorem. We also have strong duality
and an optimal value achieved in the primal problem when ε = 1, again by Slater’s
theorem, as X = 1 is primal feasible (although not strictly so); and strong duality
also holds in the case ε = 0, as our examination of Optimization Problem 2.4 has
already revealed.
Interpretation
One way to interpret the ε-hypothesis-testing relative entropy, at least in the case
ε > 0, begins with observation that
h Q, Pi
ε
DH (ρk Q) = − inf log : 0 ≤ P ≤ 1, hρ, Pi ≥ ε . (5.48)
ε
For the sake of the discussion that follows, let us consider the case that Q = σ is a
density operator.
53
We can now consider a test being performed that aims to distinguish between
the states ρ and σ. Think of σ as representing an idealized model for a state, corre-
sponding to the null-hypothesis of the test being performed, whereas ρ is the actual
state of the system being tested. The operator P may be associated with a mea-
surement operator, the outcome corresponding to which is to be seen as a signal
supporting an alternative hypothesis. If we were to measure ρ with respect to such
a measurement, the alternative hypothesis may not be signaled with high prob-
ability, but the probability is at least ε. Think of the probability ε as representing
how small of a signal one is willing to tolerate in support of an alternative hypoth-
esis. The value 2− DH (ρ k Q) is then equal to the smallest possible value for h Q, Pi/ε
ε
δ
X = (1 − δ )1 + uu∗ (5.49)
huu∗ , ρi
in the primal problem, for δ ∈ (0, 1). The operator X is clearly positive semidefi-
nite, and it is the case that hσ, X i < hρ, X i = 1. The constraint εX ≤ 1 is satisfied
so long as
δ 1
1−δ+ ∗
≤ , (5.50)
huu , ρi ε
which is so for all sufficiently small δ, as 1/ε is strictly larger than 1 by the as-
sumption ε < 1. By selecting any such δ, one obtains a primal feasible X having
objective value strictly smaller than 1, which implies that DH ε
(ρkσ) is positive. In
1
the case ε = 1, one has DH (ρkσ) = 0 if and only if im(σ ) ⊆ im(ρ).
54
and therefore (λ, Φ(Y )) is dual-feasible for the instance of this problem corre-
sponding to DHε
(Φ(ρ)kΦ( Q)), with the same objective value being achieved by
the assumption that Φ preserves trace. It follows that
2− DH (Φ(ρ)kΦ(Q)) ≥ 2− DH (ρkQ) ,
ε ε
(5.53)
or, equivalently,
ε
DH (Φ(ρ)kΦ( Q)) ≤ DH
ε
( ρ k Q ). (5.54)
Theorem 5.14. Let ρ ∈ D(X) and Q ∈ Pos(X) be operators satisfying im(ρ) ⊆ im( Q)
and let ε ∈ (0, 1). It is the case that
√
ε
1
Dmax (ρk Q) ≤ ε
DH (ρk Q) + log . (5.57)
ε (1 − ε )
55
be a spectral decomposition of Z. We will use the basis {z1 , . . . , zn } to construct a
family of feasible solutions X to the primal form of Optimization Problem 5.12.
In particular, for each real number λ ∈ R, define a subset Sλ ⊆ {1, . . . , n} as
and define
Πλ = ∑ zk z∗k . (5.60)
k ∈ Sλ
Observe that
Π λ , ρ ≥ 2λ Π λ , Q , (5.61)
with the equality being strict so long as Πλ is nonzero. The operator
Πλ
X= (5.62)
ε
is therefore feasible for primal form of Optimization Problem 5.12 provided that
hΠλ , ρi ≥ ε, and with this observation in mind it is informative to consider the
supremum over all such values of λ:
γ = sup λ ∈ R : hΠλ , ρi ≥ ε .
(5.63)
Πγ−δ
X= (5.64)
ε
is primal feasible, and so we obtain the inequalities
h Q, Πγ−δ i 2− γ + δ 2− γ + δ
2 − DH ( ρ k Q ) ≤
ε
≤ hρ, Πγ−δ i ≤ . (5.65)
ε ε ε
As these inequalities hold for every δ > 0, it follows that
ε
DH (ρk Q) ≥ γ + log(ε). (5.66)
On the other hand, for any choice of δ > 0, it must be that hΠγ+δ , ρi < ε. The
density operator
(1 − Π γ + δ ) ρ (1 − Π γ + δ )
ξ= (5.67)
h1 − Π γ + δ , ρ i
therefore satisfies √
F(ρ, ξ ) ≥ 1−ε (5.68)
56
by Winter’s gentle measurement lemma, which implies that
1 √
k ρ − ξ k1 ≤ ε (5.69)
2
by one of the Fuchs–van de Graaf inequalities. Consequently,
√
1
∑ λk ( Z )hzk z∗k , ρi
ε
ψρ ( Z ) ≤ hξ, Z i =
h1 − Π γ + δ , ρ i k 6 ∈ S
γ+δ
(5.70)
2γ + δ 2γ + δ 2γ + δ
≤ ∑
1 − ε k6∈S
λk ( Z )hzk z∗k , Qi ≤
1−ε
h Z, Qi ≤
1−ε
.
γ+δ
and therefore √
ε
1
log ψρ ( Z ) ≤ ε
DH (ρk Q) + log . (5.72)
ε (1 − ε )
By optimizing over all operators Z ∈ Pos(X) that satisfy h Q, Z i ≤ 1 we obtain the
required inequality.
Theorem 5.15. Let ρ ∈ D(X) and Q ∈ Pos(X) be operators satisfying im(ρ) ⊆ im( Q),
let ε ∈ (0, 1), and let δ ∈ (0, 1 − ε). It is the case that
δ
ε+δ
DH (ρk Q) ≤ Dmax ε
(ρk Q) − log . (5.73)
ε+δ
Proof. Suppose that ξ ∈ Bε (ρ) satisfies
ξ ≤ 2Dmax (ρ k Q) Q.
ε
(5.74)
ρ≤ξ+R (5.75)
2− Dmax (ρ k Q) R
ε
− Dmax
ε
(ρ k Q)
λ=2 and Y = (5.76)
ε+δ
in the dual form of Optimization Problem 5.12 (with ε being replaced by ε + δ).
These choices represent a feasible solution to this conic program, and the objective
value is at least
− Dmax
ε
(ρ k Q) ε
2 1− (5.77)
ε+δ
57
It follows that δ
ε+δ
DH (ρ k Q) ≤ Dmax
ε
(ρ k Q) − log , (5.78)
ε+δ
which completes the proof.
Corollary 5.16. Let ρ ∈ D(X) and Q ∈ Pos(X) be operators. For every ε ∈ (0, 1), we
have
ρ⊗n Q⊗n
ε
DH
lim = D( ρ k Q ). (5.79)
n→∞ n
58
Lecture 6
In this lecture we will discuss nonlocal games, which offer a model through which
the phenomenon of nonlocality is commonly studied. We will then narrow our
focus to XOR games, which are a highly restricted form of nonlocal games that
can, perhaps surprisingly, be analyzed through semidefinite programming. This is
made possible by Tsirelson’s theorem, which we will prove in this lecture.
In this definition, the sets X and Y are the sets of questions, and A and B are
the sets of answers, for Alice and Bob, respectively. The probability vector π de-
termines the probability with which each pair of questions ( x, y) ∈ X × Y is se-
lected by the referee, and V determines whether or not a pair of answers ( a, b)
wins or loses for a given pair of questions ( x, y). For a given pair of questions
59
( x, y) ∈ X × Y and a pair of answers ( a, b) ∈ A × B, we write the value of the pred-
icate as V ( a, b | x, y), because that’s the way Ben Toner prefers it to be written—as it
helps to stress the idea that ( a, b) either wins or loses given that the question pair
( x, y) was selected.
Example 6.2 (The CHSH game). The CHSH game (named after Clauser, Horn,
Shimony, and Holt) is a nonlocal game in which the questions and answers cor-
respond to binary values, X = Y = A = B = {0, 1}, the probability vector π is
uniform,
1
π (0, 0) = π (0, 1) = π (1, 0) = π (1, 1) = , (6.1)
4
and the predicate V is defined as
(
1 if a ⊕ b = x ∧ y
V ( a, b| x, y) = (6.2)
0 if a ⊕ b 6= x ∧ y,
where a ⊕ b denotes the XOR of a and b, and x ∧ y denotes the AND of x and y.
Intuitively speaking, if the referee selects any of the question pairs (0, 0), (0, 1),
or (1, 0), then Alice and Bob must provide a pair of answers ( a, b) for which a = b
in order to win, while if the referee selects the question pair (1, 1), the answer ( a, b)
wins when a 6= b.
Example 6.3 (The FFL game). The FFL game (named after Fortnow, Feige, and
Lovász) is a nonlocal game in which the questions and answers correspond to
binary values, X = Y = A = B = {0, 1}, the probability vector π is given by
1
π (0, 0) = π (0, 1) = π (1, 0) = , π (1, 1) = 0, (6.3)
3
60
Example 6.4 (Graph coloring games). Suppose that H = (V, E) is an undirected
graph and k is a positive integer. Let us also define n = |V | and m = | E|, and
assume m ≥ 1. We may form a nonlocal game in the following way. The question
sets are both equal to the set of vertices, X = Y = {1, . . . , n}, and the answer sets
are given by A = B = {1, . . . , k}, which we may intuitively think about as colors.
The probability vector π is defined as follows:
1
2n if x = y
π ( x, y) = 4m 1
if { x, y} ∈ E (6.5)
0
otherwise.
In words, the referee flips a fair coin, and if the outcome is heads, it randomly
selects a vertex and sends it to both players, and if the outcome is tails, it randomly
selects an edge and then sends the two incident vertices to the two players (again
at random). The predicate is defined as
1 if x = y and a = b
V ( a, b | x, y) = 1 if x 6= y and a 6= b (6.6)
0 otherwise.
The idea is that if Alice and Bob receive the same vertex, they should answer with
the same color, while if they receive different (adjacent) vertices, they should an-
swer with different colors.
Strategies
The definition of a nonlocal game does not, in itself, specify or restrict the sorts
of strategies that Alice and Bob might employ when playing. There are, in fact,
different types of strategies that are of interest. Let us start with a short summary
of the strategy types that are of interest for this lecture.
61
2. Randomized strategies. Rather than choosing their answers deterministically, Al-
ice and Bob could choose to make use of randomness when selecting their
answers. The randomness could be in the form of local randomness, where
Alice and Bob individually generate random numbers to assist in the selection
of their answers, or it could be in the form of shared randomness, which one
might view as having been generated by Alice and Bob at some point in the
past.
As it turns out, randomized strategies are not helpful to Alice and Bob, as-
suming their goal is to maximize the probability that they win. This is because
randomized strategies can simply be viewed as the random selection of a de-
terministic strategy, and Alice and Bob might as well just select the optimal
deterministic strategy—the average winning probability obviously cannot be
larger than the maximum winning probability over all deterministic strategies.
3. Entangled strategies. An entangled strategy is one in which Alice and Bob make
use of a shared quantum state when playing a nonlocal game. That is, Alice
holds a register A and Bob holds a register B, where (A, B) is in a joint state
ρ ∈ D(A ⊗ B), prior to the referee sending the questions. Upon receiving a
question x ∈ X, Alice measures the register A with respect to a measurement
described by a collection of measurement operators
x
Pa : a ∈ A ⊂ Pos(A), (6.7)
62
There are other types of strategies that are often considered in the study of non-
local games, including commuting operator strategies and no-signaling strategies—we
will discuss commuting operator strategies in the lecture following the next one.
One could also consider global strategies, in which there is no implicit assumption
that Alice and Bob are separated, so that ( a, b) can depend arbitrarily on ( x, y), but
this class of strategies is not very interesting in a setting in which the nonlocality
of Alice and Bob is relevant.
Values of games
When we speak of the value of a nonlocal game, we’re referring to the supremum
probability with which Alice and Bob can win the game, with respect to whatever
class of strategies we might wish to consider. For this lecture we will focus on two
values: the classical value and the entangled value.
Definition 6.5 (Classical value of a nonlocal game). The classical value of a nonlocal
game G = ( X, Y, A, B, π, V ), which is denoted ω ( G ), is given by a maximization
of the winning probability over all deterministic strategies:
∑
ω ( G ) = max π ( x, y)V f ( x ), g(y)| x, y , (6.10)
f ,g
( x,y)∈ X ×Y
Remark 6.6. As was already discussed, there is need to differentiate between de-
terministic and randomized values of nonlocal games, because they are the same—
and so the name classical value is justified.
Definition 6.7 (Entangled value of a nonlocal game). The entangled value of a non-
local game G = ( X, Y, A, B, π, V ), which is denoted ω ∗ ( G ), is the supremum of the
winning probabilities
∑ ∑
y
Pax ⊗ Qb , ρ ,
π ( x, y) V a, b| x, y (6.11)
( x,y)∈ X ×Y ( a,b)∈ A× B
over all choices of complex Euclidean spaces A and B, states ρ ∈ D(A ⊗ B}, and
sets of measurements
y
Pax : a ∈ A} x∈X ⊂ Pos(A) Qb : b ∈ B}y∈Y ⊂ Pos(B).
and (6.12)
That is, the entangled value is the supremum winning probability over all entan-
gled strategies.
63
Remark 6.8. There are nonlocal games for which the winning probability is never
achieved, so it is necessary to use the supremum in this definition. The principal
issue is that the dimensions of the spaces A and B are not bounded as one ranges
over all entangled strategies.
Example 6.9 (CHSH game values). Letting G denote the CHSH game, we have that
the classical value of this game is ω ( G ) = 3/4. This may be verified by checking
that the winning probability of each of the 16 possible deterministic strategies is at
most 3/4, and of course that some of those strategies win with probability 3/4.
The entangled value of the CHSH game is ω ∗ ( G ) = cos2 (π/8) ≈ 0.85. The fact
that this is so will emerge as a simple corollary to Tsirelson’s theorem—which is
fitting given that the inequality ω ∗ ( G ) ≤ cos2 (π/8) is a rephrasing of an inequality
known as Tsirelson’s bound.
Example 6.10 (FFL game values). If we let G denote the FFL game, then we have
that its classical value and quantum value agree: ω ( G ) = ω ∗ ( G ) = 2/3. The fact
that ω ( G ) = 2/3 is easily established by testing all deterministic classical strate-
gies. I will ask you to prove that ω ∗ ( G ) = 2/3 as a homework problem. One way to
do this is to prove that even the so-called no-signaling value, which upper-bounds
the quantum value, of the FFL game is 2/3. The no-signaling value can be com-
puted through linear programming.
Example 6.11 (Graph coloring game values). If G is the graph coloring game de-
termined by a graph H and an integer k, there we see that ω ( G ) = 1 if and only
if the chromatic number of H is at most k. That is, given any perfect deterministic
strategy, meaning one that wins with certainty, it is possible to recover a k-coloring
of H, meaning an assignment of colors {1, . . . , k } to the vertices of H such that no
two adjacent vertices share the same color.
There are known examples of graphs H and choices of k for which the associ-
ated nonlocal game G satisfies ω ( G ) < 1 but ω ∗ ( G ) = 1.
64
for some choice of a function f : X × Y → {0, 1}. Intuitively speaking, the func-
tion f specifies whether a and b should agree or disagree in order to be a winning
answer, for each question pair ( x, y). Notice that exactly one of the two possibili-
ties, meaning the possibilities that a and b agree or disagree, always wins for each
question pair, while the other possibility loses.
As every XOR game is uniquely determined by the sets X and Y, the probability
vector π ∈ P( X × Y ), and the function f : X × Y → {0, 1}, we will identify the
corresponding game G with the quadruple ( X, Y, π, f ) when it is convenient to do
that. For example, the CHSH game is an example of an XOR game, corresponding
to the the quadruple ({0, 1}, {0, 1}, π, f ), for π the uniform probability vector and
f ( x, y) = x ∧ y being the AND function.
ε( G ) = 2ω ( G ) − 1 and ε∗ ( G ) = 2ω ∗ ( G ) − 1, (6.14)
or, alternatively,
1 ε( G ) 1 ε∗ ( G )
ω (G) = + and ω∗ (G) = + . (6.15)
2 2 2 2
∑
y y
π ( x, y)(−1) f ( x,y) P0x − P1x ⊗ Q0 − Q1 , ρ
(6.17)
x,y∈ X ×Y
for a few moments, we find that it agrees with the bias of the strategy just de-
y y
scribed. By defining A x = P0x − P1x for each x ∈ X and By = Q0 − Q1 for each
65
y ∈ Y, we may express this quantity as
Theorem 6.12 (Tsirelson’s theorem). For every choice of finite and nonempty sets X and
Y and an operator M ∈ L(RY , RX ), the following statements are equivalent.
1. There exist complex Euclidean spaces A and B, a density operator ρ ∈ D(A ⊗ B), and
two collections { A x : x ∈ X } ⊂ Herm(A) and { By : y ∈ Y } ⊂ Herm(B) of
operators such that k A x k ≤ 1, k By k ≤ 1, and
M( x, y) = A x ⊗ By , ρ (6.19)
Remark 6.13. The second statement in the theorem is equivalent to one in which
the requirement that R and S have real number entries is added. In particular, if R0
and S0 satisfy the conditions listed in the second statement of the theorem, then so
too will
R0 + RT0 S0 + S0T
R= and S = , (6.21)
2 2
66
by virtue of the fact that M has real-number entries and
T
R M 1 R0 M 1 R0 M
= + (6.22)
M∗ S 2 M ∗ S0 2 M ∗ S0
is a positive semidefinite operator whose diagonal entries are all equal to 1.
The first statement of the theorem says that the operator M, which is best
viewed as a matrix indexed by pairs ( x, y) ∈ X × Y in this case, describes exactly
the values in the expression (6.18) that depend upon the strategy under consider-
ation. The second statement of the theorem is a suprisingly simple condition on
M—and it may come at no surprise to learn that it will be used to define semidefi-
nite programs to calculate XOR game biases. The fact that these two statements are
exactly the same thing is a remarkable thing of beauty.
Weyl–Brauer operators
The proof of Tsirelson’s theorem will make use of a collection of unitary and Her-
mitian operators known as Weyl–Brauer operators.
Definition 6.14. Let N be a positive integer and let Z = C2 . The Weyl–Brauer oper-
ators of order N are the operators V1 , . . . , V2N +1 ∈ L(Z⊗ N ) defined as
⊗(k−1)
V2k−1 = σz ⊗ σx ⊗ 1⊗( N −k) ,
(6.23)
⊗(k−1)
V2k = σz ⊗ σy ⊗ 1⊗( N −k) ,
for all k ∈ {1, . . . , N }, as well as
V2N +1 = σz⊗ N , (6.24)
where 1, σx , σy , and σz denote the Pauli operators:
1 0 0 1 0 −i 1 0
1= , σx = , σy = , σz = . (6.25)
0 1 1 0 i 0 0 −1
Example 6.15. In the case N = 3, the Weyl–Brauer operators V1 , . . . , V7 are
V1 = σx ⊗ 1 ⊗ 1
V2 = σy ⊗ 1 ⊗ 1
V3 = σz ⊗ σx ⊗ 1
V4 = σz ⊗ σy ⊗ 1 (6.26)
V5 = σz ⊗ σz ⊗ σx
V6 = σz ⊗ σz ⊗ σy
V7 = σz ⊗ σz ⊗ σz .
67
A proposition summarizing the properties of the Weyl–Brauer operators that
are relevant to the proof of Tsirelson’s theorem follows.
Proposition 6.16. Let N be a positive integer, let V1 , . . . , V2N +1 denote the Weyl–Brauer
operators of order N. For every unit vector u ∈ R2N +1 , the operator
2N +1
∑ u(k )Vk (6.27)
k =1
is both unitary and Hermitian, and for any two vectors u, v ∈ R2N +1 , it holds that
* +
1 2N +1 2N +1
2 N j∑
u( j)Vj , ∑ v(k )Vk = hu, vi. (6.28)
=1 k =1
Proof. Each operator Vk is Hermitian, and therefore the operator (6.27) is Hermitian
as well.
The Pauli operators anti-commute in pairs:
σx σy = −σy σx , σx σz = −σz σx , and σy σz = −σz σy . (6.29)
By an inspection of the definition of the Weyl–Brauer operators, it follows that
V1 , . . . , V2N +1 also anti-commute in pairs:
Vj Vk = −Vk Vj (6.30)
for distinct choices of j, k ∈ {1, . . . , 2N + 1}. Moreover, each Vk is unitary (as well
as being Hermitian), and therefore Vk2 = 1⊗ N . It follows that
!2
2N +1 2N +1
∑ u(k)Vk = ∑ u(k)2 Vk2 + ∑ u( j)u(k) Vj Vk + Vk Vj
k =1 k =1 1≤ j<k≤2N +1 (6.31)
2N +1
= ∑ u ( k )2 1⊗ N = 1⊗ N ,
k =1
and therefore (6.27) is unitary.
Next, observe that (
2N if j = k
hVj , Vk i = (6.32)
0 if j 6= k.
Therefore, one has
* +
1 2N +1 2N +1
2 N j∑
u( j)Vj , ∑ v(k )Vk
=1 k =1
2N +1 2N +1 2N +1
(6.33)
1
= N
2 ∑ ∑ u( j)v(k )hVj , Vk i = ∑ u(k)v(k ) = hu, vi,
j =1 k =1 k =1
as required.
68
Proof of Tsirelson’s theorem
Proof of Theorem 6.12. For the sake of simplifying notation, we will make the as-
sumption that X = {1, . . . , n} and Y = {1, . . . , m}.
Assume that the first statement is true, and define an operator
√ ∗
vec ( A1 ⊗ 1) ρ
..
.
vec ( A ⊗ 1)√ρ∗
n
K= √ ∗ ∈ L(A ⊗ B ⊗ A ⊗ B, Cn ⊕ Cm ). (6.34)
vec (1 ⊗ B1 ) ρ
..
.
√ ∗
vec (1 ⊗ Bm ) ρ
∗ P M
KK = (6.35)
M∗ Q
for P ∈ Pos(Cn ) and Q ∈ Pos(Cm ); the fact that the off-diagonal blocks are as
claimed follows from the calculation
√ √
( A j ⊗ 1) ρ, (1 ⊗ Bk ) ρ = A j ⊗ Bk , ρ = M( j, k). (6.36)
which is necessarily a nonnegative real number in the interval [0, 1]; and through
a similar calculation, one finds that Q(k, k ) is also a nonnegative integer in the
interval [0, 1] for each k ∈ {1, . . . , m}. A nonnegative real number may be added to
each diagonal entry of this operator to yield another positive semidefinite operator,
so one has that statement 2 holds.
Next, let us assume statement 2 holds. As was explained in Remark 6.13, we
are free to assume that all of the entries of R and S are real numbers.
Now, a matrix with real number entries is positive semidefinite if and only if
it is the Gram matrix of a collection of real vectors, and therefore there must exist
real vectors {u1 , . . . , un , v1 , . . . , vm } such that
hu j , vk i = M( j, k) (6.38)
69
for all j0 , j1 ∈ {1, . . . , n} and k0 , k1 ∈ {1, . . . , m}. There are n + m of these vectors,
and therefore they span a real vector space of dimension at most n + m, so there is
no loss of generality in assuming u1 , . . . , un , v1 , . . . , vm ∈ Rn+m . Observe that these
vectors are all unit vectors, as the diagonal entries of R and S represent their norm
squared.
Choose N so that 2N + 1 ≥ n + m and let Z = C2 . Define operators A1 , . . . , An
and B1 , . . . , Bm , all acting on L(Z⊗ N ), as
n+m n+m
Aj = ∑ u j (i )Vi and Bk = ∑ vk (i )ViT (6.40)
i =1 i =1
for each j ∈ {1, . . . , n} and k ∈ {1, . . . , m}, where V1 , . . . , Vn+m are the first n + m
Weyl–Brauer operators of order N. By Proposition 6.16, each of these operators
is both unitary and Hermitian, and therefore each of these operators has spectral
norm equal to 1.
Finally, define
1 ⊗N ⊗N ∗ ⊗N ⊗N
Z Z
ρ= vec 1 vec 1 ∈ D ⊗ . (6.41)
2N
Applying Proposition 6.16 again gives
1
A j ⊗ Bk , ρ = N
A j , BkT = hu j , vk i = M ( j, k), (6.42)
2
for each j ∈ {1, . . . , n} and k ∈ {1, . . . , m}.
We have proved that statement 2 implies statement 1, for the spaces A = Z⊗ N
and B = Z⊗ N , and so the proof is complete.
70
Lecture 7
taken over all choices for complex Euclidean spaces A and B, a state ρ ∈ D(A ⊗ B),
and Hermitian contractions
taken over all M ∈ L(RY , RX ) for which there exist R ∈ Pos(CX ) and S ∈ Pos(CY )
for which R( x, x ) = 1 for all x ∈ X, S(y, y) = 1 for all y ∈ Y, and
R M
∈ Pos(CX ⊕ CY ). (7.4)
M∗ S
71
With this fact in mind, let us consider the following semidefinite program.
First, let us write X = CX and Y = CY for brevity, and let ∆ ∈ C(X ⊕ Y) de-
note the completely dephasing channel acting on X ⊕ Y, which zeros out all of the
off-diagonal entries of its input and leaves the diagonal entries alone. Define an
operator D ∈ L(Y, X) as
72
owing to the fact that D and M have real number entries. Noting that
h D, Mi = ∑ π ( x, y)(−1) f ( x,y) M ( x, y), (7.9)
( x,y)∈ X ×Y
we find that the optimal value of the semidefinite program is at least the entangled
bias of G.
To see that the optimal value of the semidefinite program is no greater than the
entangled bias of G, consider any Z ∈ Pos(X ⊕ Y), which can be expressed as
R K
Z= (7.10)
K∗ S
for some choice of R ∈ Pos(X), S ∈ Pos(Y), and K ∈ L(Y, X). Again the constraint
∆( Z ) = 1X⊕Y is equivalent to R and S having diagonal entries equal to one, and
the only issue remaining is that K might not have real number entries. However,
by expressing the objective function in terms of the block structure of H and Z, we
find that
1 1
h H, Z i = h D, K i + h D ∗ , K ∗ i = h D, Mi (7.11)
2 2
for
K+K
M= , (7.12)
2
following from the fact that D has real entries:
h D ∗ , K ∗ i = h D, K i = D, K = D, K . (7.13)
Proceeding in much the same way as in the previous lecture, we see that
T 1
R + 12 RT
1 R K 1 R K 2 M
+ = (7.14)
2 K∗ S 2 K∗ S M∗ 1 1 T
2S + 2S
73
for vectors u ∈ RX and v ∈ RY . (Here we’re including the factor of 1/2 for the
sake of convenience—we’re free to scale the vectors u and v as we choose.) The
objective function then becomes
1 1
Tr(W ) = ∑
2 x∈X
u ( x ) + ∑ v ( y ),
2 y ∈Y
(7.17)
∑ u ( x ) = ∑ v ( y ). (7.19)
x∈X y ∈Y
The reason is that for any choice of u and v, and for any λ > 0, the operator
Diag(u) −D
(7.20)
− D∗ Diag(v)
74
is positive semidefinite. The dual objective value obtained by the operator (7.21),
assuming it is positive semidefinite, is equal to
λ 1
2 ∑ u(x) + 2λ ∑ v(y). (7.22)
x∈X y ∈Y
Assuming that D is nonzero, which is always the case when it arises from an XOR
game G, it must be the case that ∑ x∈X u( x ) and ∑y∈Y v(y) are strictly positive, and
in this case the minimum value for (7.22) occurs when
s
∑ y ∈Y v ( y )
λ= , (7.23)
∑ x∈X u( x )
and this is the unique choice of λ for which the minimum is obtained. Thus, under
the assumption that u and v are optimal, it must be the case that λ = 1, which is
equivalent to (7.19).
We can now verify that the entangled value of the CHSH game is cos2 (π/8), as
claimed in the previous lecture.
Example 7.3 (CHSH game entangled bias/value). Recall that the CHSH game is
the XOR game G = ( X, Y, π, f ) with
X = Y = {0, 1},
1
π (0, 0) = π (0, 1) = π (1, 0) = π (1, 1) = , (7.24)
4
f ( x, y) = x ∧ y.
We will verify√that the optimal value of Optimization Problem 7.2 for the game G
is ε∗ ( G ) = 1/ 2.
First choose
1 1 1
M= √ , (7.26)
2 1 −1
which has spectral norm equal to 1—it is a unitary operator, representing the
Hadamard transform—and observe that for R = S = 1 we have that
R M
Z= ≥ 0. (7.27)
M∗ S
75
The diagonal entries of R and S are equal √ to one, so Z is primal feasible, and it
achieves the objective value h D, M i = 1/ 2.
In the dual problem, choosing
1 1 1 1
u= √ , √ and v = √ , √ (7.28)
2 2 2 2 2 2 2 2
yields a feasible solution. The objective value is the same value just achieved in the
primal:
1 1 1
∑
2 x∈{0,1}
u( x ) + ∑
2 y∈{0,1}
v(y) = √ .
2
(7.29)
Having obtained the same value in the primal and dual, we have verified that
the optimal value, which is the entangled bias, is
1
ε∗ ( G ) = √ . (7.30)
2
This implies that the entangled value of the CHSH game is
1 1
ω∗ (G) = + √ = cos2 (π/8). (7.31)
2 2 2
π (( x1 , . . . , xn ), (y1 , . . . , yn )) = π1 ( x1 , y1 ) · · · πn ( xn , yn ) (7.34)
76
and the predicate
Example 7.4 (Parallel repetition of the FFL game). The classical value of the FFL
game, introduced in the previous lecture, is 2/3. Remarkably, the classical value
of the two-fold repetition FFL ∧ FFL of this game with itself is also 2/3. A deter-
ministic strategy that achieves this winning probability is that Alice and Bob both
respond to their question pairs by simply swapping the two binary values. That
is, Alice’s answers are determined by the function f : X × X → A × A and Bob’s
answers are determined by the function g : Y × Y → B × B, where
f ( x1 , x2 ) = ( x2 , x1 ) and g ( y1 , y2 ) = ( y2 , y1 ). (7.36)
For this strategy, the winning condition in both games is the same:
x1 ∨ x2 6 = y1 ∨ y2 . (7.37)
That is, they either win both games or lose both games, never winning just one of
them. If ( x1 , y1 ) and ( x2 , y2 ) are independently and uniformly generated from the
set {(0, 0), (0, 1), (1, 0)} then the above condition fails only when
(( x1 , y1 ), ( x2 , y2 )) ∈ ((0, 0), (0, 0)), ((1, 0), (0, 1)), ((0, 1), (1, 0)) . (7.38)
Such question pairs are selected with probability 3/9 = 1/3, so the winning proba-
bility is 2/3, as claimed.
The example of the FFL game illustrates that the classical value of a nonlo-
cal game G = G1 ∧ · · · ∧ Gn is not always equal to the product of the values of
77
G1 , . . . , Gn , which is what one obtains when Alice and Bob play the games inde-
pendently. That is, one always has
ω ( G1 ∧ · · · ∧ Gn ) ≥ ω ( G1 ) · · · ω ( Gn ), (7.39)
but in some cases the inequality is strict. A similar phenomenon occurs for the
entangled value—although we did not prove it, the entangled value of the FFL
game agrees with the classical value, and therefore
2
ω ∗ (FFL ∧ FFL) ≥ ω (FFL ∧ FFL) = = ω ∗ (FFL) > ω ∗ (FFL)2 . (7.40)
3
We will prove that the entangled value of XOR games forbids this type of ad-
vantage. That is, if G1 , . . . , Gn are XOR games, then
ω ∗ ( G1 ∧ · · · ∧ Gn ) = ω ∗ ( G1 ) · · · ω ∗ ( Gn ). (7.41)
G ∧n = G ∧ · · · ∧ G (n times), (7.42)
ω ∗ ( G ∧n ) = ω ∗ ( G )n . (7.43)
The first step in proving that (7.41) holds for XOR games G1 , . . . , Gn is to define
the XOR of two (or more) XOR games. Suppose that G1 and G2 are XOR games,
specified by probability distributions
The XOR G1 ⊕ G2 of these two games is the XOR game defined by the distribution
π : ( X1 × X2 ) × (Y1 × Y2 ) → [0, 1] given by
f (( x1 , x2 ), (y1 , y2 )) = f 1 ( x1 , y1 ) ⊕ f 2 ( x2 , y2 ). (7.47)
78
In words, the question sets for the game G1 ⊕ G2 are X1 × X2 and Y1 × Y2 , and
if Alice receives ( x1 , x2 ) and Bob receives (y1 , y2 ), they are expected to provide
answers a, b ∈ {0, 1} that are consistent with the equations
a = a1 ⊕ a2 and b = b1 ⊕ b2 (7.48)
ε∗ ( G1 ⊕ G2 ) ≥ ε∗ ( G1 )ε∗ ( G2 ). (7.49)
The reason is that if Alice and Bob play the game G1 ⊕ G2 by playing G1 and G2
independently and optimally, and then answer according to the XOR of their an-
swers in G1 and G2 , then they will achieve the bias ε∗ ( G1 )ε∗ ( G2 ). We will prove
that this is, in fact, the best they can do. That is, we will prove
ε∗ ( G1 ⊕ G2 ) = ε∗ ( G1 )ε∗ ( G2 ). (7.50)
∑ u1 ( x1 ) = ∑ v1 ( y1 ) and ∑ u2 ( x2 ) = ∑ v2 ( y2 ). (7.51)
x1 y1 x2 y2
The dual form of Optimization Problem 7.2 for the entangled bias of the XOR game
G1 ⊕ G2 has the following form:
1 1
minimize: ∑
2 x∈X ×X
u( x ) +
2 y∈Y∑×Y
v(y)
1 2 1 2
!
Diag(u) − D1 ⊗ D2
subject to: ≥ 0,
− D1∗ ⊗ D2∗ Diag(v)
u ∈ RX1 ×X2 , v ∈ RY1 ×Y2 .
D1 ( x1 , y1 ) = π1 ( x1 , y1 )(−1) f1 ( x1 ,y1 ) ,
(7.52)
D2 ( x2 , y2 ) = π2 ( x2 , y2 )(−1) f2 ( x2 ,y2 ) .
79
Next we observe that u = u1 ⊗ u2 and v = v1 ⊗ v2 provide a dual feasible
solution to Optimization Problem 7.2 for ε∗ ( G1 ⊕ G2 ). To prove that this is so, it is
helpful to observe that if
P1 X1 P2 X2
≥ 0 and ≥0 (7.53)
X1∗ Q1 X2∗ Q2
then
P1 ⊗ P2 X1 ⊗ X2
≥ 0. (7.54)
X1∗ ⊗ X2∗ Q1 ⊗ Q2
This is so because, after some simultaneous re-ordering of rows and columns, the
matrix above is a principal submatrix of the positive semidefinite matrix
P1 X1 P2 X2
⊗ . (7.55)
X1∗ Q1 X2∗ Q2
80
where we have used the fact that
ω ∗ ( G1 ∧ · · · ∧ Gn ) = ω ∗ ( G1 ) · · · ω ∗ ( Gn ). (7.62)
ω ∗ ( G1 ∧ · · · ∧ Gn ) ≥ ω ∗ ( G1 ) · · · ω ∗ ( Gn ), (7.63)
ω ∗ ( G1 ∧ · · · ∧ Gn ) ≤ ω ∗ ( G1 ) · · · ω ∗ ( Gn ), (7.64)
Assume hereafter that XOR games G1 , . . . , Gn have been fixed, and consider an
arbitrary strategy for Alice and Bob in the game G1 ∧ · · · ∧ Gn , through which Alice
and Bob answer question tuples ( x1 , . . . , xn ) and (y1 , . . . , yn ) with answer tuples
( a1 , . . . , an ) and (b1 , . . . , bn ), respectively. We will consider how well this strategy
performs for the XOR game
Gk1 ⊕ · · · ⊕ Gkm , (7.65)
for various choices of a subset S = {k1 , . . . , k m } ⊆ {1, . . . , n}, provided that we
define Alice and Bob’s answers as
and where we assume that they have chosen to share randomly generated question
pairs ( xk , yk ) for those choices of k 6∈ S.
To do this we will define binary-valued random variables Z1 , . . . , Zn as
Zk = ak ⊕ bk ⊕ f k ( xk , yk ), (7.67)
81
and therefore the probability of winning minus losing game Gk is equal to the
expectation
E (−1) Zk . (7.69)
More generally, if Alice and Bob’s strategy is transformed into a strategy for the
XOR game Gk1 ⊕ · · · ⊕ Gkm as suggested above, we find that
(
0 if Alice and Bob win Gk1 ⊕ · · · ⊕ Gkm
Zk1 ⊕ · · · ⊕ Zkm = (7.70)
1 if Alice and Bob lose Gk1 ⊕ · · · ⊕ Gkm
and therefore the probability they win minus the probability they lose in this XOR
game is
E (−1) Zk1 +···+Zkm . (7.71)
Now, we know that probability of winning minus the probability of losing in the
game Gk1 ⊕ · · · ⊕ Gkm is upper-bounded by its bias:
Zk1 +···+ Zk m
E (−1) ≤ ε∗ ( Gk1 ⊕ · · · ⊕ Gkm ) = ε∗ ( Gk1 ) · · · ε∗ ( Gkm ). (7.72)
Here we have used the fact concerning XOR game biases proved above, which is
the key to making the entire argument work. The probability that Alice and Bob’s
strategy wins G1 ∧ · · · ∧ Gn is therefore bounded as follows:
1 + (−1) Z1 1 + (−1) Zn
Pr( Z1 = 0, . . . , Zn = 0) = E ···
2 2
1
=
2n ∑ E (−1)∑k∈S Zk
S⊆{1,...,n}
1 (7.73)
2n S⊆{∑ ∏ ε∗ (Gk )
≤
1,...,n} k∈S
1 + ε∗ ( G1 ) 1 + ε∗ ( Gn )
= ···
2 2
= ω ∗ ( G1 ) · · · ω ∗ ( Gn ).
Maximizing over all possible entangled strategies for Alice and Bob yields
ω ∗ ( G1 ∧ · · · ∧ Gn ) ≤ ω ∗ ( G1 ) · · · ω ∗ ( Gn ), (7.74)
as required.
82
Lecture 8
The hierarchy of
Navascués, Pironio, and Acín
In this lecture, we will define and study a class of strategies for nonlocal games
known as commuting measurement strategies, or alternatively as commuting operator
strategies. These strategies include all entangled strategies, in a sense that will be
made more precise momentarily—and it was not long ago that this inclusion was
proved to be proper by Slofstra [arXiv:1606.03140]. A proof that the values defined
by the classes of entangled and commuting measurement strategies are different,
where we take the supremum winning probability over the two classes of strate-
gies, has only recently been announced by Ji, Natarajan, Vidick, Wright, and Yuen
[arXiv:2001.04383]. But be warned—the paper is over 200 pages long. This refutes
the famous Connes’ embedding conjecture from the subject of von Neumann alge-
bras, so it is worth every page it needs.
We will then analyze the semidefinite programming hierarchy of Navascués,
Pironio, and Acín, better known as the NPA hierarchy, which provides us with a
uniform family of semidefinite programs that converges to the commuting measure-
ment value of any nonlocal game. This result is, in fact, a necessary ingredient in Ji,
Natarajan, Vidick, Wright, and Yuen’s proof.
M ∈ L(RX ⊗ RY , R A ⊗ RB ), (8.1)
83
or equivalently as matrices whose columns are indexed by question pairs and
whose rows are indexed by answer pairs. To be precise, the value M( a, b | x, y) rep-
resents the probability that Alice and Bob answer the questions ( x, y) with the
answer ( a, b).1
This representation is nice because not only does M store the probabilities with
which Alice and Bob respond to each question pair ( x, y) with answers ( a, b), but
also the action of M as a linear operator has meaning. For example, assuming the
referee selects question pairs according to a probability vector π, we have that
Alice and Bob’s answers are distributed according to the vector Mπ. Perhaps more
useful is to consider the vector
v= ∑ π ( x, y) | x i ⊗ | y i ⊗ | x i ⊗ | y i (8.2)
( x,y)∈ X ×Y
∑ π ( x, y) ∑ V ( a, b | x, y) M( a, b | x, y), (8.4)
( x,y)∈ X ×Y ( a,b)∈ A× B
which one may alternatively express as the inner product hK, M i, where the oper-
ator K ∈ L(RX ⊗ RY , R A ⊗ RB ) is defined as
K ( a, b | x, y) = π ( x, y)V ( a, b | x, y) (8.5)
S ⊂ L(RX ⊗ RY , R A ⊗ RB ) (8.6)
84
1 1 1 1 1 0 1 0 0 1 0 1 0 0 0 0
0 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 0
0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1
0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1
1 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 0 1 0 0 1 0 1 0 0 0 0
0 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1
Figure 8.1: The 16 deterministic strategies for binary question and answer pairs.
It is now perhaps more clear by an inspection of this matrix together with those ap-
pearing in Figure 8.1 that the classical value of the CHSH game is 3/4; if M ranges
over the matrices representing deterministic strategies, it is possible to make three
of the 1s in any of these matrices, but not all four, overlap the nonzero entries of K.
85
Now, if we were to represent the set of all (probabilistic) classical strategies by
P ⊂ L(RX ⊗ RY , R A ⊗ RB ) (8.9)
E ⊂ L(RX ⊗ RY , R A ⊗ RB ), (8.10)
Definition 8.2. For a given choice of question sets X and Y and answer sets A
and B, we say that an operator M ∈ L(RX ⊗ RY , R A ⊗ RB ) represents a commuting
measurement strategy if there exists a complex Hilbert space H, a unit vector u ∈ H,
and projection operators
x y
Pa : x ∈ X, a ∈ A and Qb : y ∈ Y, b ∈ B (8.12)
acting on H, such that the following properties are satisfied. The projections repre-
sent measurements, in the sense that
∑ Pax = 1H ∑ Q b = 1H
y
and (8.13)
a∈ A b∈ B
2 Ifwe restrict the definition to finite-dimensional H, then we obtain precisely the entangled
strategies. It is not the case, in contrast, that allowing the spaces A and B appearing in the defi-
nition of entangled strategies to be infinite dimensional makes entangled strategies equivalent to
commuting measurement strategies.
86
for all x ∈ X and y ∈ Y, the two collections of projections commute in pairs,
y
meaning that Pax , Qb = 0 for all x ∈ X and y ∈ Y, and we have
y
M ( a, b | x, y) = u, Pax Qb u (8.14)
C ⊂ L(RX ⊗ RY , R A ⊗ RB ) (8.15)
to denote the set of all commuting measurement strategies for the question and
answer sets X, Y, A, and B. The commuting measurement value of G, which is de-
noted ω c ( G ), is the supremum value of the winning probability for G taken over
all commuting measurement strategies for Alice and Bob:
Proving that C is convex and includes E is a good exercise that I will leave to you.
The fact that C is compact happens to follow from what we will prove later in the
lecture (as does convexity). The containment E ⊆ C is proper, as was mentioned at
the start of the lecture.
Basics of strings
When defining the NPA hierarchy in precise terms, it is helpful to make use of
some elementary concepts concerning strings of symbols. In particular, we will be
87
discussing matrices and vectors whose entries are indexed by strings of varying
lengths. These concepts are elementary and quite familiar within theoretical com-
puter science, and it will take just a moment to become familiar with them in case
you are not.
Suppose that an alphabet Σ, which is a finite and nonempty set whose ele-
ments are viewed as symbols, has been fixed. A string over Σ is any finite, ordered
sequence of symbols from Σ. (Infinite sequences of symbols are not to be consid-
ered as strings.) The length of a string is the total number of symbols, counting all
repetitions, appearing in that string.
For example, if Σ = {0, 1}, then 0, 0110, and 1000100100011 are examples of
strings over Σ; the string 0 has length 1, the string 0110 has length 4, and the string
1000100100011 has length 13. There is a special string that has length 0, and this
string is called the empty string. We use the Greek letter ε to denote this string.
For every nonnegative integer n, we write Σ≤n to denote the set of all strings
having length at most n and we write Σ∗ to denote the set of all strings, of any
(finite) length, over Σ. For every alphabet Σ, the set Σ∗ is countably infinite.
Lastly, given any string s ∈ Σ∗ , we let sR denote the string obtained by reversing
the ordering of the symbols in s. For example, if s = 00010, then sR = 01000.
Among the entries of this matrix, one finds all of the values
y y
Pax u, Qb u = u, Pax Qb u = M( a, b | x, y) (8.19)
88
for appropriate choices of x, y, z, a, b, and c.
We may now think about the various properties that must hold for such a Gram
matrix, but before we do this let us discuss how we will index the entries of such
matrices. Assuming that question and answer sets X, Y, A, and B have been fixed,
we will introduce three alphabets:
Σ A = X × A, Σ B = Y × B, and Σ = Σ A t ΣB . (8.21)
Here, t denotes the disjoint union, meaning that Σ A and Σ B are to be treated as
disjoint sets when forming Σ. In words, there are | X × A| + |Y × B| symbols in the
alphabet Σ, one symbol for each pair ( x, a) ∈ X × A and a separate symbol for
each pair (y, b) ∈ Y × B. The collection of vectors (8.18) may naturally be labeled
by the set
{ ε } ∪ Σ = Σ ≤1 , (8.22)
and so we may consider that the Gram matrix of these vectors has rows and
columns indexed by this set.
Now, supposing that ≤1
R ∈ Pos CΣ (8.23)
is such a Gram matrix—and naturally this is a positive semidefinite matrix, as all
Gram matrices are—there are various things one may say, based on the conditions
that commuting measurement strategies must meet. In particular, we observe these
conditions:
1. R(ε, ε) = 1, given that u is a unit vector.
2. For every x ∈ X we must have
∑ R((x, a), s) = R(ε, s) and ∑ R(s, (x, a)) = R(s, ε), (8.24)
a∈ A a∈ A
and likewise for every y ∈ Y we must have
∑ R((y, b), s) = R(ε, s) and ∑ R(s, (y, b)) = R(s, ε), (8.25)
b∈ B b∈ B
for every s ∈ Σ≤1 (in all four equalities). These conditions reflect the fact that
summing over all operators in any given measurement yields the identity op-
erator, as in (8.13).
3. For every x ∈ X and a, c ∈ A satisfying a 6= c, we have
R(( x, a), ( x, c)) = 0, (8.26)
because Pax and Pcx must be orthogonal projection operators. Similarly, for ev-
ery y ∈ Y and b, d ∈ B satisfying b 6= d, we have
R((y, b), (y, d)) = 0. (8.27)
89
4. For every (z, c) ∈ Σ, we have the equality
R((z, c), (z, c)) = R(ε, (z, c)) = R((z, c), ε); (8.28)
y
each Pax and Qb is a projection operator, thus squaring to itself.
5. For every ( x, a) ∈ X × A and (y, b) ∈ Y × B we have
Remark 8.3. We can actually do a bit better in the case of the fifth item, and say
that every entry R(( x, a), (y, b)) must be a nonnegative real number, given that
y y 2
R(( x, a), (y, b)) = Pax u, Qb u = Pax Qb u , (8.30)
and likewise for R((y, b), ( x, a)). This is a stronger condition than (8.29), as the en-
tries R(( x, a), (y, b)) and R((y, b), ( x, a)) must be equal if they are real given that R
is Hermitian. This stronger claim is not really needed though, and does not appear
in the formal description of the NPA hierarchy coming up.
The inclusion C ⊆ C1 is proper in general, but we can still use this relationship
to obtain upper bounds on the commuting operator value (and therefore on the
entangled value) of nonlocal games.
Now, notice that items 1 through 5 in the list above are all affine linear constraints
on R. (With the exception of R(ε, ε) = 1 they are all linear constraints.) If we define
≤1
a Hermitian operator H ∈ Herm CΣ as
1
H (( x, a), (y, b)) = H ((y, b), ( x, a)) = π ( x, y)V ( a, b | x, y) (8.33)
2
90
for all x ∈ X, y ∈ Y, a ∈ A, and b ∈ B, and with all other entries equal to zero,
we see that hK, M i = h H, Ri. Thus, the optimization of hK, M i over all M ∈ C1 is
represented by a semidefinite program, where we optimize h H, Ri over all positive
semidefinite R satisfying the affine linear constraints imposed by items 1 through
5 in the list above.
The semidefinite program just suggested is the first level of the NPA hierarchy.
The idea behind subsequent levels is to consider not just the Gram matrix of the
vectors (8.18), but of larger sets of vectors that include ones such as
y y
Pax Qb u, Pax Qb Pcz u, etc. (8.34)
This will lead us to consider operators R whose rows and columns are indexed not
by Σ≤1 , but by Σ≤k for larger choices of k. (Larger choices for k yield higher levels
in the hierarchy.) Although the inner products between most pairs of these vectors
are not informative to the task of determining how well such a strategy performs
in a given nonlocal game, the benefit comes from the introduction of additional
constraints that reflect the same properties through which the five affine linear con-
straints above were derived. Indeed, the sequence of optimal values obtained from
this hierarchy always converges to ω c ( G ), as we will prove in the next section.
That is, two strings are equivalent with respect to the relation ∼ if and only if one
can be obtained from the other by any number of applications of the above rules.
These equivalences reflect the fact that projections square to themselves (the first
rule) and Alice’s measurements commute with Bob’s measurements (the second
rule).
Next, notice that the values that appear in any Gram matrix R of the form sug-
gested above always take the form
91
for some finite sequence of projection operators Πzc11 , . . . , Πzcnn selected from Alice
and Bob’s projections.
With this observation in mind, we say that a function of the form φ : Σ∗ → C is
admissible if it satisfies these properties:
1. φ(ε) = 1.
2. For all strings s, t ∈ Σ∗ we have
φ : Σ≤k → C, (8.38)
which we call admissible if and only if the same conditions listed above hold for
those strings s and t that are sufficiently short so that φ is defined on the arguments
indicated within each condition.
Thus, if φ is a function that is defined from an actual commuting measurement
strategy as
φ((z1 , c1 ) · · · (zn , cn )) = u, Πzc11 · · · Πzcnn u (8.39)
for every string (z1 , c1 ) · · · (zn , cn ), where each Πzc denotes Pcz or Qzc depending on
whether (z, c) ∈ Σ A or (z, c) ∈ Σ B , respectively, then φ is necessarily admissible.
This is true for functions of both forms φ : Σ∗ → C and φ : Σ≤k → C.
Finally, a positive semidefinite operator
≤k
R ∈ Pos CΣ (8.40)
92
Observe that for each positive integer k, the condition that a positive semidef-
inite operator is k-th order admissible is an affine linear constraint, as there are a
finite number of affine linear constraints imposed by the equation (8.41) and the
condition that φ is admissible. Thus, the optimization over all k-th order admissi-
ble operators can be represented by a semidefinite program. This is the NPA hier-
archy, where different choices of k correspond to different levels of the hierarchy.
(Any linear objective function may naturally be considered, but often the objective
function reflects the probability to win a given nonlocal game. Alternatively, one
can set up a semidefinite program that tests whether a given M agrees with some
k-th order admissible operator.)
C1 ⊇ C2 ⊇ C3 ⊇ · · · (8.42)
as any (k + 1)-st order admissible operator must yield a k-th order admissible oper-
ator when its rows and columns corresponding to strings longer than k are deleted.
It is also the case that C ⊆ Ck for every positive integer k; just like in the k = 1 case,
an actual commuting measurement strategy defines a Gram matrix R that is k-th
order admissible for every choice of k.
The remainder of the lecture is devoted to proving that the sequence (8.42) con-
verges to C in the sense made precise by the following theorem.
Theorem 8.4. Let X, Y, A, and B be finite and nonempty sets and let C and Ck , for every
positive integer k, be as defined above. It is the case that
∞
C= Ck .
\
(8.43)
k =1
93
The easier implication
We have already observed the implication that statement 1 implies statement 2.
In more detail, under the assumption that statement 1 holds, it must be that M is
defined by a commuting measurement strategy in which Alice and Bob’s projective
y
measurements are described by { Pax : a ∈ A} for Alice and { Qb : b ∈ B} for Bob,
with all of these projections acting on a Hilbert space H, along with a unit vector
u ∈ H. As before, let Πzc denote Pcz if z ∈ X and c ∈ A, or Qzc if z ∈ Y and c ∈ B.
With respect to this notation, one may consider the k-th order admissible operator
R defined by
R(s, t) = φ(sR t), (8.44)
where the function φ is defined as
zj
φ (z1 , c1 ) · · · (z j , c j ) = u, Πzc11 · · · Πc j u
(8.45)
R(s, t) ≤ 1 (8.46)
for every s, t ∈ Σ≤k . To see that this is so, observe first that
q q
R(s, t) ≤ R(s, s) R(t, t) (8.47)
94
The bound (8.49) may be proved by induction on the length of s. For the base
case, one has that R(ε, ε) = 1. For the general case, one has that for any string
s ∈ Σ∗ and any choice of (z, c) ∈ Σ, it holds that
R((z, c)s, (z, c)s) ≤ ∑ R((z, d)s, (z, d)s) = ∑ φ(sR (z, d)(z, d)s)
d d (8.50)
= ∑ φ(sR (z, d)s) = φ(sR s) = R(s, s),
d
R : Σ∗ × Σ∗ → C (8.53)
Note here that we must draw a distinction between an infinite matrix of the form
(8.53) and a linear operator, as these concepts are no longer equivalent in infinite
dimensions.
Such an infinite matrix R can, in fact, be obtained in a fairly straightforward
fashion. Observe first that for every string s ∈ Σ∗ and every infinite, strictly in-
creasing sequence of positive integers (k1 , k2 , k3 , . . .), the sequence
95
must have at least one limit point.3 This is because each value φk (s) agrees with an
entry of Rk , and the entries of each operator Rk are bounded by 1 in absolute value.
Beginning with the string s = ε and the sequence (k1 , k2 , . . .) = (1, 2, . . .), we
consider the function φ : Σ∗ → C defined by the following process:
Here, when we say “increment s,” we are referring to any fixed total ordering of
Σ∗ for which ε is the first string. For example, the strings in Σ∗ may be ordered
according to their length, with strings of equal length being ordered according to
the natural “dictionary ordering” induced by a fixed ordering of Σ (which is the so-
called lexicographic ordering of Σ∗ ). It must be that any function φ obtained through
this process is admissible, by virtue of the fact that every φk is admissible.
Now we may define
R(s, t) = φ(sR t). (8.55)
The three properties required of R follow, either directly or from the recognition
that every finite submatrix of R is equal to the limit of the corresponding subma-
trices of some convergent subsequence of the sequence
( R1 , R2 , R3 , . . . ) (8.56)
together with the fact that the positive semidefinite cone is closed.
96
for every s, t ∈ Σ∗ . (The inner product is, naturally, the inner product on K.)
Now let us take H to be the closure of the span of the set {us : s ∈ Σ∗ }, which
is a Hilbert space having countable dimension (and is therefore a separable Hilbert
space). We will define a commuting measurement strategy for Alice and Bob, with
H being the Hilbert space for this strategy. The unit vector associated with this
strategy will be uε ∈ H, which is indeed a unit vector given that
u(y,d)s : d ∈ B\{b}, s ∈ Σ∗ .
(8.61)
97
from which it follows that
us = ∑ u(x,a)s . (8.67)
a∈ A
Similarly, for any choice of y ∈ Y and s ∈ Σ∗ ,
us = ∑ u(y,b)s . (8.68)
b∈ B
Consequently,
Pax us = ∑ Pax u(x,c)s = Pax u(x,a)s = u(x,a)s , (8.69)
c∈ A
and by similar reasoning
y
Qb us = u(y,b)s , (8.70)
as claimed.
With the formulas (8.62) in hand, we can verify the required properties of H,
y
uε , { Pax }, and { Qb }. First, see that
y
us , Pax Qb ut = us , u( x,a)(y,b)t = φ sR ( x, a)(y, b)t
y (8.71)
= φ sR (y, b)( x, a)t = us , u(y,b)(x,a)t = us , Qb Pax ut
y
for every s, t ∈ Σ∗ . This implies that Pax and Qb commute on the span of the vectors
{us : s ∈ Σ∗ }, and it follows that they commute on all of H by continuity. Second,
for every x ∈ X and s ∈ Σ∗ we find that
∑ Pax = 1H (8.73)
a∈ A
and similarly
∑ Qb = 1H .
y
(8.74)
b∈ B
To complete the proof, it remains to observe that the strategy represented by
y
the unit vector uε and the projections { Pax } and { Qb } yields the strategy M. This is
also evident from the formulas (8.62), as one has
y
R(( x, a), (y, b)) = u( x,a) , u(y,b) = uε , Pax Qb uε (8.75)
and therefore
y
M ( a, b | x, y) = uε , Pax Qb uε (8.76)
for every choice of x ∈ X, y ∈ Y, a ∈ A, and b ∈ B.
98
Two implications
We will conclude the lecture by briefly observing two facts that follow from Theo-
rem 8.4. The first is that the set C of commuting measurement strategies is closed,
as it is the intersection of the closed sets C1 , C2 , . . . (and, by similar reasoning, it
is convex). The second fact is that there is no loss of generality in restricting one’s
attention to separable Hilbert spaces in the definition of commuting measurement
strategies, for this is so for the strategy constructed in the proof.
99