Lecture 9

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Quantum information theory (MAT4430) Spring 2021

Lecture 9: Schumacher’s compression theorem and quantum entropy


Lecturer: Alexander Müller-Hermes

Recall Shannon’s source coding theorem from Lecture 1. There, we considered a discrete
memoryless source of information, i.e., a sequence of independent and identically distributed
random variables on a finite alphabet, and showed that it can be compressed with compres-
sion rates arbitrarily close to the Shannon entropy of the source. We will now discuss the
quantum analogue of this result.

1 Compression of quantum data


Definition 1.1 (Quantum source). A discrete memoryless source of quantum information
is a sequence (ρ⊗n )n∈N of tensor powers of a quantum state ρ ∈ D(H).
An interpretation of this concept goes as follows: Consider an ensemble {pn , |ψn ihψn |}N n=1
of quantum states, which can be thought of as describing a probabilistic process that produces
the pure state |ψn ihψn | with probabilty pn . As we have seen in Lecture 1, we can associate the
quantum state ρ = N
P
n=1 pn |ψn ihψn | to the outcome of this process, which takes into account
that we do not know which of the states |ψn ihψn | has been prepared. Executing this process
multiple times independent from each other can then be described by the tensor powers ρ⊗n .
Note that this is completely analogous to the classical situation, where a random variable
X distributed according to a distribution p ∈ P (1, . . . , N ) can be thought of as producing
outcomes xn with probability pn .
There is, however, one major difference between the quantum case and the classical case:
Entanglement! The quantum state ρ could be the reduced density matrix of a quantum
system ‘A’ embedded in a larger quantum system ‘AR’, where the other quantum system
‘R’ serves the purpose of an environment but is often referred to as a reference system in
this context. In this case, it will be important that the protocol compressing the system ‘A’
preserves the entanglem with the reference ‘R’. For example, we could imagine a situation,
where the researcher Alice wants to send part of an entangled quantum state to the researcher
Bob while keeping her share of the state. Using a compression protocol she can make
the transmission process more efficient, but of course it is important that in the end Bob
can retrieve the entanglement with the system that Alice had kept. To define quantum
compression schemes that preserve this entanglement we will use the so-called channel fidelity
as a distance measure:
Definition 1.2 (Channel fidelity). For a quantum channel T : B(H) → B(H) we define the
channel fidelity of T with respect to a quantum state ρ ∈ D(H) as
F (T, ρ) = F ((idE ⊗ T ) (|ψEA ihψEA |) , |ψEA ihψEA |) ,
√ 
where |ψEA i = vec ρ .
The channel fidelity F (T, ρ) quantifies how close the quantum channel T is to the identity
channel, when applied to part of a purification of the quantum state ρ. We will need the
following elementary property of the channel fidelity:
Lemma 1.3. Consider a quantum channel T : B(H) → B(H) given by its Kraus decompo-
sition
XN
T = AdKn ,
n=1

1
with Kn ∈ B(H). For any quantum state ρ ∈ D(H) we have
v
uN
uX
F (T, ρ) = t |hρ, Kn iHS |2 .
n=1

Proof. In the exercises, we have seen that


p
F (σ, |φihφ|) = hφ|σ|φi,

for any σ ∈ D(H0 ) and any pure state |φihφ| ∈ D(H0 ). Applying this result, we find that
p
F (T, ρ) = hψEA |(idE ⊗ T ) (|ψEA ihψEA |) |ψEA i,
√ 
where |ψEA i = vec ρ . Finally, note that
√ √
hψEA |(idE ⊗ AdK ) (|ψEA ihψEA |) |ψEA i = | vec( ρ)† vec(K ρ)|2 = |hρ, KiHS |2 ,

and the lemma follows from the Kraus representation


N
X
T = AdKn .
n=1

Now, we define the compression task as follows:

Definition 1.4 (Quantum compression schemes). Let H denote a complex Euclidean space
and ρ ∈ D(H) a quantum state. An (n, m, δ)-compression scheme for ρ is a pair of quantum
channels
C C
E : B(H⊗n ) → B(( 2 )⊗m ) and D : B(( 2 )⊗m ) → B(H⊗n ),
such that
F (D ◦ E, ρ⊗n ) > 1 − δ.

Note that we used the channel fidelity to quantify the final error of the compression
scheme, which takes entanglement with a reference system into account. Indeed, it would be
trivial (and not very interesting) to construct compression schemes otherwise, since

ρ⊗n = (Dtriv ◦ Etriv )(ρ⊗n ),

for the quantum channels

Etriv = Tr [·] and Dtriv = ρ⊗n .

Such a compression scheme would of course be useless as a compression scheme in practice,


since it would destroy all correlations the quantum system might have with some other
system. Note that a similar trivial example was excluded in the classical case by definition,
since we only considered deterministic compression schemes. If we would have allowed for
probabilistic schemes, then we would have needed to be more careful with the definition.
As in the classical case, we will also define achievable compression rates as follows:

R
Definition 1.5 (Achievable compression rates). We call R ∈ + an achievable compression
rate for ρ ∈ D(H) if for every n ∈ N
there exists an (n, mn , δn ) compression scheme such
that
mn
R = lim and lim δn = 0.
n→∞ n n→∞

2
2 The von Neumann entropy and Schumacher compression
The von Neumann entropy is the proper quantum generalization of the Shannon entropy:
Definition 2.1 (von Neumann entropy). For a quantum state ρ ∈ D(H) we define the von
Neumann entropy as
H(ρ) = − Tr [ρ log(ρ)] ,
where the logarithm is taken in base 2.
We will prove the following theorem:
Theorem 2.2 (Schumacher’s compression theorem). Let H denote a complex Euclidean
space and ρ ∈ D(H) a quantum state.
1. Any number R > H(ρ) is an achievable compression rate for (ρ⊗n )n∈N .

2. If there is a sequence of (nk , mk , δk )-compression schemes for (ρ⊗n )n∈N satisfying


mk
lim nk = ∞ and lim = R < H(p),
k→∞ k→∞ nk

then we have limk→∞ δk = 1, i.e., the channel fidelity of the compression schemes goes
to zero in the limit k → ∞.
Proof. For brevity set d = dim(H). We start with the direct part: By the spectral theorem,
we have
Xd
ρ= pi |ψi ihψi |,
i=1
for a probability distribution p ∈ P (1, . . . , d) and an orthonormal basis {|ψ1 i, . . . , |ψd i} ⊂ H.
Note that H(ρ) = H(p), and recall the set of -typical strings Tn, (p) of length n. For each
N
 > 0 and n ∈ , we define an -typical projector Πn, ∈ Proj (H⊗n ) by
X
Πn, = |ψi1 ihψi1 | ⊗ · · · ⊗ |ψin ihψin |.
(i1 ,...,in )∈Tn, (p)

Expressing the basic properties of typical sequences (see Lecture 1) in terms of the projector
Πn, shows the following:
• For any n ∈ N and any  > 0 we have
2−n(H(ρ)+) Πn, < Πn, ρ⊗n Πn, < 2−n(H(ρ)−) Πn, . (1)

• For any n ∈ N and  > 0 we have


Tr [Πn, ] = |Tn, (p)| 6 2n(H(ρ)+) . (2)

• For any  > 0 we have


lim Tr Πn, ρ⊗n = 1.
 
(3)
n→∞

N
For  > 0 and any n ∈ , we will now construct an (n, dn(H(ρ)+)e, δn ) compression scheme
for ρ such that δn → 0 as n → ∞. This shows that H(ρ) +  is an achievable rate. Consider
m = dn(H(ρ) + )e and choose a bit string b(i1 , i2 , . . . , in ) ∈ {0, 1}m for any typical sequence
C
(i1 , . . . , in ) ∈ T,n (p). Let us denote by Sn,m ⊂ ( 2 )⊗m the span of the orthonormal vectors
|b(i1 , . . . , in )i for all (i1 , . . . , in ) ∈ T,n (p), and define an isometry Vn, : Sn,m → H⊗n by
X
Vn, = (|ψi1 i ⊗ · · · ⊗ |ψim i) hb(i1 , . . . , in )|
(i1 ,...,in )∈Tn, (p)

3
It is easy to see that

Πn, = Vn, Vn, .
Next, we define two quantum channels:

E : B(H⊗n ) → B(( C2)⊗m)



XVn, + Tr (1⊗n
 
E(X) = Vn, H − Πn, )X |0ih0|.

and

D : B(( C2)⊗m → B(H⊗n)



+ Tr (1⊗n
 
D(Y ) = Vn, Y Vn, H − ΠSn,m )X |0ih0|,

C
where we extend Vn, to all of ( 2 )⊗m by setting it zero on the basis vectors it is not defined
on. Finally, we can compute that

(D ◦ E)(X) = Πn, XΠn, + Fn, (X),

for some completely positive map Fn, : B(H⊗n ) → B(H⊗n ) that we do not need to know
exactly. Using Lemma 1.3 and (3), we find that

F (D ◦ E, ρ⊗n ) > Tr Πn, ρ⊗n → 1 as n → ∞.


 

For the reverse direction, consider a sequence of (nk , mk , δk )-compression schemes for
ρ ∈ D(H) given by quantum channels

Ek : B(H⊗nk ) → B(( C2)⊗m ) k


and Dk : B(( C2)⊗m ) → B(H⊗n ),
k k

such that limk→∞ nk = ∞ and


mk
R = lim < H(ρ).
k→∞ nk

(k)
Let {Al  }L
l=1 denote the Kraus operators of the quantum channel Dk ◦ Ek and note that
k

(k) (k)
rk Al 6 2mk by assumption. For each k and l we denote by Πl the projection onto the
 
(k) (k) (k) (k) (k)
image of Al such that rk Πl 6 2mk and Πl Al = Al . By Lemma 1.3 we have
v
u Lk
(k)
uX
F (Dk ◦ Ek , ρ⊗nk ) = t |hρ⊗nk , Al iHS |2 .
l=1

Using the Cauchy-Schwarz inequality, we can compute that


h i  
⊗nk (k) (k) (k) (k) ⊗nk
p p
⊗nk
, Al iHS |2 2

|hρ = |hΠl ⊗n ⊗n
ρ k , ρ k Al iHS | 6 Tr Πl ρ Tr Ad (nk ) ρ ,
Al

which implies that v


u Lk
uX (k) h (k) i
⊗nk
F (Dk ◦ Ek , ρ )6t ql Tr Πl ρ⊗nk , (4)
l=1

where we set  
(k) ⊗nk

ql = Tr Ad (nk ) ρ > 0.
Al

4
Since Dk ◦ Ek is a quantum channel, we find that
Lk Lk  
(k)
X X
⊗nk

ql = Tr Ad (nk ) ρ = 1.
Al
l=1 l=1

Finally, choosing  > 0 such that R +  < H(ρ), we compute


h i h i h i
(k) (k) (k)
Tr Πl ρ⊗nk = Tr Πl Πnk , ρ⊗n Πnk , + Tr Πl ρ⊗nk 1⊗n H
k
− Π nk ,
h i
(k) ⊗nk
6 2−nk (H(ρ)−) Tr Πl Πnk , + Tr ρ⊗nk 1H
 
− Πnk ,
 
mk
nk −H(ρ)+ ⊗nk
+ Tr ρ⊗nk 1H
 
62 nk
− Πnk , −→ 0,
(k)
⊗nk
as k → ∞, where weh used thati [ρ⊗nk ,hΠnk ,i] = 0 in the first line, (1) and Πl 6 1H in the
(k) (k) m
second line, and Tr Πl Πnk , 6 Tr Πl 6 2 k and (3) in the final line. Combining this
estimate with (4) shows that F (Dk ◦ Ek , ρ⊗nk ) → 0 as k → ∞.

Schumacher’s compression theorem gives an operational interpretation of von Neumann’s


entropy. We will now use a bit of time to prove some properties of the von Neumann entropy.
As for the Shannon entropy, it will be useful to use the quantum relative entropy to show
properties of the von Neumann entropy.

3 The quantum relative entropy


Most properties of the von Neumann entropy will follow from the properties of the quantum
relative entropy and in particular from its data-processing inequality. We start by defining
this quantity:
Definition 3.1 (Quantum relative entropy). Let H denote a complex Euclidean space. For
any pair of quantum states ρ, σ ∈ D(H) we define a the relative entropy as
(
Tr [ρ (log(ρ) − log(σ))] , if ker (σ) ⊆ ker (ρ)
D(ρkσ) =
+∞, otherwise.
Usually, the operator log(ρ) is only defined for positive definite operators ρ. However,
even if ρ ∈ D(H) has a non-trivial kernel, we can define the operator ρ log(ρ) by using the
convention that 0 · log(0) = 0. Specifically, we set
n
X
ρ log(ρ) = λi log(λi )|vi ihvi |,
i=1
Pn
where ρ = i=1 λi |vi ihvi | is the spectral decomposition of ρ with λi > 0 for any i ∈ {1, . . . , n}
and n = rk (ρ) 6 dim(H). In a similar way we can make sense of the operator ρ log(σ) under
the condition that ker (σ) ⊆ ker (ρ), by setting
m
X
ρ log(σ) = log(µj )ρ|wj ihwj |,
j=1
Pm
where σ = j=1 µj |wj ihwj | is the spectral decomposition of σ with µj > 0 for any j ∈
{1, . . . , m} and m = rk (σ) 6 dim(H). Using the spectral decomposition of both operators,
we obtain the formula
Xn Xm
D(ρkσ) = |hvi |wj i|2 λi (log(λi ) − log(µj )) , (5)
i=1 j=1

5
whenever ker (σ) ⊆ ker (ρ). This expression has a simple, but useful, consequence. It allows
to write the relative entropy as a limit of slightly simpler trace functionals:
Lemma 3.2. For any ρ, σ ∈ D(H) we have
1 − Tr ρ1− σ 
 
1
D(ρkσ) = lim .
ln(2) &0 
Proof. If ker (σ) * ker (ρ), then we have
lim 1 − Tr ρ1− σ  = 1 − Tr ρ 1H − Πker(σ) = Tr ρΠker(σ) > 0.
     
&0

In this case, we conclude that the limit in the statement diverges as it should.
Assume now, that ker (σ) ⊆ ker (ρ) and consider the spectral decompositions
n
X
ρ= λi |vi ihvi |,
i=1

and
m
X
σ= µj |wj ihwj |,
j=1

with n = rk(ρ) and m = rk(σ). Then, we define a function f : R → R by


n X
m
 1−α α  X
f (α) = Tr ρ σ = |hvi |wj i|2 λ1−α
i µαj .
i=1 j=1

The function f is differentiable in every point α ∈ R and it is easy to compute that


n X
X m
f 0 (α) = − |hvi |wj i|2 λi1−α µαj (ln (λi ) − ln (µj )) .
i=1 j=1

Finally, we observe that f (0) = Tr [ρ] = 1 and


1 − Tr ρ1− σ 
 
0
−f (0) = lim
&0 
n
XX m
= |hvi |wj i|2 λi (ln(λi ) − ln(µj ))
i=1 j=1

= ln(2)D(ρkσ).
This finishes the proof.

The previous lemma is quite useful when proving the data-processing inequality for the
relative entropy.

4 The data-processing inequality of the relative entropy


There are at least three different ways in the literature to establish the data-processing
inequality for the relative entropy. The most common one derives it from the joint convexity
inequality using a technique we have seen in the exercises. Here, we will do a different
approach, which directly establishes the data-processing inequality. It uses Lemma 3.2 and
proves that
Tr T (ρ)1− T (σ) > Tr ρ1− σ  ,
   

for any ρ, σ ∈ D(H) and all quantum channels T : B(H) → B(H0 ). Taking the limit as in
Lemma 3.2 then yields the data-processing inequality of the relative entropy.

6
4.1 Monotonicity of certain Hilbert-Schmidt operators
Let H denote a complex Euclidean space. For positive operators A, B ∈ B(H)+ , we define
the linear maps LA : B(H) → B(H) and RB : B(H) → B(H) by

LA (X) = AX, and RB (X) = XB.

We will now think of these maps as operators acting on the Hilbert-Schmidt inner product
space B(H). The following properties are easy to show:

• For any A, B ∈ B(H)+ the operators LA and RB are positive1 semidefinite, since, e.g.,
LA = L∗A and LA = L√A ◦ L√A and the same argument works for RB .

• For any A, B ∈ B(H)+ the operators LA and RB commute.

• For any A, B ∈ B(H)++ , the composition RB ◦ L−1


A is a positive semidefinite operator
as well.

It is easy to simulatenously diagonalize the operators LA and RB by using the orthonormal


basis given by {|vi ihwj |}i,j ⊂ B(H) with the eigenbasis {|vi i}i ⊂ H of A and the eigenbasis
{|wj i}j ⊂ H of B, and clearly the eigenvalues of LA and RB are positive.
We will now use the functional calculus for normal operators to apply functions to these
operators. We start with a definition:

Definition 4.1. For a complex Euclidean space H and a function f : (0, ∞) → (0, ∞), we
define the operator
Gf (A, B) = f RB ◦ L−1

A ◦ LA ,
for any pair of positive invertible operators A, B ∈ B(H)++ .

By the spectral decomposition of RB ◦ L−1A in the basis {|vi ihwj |}i,j ⊂ B(H) it is easy
to show that the operator Gf (A, B) is positive semidefinite for any A, B ∈ B(H)++ . Let
us consider the operator Gf (A, B) for the function f (x) = xα where α ∈ (0, 1). Then, it is
easy to show that
Gxα (A, B) = RB α ◦ LA1−α .
A consequence of this is the following lemma which proof is immediate:

Lemma 4.2. For invertible quantum states ρ, σ ∈ D(H) we have

h1H , Gxα (ρ, σ)1H iHS = Tr ρ1−α σ α .


 

To prove the data-processing inequality we will use the following integral representation
of the function f (x) = xα :

sin(πα) ∞ x
Z
xα = λα−1 dλ.
π 0 λ + x

As the operators Gf (A, B) are linear in f , we conclude that

sin(πα) ∞
Z
α−1
h1H , Gx (ρ, σ)1H iHS =
α h1H , G λ+x
x (ρ, σ)1H iHS λ dλ.
π 0

We will show the following theorem:


1
Do not confuse this positivity with them being positive maps. Clearly, these operators are not positive
maps since they are not Hermiticity preserving.

7
Theorem 4.3. For invertible quantum states ρ, σ ∈ D(H) and any quantum channel T :
B(H) → B(H0 ) for which T (ρ) and T (σ) are invertible we have

x (T (ρ), T (σ)) > T ◦ G x (ρ, σ) ◦ T ,
G λ+x λ+x

for any λ > 0.


Before we prove this theorem, we will need two lemmata:
Lemma 4.4 (Yet another operator inequality). For any quantum channel T : B(H) →
B(H0 ), any X ∈ B(H) and any invertible quantum state σ ∈ D(H), we have
T (X)T (σ)−1 T (X)† 6 T (Xσ −1 X † ).
Proof. Using Schur complements (see exercises) we have
 
σ X
∈ B(H ⊕ H)+ .
X † Xσ −1 X †
By complete positivity we have
 
T (σ) T (X)
∈ B(H0 ⊕ H0 )+ ,
T (X)† T (Xσ −1 X † )
which, by taking Schur complements again, is equivalent to the desired operator inequality.

Lemma 4.5. Let H denote a complex Euclidean space. For positive invertible operators
A, B ∈ B(H)++ and X ∈ B(H) we have
XA−1 X † 6 B −1 if and only if X † BX 6 A.
Proof. Obviously, it is enough to show one direction of the equivalence. If XA−1 X † 6 B −1 ,
then we have B 1/2 XA−1 X † B 1/2 6 1H . This final condition is equivalent to kA−1/2 X † B 1/2 k∞ 6
1, which implies (A−1/2 X † B 1/2 )(B 1/2 XA−1/2 ) 6 1H . Multiplying by A1/2 on both sides,
shows that X † BX 6 A.
Proof of Theorem 4.3. Fix λ > 0. By Lemma 4.5 it is sufficient to show that
−1
G λ+x
x (ρ, σ) > T ∗ ◦ G λ+x
x (T (ρ), T (σ))
−1
◦ T.
Note that
−1
G λ+x
x (ρ, σ) = (λ + Rσ ◦ L−1 −1 −1 −1 −1
ρ ) ◦ Rσ ◦ Lρ ◦ Lρ = λRσ + Lρ ,

and a similar expression holds for the operator G λ+xx (T (ρ), T (σ))−1 . With this we have

h i
−1 † −1 −1

hX, G λ+x (ρ, σ) (X)iHS = Tr X λRσ + Lρ (X)
x

h i h i
= λ Tr Xσ −1 X † + Tr X † ρ−1 X
and
h   i
hX, T ∗ ◦ G λ+x
x (T (ρ), T (σ))
−1
◦ T (X)iHS = Tr T (X)† λRT−1(σ) + L−1
T (ρ) (T (X))
h i h i
−1 † † −1
= λ Tr T (X)T (σ) T (X) + Tr T (X) T (ρ) T (X) ,
By Lemma 4.4, we have
h i h i
Tr T (X)T (σ)−1 T (X)† 6 Tr Xσ −1 X † ,
and h i h i
Tr T (X)† T (ρ)−1 T (X) 6 Tr X † ρ−1 X ,
for all invertible ρ, σ ∈ D(H) and all X ∈ B(H).

8
4.2 Proving the data-processing inequality
Now, we are ready to prove the main result of this lecture:

Theorem 4.6 (Data-processing inequality). For any quantum channel T : B(H) → B(H0 ),
we have
D(T (ρ), T (σ)) 6 D(ρ, σ),
for all quantum states ρ, σ ∈ D(H).

Proof. Assume first that ρ, σ ∈ D(H) and T (ρ), T (σ) ∈ D(H0 ) are invertible. Then, we have

sin(πα) ∞
Z
 1−α α  α−1
Tr ρ σ = h1H , Gxα (ρ, σ)1H iHS = h1H , G λ+x
x (ρ, σ)1H iHS λ dλ,
π 0

for any α ∈ (0, 1). By Theorem 4.3 we have



h1H0 , G λ+x
x (T (ρ), T (σ))1 0 iHS = h1H , T ◦ G x (T (ρ), T (σ)) ◦ T (1H )iHS
H λ+x

> h1H , G λ+x


x (ρ, σ)1H iHS ,

and, using that sin(πα) > 0 for all α ∈ (0, 1), we conclude that

Tr T (ρ)1−α T (σ)α > Tr ρ1−α σ α ,


   

for any α ∈ (0, 1).


Next, consider general ρ, σ ∈ D(H) and a quantum channel T : B(H) → B(H0 ). Consider
quantum states
1H 1H
ρ = (1 − )ρ +  and σ = (1 − )σ +  ,
dim(H) dim(H)

for any  > 0, and quantum channels


1H
Tδ = (1 − δ)T + δ Tr [·] ,
dim(H)

for any δ > 0. By the computation from above, we have

Tr Tδ (ρ )1−α Tδ (σ )α > Tr ρ1−α σα ,


   

for any α ∈ (0, 1) and any , δ > 0. Using that the function (ρ, σ) 7→ Tr ρ1−α σ α is
 

continuous for every α ∈ (0, 1), we conclude that

Tr T (ρ)1−α T (σ)α = lim Tr Tδ (ρ )1−α Tδ (σ )α > lim Tr ρ1−α σα = Tr ρ1−α σ α .
       
,δ&0 &0

Finally, we can use Lemma 3.2 and conclude that

1 − Tr T (ρ)1− T (σ)
 
1
D(T (ρ)kT (σ)) = lim
ln(2) &0 
 1−  
1 1 − Tr ρ σ
6 lim
ln(2) &0 
= D(ρkσ).

9
4.3 Generalizing the data-processing inequality
If you carefully read the proof given in the previous sections, you might realize that we did
not use that the linear map T : B(H) → B(H0 ) was a quantum channel. In the way, we
have stated it, we have only used that the map T : B(H) → B(H0 ) is a trace-preserving
2-positive map. By a slight modification of the proof, we can actually make it work for duals
of so-called unital Schwarz maps.
Definition 4.7. A linear map P : B(H) → B(H0 ) is called a unital Schwarz map if it is
unital and satisfies the Schwarz inequality
P (X)† P (X) 6 P (X † X),
for any X ∈ B(H).
Note that the Schwarz inequality for a unital map P : B(H) → B(H0 ) is equivalent to
 
1H0 P (X)
> 0,
P (X)† P (X † X)
for any X ∈ B(H). The following lemma is therefore immediate:
Lemma 4.8. If T : B(H0 ) → B(H) is a quantum channel, then T ∗ is a unital Schwarz map.
There are many examples of unital Schwarz maps that do not arise as adjoints of quantum
channels, or even of 2-positive maps. The most prominent example is the map P : B( 2 ) → C
C
B( 2 ) given by
P (X) = Tr [X] C + X T .
1 1 2 1
2 2 2
It turns out, that Theorem 4.3 also holds for linear maps T that are adjoints of unital Schwarz
maps. The proof is almost the same, but in the final lines of the proof we use the following
inequality:
Theorem 4.9 (A tracial inequality). Let P : B(H) → B(H0 ) denote a unital Schwarz map.
For any C ∈ B(H0 )+ and any X ∈ B(H0 ) satisfying ker(C) ⊆ ker(X † ) we have
h i h i
Tr P ∗ (X)† P ∗ (C)−1 P ∗ (X) 6 Tr X † C −1 X ,

where we used the Moore-Penrose pseudoinverse.


Proof. Setting A = P ∗ (C)−1 P ∗ (X) with the Moore-Penrose pseudoinverse, we find that
AA† −A
 ∗
P ∗ (X) 
   
P (C) Z ?
= ,
−A† 1H P ∗ (X)† P ∗ X † C −1 X ? D
with
D = P ∗ (X † C −1 X) − P ∗ (X)† P ∗ (C)−1 P ∗ (X),
and
Z = P ∗ (C)−1 P ∗ (X)P ∗ (X)† P ∗ (C)−1 P ∗ (C) − P ∗ (C)−1 P ∗ (X)P ∗ (X)† .
Using that P ∗ (C)−1 P ∗ (C)P ∗ (C)−1 = P ∗ (C), we find that Tr [Z] = 0, and we conclude
AA† −A
 ∗
P ∗ (X) 
 
P (C) h
∗ † −1 ∗ † ∗ −1 ∗
i
Tr = Tr P (X C X) − P (X) P (C) P (X) ,
−A† 1H P ∗ (X)† P ∗ X † C −1 X
which is non-negative since
AA† −A
 ∗
P ∗ (X)  P (AA† ) −P (A)
    
P (C) C X
Tr = Tr ,
−A† 1H P ∗ (X)† P ∗ X † C −1 X −P (A)† 1H 0 X † X † C −1 X
is the Hilbert-Schmidt inner product of two positive semidefinite operators. Finally, note
that P ∗ is trace-preserving, and we obtain the desired inequality.

10
The data-processing inequality for the quantum relative entropy therefore holds for all
trace-preserving maps that are adjoints of unital Schwarz maps. Recently, a different tech-
nique for proving the data-processing inequality was discovered. Unfortunately, it goes be-
yond the scope of this course, but for completeness we state the most general form of the
data-processing inequality:

Theorem 4.10 (General data-processing inequality). For any positive and trace-preserving
map P : B(H) → B(H0 ), we have

D(P (ρ), P (σ)) 6 D(ρ, σ),

for all quantum states ρ, σ ∈ D(H).

Whether this theorem can be proven using the techniques used above is an open question!

11

You might also like