Resources of The Quantum World: A Modern Textbook On Quantum Resource Theories
Resources of The Quantum World: A Modern Textbook On Quantum Resource Theories
Resources of The Quantum World: A Modern Textbook On Quantum Resource Theories
Gilad Gour
February 9, 2024
2
List of Symbols 13
1 Introductory Material 17
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2 About This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3 The Structure of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4 Resurrection of Quantum Entanglement: The Birth of a Fundamental Resource 24
1.5 Resource Analysis and Reversibility . . . . . . . . . . . . . . . . . . . . . . . 30
1.6 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
I Preliminaries 37
3
4 CONTENTS
VI Appendices 821
A Elements of Convex Analysis 823
A.1 The Hyperplane Separation Theorem . . . . . . . . . . . . . . . . . . . . . . 823
A.2 Convex Hulls, Faces, and Polytopes . . . . . . . . . . . . . . . . . . . . . . . 826
A.3 Extreme Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 828
A.4 Polyhedrons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831
A.5 Affine Subspaces and the Birkhoff Polytope . . . . . . . . . . . . . . . . . . 832
A.6 Polarity and Half Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834
A.7 Support Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 837
A.8 Convex Cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838
A.9 Conic Linear Programming and Semidefinite Programming . . . . . . . . . . 839
A.10 Fixed-Point Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843
A.11 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844
D Miscellany 903
D.1 The Divided difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 903
D.2 The Maximal f -Divergence: Singular Case . . . . . . . . . . . . . . . . . . . 907
D.3 Smoothing with the Second Variable of Dmax . . . . . . . . . . . . . . . . . . 912
D.4 Two Proofs of the Classical Stein’s Lemma . . . . . . . . . . . . . . . . . . . 914
D.5 Alternative (direct) proofs of Theorem 12.6.1 and Theorem 12.6.2 . . . . . . 918
D.6 Beyond States that are G-Regular . . . . . . . . . . . . . . . . . . . . . . . . 920
D.7 Proof of Theorem 17.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 922
D.8 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 928
D.9 Alternative Proof of Blackwell Theorem . . . . . . . . . . . . . . . . . . . . . 929
D.10 Symmetric Purification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 936
I extend my deepest gratitude to the numerous colleagues who have profoundly influenced my
understanding of quantum information and quantum resource theories. Through countless
discussions, their insights have enriched my perspective and deepened my knowledge. I am
particularly indebted to Fernando G.S.L. Brandão, S. Brandsen, Francesco Buscemi, Giulio
Chiribella, Eric Chitambar, Nilanjana Datta, Julio De Vicente, Runyao Duan, Kun Fang,
Shmuel Friedland, Yu Guo, Aram Harrow, Michal Horodecki, David Jennings, Amir Kalev,
Barbara Kraus, Ludovico Lami, Iman Marvian, David A. Meyer, Markus P. Müller, Varun
Narasimhachar, Jonathan Oppenheim, Carlo Maria Scandolo, Bartosz Regula, Robert W.
Spekkens, Marco Tomamichel, Nolan Wallach, Xin Wang, Mark M. Wilde, Andreas Winter,
and Nicole Yunger Halpern for their invaluable contributions.
My sincere appreciation goes to Julio Inigo De Vicente Majua for his meticulous review of
the chapter on multipartite entanglement, offering numerous improvements and corrections.
Special thanks to Thomas Theurer, whose relentless feedback on various drafts has been
instrumental in refining this work. Additionally, I am grateful to Mark M. Wilde for his
advice on enhancing the notations and clarity of presentation.
I owe a debt of gratitude to the many students I’ve had the privilege of interacting with
over the years. Their keen observations and identification of errors and typos in various
drafts have been crucial in shaping the final content of this book. I thank John Burniston,
Nuiok Decaire, Raz Firanko, Kimberly Golubeva, Michael Grabowecky, Alexander Hickey,
Takla Nateeboon, Gaurav Saxena, Kuntal Sengupta, Guy Shemesh, Samuel Steakley, Goni
Yoeli, and Elia Zanoni for their contributions.
Lastly, but most importantly, I wish to express my heartfelt appreciation to my family.
To my parents, Iris and Gideon Gour, whose unwavering support and belief in my pursuits
have been the bedrock of my resilience and determination. Their love and encouragement
have been a constant source of strength throughout this journey. To my children, Sophia
and Elijah Gour, who have been my greatest source of inspiration. Their curiosity and
enthusiasm for life remind me daily of the joy of discovery and the importance of sharing
knowledge. And to my life partner, Eve Zhang, whose endless support and understanding
have been nothing short of miraculous. Her presence and encouragement have been my
guiding light, helping me navigate the challenges of this endeavor and crossing the finish
line of completing this book. My journey is as much theirs as it is mine, and I am eternally
grateful for their love, patience, and sacrifice.
9
10 CONTENTS
First letters of the English alphabet, such as A, B, and C, are used to denote both quantum
physical systems and their corresponding Hilbert spaces. The letter R is used to denote a
quantum reference physical system (and its corresponding Hilbert space), and sometimes the
letter E is used to denote the environment system. The last letters of the English alphabet,
such as X, Y , and Z are used to denote classical systems or classical registers. The dimension
of a Hilbert space is denoted with vertical lines; e.g. the dimension of systems A, B, and
X, are denoted respectively as |A|, |B|, and |X|. The tilde symbol above a system always
represents a replica of the system. For example, Ã represents another copy of system A and
in particular |A| = |Ã|.
We use |ΩAÃ ⟩ to denote the unnormalized maximally entangled state x∈[m] |xx⟩, and
P
ψ, ϕ, Ω, Φ to denotes respectively the rank one pure states |ψ⟩⟨ψ|, |ϕ⟩⟨ϕ|, |Ω⟩⟨Ω|, |Φ⟩⟨Φ|.
11
12 CONTENTS
13
14 CONTENTS
Pure(A) The set of all pure (i.e. rank one) density matrices in D(A)
Herm(A) The set of all Hermitian operators in L(A)
L(A → B) The set of all linear operators form L(A) to L(B)
Herm(A → B) The subset {E ∈ L(A → B) : E(ρ) ∈ Herm(B) ∀ ρ ∈ Herm(A)}
CP(A → B) The set of all completely positive maps in L(A → B)
CPTP(A → B) The set of all quantum channels in L(A → B)
Pos(A → B) The set of all positive maps in L(A → B)
idA The identity element (channel) of L(A → A)
#f The Kobu-Ando Operator Mean (Definition B.5.1)
Irr(π) The set of all irreps (up to equivalency) appearing in the
decomposition of π
In n × n identity matrix
IA The identity operator in L(A)
uA The maximally mixed state in L(A)
u(n) The uniform probability vector in Prob(n)
1n The column vector (1, . . . 1)T in Rn
|ΩAÃ ⟩
P
The unnormalized maximally entangled state x∈[m] |xx⟩
|ΦAÃ ⟩ The normalized maximally entangled state √1
P
x∈[m] |xx⟩
|A|
Eff(A) The set of all effects in Pos(A); i.e. Λ ∈ Eff(A) if and only if 0 ⩽ Λ ⩽ I A .
Im(T ) The image of T ∈ L(A, B).
Ker(T ) The kernel of T ∈ L(A, B).
supp(T ) The support subspace of T ∈ L(A, B).
supp(p) The set {x ∈ [n] : px > 0}, where p = (p1 , . . . , pn )T ∈ Prob(n).
ρ≪σ Inclusion of supports; supp(ρ) ⊆ supp(σ) for ρ, σ ∈ D(A).
spec(H) The set of all distinct eigenvalues of an Hermitian operator H ∈ Herm(A).
Introductory Material
1.1 Introduction
A recurring theme in the field of physics is the endeavor to unify a variety of distinct physical
phenomena into a comprehensive framework that can offer both descriptions and explana-
tions for each of them. One of the most astounding achievements in this endeavor is the
unification of fundamental forces. When physicists realized that the forces of electricity and
magnetism could be elegantly described using a single framework, it not only substantially
enhanced our comprehension of these forces but also gave birth to the expansive domain of
electromagnetism.
The remarkable success in unifying forces serves as a testament to the fact that seemingly
unrelated phenomena can often be traced back to a common origin. This approach extends
beyond the realm of forces and finds resonance in the burgeoning field of quantum information
science. Within this field, a novel discipline has emerged, which seeks to identify shared
characteristics among seemingly disparate quantum phenomena. The overarching theme
of this approach lies in the recognition that various attributes of physical systems can be
defined as “resources.” This recognition not only alters our perspective on these phenomena
but also seamlessly integrates them within a comprehensive framework known as “quantum
resource theories.”
For example, take the case of quantum entanglement. In the 1990s, it was transformed
from a topic of philosophical debates and discussions into a valuable resource. This trans-
formative shift revolutionized our perception of entanglement; it evolved from being an
intriguing and non-intuitive phenomenon into the essential driving force behind numerous
quantum information tasks. This new perspective on entanglement opened up a vast array
of possibilities and applications, starting with its utilization in quantum teleportation and
superdense coding. Today, entanglement stands as a fundamental resource in fields such as
quantum communication, quantum cryptography, and quantum computing.
Given the success of entanglement theory, it is only natural to explore other physical
phenomena that can also be recognized as valuable resources. Currently, there are several
quantum phenomena that have been identified as such. These encompass areas such as
17
18 CHAPTER 1. INTRODUCTORY MATERIAL
quantum and classical communication, athermality (within the realm of quantum thermo-
dynamics), asymmetry, magic (in the context of quantum computation), quantum coherence,
Bell non-locality, quantum contextuality, quantum steering, incompatibility of quantum mea-
surements, and many more. The recognition of all these phenomena as resources enables us
to unify them under the umbrella of quantum resource theories.
Resource theories serve as a crucial framework for addressing complex questions. They
aim to unravel puzzles such as determining which sets of resources can be transformed into
one another and the methods by which such conversions can occur. Additionally, they explore
how to measure and detect different resources. If a direct transformation between particular
resources is not feasible, resource theories examine the possibility of non-deterministic con-
versions and the computation of their associated probabilities. The introduction of catalysts
into the equation further deepens the inquiry.
This investigative approach often yields profound insights into the underlying nature
of the physical or information-theoretic phenomena under scrutiny—such as entanglement,
asymmetry, athermality, and more. Furthermore, this perspective provides a structured
framework for organizing theoretical findings pertaining to these phenomena. As demon-
strated by the evolution of entanglement theory, the resource-theoretic perspective possesses
the potential to revolutionize our understanding of familiar subjects.
In this context, chemistry exemplifies this framework, elucidating how abundant collec-
tions of chemicals can be converted into more valuable products. Similarly, thermodynamics
fits this mold by addressing inquiries about the conversion of various types of nonequilibrium
states—thermal, mechanical, chemical, and more—into one another, including the extraction
of useful work from heat baths at differing temperatures.
Within the realm of quantum resource theories, a fundamental challenge arises in identi-
fying equivalence classes of quantum systems that can be reversibly interconvert (or simulate
each other) when considering an abundance of resource copies, and determining the rates at
which these interconversions occur. The relative entropy of a resource plays a pivotal role
in such reversible transformations, gauging the resourcefulness of a system by quantifying
its deviation from the set of free (non-resourceful) systems. Remarkably, this function uni-
fies essential (pseudo) metrics across seemingly disparate scientific domains. For instance,
the relative entropy of a resource manifests as free energy in thermodynamics, entangle-
ment entropy in pure state entanglement theory, and the entanglement-assisted capacity of
a quantum channel in quantum communication; see Fig. 1.1.
sulting in a proliferation of publications and the development of new tools and mathematical
methods that firmly underpin this area of study.
In light of the extensive literature in the field of quantum information science, one might
understandably question the need for yet another book on quantum resources. Isn’t this
territory already covered in existing quantum information textbooks? For instance, quantum
Shannon theory can be seen as a theory of interconversions among different types of resources,
and Wilde [232] and Watrous [230] have produced outstanding books delving into these
topics. Additionally, detailed treatments of subjects covered in this book, such as quantum
divergences and Rényi entropies, can be found in Tommamichel’s noteworthy work [208].
While it is accurate to say that many of the topics covered in this book are available
elsewhere, what distinguishes this book is its unique approach. It explores well-trodden
subjects like entropy, uncertainty, divergences, non-locality, entanglement, and energy from
a fresh perspective rooted in resource theories. Specifically, the book adopts an axiomatic
approach to rigorously introduce these concepts, providing illustrative examples. Only then
does it transition to operational aspects that involve the examples discussed.
Take, for instance, the topic of conditional entropy, a subject widely covered in numerous
textbooks in both classical and quantum information theory. This book, however, offers a
distinctive approach by presenting this concept from three distinct perspectives: axiomatic,
constructive, and operational. Notably, all three perspectives converge to the same notion of
conditional entropy. This approach not only provides the reader with a deeper understanding
of the concept but also underscores its robust foundation.
The primary goal of this book is pedagogical in nature, with the hope of providing readers
with a contemporary perspective on quantum resource theories. It aspires to equip readers
with the necessary physical principles and advanced mathematical techniques required to
comprehend recent advancements in this field. Upon completing this book, readers should
have the ability to explore open problems and research directions within the field, some of
which will be highlighted in the text.
In anticipation of a diverse readership, this book is designed to be inclusive, targeting
both graduate students and senior undergraduate students who possess a foundational un-
derstanding of linear algebra. It aims to provide them with a comprehensive resource for
delving into this fascinating field. Simultaneously, the book serves as a reference, offering
fresh insights and innovative approaches that researchers in the early stages of their careers
may find valuable. With numerous examples and exercises, it aims to serve as a textbook
for courses on the subject, enhancing the learning experience for students.
While the primary audience for this textbook consists of entry-level graduate students
interested in pursuing research at the master’s or Ph.D. level in quantum resource theories,
encompassing quantum information science, it may also prove valuable to researchers in fields
influenced by quantum information and resource theories, such as quantum thermodynamics
and condensed matter physics. They may find this book to be a useful and accessible
reference source.
Although we have endeavored to make the book self-contained, a basic understanding of
linear algebra is essential. The goal was to create a resource accessible to graduate students
from diverse backgrounds in mathematics, physics, and computer science. As a result, the
book includes preliminary chapters and several appendices that fill potential knowledge gaps,
given the interdisciplinary nature of the subject matter.
Quantum resource theories constitute a vast research area, with new properties of physical
systems continually being recognized as resources. Consequently, the aim of this book is
not to exhaustively cover all resource theories but rather to select those that illustrate the
techniques used in quantum resource theories effectively. On the technical front, we have
chosen to begin with the modern single-shot approach and employ it to derive asymptotic
rates. Historically, asymptotic rates were studied first, but from a pedagogical standpoint,
it is more intuitive to start with the single-shot regime.
To the best of our knowledge, there are currently no dedicated books specifically focused
on quantum resource theories. With this book, we hope to contribute to the field by providing
a comprehensive overview and integrating both new and existing results within a unified
framework. While we do not claim this book to be the ultimate authority, we believe it can
serve as a valuable reference that consolidates ideas scattered across various journal articles,
addressing the need for a centralized resource in the field of quantum resource theories.
Part I: The opening section of this book is thoughtfully designed to cater to readers who may
not possess prior knowledge of quantum mechanics or quantum information. Within
this segment, we embark on a rigorous mathematical journey through quantum the-
ory, emphasizing precise definitions and mathematical proofs of fundamental physical
theorems. Key subjects covered in this section encompass quantum states, general-
ized quantum measurements, quantum channels, POVMs, and more. Moreover, this
section extends its reach beyond the boundaries of quantum theory, delving into top-
ics such as Ky-Fan norms, the Strømer-Woronowicz theorem, the Pinching Inequality,
the Reverse Hölder Inequality, certain hidden variable models, and other subjects that
may not commonly cross the paths of graduate students in physics, mathematics, or
computer science. Therefore, even those well-versed in these topics may find it bene-
ficial to skim through this chapter briefly, as it has the potential to reveal previously
undiscovered insights.
Part II: The second section of this book delves deep into the methodologies and tools employed
within the realm of quantum resource theories and quantum information. While it
explores numerous quantum information concepts, it distinguishes itself from conven-
tional quantum information theory textbooks. The introductory chapter of this section
provides an all-encompassing mathematical review of majorization theory, encapsulat-
ing recent groundbreaking discoveries, such as relative majorization, conditional ma-
jorization, and the intersection of probability theory with this field.
Subsequent chapters in this section adopt a distinctive approach to elucidate concepts
associated with metrics, divergences, and entropies. These notions are introduced
and dissected using techniques and insights drawn from the framework of quantum
resource theories. For instance, entropy, conditional entropy, relative entropies, and
other divergences are introduced as additive functions that adhere to monotonicity
under the set of free operations, a foundational concept in quantum resource theories.
The final chapter in this part of the book is dedicated to the asymptotic regime,
focusing on the consequences of the “law of large numbers” in quantum information
and quantum resource theories. This chapter introduces concepts such as weak and
strong typicality, the method of types, classical and quantum hypothesis testing, and
the symmetric subspace. These tools prove particularly valuable in the asymptotic
domain of quantum resource theories when exploring inter-conversion rates among
infinitely many resources.
In summary, although the contents of this second section share some commonalities
with conventional quantum information theory textbooks, they diverge significantly
by presenting concepts and tools in a unique manner. Rather than employing Venn
diagrams to define key concepts like entropy, this part of the book aims to provide a
comprehensive and rigorous approach to precisely define these concepts by employing
axiomatic, constructive, and operational approaches. Leveraging the framework of
quantum resource theories, this section offers a fresh and innovative perspective on
these familiar topics.
Part III: In the third section of the book, we delve into the fundamental framework of quantum
resource theories. Our journey begins with a meticulous mathematical elucidation of
a quantum resource theory. We proceed to examine its foundational principles, in-
cluding but not limited to the golden rule of free operations, resource non-generating
operations, physically implementable operations, convex and affine resource theories,
state-based resource theories, as well as resource witnesses and their associated prop-
erties.
Next, we delve into the quantification of quantum resources. In this context, we in-
troduce a plethora of resource measures and resource monotones, delving deep into
their properties, which include additivity, sub-additivity, convexity, strong monotonic-
ity, and asymptotic continuity. These concepts form the bedrock of quantum resource
theories, and understanding them is pivotal.
Resource monotones and resource measures offer a valuable means of quantifying re-
sources. Our emphasis is on divergence-based resource measures, such as the relative
entropy of a resource, given their operational interpretations across various resource
theories. We also explore techniques for computing these measures, including semidefi-
nite programming, and delve into a practical approach for “smoothing” these measures,
a technique commonly employed in single-shot quantum information science.
Concluding this section of the book, we introduce a rich array of resource intercon-
version scenarios. These encompass exact interconversions, stochastic (probabilistic)
interconversions, approximate interconversions, and asymptotic interconversions. We
delve into essential tools intricately linked to resource interconversions, such as the
conversion distance within the single-shot regime, the asymptotic equipartition prop-
erty, and the quantum Stein’s lemma within the asymptotic domain. Additionally, we
explore the uniqueness of the Umegaki relative entropy within the context of quantum
resource theories. Our investigation extends to the evaluation of both the cost and
distillation of resources, examining these processes within both the single-shot and
asymptotic regimes. We have encapsulated the essence of this section of the book in
Figure 1.2.
Part IV: The fourth section of this book is dedicated to the quintessential exemplar of quantum
resource theories, often referred to as the “poster child” – entanglement theory. This
section comprises three chapters, each focusing on distinct facets of entanglement.
The first chapter delves into the realm of pure bipartite entanglement, followed by the
second chapter, which explores mixed bipartite entanglement. The third chapter, in
turn, delves into the intricacies of multipartite entanglement.
Within these chapters, we leverage the techniques and concepts developed in parts II
and III to delve into the theory of entanglement. This enables us to furnish a precise
definition of quantum entanglement and undertake a comprehensive examination of its
detection, manipulation, and quantification. Notably, the first of these three chapters
serves as the cornerstone, offering an in-depth exploration of pure bipartite entangle-
ment, which forms the foundational knowledge upon which the subsequent chapters on
mixed and multipartite entanglement build.
Part V: The fifth part comprises three chapters, with the first two chapters focusing on asym-
metry and non-uniformity, laying the groundwork for the third chapter on quantum
thermodynamics. In this part of the book, we reveal that athermality, the resource
essential for thermodynamic tasks, consists of two components: time-translation asym-
metry and non-uniformity.
The first chapter explores the resource theory of asymmetry, introducing an operational
framework that arises from practical constraints when multiple parties lack a common
shared reference frame. This theory has found numerous applications in quantum
information and beyond.
The second chapter delves into the resource theory of non-uniformity. In this theory,
maximally mixed states are considered free, while all other states are regarded as
valuable resources. This theory can be seen as a unique variant of thermodynamics,
involving completely degenerate Hamiltonians. Indeed, we introduce this chapter to
serve as a gentle introduction to the world of quantum thermodynamics.
Finally, in the third chapter of this section, we dive into quantum thermodynamics.
Throughout the book, whenever we introduce a new quantum resource theory, we
adhere to the structured framework outlined in Figure 1.2.
Part VI: The final section of the book serves as a comprehensive resource aimed at ensuring
the self-containment of the entire text. It exclusively includes material that directly
complements the core content of the book.
In the initial three chapters, we delve into key subjects: convex analysis, operator
monotonicity, and representation theory. It’s important to note that each of these
topics is vast in its own right, with numerous dedicated books solely focused on repre-
sentation theory or convex analysis, for example. In this section, we have thoughtfully
curated and presented the aspects of these topics that are pertinent to our book’s
core themes. Our approach emphasizes utilizing quantum notations and placing a
strong emphasis on furnishing all the essential elements needed to ensure the book’s
self-contained nature.
1
|ΨAB
− ⟩ = √ (|01⟩ − |10⟩) . (1.2)
2
Furthermore, let’s consider the scenario where Alice possesses an additional electron in
her system, characterized by a quantum state |ψ Ã ⟩ = a|0⟩+b|1⟩. Importantly, both Alice and
Bob lack knowledge regarding the spin state of this electron, which means they are unaware
of the specific values of a and b. According to the principles of quantum mechanics, the
collective quantum state of these three electrons—two under Alice’s control and one under
Bob’s—is described by the tensor product:
1
|ψ Ã ⟩ ⊗ |ΨAB
− ⟩ = √ (a|0⟩ + b|1⟩) ⊗ (|01⟩ − |10⟩)
2
(1.3)
Openning 1
Parentheses→ =
√ a|001⟩ + b|101⟩ − a|010⟩ − b|110⟩ .
2
It’s noteworthy that in our description above, we represented the state |ψ Ã ⟩⊗|ΨAB− ⟩ using
the computational basis of the vector space ÃAB. However, we can achieve a more insightful
representation by substituting the computational basis |00⟩, |01⟩, |10⟩, |11⟩ of system ÃA with
the Bell basis consisting of |ΦÃA √1 (|00⟩ ± |11⟩) and |ΨÃA ⟩ = √1 (|01⟩ ± |10⟩). This
± ⟩ = 2 ± 2
substitution allows us to express the state as follows:
1h
|ψ⟩Ã ⊗ |ΨAB a |ΦÃA ÃA ÃA ÃA
− ⟩ = + ⟩ + |Φ− ⟩ |1⟩ + b |Ψ+ ⟩ − |Ψ− ⟩ |1⟩
2
i
− a |ΨÃA ÃA ÃA ÃA
+ ⟩ + |Ψ − ⟩ |0⟩ − b |Φ + ⟩ − |Φ − ⟩ |0⟩
(1.4)
1 h ÃA ÃA
Collecting terms→ = |Φ+ ⟩(a|1⟩ − b|0⟩) + |Φ− ⟩(a|1⟩ + b|0⟩)
2 i
+ |ΨÃA + ⟩(b|1⟩ − a|0⟩) − |ΨÃA
− ⟩(a|0⟩ + b|1⟩)
Therefore, if Alice performs the Bell measurement on her two qubits ÃA, i.e. the basis
(projective) measurement
n o
ÃA ÃA ÃA ÃA ÃA ÃA ÃA ÃA
P0 = |Ψ− ⟩⟨Ψ− |, P1 = |Φ− ⟩⟨Φ− |, P2 = |Φ+ ⟩⟨Φ+ |, P3 = |Ψ+ ⟩⟨Ψ+ | , (1.5)
she will get with equal probability four possible outcomes (denoted x = 0, 1, 2, 3, and global
phase is ignored):
Simplification
Outcome Post-Measurement State (Up to a global phase)
x=0 |ΨÃA
− ⟩ ⊗ (a|0⟩ + b|1⟩) |ΨÃA
− ⟩ ⊗ |ψ⟩
x=1 |ΦÃA
− ⟩ ⊗ (a|1⟩ + b|0⟩) |ΦÃA
− ⟩ ⊗ σ1 |ψ⟩
x=2 |ΦÃA
+ ⟩ ⊗ (a|1⟩ − b|0⟩) |ΦÃA
+ ⟩ ⊗ σ2 |ψ⟩
x=3 |ΨÃA
+ ⟩ ⊗ (b|1⟩ − a|0⟩) |ΨÃA
+ ⟩ ⊗ σ3 |ψ⟩
where we denoted by {σx }x=0,1,2,3 , the identity matrix σ0 = I2 , and the 3 Pauli matrices
σ1 , σ2 , σ3 . Hence, up to a global phase, Bob’s state after outcome x occurred is σx |ψ⟩. After
Alice sends (via a classical communication channel) the measurement outcome x to Bob,
Bob can then perform the unitary operation Ux = σx to obtain the state
Therefore, by using shared entanglement, and after transmitting two classical bits (cbits),
Alice was able to transfer her unknown qubit state |ψ⟩ to Bob’s side.
If Bob did not receive the classical message from Alice, then his state is one of the four
states {σx |ψ⟩}x=0,1,2,3 . Since he does not know x, from his perspective his state is (see
Exercise 1.4.1)
3
1X 1
ρ= σx |ψ⟩⟨ψ|σx = I . (1.7)
4 x=0 2
That is, without the knowledge of x, Bob’s resulting state is the maximally mixed state, and
contains no information about |ψ⟩.
Figure 1.3: Quantum teleportation. Single-line arrows correspond to quantum systems. Double
line arrows correspond to classical systems.
Hint: Prove first that the left-hand side of the equation above is invariant under a conjugation
by σx .
Exercise 1.4.2. Show that if instead of the singlet state |ΨAB
− ⟩ above, Alice and Bob share
another maximally entangled state |Φ ⟩ (i.e. the reduced density matrix of |ΦAB ⟩ is the
AB
maximally mixed state), then, by modifying slightly the protocol, they can still teleport an
unknown quantum state from Alice to Bob.
The protocol above can be generalized in several different ways. First, in Exercise 1.4.3
you will generalize it to d-dimensions. Moreover, in general, if Alice and Bob do not share
the singlet state, but instead their particles are prepared in some other non-seperable state
(i.e. entangled state, but not maximally entangled state) ρAB ∈ D(AB), then typically
perfect/faithful teleportation will not be possible. Still in this case one can design a protocol
achieving quantum teleportation with probability that is less than one (see Exercise 1.4.4),
and/or in the end of the protocol the state in Bob’s lab is not exactly equal to Alice’s original
state |ψ⟩Ã but only close to it up to some treshold. Thus, the protocol described above is
called faithful teleportation, since the protocol teleport perfectly |ψ⟩ form Alice to Bob with
100% success rate.
Exercise 1.4.3. Let |ΦAB ⟩ := √1d z∈[d] |zz⟩ be a 2-qudit (normalized) maximally entangled
P
AB
1. Show that {|ψxy ⟩}x,y∈[d] is an orthonormal basis of AB.
AB
2. Show that the reduced density matrix of |ψxy ⟩ is the maximally mixed state for all
x, y ∈ [d].
3. Find a protocol for faithful teleportation of a qudit from Alice’s lab to Bob’s lab. Assume
that the joint measurement that Alice performs on her two qudits is a basis measurement
AB
in the basis {|ψxy ⟩}x,y∈[d] . What are the unitary operators performed by Bob? How
many classical bits (cbits) Alice transmits to Bob?
√
Exercise 1.4.4. Suppose Alice and Bob share the state |ψ AB ⟩ = 21 |00⟩ + 23 |11⟩. Show that
there exists a 2-outcome (basis) measurement that Alice can perform, such that with some
probability greater than zero, the state of Alice and Bob after the measurement becomes the
maximally entangled state |ΦAB √1
+ ⟩ = 2 (|00⟩ + |11⟩).
So far we assumed that the teleported state is a pure state. However, the exact same
protocol works even if the unknown state |ψ⟩ is replaced with a mixed state ρ. This is
because we can view any mixed state as some ensemble of pure states {px , |ψx ⟩} in which
the parameter x is unknown. Irrespective of the value of x, the protocol above will teleport
|ψx ⟩ from Alice to Bob, and thereby, given
P that the value of x is unknown, Alice effectively
teleported to Bob the mixed state ρ := x px |ψx ⟩⟨ψx |. Alternatively, note that the quantum
teleportation protocol in Fig. 1.3 can be described as a realization of the identity quantum
channel id ∈ CPTP(A → B) (with |A| = |B| := d) given by
X h ∗ i
A→B A ÃA B A ÃB ÃA B
id (ρ ) = TrAÃ Px ⊗ Ux ρ ⊗Φ Px ⊗ Ux (1.10)
x∈[d2 ]
where {PxÃA }x∈[d2 ] corresponds to the measurement on systems à and A in the maximally
entangled basis, Ux is the unitary performed by Bob after he received the value x from
Alice, and ΦAB is the maximally entangled state on system AB. The quantum teleportation
protocol states that there exists {PxÃA } and {Ux } such that the quantum channel idA→B
above is indeed the identity channel. Although, in the protocol above we proved it only for
pure input states |ψ⟩⟨ψ|, from the linearity of the quantum channel idA→B , it follows that
idA→B is the identity quantum channel on all mixed states.
Exercise 1.4.5. [Entanglement Swapping] Consider four qubit systems A, B, C, and
D, in the double-singlet state |ΨAB CD
− ⟩ ⊗ |Ψ− ⟩.
Taking Ux = σx to be the four Pauli matrices (with σ0 = I2 ) we get that the four states
{|ψxAB ⟩}3x=0 are orthonormal and form a basis of C2 ⊗ C2 . In fact, this is the Bell basis we
encountered in the previous subsection. In the next step of the protocol, Alice sends her
electron (over a noiseless quantum communication channel) to Bob. Upon receiving Alice’s
electron, Bob has in his lab two electrons in the state |ψxAB ⟩. Given that the set of states
{|ψxAB ⟩}3x=0 form an orthonormal basis, in the last step of the protocol, Bob performs a joint
basis measurement on his two electrons, in the basis {|ψxAB ⟩}3x=0 , and thereby learns the
outcome x. The outcome x is the message that Alice intended to send Bob.
Exercise 1.4.6. Show that the set of states {|ψxAB ⟩}3x=0 is an orthonormal basis of C2 ⊗ C2 .
Exercise 1.4.7. Let |ΦAB ⟩ := √1d z∈[d] |zz⟩ be a maximally entangled state in Cd ⊗ Cd .
P
Show that Alice can use it to transmit to Bob 2 log2 (d) cbits.
Figure 1.5: Superdense Coding. Double-lines corresponds to classical systems, and single lines to
quantum systems.
Note that if entanglement is not considered as a resource, that is, the parties are supplied
with unlimited singlet states, then we can remove the ebit cost [qq] in (1.12) and (1.13) and
get that for teleportation 2[c → c] ⩾ [q → q] and for superdense coding [q → q] ⩾ 2[c → c].
This makes teleportation and superdense coding the dual protocols of each other, and in this
case we can say that [q → q] = 2[c → c].
However, in almost all practical scenarios, entanglement is an expensive resource that can
be difficult to generate over long distances and that is also highly sensitive to decoherence and
noise. Therefore, specifically pure maximally entanglement is scarce, and must be treated
as a resource. The question then becomes if it is possible to change slightly the protocols
of teleportation and superdense coding making them more symmetric, in the sense that the
two resource inequalities in (1.12) and (1.13) merge into a single resource equality. This is
indeed possible if we replace 2[c → c] in the right-hand side of (1.13) with two uses of an
isometry channel known as the coherent bit channel.
We will denote by
VZ (ρ) := V ρV ∗ , ∀ ρ ∈ L(A) , (1.15)
where the subscript Z indicate that the basis {|0⟩, |1⟩} is an eigenbasis of the third Pauli
operator (i.e., eigenvectors of the spin observable in the z-direction). One can define V with
respect to other bases. For example, we will denote by VX (·) = U (·)U ∗ the coherent bit
channel with respect to the basis {|+⟩, |−⟩}, where U is the isometry defined by U |±⟩A =
|±⟩A |±⟩B .
How is this resource related to other resources? First note that with such a resource Alice
can transmit a classical bit to Bob. Indeed, Alice can encode a cbit x ∈ {0, 1} in the state
|x⟩A and send it over the channel VZ . Then, Bob receives |x⟩B on his system and performs
a basis measurement to learn x. We therefore have
[q → qq] ⩾ [c → c] . (1.16)
The exercise below shows that we also have [q → qq] ⩾ [qq] . Among other things, this also
implies that [c → c] ̸⩾ [q → qq] or in other words, [c → c] is a strictly less resourcefull than
[q → qq].
Exercise 1.5.1. Show that VZ |+⟩⟨+|A = |ΦAB AB
+ ⟩⟨Φ+ |.
Figure 1.6: Coherent Superdense Coding. One ebit plus one use of a noiseless qubit channel are
implemented to realize two uses of the cobit channel.
Coherent superdense coding protocol (see Fig. 1.6) consists of several steps. Initially,
Alice and Bob share the maximally entangled state |ΦAB + ⟩. Alice then prepares an input
A1 A2
state |x⟩ |y⟩ so that Alice and Bob’s initial state (time t0 is the figure) is
Alice then performs a sequence of two controlled unitary gates, controlled X on system A2
and A, followed by controlled Y gate on system A1 and A. The resulting state at time t1 is
where Z x equals the identity matrix for x = 0, and the third Pauli matrix for x = 1 (X y is
defined similarly). A key observation is that {|ϕAB
xy ⟩}x,y∈{0,1} is precisely the Bell basis, and
therefore forms an orthonormal basis for C ⊗C . Note also that this encoding (x, y) → |ϕAB
2 2
xy ⟩
is done by Alice alone, and therefore essentially identical to the superdense coding protocol
we encountered earlier.
In the next step Alice uses a noiseless qubit channel to transmit system A to Bob.
Therefore, at time t2 the state of the system is |x⟩A1 |y⟩A2 |ϕBxy
1 B2
⟩, where as before,
|ϕB 1 B2
⟩ := Z x X y ⊗ I B2 |ΦB1 B2
xy + ⟩. (1.20)
|xy⟩B1 B2 = U B1 B2 |ϕB1 B2
xy ⟩ ∀ x, y ∈ {0, 1} . (1.21)
It turns out that the unitary U AB as defined above can be expressed as a CNOT gate followed
by a Hadamard gate on system B1 (see Bob’s side in Fig. 1.6 between time steps t2 and t3 ).
Explicitly,
U B1 B2 = H|0⟩⟨0|B1 ⊗ I B2 + H|1⟩⟨1|B1 ⊗ X B2
(1.22)
= |+⟩⟨0|B1 ⊗ I B2 + |−⟩⟨1|B1 ⊗ X B2
Hence, after the application of the unitary U B1 B2 on Bob’s system, Alice and Bob state is
|x⟩A1 |y⟩A2 |x⟩B1 |y⟩B2 . That is, the quantum circuit in Fig. 1.6 simulates the linear transfor-
mation
|x⟩A1 |y⟩A2 → |x⟩A1 |x⟩B1 ⊗ |y⟩A2 |y⟩B2 (1.23)
which is equivalent to two coherent channels. The resources we used to simulate these two
coherent channels are precisely the same ones used in superdense coding to simulate two
noiseless classical channels.
Exercise 1.5.2. Show that the unitary matrix U B1 B2 above satisfies (1.21).
Exercise 1.5.3. Suppose the initial ebit shared between Alice and Bob was given in the
singlet state |ΨAB AB
− ⟩ instead of |Φ+ ⟩, and consider the exact same protocol as in Fig. 1.6,
until time step t2 . Revise the unitary matrix U B1 B2 after time step t2 so that the protocol
still simulates 2 coherent channels.
Figure 1.7: Coherent Quantum Teleportation. Two cobit channels produce one ebit plus one use
of a noiseless qubit channel.
In the next step system A goes through the second cobit channel, VX , yielding the state
1
√ |+⟩A |+⟩B2 a|0⟩B1 + b|1⟩B1 + |−⟩A |−⟩B2 a|0⟩B1 − b|1⟩B1 .
(1.26)
2
Finally, in the last step Bob sends his systems through a CNOT gate. Note that X|+⟩ = |+⟩
so that the CNOT gate only changes |−⟩B2 |1⟩B1 to −|−⟩B2 |1⟩B1 while keeping all the other
terms intact. Hence, after Bob’s CNOT gate, Alice and Bob share the state
1
√ |+⟩A |+⟩B2 a|0⟩B1 + b|1⟩B1 + |−⟩A |−⟩B2 a|0⟩B1 + b|1⟩B1 = |ΦAB B1
+ ⟩|ψ ⟩ . (1.27)
2
2
That is, at the end of the protocol Alice teleported her quantum state |ψ⟩ to Bob’s system
B1 , and also share with Bob’s system B2 the maximally entangled state ΦAB+ .
2
Coherent quantum teleportation and coherent superdense coding demonstrate that two
cobit channels have the same resource value as one ebit and one use of a qubit channel.
This means that coherent teleportation is the reversal process of coherent superdense coding
and vice versa.
exotic tasks such as quantum teleportation. Moreover, these protocols are considered as the
unit protocols (see e.g. [232]), since they form the building blocks with which one studies
the capabilities of noisy quantum channels to transmit information in asymptotic settings
involving many uses of the channels.
The resource analysis using notations such as [q → q] was first introduced in [62] were
the rules of this ‘resource calculus’ developed. The coherent bit, coherence teleportation,
and coherent superdense coding is due to [112]. More details on coherent communication
can be found in the book of Wilde [232].
Preliminaries
37
CHAPTER 2
Quantum mechanics, which was discovered during the first quarter of the twentieth century,
has profoundly transformed our understanding of the world around us. The theory pre-
dicts a plethora of non-intuitive phenomena, including entanglement, quantum non-locality,
wave-particle duality, coherence, the uncertainty principle, quantum contextuality, quantum
steering, and no-cloning, to name a few. This remarkable departure from classical physics
has led to numerous thought-provoking papers and interpretations of quantum mechanics.
To date, there is no consensus on which view of quantum mechanics is the most natural one
to adopt. Additionally, sub-fields of science, such as quantum logic, have arisen from these
phenomena, particularly the inconsistency of classical logic and the uncertainty principle.
The development of quantum mechanics was a gradual process that involved many trials
and errors. It began in 1900 with Max Planck’s discretization of energy values used to
solve the black body radiation. The process continued with Einstein’s 1905 paper on the
correspondence between energy and frequency, which provided a quantum explanation for
the photoelectric effect. The process ultimately ended with the formalism that was developed
in the mid-1920s by Erwin Schrödinger, Werner Heisenberg, Max Born, and others.
In this book, we will not review quantum mechanics from a historical or traditional per-
spective. Instead, we will study it in the context of the modern field of quantum information
science, which emerged and developed in the early 1990s. We will discuss the theory’s basic
postulates, its corresponding mathematical structure, and its many consequences and appli-
cations, particularly to information theory, resource theories, and more broadly to physics
and science.
The interplay between the concept of ‘information’ and the field of quantum science is a
complex and multifaceted one. The fundamental principles of quantum mechanics, such as
the superposition of states and entanglement, have led to the development of quantum infor-
mation theory, which studies the processing and transmission of information using quantum
systems. Moreover, the very act of observing a quantum system can alter its state, and
this observation is itself a form of information. This has profound implications for our un-
39
40 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS
derstanding of the nature of reality and the limits of our ability to measure it. Thus, the
relationship between ‘information’ and quantum science is a rich and nuanced one, encom-
passing a broad range of topics from the foundations of quantum mechanics to the practical
applications of quantum technologies.
In Shannon’s terms, information is defined as “that which can distinguish one thing from
another.” For instance, a coin has two sides, “head” and “tail,” and the ability to differentiate
between the two implies that the coin can store information. Typically, when referring to
the distinguishability of two elements, we denote the options as 0 and 1, and information is
then measured in bits, where the number of bits represents the number of distinguishable
elements. For example, two bits correspond to four possible elements.
The definition of information is abstract and detached from any specific implementation
or labeling. For instance, one bit may correspond to the head or tail of a coin, or the 5-
volt versus 0-volt of an electrical circuit. All information processing, such as communication,
computation, and manipulation of information, can be performed with either coins or electri-
cal circuits. While storing information in coins is impractical (especially for large numbers
of bits), from an information-theoretic perspective, any object can be used to implement
classical bits, and we say that information is fungible.
However, what happens when we attempt to encode information in the spin of an elec-
tron? The electron, being an elementary particle, is uniquely determined by its mass and
spin. The magnitude of its spin can only take one value, 12 ℏ (where ℏ is a unit of angular
momentum), and its spin can point in any direction. The Stern-Gerlach experiment, dis-
cussed below), demonstrates that it is possible to differentiate between “up” and “down”
along the z-direction of the spin of an electron (see Fig. 2.1). This implies that information
can be encoded in the spin of an electron as well.
Are there any advantages to encoding information in the spins of quantum particles, such
as electrons, as opposed to larger classical systems like coins or electrical circuits? If informa-
tion is fungible, why should encoding information in quantum particles make any difference?
Before addressing these questions, we will introduce the Stern-Gerlach experiment and the
necessary elements from linear algebra for the study of quantum physics.
The SG experiment involves emitting electrons or atoms from an oven and passing them
through a non-homogeneous magnetic field. Their spin orientation determines whether they
hit the screen above or below the horizontal line, with the distance from the line reflecting
the magnitude of their spin. The experiment can be performed on any physical system, and
always yields a discrete spectrum for the angular momentum, given by 21 nℏ where n is an
integer. Regions on the screen in Fig. 2.2 where no particles hit indicate this quantization,
which is true regardless of the type of particle used in the experiment.
The SG experiment has two noteworthy implications. Firstly, electrons always hit the
screen at the same distance from the horizontal line, indicating a consistent magnitude of
1
2
ℏ for their spin. Secondly, all electrons hit the screen in the same two areas, irrespective of
their initial spin direction. Even if an electron’s initial spin is pointing in the x-direction, it
should have zero spin in the z-direction and therefore should not be deflected by the non-zero
z-gradient of the magnetic field. However, the fact that electrons still hit the screen in the
same two areas shows that measuring the spin in one direction affects its value in another
direction.
To verify that an electron’s spin remains intact after it is deflected by the SG experiment,
we can concatenate two SG experiments in a variety of ways. For example, suppose we want
to confirm that an electron deflected upwards in the first SG box has a spin pointing in the
upward z-direction. We can set up the experiment as shown in Fig. 2.3 (a). After passing
through the first SG box, only electrons deflected upwards will continue on to the second
SG box, while those deflected downward are blocked. After passing through the second SG
box, the electrons hit the screen. We observe that no electrons hit the screen below the
horizontal line, indicating that all electrons deflected upwards in the first SG box also have a
spin pointing in the upward z-direction. This implies that after an electron’s spin has been
measured, it remains intact by further measurements in the same direction.
Another interesting feature of the SG experiment is that it allows us to measure the spin
of particles in different directions. For instance, we can measure the spin in the x-direction
by using a modified version of the experiment. In this case, the electrons are deflected to
the left or to the right, depending on their spin in the x-direction. Figure 2.3 (b) shows a
schematic of this experiment, where the first SG box measures the spin in the x-direction,
and the second SG box measures the spin in the z-direction.
The results of this experiment are surprising from a classical point of view. According
to classical physics, any physical system with its angular momentum pointing in the x-
direction should have zero angular momentum pointing in the z-direction. However, in the
SG experiment, we find that 50% of the electrons are deflected upward and 50% are deflected
downward after passing through the second SG box. This shows that the spin in different
directions of a quantum particle is not fully determined by its spin in one particular direction,
and that measurements of spin along different axes can yield nontrivial and unexpected
results.
What happens to a particle with spin in the x-direction when it passes through an SG-
experiment in the z-direction, followed by an SG-experiment in the x-direction? Will its
resulting spin in the x-direction remain the same? Figure 2.4 illustrates such a scenario
with three SG-boxes. The first box filters only particles with spins in the positive (right) x-
direction. The second box measures their spin in the z-direction, and the third box measures
their spin in the x-direction. Remarkably, there is a 50% chance that the resulting particle
will have spin in the negative x-direction. This implies that the measurement in the z-
direction erases the information about the spin in the x-direction completely.
This phenomenon, known as quantum mechanical complementarity, is a fundamental
feature of quantum mechanics and is one of the most profound concepts in physics. It
implies that it is impossible to simultaneously measure certain pairs of physical quantities
with arbitrary precision, such as position and momentum, or spin in different directions. Any
measurement of one quantity necessarily disturbs the other, and the uncertainty principle
sets a fundamental limit on the precision with which they can be measured simultaneously.
Remark. We adopt the notation ⟨ | ⟩ instead of ⟨ , ⟩, as it conforms with the Dirac notation
that we will define shortly. Additionally, while most mathematics textbooks define the
linearity property with respect to the first argument, we will use the aforementioned notation
to suitably align with the Dirac notation. We use the letters A, B, and C to denote Hilbert
spaces since, in quantum physics, Hilbert spaces correspond to physical systems that are
operated on by parties such as Alice, Bob, Charlie, etc.
The inner product induces a norm defined by
∥ψ∥2 := ⟨ψ|ψ⟩1/2 ∀ψ∈A, (2.2)
and a metric
d(ψ, ϕ) := ∥ψ − ϕ∥2 ∀ ψ, ϕ ∈ A . (2.3)
A Norm
Definition 2.2.2. A norm is a real-valued function defined on the vector space, A,
that has the following properties:
Exercise 2.2.1. Show that the norm defined by the inner product in equation (2.2) indeed
satisfies the three fundamental properties of a norm.
Exercise 2.2.2. Let A be a normed space, and let ψ, ϕ ∈ A with ∥ϕ∥ = 1. Show that
1
ψ − ϕ ⩽ 2∥ψ − ϕ∥ . (2.4)
∥ψ∥
1 1
Hint: Write ∥ψ∥
ψ −ϕ= ∥ψ∥
ψ − ψ + ψ − ϕ and use the norm properties.
It is important to note that not all norms are derived from an inner product, and as a
result, not every metric necessarily originates from either an inner product or a norm.
Exercise 2.2.3 (The p-Norms and The Hölder Inequality). The normed space ℓp (Cn ) (with
p ∈ [1, ∞]) is the vector space Cn equipped with the p-norm, ∥ · ∥p , defined on all ψ =
(a1 , . . . , an )T ∈ Cn as:
∥ψ∥p := (|a1 |p + · · · + |an |p )1/p . (2.5)
1 1
The Hölder inequality states that for any p, q ∈ [1, ∞] with p
+ q
= 1 and any ψ =
(a1 , . . . , an )T ∈ Cn and ϕ = (b1 , . . . , bn )T ∈ Cn we have
X
|ax bx | ⩽ ∥ψ∥p ∥ϕ∥q . (2.6)
x∈[n]
Use this to show that ∥ · ∥p is a norm, and that for p ̸= 2 this norm is not induced from an
inner product.
The p-norms above have numerous applications in many fields of science. Of particular
importance, are the extreme cases of p = 1 and p = ∞:
where throughout the book we use the notation [n] := {1, . . . , n} for every integer n ∈ N.
For these cases, the norms above behave monotonically under stochastic matrices as we show
now.
Let S = (sxy ) ∈ STOCH(m, n) be an m × n column stochastic matrix; i.e. a matrix
whose components are non-negative real numbers, and all the columns sums to one. Then,
the 1-norm behaves monotonically under such matrices; i.e.
Indeed, by definition
X X X X X
∥Sv∥1 = sxy vy ⩽ sxy |vy | = |vy | = ∥v∥1 . (2.9)
x∈[m] y∈[n] x∈[m] y∈[n] y∈[n]
Exercise 2.2.5. Show that L2 (R) satisfies all the axioms of a Hilbert space.
To distinguish more clearly between Hilbert spaces and inner product spaces, consider
the space C([a, b]) of continuous complex-valued functions. This infinite-dimensional vector
space is equipped with an inner product given by:
Z b
⟨g, f ⟩ := f (t)g(t)dt ∀ f, g ∈ C([a, b]) . (2.14)
a
However, this space is not a Hilbert space since it is not complete with respect to the metric
induced by this inner product. For example, consider the following sequence of continuous
functions in C([−1, 1]):
0
if t ∈ [−1, 0)
fk (t) := kt if t ∈ [0, k1 ) . (2.15)
1 if t ∈ [ k1 , 1]
Note that while the sequence is Cauchy, its limit does not exist in C([−1, 1]) as fk cannot
converge to a continuous function. However, this book mostly considers finite-dimensional
Hilbert spaces, in which all inner product spaces are complete and isomorphic to Cn (or Rn ).
Thus, such examples are not relevant for our purposes, and we will use the terms ‘Hilbert
space’ and ‘inner product space’ interchangeably. Similarly, in finite dimensions, all normed
spaces are Banach spaces (i.e., complete normed spaces).
⟨M |N ⟩ := Tr [M ∗ N ] (2.16)
where M ∗ denotes the adjoint of M . This specific inner product is referred to as the ‘Hilbert-
Schmidt’ inner product, and it is occasionally denoted with a subscript as ⟨ | ⟩HS .
Exercise 2.2.6. Consider the space Cm×n .
1. Show that the definition in (2.16) satisfies the three axioms of an inner product.
2. Find an isometrical isomorphism vec : Cm×n → Cmn such that for all M, N ∈ Cm×n
where ⟨ | ⟩HS is the Hilbert-Schmidt inner product of Cm×n , and ⟨ | ⟩2 is the standard
inner product of Cmn .
2. Pythagorean theorem: If ψ1 , . . . , ψn are orthogonal vectors, that is, ⟨ψx |ψy ⟩ = 0 for
distinct indices x, y ∈ [n], then
X 2 X
ψx = ∥ψx ∥2 . (2.19)
x∈[n] x∈[n]
The dual space, A∗ , of a Hilbert space, A, is defined as the set of all linear functionals
on A. A linear functional is a function f : A → F with the property that for all |ψ⟩, |ϕ⟩ ∈ A
and a, b ∈ F we have f (a|ψ⟩ + b|ϕ⟩) = af (|ψ⟩) + bf (|ϕ⟩). For a fixed vector |χ⟩ ∈ A, the
function fχ : A → F defined by fχ (|ψ⟩) := ⟨χ|ψ⟩ is a linear functional, and every linear
functional has this form. It is therefore convenient to denote the linear functionals with a
‘bra’ notation. That is, instead of fχ , we denote this functional simply by ⟨χ|, so that its
action on an element |ψ⟩ is given by the inner product ⟨χ|ψ⟩. Hence, A∗ consists of bra
vectors.
As an example, the dual space of C2 is spanned by the standard bra basis
Note that there is a one-to-one correspondence between A = Cn and its dual A∗ , via the
bijective mapping X X
|ψ⟩ = cx |x⟩ 7→ ⟨ψ| = c̄x ⟨x| , (2.22)
x∈[n] x∈[n]
where cx ∈ C and c̄x denotes the complex conjugate of cx . Moreover, we will denote
⟨ψ|ρ|ϕ⟩ := ⟨ψ|ρϕ⟩, where ρ is a linear transformation (see below).
Identifying an element |ψ⟩ ∈ A with an element (ψ, 0) ∈ C and an element |ϕ⟩ ∈ B with an
element (0, ϕ) ∈ C, we conclude that C = A ⊕ B. For example, R3 can be decomposed as
R2 ⊕ R or as R ⊕ R ⊕ R.
Let A and B be two finite dimensional Hilbert spaces with dimensions |A| and |B|,
respectively. We define a bilinear function ⊗ that takes two vectors |ψ⟩ ∈ A and |ϕ⟩ ∈ B
and returns an element of the form |ψ⟩ ⊗ |ϕ⟩. The bilinearity of ⊗ means that for all c ∈ F,
and all vectors |ψ1 ⟩, |ψ2 ⟩ ∈ A, and |ϕ1 ⟩, |ϕ2 ⟩ ∈ B,
1. (|ψ1 ⟩ + |ψ2 ⟩) ⊗ |ϕ⟩ = |ψ1 ⟩ ⊗ |ϕ⟩ + |ψ2 ⟩ ⊗ |ϕ⟩
From its definition above, it follows that A ⊗ B is a vector space with an orthonormal basis
{|x⟩A ⊗|y⟩B }. In particular, note that |A⊗B| = |AB| := |A||B| (we will therefore sometimes
use the notation AB to mean A ⊗ B). The inner product between two elements |ψ1 ⟩ ⊗ |ϕ1 ⟩
and |ψ2 ⟩ ⊗ |ϕ2 ⟩ is simply given by the product of the inner products; i.e. ⟨ψ2 |ψ1 ⟩⟨ϕ2 |ϕ1 ⟩.
More generally, given two states
X X X X
|ψ AB ⟩ = µxy |x⟩A ⊗ |y⟩B and |ϕAB ⟩ = νxy |x⟩A ⊗ |y⟩B , (2.26)
x∈[m] y∈[n] x∈[m] y∈[n]
⟨ψ AB |ϕAB ⟩ = Tr [M ∗ N ] (2.27)
where the matrices M := (µxy ) and N := (νxy ). Using these definitions, the set A ⊗ B forms
a Hilbert space. Notably, the inner product defined in Equation (2.27) is the same as the
one defined in Equation (2.16). This is because each element |ψ⟩ ∈ A ⊗ B can be represented
as a matrix M = (µxy ). It can be easily demonstrated that this mapping between bipartite
vectors and matrices is an isometric isomorphism.
Exercise 2.2.8. Show that Cm×n ∼
= Cm ⊗ Cn .
Hence, the vector |ψ A ⟩|ϕB ⟩ = x∈[m] y∈[n] ax by |xy⟩AB corresponds to the vector a⊗b given
P P
by
ab
1
..
a ⊗ b := . ∈ Cmn . (2.28)
am b
The above definition of a tensor product between vectors in Cm and Cn is the Kronecker
product which is also denoted with the symbol ⊗. The Kronecker product is defined on
arbitrary matrices as follows. Let M = (µxy ) ∈ Ck×ℓ and N ∈ Cp×q . The Kronecker product
M ⊗ N is a matrix in Ckp×ℓq defined by
µ N · · · µ1ℓ N
11
.. ..
M ⊗N = . ..
. . . (2.29)
µk1 N · · · µkℓ N
It is simple to check that the tensor product above is bilinear and associative, however, it
is not commutative. Also, note that for A = Cn×m and B = Cp×q the tensor product given
in (2.25) is equivalent to the Kronecker product. We will therefore use the terms tensor
product and Kronecker product interchangeably.
Exercise 2.2.9. Let M be an m × n matrix and N be an n × k matrix. Find all the values
of m, n, k for which M ⊗ N = M N .
Exercise 2.2.10. Show that the Kronecker product is not commutative, and that there al-
ways exist permutations matrices P and Q in appropriate dimensions such that M ⊗ N =
P (N ⊗ M ) Q.
Exercise 2.2.11. Prove the following properties. For any matrices K, L, M, N in appropri-
ate dimensions (in some cases square matrices):
1. (K ⊗ L)(M ⊗ N ) = KM ⊗ LN .
2. K ⊗ L is invertible if and only if K and L are invertible and in this case (K ⊗ L)−1 =
K −1 ⊗ L−1 .
6. Rank(K ⊗ L) = Rank(K)Rank(L).
Kronecker also defined a direct sum that is closely related to the definition above. Given
two matrices M ∈ Cm×m and N ∈ Cn×n their Kronecker sum is defined by
M ⊕ N := M ⊗ In + Im ⊗ N . (2.30)
The sum appears naturally in physics, typically when describing the Hamiltonian of a com-
posite system consisting of non-interacting subsystems. A celebrated result connecting the
Kronecker product and Kronecker sum is given in the following exponential relation:
eM ⊕N = eM ⊗ eN . (2.31)
P∞ Mn
Exercise 2.2.12. Prove the above equality. Hint: Use the formula eM = n=0 n! and the
commutativity of M ⊗ In and Im ⊗ N .
For improved clarity in our exposition, we have omitted the superscripts A and B from |x⟩A
and |y⟩B . This use of the Dirac notations has the advantage that the action of M on a vector
|ψ⟩ ∈ A becomes
X X X X
M |ψ⟩ = µyx |y⟩⟨x|ψ⟩ = µyx ⟨x|ψ⟩ |y⟩ (2.34)
y∈[n] x∈[m] y∈[n] x∈[m]
Note that the numbers µyx form a matrix which is known as the matrix representation
of M . Sometimes we will identify the matrix (µyx ) with the operator M and write M =
(µyx ). However, note that different choices of orthonormal bases {|x⟩A }x∈[m] and {|y⟩B }x∈[n] ,
correspond to different matrix representations (µyx ) of the same linear operator M (see
Exercise 2.3.3).
Exercise 2.3.1. Show that any linear operator M : A → B can be expressed as
X
M= λz |vzB ⟩⟨uA
z| , (2.35)
z∈[k]
Exercise 2.3.3. Let M be a linear operator as in (2.33), and denote by M̃ the matrix whose
components are µyx := ⟨y|M |x⟩ (i.e. M is an operator whereas M̃ is a matrix). Let {|ax ⟩}
and {|by ⟩} be two orthonormal bases of A and B, respectively. Show that there exists two
unitary matrices U and V (not necessarily of the same size) such that
X X
M= νyx |by ⟩⟨ax | (2.39)
x∈[m] y∈[n]
For any linear operator T : A → B its kernel, denoted Ker(T ), is the subspace of A
consisting of all vectors |ψ⟩ ∈ A such that T |ψ⟩ = 0. The image of T , denoted by Im(T ),
is the set of vectors {T |ψ⟩} over all vectors |ψ⟩ ∈ A. Finally, the support of T , denoted
supp(T ), is also a subspace of A consisting of all the vectors that are orthogonal to all the
elements in Ker(T ). In particular, for any non-zero vector |ψ⟩ ∈ supp(T ) we have T |ψ⟩ =
̸ 0.
Exercise 2.3.4. Let A and B be two Hilbert spaces, and let T : A → B be a linear trans-
formation.
where k := |supp(V )| ⩽ min{|A|, |B|}, {|az ⟩}z∈[k] is an orthonormal set of vectors in A, and
{|bz ⟩}[k] is an orthonormal set of vectors in B. In other words, a linear operator V : A → B
is a partial isometry if and only if there exists two set of orthonormal vectors, {|az ⟩}z∈[k] ⊂ A
and {|bz ⟩}z∈[k] ⊂ B such that (2.43) holds.
Exercise 2.3.5. Show that if Eq. (2.41) holds then V ∗ V = I A .
Exercise 2.3.6. Use (2.35) to show (2.43).
Exercise 2.3.7. A linear operator Π : A → A is called an orthogonal projection if and only
if Π2 = Π = Π∗ .
1. Show that Π : A → A is an orthogonal projection if and only if Π∗ Π = Π.
2. Show that V : A → B is a partial isometry if and only if V ∗ V is an orthogonal
projection in A, and V V ∗ is an orthogonal projection in B.
Exercise 2.3.8. Let A ⊆ B be a subspace of B, and let V : A → B be an isometry satisfying
V ∗ V = Π, where Π : B → A is the projection onto the subspace A. Show that there exists a
unitary matrix U : B → B such that
UΠ = V . (2.44)
where {|vx ⟩}x∈[m] is an orthonormal basis of A. The coefficients {λx }x∈[m] are the eigenvalues
of H. We denote by Tr[H] the trace of Hermitian operator H : A → A. That is,
X
Tr[H] := ⟨x|H|x⟩ . (2.46)
x∈[m]
Note that the definition above is independent on the choice of the orthonormal basis {|x⟩}
of A.
We say that a linear operator ρ : A → A is positive semidefinite, and write ρ ⩾ 0, if and
only if
⟨ψ|ρ|ψ⟩ ⩾ 0 ∀ |ψ⟩ ∈ A . (2.47)
If the above inequality is strict for all non-zero |ψ⟩ ∈ A then we say that ρ is positive definite
and write ρ > 0. We will also write ρ ⩾ σ to mean ρ − σ ⩾ 0, and will use the Greek
letters such as ρ and σ to denote linear operators that are positive semidefinite. The set of
all positive semidefinite operators acting on Hilbert space A will be denoted by Pos(A).
Every positive linear operator ρ : A → A is necessarily Hermitian. To see why, observe
that the positivity property above implies
Now, observe that the operator N := ρ − ρ∗ satisfies N ∗ N = N N ∗ . Such operators are called
normal operators and are known to be diagonalizable. Therefore, taking |ψ⟩ above to be
an eigenvector of N we conclude that all the eigenvalues of N are zero. Hence, N = 0 or
equivalently ρ = ρ∗ .
Exercise 2.3.9. Let ρ : A → A be a linear operator. Show that the following are equivalent:
1. ρ ⩾ 0
Exercise 2.3.11. Show that for any two vectors |ψ⟩, |ϕ⟩ ∈ A
Note that the identity operator I A : A → A is a positive operator (i.e. an operator with
all eigenvalues strictly greater than zero) given by
X
IA = |x⟩⟨x| . (2.51)
x∈[m]
where {|ϕx ⟩}x∈[n] is an orthonormal basis, and the eigenvalues {λx }x∈[n] are all real. There-
fore, it is possible to decompose H as
H = H+ − H− (2.54)
where X X
H+ := λx |ϕx ⟩⟨ϕx | ⩾ 0 and H− := |λx ||ϕx ⟩⟨ϕx | ⩾ 0 . (2.55)
x: λx ⩾0 x: λx <0
P
By definition H+ , H− ⩾ 0 and H+ H− = H− H+ = 0. Further, denote by Π− := x: λx <0 |ϕx ⟩⟨ϕx |
the projection to the negative eigenspace of H, and by Π+ = I − Π− the projection to the
non-negative eigenspace of H. Then, H± = HΠ± = Π± H.
|H| ± H
H± = , (2.56)
2
where H± are the positive and negative parts of H as defined in (2.55).
Note that the above pure state decomposition of ρ is not necessarily the spectral decom-
position since the pure states |ψx ⟩ are not necessarily orthogonal. In fact, any quantum
state corresponds to infinitely many ensembles of quantum states. For example, consider a
quantum state ρ : C2 → C2 defined by:
1 3
ρ = |0⟩⟨0| + |1⟩⟨1| . (2.59)
4 4
Clearly, this is the spectral decomposition of ρ. Now, it is simple to check that ρ can also
be expressed as
1 1
ρ = |u⟩⟨u| + |v⟩⟨v| , (2.60)
2 2
where r r r r
1 3 1 3
|u⟩ := |0⟩ + |1⟩ , |v⟩ := |0⟩ − |1⟩ . (2.61)
4 4 4 4
Note that |u⟩ and |v⟩ are not orthogonal, and both ensembles in (2.59) and (2.60) corresponds
to the same quantum state ρ.
Exercise 2.3.15. Let {|ψx ⟩, px }x∈[m] and {|ϕy ⟩, qy }y∈[n] be two ensembles of quantum states
in A with m ⩾ n. Show that they correspond to the same density matrix
X X
ρ := px |ψx ⟩⟨ψx | = qy |ϕy ⟩⟨ϕy | (2.62)
x∈[m] y∈[n]
if and only if there exists an m × n isometry matrix V = (vxy ) (i.e. V ∗ V = In ) such that
√ X √
px |ψx ⟩ = vxy qy |ϕy ⟩ ∀ x ∈ [m] . (2.63)
y∈[n]
The case p = 1 is often called the trace norm and we will discuss it in details in Sec. 5.4.1.
The case p = ∞ is understood in terms of the limit p → ∞. It is often called the operator
norm and is given by
∥M ∥∞ = λmax (|M |) , (2.69)
where λmax (|M |) is the largest eigenvalue of |M |, or equivalently, the largest singular value
of M . The Schatten Norms appears quite often in quantum Shannon theory due to their
relation to the Rényi entropies that we will study later on. We leave it as an exercise for the
reader to prove some of their key properties.
Exercise 2.3.20. Let A and B be two Hilbert spaces, M, N ∈ L(A, B), and p, q ∈ [1, ∞]
such that p1 + 1q = 1. Show that the p-Schatten norm is indeed a norm satisfying the following
properties:
1. Invariance. For any two Hilbert spaces A′ , B ′ with |A′ | ⩾ |A| and |B ′ | ⩾ |B|, and
any isometries V ∈ L(B, B ′ ) and U ∈ L(A, A′ )
∥V M U ∗ ∥p = ∥M ∥p . (2.70)
2. Hölder Inequality.
∥M N ∥1 ⩽ ∥M ∥p ∥N ∥q . (2.71)
3. Sub-Multiplicativity.
∥M N ∥p ⩽ ∥M ∥p ∥N ∥p . (2.72)
4. Monotonicity. If p ⩽ q
∥M ∥1 ⩾ ∥M ∥p ⩾ ∥M ∥q ⩾ ∥M ∥∞ . (2.73)
5. Duality. n o
∥M ∥p = sup |Tr [M ∗ L]| : ∥L∥q = 1 , L ∈ L(A, B) (2.74)
Exercise 2.3.21 (Young’s Inequality). Let A and B be two Hilbert spaces, M, N ∈ Pos(A),
and p, q ∈ [1, ∞) such that p1 + 1q = 1. Use the Hölder inequality of the Schatten norm to
show that
1 1
Tr[M N ] ⩽ Tr[M p ] + Tr[N q ] . (2.75)
p q
with equality if and only if M p = N q . Hint: Take the logarithm on both sides of Hölder
inequality and use the concavity property of the logarithm.
Exercise 2.3.22. Show that for any M ∈ Herm(A), the operator norm and the trace norm
can be expressed as
∥M ∥1 = max Tr[ηM ] and ∥M ∥∞ = max Tr[ηM ] . (2.76)
η∈Herm(A) η∈Herm(A)
∥η∥∞ ⩽1 ∥η∥1 ⩽1
∥M ∥(k) := s1 + s2 + · · · + sk , (2.77)
Remark. When restricting M to be diagonal real matrix we get the following definition of
the Ky Fan norm on Rn : For any r ∈ Rn and k ∈ [n] the kth-Ky Fan norm of r is defined as
X
∥p∥(k) := |rx↓ | , (2.78)
x∈[k]
where {rx↓ }x∈[n] are the components of r arranged such that |r1↓ | ⩾ |r2↓ | ⩾ · · · ⩾ |rn↓ |.
The Ky Fan norms plays an important role in the resource theory of entanglement. We
leave it as an exercise to prove that the Ky Fan norms are indeed norms.
Exercise 2.3.23. Show that the Ky Fan norms are indeed norms that have the following
invariance property. Using the same notations as in the definition above, show that for any
two Hilbert spaces A′ , B ′ with |A′ | ⩾ |A| and |B ′ | ⩾ |B|, and any isometries V ∈ L(B, B ′ )
and U ∈ L(A, A′ )
∥V M U ∗ ∥(k) = ∥M ∥(k) . (2.79)
where the supremum is over all orthogonal projections Π with rank no greater than k.
Exercise 2.3.25 (The Ky Fan norms on Rn ). Consider the variant of the Ky Fan norm as
defined in (13.110).
3. Show that for any two probability vectors p, q ∈ Prob(n) and any k ∈ [n] we have
1
∥p − q∥(k) ⩽ ∥p − q∥1 (2.82)
2
and conclude that
1
∥p∥(k) − ∥q∥(k) ⩽ ∥p − q∥1 . (2.83)
2
Hint: For the first inequality use the fact that 21 ∥p − q∥1 = x∈[n] (px − qx )+ , and for
P
the second inequality use the properties of a norm. For any r ∈ R the symbol (r)+ := r
if r ⩾ 0 and otherwise (r)+ := 0.
where µxy ∈ C. Let Mψ be a linear map from B̃ to A defined below by its action on the
basis elements {|y⟩B̃ } of B̃: X
Mψ |y⟩B̃ := µxy |x⟩A . (2.85)
x∈[m]
Denoting by X
|ΩB̃B ⟩ := |yy⟩B̃B (2.87)
y∈[n]
we conclude that
|ψ AB ⟩ = Mψ ⊗ I B |ΩB̃B ⟩ . (2.88)
AB
Therefore, for any bipartite vector |ψ⟩ there is a corresponding linear map Mψ : B̃ → A
and vice versa. In other words, the mapping
|ψ AB ⟩ 7→ Mψ (2.89)
is an (isometrically) isomorphism map from the space AB and the space Cm×n . The vector
|ΩB̃B ⟩ has many interesting properties, and later on we will see that, physically, its normalized
version corresponds to a composite system of two maximally entangled subsystems.
Exercise 2.3.26. Prove the following properties of |ΩAÃ ⟩:
1. For any matrix N ∈ L(Ã)
Mψ = LMφ RT . (2.93)
Exercise 2.3.27. Show that the two reduced density matrices above, ρA B
ψ and ρψ , have the
same non-zero eigenvalues.
The Partial Trace
Exercise 2.3.28. Let TrB : L(AB) → L(A) be a linear map defined by its action on
the basis elements of L(AB) as:
It is well known that the trace remains invariant under cyclic permutation of product of
matrices. The following exercise states that this remain true also for the partial trace.
Exercise 2.3.29. Show that for any ρ ∈ L(AB) and any two matrices η, ζ ∈ L(B)
TrB (I A ⊗ η B )ρAB (I A ⊗ ζ B ) = TrB (I A ⊗ ζ B η B )ρAB .
(2.97)
For any pure bipartite state as in (2.88) there is a unique reduced density matrix ρA
ψ . On
the other hand, for any density matrix ρ ∈ D(A) there are many bipartite pure states |ψ AB ⟩
with the same reduced density matrix ρ (see the exercise below).
Exercise 2.3.30. Let ρ ∈ L(AB). Show that if for all η ∈ L(B) the matrix
TrB I A ⊗ η B ρAB
(2.98)
1 A
is proportional to the identity matrix, then ρAB = uA ⊗ ρB , where uA := |A| I is the uniform
density matrix also known as the maximally mixed state. Hint: Use Part 3 of Exercise 2.3.18.
Exercise 2.3.31. Let A, B, A′ , B ′ be four Hilbert spaces and let Λ ∈ L(AB, A′ B ′ ). Show
that if h ′ i
A
TrB I ⊗ T Λ = 0 ∀ T ∈ L(B ′ , B) (2.99)
′
then Λ = 0. Observe that the operator I A ⊗ T Λ belongs to L(AB, A′ B), so the partial
trace above over B is well defined. Hint: Let {Nx } be anPorthonormal basis (w.r.t. the
Hilbert-Schmidt inner product) of L(B, B ′ ) and write Λ = x Mx ⊗ Nx , where {Mx } are
some matrices in L(A, A′ ). Then show that by taking T above to be Ny you get My = 0.
3. Use Part 2 to provide alternative (simpler!) proof of the claim in Exercise 2.3.15.
Exercise 2.3.33. Operator Schmidt Decomposition: Let A and B be two Hilbert spaces
of dimensions m := |A| and n := |B| and denote by k := min{m2 , n2 }. Show that for every
ρ ∈ Herm(AB) there exists k non-negative real numbers {λz }z∈[k] , and two orthonormal sets
of Hermitian matrices (w.r.t. the Hilbert-Schmidt inner product) {ηz }z∈[k] ⊂ Herm(A) and
{ζz }z∈[k] ⊂ Herm(B) such that
X
ρAB = λz ηzA ⊗ ζzB . (2.101)
z∈[k]
(n ) (n )
the previous arguments, we conclude that the state Tθ2 2 Tθ1 1 |ψ⟩ is the quantum state that
(n ) (n )
corresponds to thespin in the direction Rθ2 2 Rθ1 1 m. Combining everything we conclude that
the mapping
(n) (n) (n) (n) (n)
Rθ 7→ Tθ where Tθ (ρ) := Tθ ρ(Tθ )∗ ∀ρ ∈ L(A) , (2.102)
is a group representation of SO(3) on the Hilbert space L(A → A) (i.e. the space of linear
operators from L(A) to L(A)).
It will be more convinient to work with a unitary representation on the space of L(A) itself
rather than the Hilbert space L(A → A). For this purpose we need to eliminate the freedom
(n) (n)
in the choice of the phase so that Rθ is mapped to a unique Tθ . We therefore assume
(n) (n)
without loss of generality that det Tθ = 1 so that Tθ ∈ SU(2). This almost eliminates
(n) (n)
completely the ambiguity in the phase, although note that if Tθ ∈ SU(2) then also −Tθ ∈
(n) (n)
SU(2). This would mean that both ±Tθ correspond to the same Rθ . To summarize, up to
(n)
a sign factor, the collection of matrices {Tθ }n,θ form a group representation of SO(3). Such
a 2 : 1 and onto homomorphism h : SU(2) → SO(3) with the property that h(T ) = h(−T )
for any T ∈ SU(2) was found by Cornwell in 1984 (see the Exercise C.1.3). We now discuss
(n)
the explicit form of Tθ .
In Appendix C we show that the most general unitary matrix in SU(2) has the form
−i θ2 (n·σ)
e (see (C.15)), where the factor 1/2 implies that under a 2π addition to θ we get
−i θ+2π θ (n)
e 2 (n·σ)
= −e−i 2 (n·σ) . This property will be consistent with the identification Tθ =
1 (n)
e−i 2 θ(n·σ) which we motivate below (recall that Tθ ∈ SU(2)) since a rotation by θ or by
θ + 2π along any axis n should have the same effect on any qubit state; that is,
θ+2π θ+2π θ θ
e−i 2
(n·σ)
|ψ⟩⟨ψ|ei 2
(n·σ)
= e−i 2 (n·σ) |ψ⟩⟨ψ|ei 2 (n·σ) . (2.103)
(n)
On the other hand, if we didn’t include the factor 1/2, then the identification Tθ = e−iθ(n·σ)
would imply an undesired property that a rotation by θ or by θ + π (along any fixed axis n)
would have the same effect on a qubit |ψ⟩⟨ψ|.
(n) 1
To justify the identification Tθ = e−i 2 θ(n·σ) , recall that any rotation around the z-axis
should not change the state |0⟩⟨0|, as it represents spin in the z-direction. Taking n = z we
get
θ θ
e−i 2 (z·σ) |0⟩ = cos (θ/2) I − i sin (θ/2) σ3 |0⟩ = e−i 2 |0⟩ ,
(2.104)
θ
where we used (C.14). Recall that the vector ei 2 |0⟩ corresponds to the same quantum state
|0⟩⟨0|. Therefore, although there are many possible representations for SO(3) in C2 (such
(n) 1
as ei0.7θ(n·σ) for example), the representation Tθ = e−i 2 θ(n·σ) is the only one that have the
following essential properties:
(n) (n)
1. The mapping Tθ 7→ Rθ
is an onto homomorphism between SU (2) and SO(3).
∗ ∗
2 (n) (n) (n) (n)
2. For any |ψ⟩ ∈ C we have Tθ+2π |ψ⟩⟨ψ| Tθ+2π = Tθ |ψ⟩⟨ψ| Tθ .
With this representation at hand, we are ready to identify spins in different directions.
We start with a few examples. The spin in the negative z-direction can be obtained by
rotating |0⟩ by 180◦ along the x (or y) axis. It is therefore given by
π π
(x)
Tπ |0⟩ = cos I − i sin x · σ |0⟩ = −i|1⟩ (2.105)
2 2
Therefore, the quantum state |1⟩⟨1| corresponds to the negative z-direction. Recall from
the previous section that using the SG experiment one can determine with certainty if an
electron was prepared in the positive z-direction or negative z-direction. This ability to
distinguish between the two possible spins of the electron is reflected mathematically by the
orthogonality of the vectors |0⟩ and |1⟩. This is a general property of quantum mechanics
that any two distinguishable states of a physical system are described mathematically by
orthogonal vectors. Other examples include:
In general, rotations along the n-axis do not change a spin that points in the positive or
negative n-direction. We can use this physical property to compute the qubit representing a
spin in the n-direction. Specifically, a quantum state |ψ⟩ represents an electron with spin in
(n)
the positive or negative n-direction if and only if Tθ |ψ⟩ = eiα |ψ⟩ for some phase eiα . Now,
(n) (n)
since Tθ = cos(θ/2)I − i sin(θ/2)n · σ we get that |ψ⟩ is an eigenvector of the matrix Tθ
if and only if it is an eigenvector of the spin matrix Sn := 21 n · σ.
Exercise 2.4.1. Let Sn be the spin matrix in direction n = (sin(α) cos(β), sin(α) sin(β), cos(α))T ,
with α and β being its spherical coordinates.
From the exercise above it follows that any qubit is characterized as in (2.110) and
corresponds to spin in the positive direction of n = (sin(α) cos(β), sin(α) sin(β), cos(α))T .
This correspondence between the point on the sphere and a qubit is known in the community
as the Bloch representation of a qubit. In Fig. 2.5 we show some of the popular qubit states
and their location on the Bloch sphere.
Note that although we focused here on the spin of an electron, the qubit corresponds
to any two level quantum system. For example, one can implement a qubit with a photon,
using, say, the |0⟩ to correspond to positive (or left) circular polarization, and the |1⟩ to
corresponds to the negative (or right) circular polarization. Any linear combination of |0⟩
and |1⟩ will then correspond to different types of polarizations. Other examples are atoms,
molecules, and nucleuses, with two energy levels (excited state vs ground state). All these
examples demonstrate that the qubit can be implemented in many different ways and in this
sense, we can claim that quantum information is fungible!
the “spin matrix” Sn whose eigenvectors are the basis elements, and its eigenvalues give the
spins (i.e. ±1/2) associated with the two basis elements.
In a similar way, any orthonormal basis of A ∼ = Cd corresponds to d possible outcomes
that can be, at least in principle, observed in some experiment. Moreover, the second
postulate of quantum mechanics states that any observable (a dynamic variable that can
be measured, like position, momentum, spin, energy, etc) is represented with an Hermitian
operator whose eigenvalues correspond to the values of the observable. Recall that for any
Hermitian operator, H, there exists an orthonormal basis of A consisting only of eigenvectors
of H. This basis corresponds to the possible outcomes in the measurement of the observable
H.
For example, in the qubit case, the spin matrix Sn is an observable corresponding to the
measurement of spin P with the SG experiment. In physical systems of d energy levels, the
Hamiltonian H = x∈[d] Ex |x⟩⟨x|, is an observable corresponding to the measurement of
energy. This particular observable states that the values for the energy of the system (i.e.
Ex ) are discrete, and that the eigenvectors {|x⟩}x∈[d] correspond to these energy levels.
corresponds to three spin particles (e.g. electrons) with spin A pointing in the positive z-
direction, spin B pointing in the negative x-direction, and spin C pointing in the positive
y-direction. Of course, not all states have the same tensor product form as the state above.
For example, the Greenberger-Horne-Zeilinger (GHZ) state of three qubits
1
|GHZ⟩ := √ |0⟩A |0⟩B |0⟩C + |1⟩A |1⟩B |1⟩C
(2.112)
2
cannot be written as a tensor product of three vectors. States like this will be called entan-
gled.
Exercise 2.4.2. Show that the GHZ state above cannot be written as a tensor product of
three vectors; i.e.
|GHZ⟩ ≠ |ψ A ⟩|ϕB ⟩|χC ⟩ (2.113)
for any three qubit states |ψ A ⟩, |ϕB ⟩, and |χC ⟩.
Exercise 2.4.3. Show √ that for any unit vector n ∈ R3 the singlet state
|ΨAB
− ⟩ := (|01⟩ − |10⟩) / 2 can be expressed as
1
|ΨAB
− ⟩ = √ (| ↑n ⟩| ↓n ⟩ − | ↓n ⟩| ↑n ⟩) (2.114)
2
where | ↑n ⟩ and | ↓n ⟩ are the eigenvalues of the spin matrix Sn . In other words, for any 2 × 2
unitary matrix U we have U ⊗ U |Ψ− ⟩⟨Ψ− |U ∗ ⊗ U ∗ = |Ψ− ⟩⟨Ψ− |.
3. Show that [J 2 , Jz ] = 0.
4. Show that each of the following four 2-qubit states are eigenvectors of both J 2 and Jz :
1
|00⟩ , |11⟩ , and |Ψ± ⟩ := √ (|01⟩ ± |10⟩) (2.115)
2
1. Show that the commutator [Sn , Sm ] = iSr , where r ∈ R3 is a unit vector. What is the
direction of r?
2. Calculate
1
⟨Ψ− |Sn ⊗ Sm |Ψ− ⟩ where |Ψ− ⟩ = √ (|0⟩ ⊗ |1⟩ − |1⟩ ⊗ |0⟩) . (2.116)
2
Show that
1
B 2 = I − [Sn , Sn′ ] ⊗ [Sm , Sm′ ] , (2.118)
4
and use it to prove that
1
|⟨ψ|B|ψ⟩| ⩽ √ , (2.119)
2
for any state in |ψ⟩ ∈ C2 ⊗ C2 .
“Once upon a time there was a centipede that was amazingly good at dancing
with all hundred legs. All the creatures of the forest gathered to watch every
time the centipede danced, and they were all duly impressed by the exquisite
dance. But there was one creature that didn’t like watching the centipede dance
- that was a tortoise.
How can I get the centipede to stop dancing? thought the tortoise. He couldn’t
just say he didn’t like the dance. Neither could he say he danced better himself,
that would obviously be untrue. So he devised a fiendish plan.
He sat down and wrote a letter to the centipede. ‘O incomparable centipede,’
he wrote, ‘I am a devoted admirer of your exquisite dancing. I must know how
you go about it when you dance. Is it that you lift your left leg number 28 and
then your right leg number 39? Or do you begin by lifting your right leg number
17 before you lift your left leg number 44? I await your answer in breathless
anticipation. Yours truly, Tortoise.’ ”
1. If the SG experiment yields an outcome in the upward direction along n, the state
evolves to | ↑n ⟩.
2. If the SG experiment yields an outcome in the downward direction along n, the state
evolves to | ↓n ⟩.
It’s crucial to emphasize that this transformation is independent of the specific form of |ψ⟩.
What does vary with |ψ⟩ is the probability associated with each possible outcome. For
instance, consider the case where n = z and |ψ⟩ is initially prepared as | ↑x ⟩. In this case,
both the upward and downward outcomes are equally probable, each with a 50% chance.
The general rule governing the probability of obtaining a particular outcome in the mea-
surement is known as Born’s rule. According to Born’s rule, the probability, denoted as
Pr(ψ, n), of observing the outcome ↑n (i.e., the electron’s spin aligned with the positive
direction of n) in an SG experiment along the n direction, when the electron is initially
prepared in the state |ψ⟩, is given by:
This fundamental principle provides a mathematical framework for determining the likeli-
hood of various outcomes in quantum measurements, and it plays a central role in quan-
tum mechanics. For example, suppose an electron in the state |ψ⟩ = a|0⟩ + b|1⟩ is sent
through a SG-experiment in the z-direction. Then, using the Born’s rule (2.120) we get that
|⟨ψ| ↑z ⟩|2 = |a|2 is the probability to obtain spin up (in the z-direction), and |⟨ψ| ↓n ⟩|2 = |b|2
is the probability to obtain spin down.
Similarly, we can extend the Born’s rule for any qudit |ψ⟩ ∈ A ∼ = Cm , and any quantum
measurement that corresponds to an orthonormal basis {|ϕx ⟩}x∈[m] of A. The probability to
obtain an outcome x is given by
Note that the above assignment of probability to each ϕx is indeed a probability; that is,
X X X
Pr(ψ, ϕx ) = |⟨ψ|ϕx ⟩|2 = ⟨ψ|ϕx ⟩⟨ϕx |ψ⟩
x∈[m] x∈[m] x∈[m]
X (2.122)
= ⟨ψ| |ϕx ⟩⟨ϕx | |ψ⟩ = ⟨ψ|ψ⟩ = 1 .
x∈[m]
We call every such a measurement that corresponds to an orthonormal basis a basis mea-
surement.
To establish a connection between a basis measurement and a physical observable, let’s
consider the energy of a physical system. Energy is a fundamental observable in quantum
mechanics and therefore can be measured. As previously discussed, any observable in quan-
tum mechanics is represented by an Hermitian operator acting on the Hilbert space A. We
denote the energy operator, often referred to as the Hamiltonian, as:
X
H= ax |ϕx ⟩⟨ϕx | , (2.123)
x∈[m]
where {|ϕx ⟩}x∈[m] is an orthonormal basis of A. Therefore, in order to measure the energy,
one has to perform a basis measurement corresponding to the orthonormal basis {|ϕx ⟩}x∈[m] ,
since the energy ax is determined by the value of x. However, this system of m-energy levels,
can be degenerate as it happens quite often in many physical systems. In this case, not all of
the energy values {ax }x∈[m] are distinct. Suppose for example that a1 = a2 < a3 < · · · < am ;
i.e. the state with minimum energy (the ground state) is degenerate. In this case, both
outcomes 1 and 2 correspond to the same ground state, so that the probability that the
energy is equal a1 = a2 := b1 is given by
Pr(ψ, ϕ1 ) + Pr(ψ, ϕ2 ) = ⟨ψ| |ϕ1 ⟩⟨ϕ1 | + |ϕ2 ⟩⟨ϕ2 | |ψ⟩ := ⟨ψ|Π|ψ⟩ , (2.124)
where Π := |ϕ1 ⟩⟨ϕ1 | + |ϕ2 ⟩⟨ϕ2 |. More generally, if we have degeneracy in other energy levels,
we can always express the observable H as
X
H= by Πy , (2.125)
y∈[r]
where b1 < b2 < · · · < br , and each Πy is a sum of rank one projections from {|ϕx ⟩⟨ϕx |} that
correspond to the same energy level by . With this at hand, the probability to measure an
energy of value by is given by
Pr (ψ, Πy ) = ⟨ψ|Πy |ψ⟩ . (2.126)
Therefore, the basis-measurement that we considered so far, can be extended to projective
von-Neumann measurement which is defined as follows.
Πx Πy = δxy Πx . (2.127)
Historically, the Born’s rule above (see (2.126)) was determined essentially from consis-
tency with experiments. That is, one can perform many experiments, like the SG-experiment
for example, collect the data, and find a rule that is consistent with the data. Later on, how-
ever, Gleason came up with a theorem showing how to calculate probabilities in quantum
mechanics, and loosely speaking derived the Born’s rule above from a few fundamental prin-
ciples involving measures of a Hilbert space. Gleason’s theorem is applicable for general
(separable) Hilbert spaces in any dimension, but for us, only the finite dimensional case,
i.e. the qudit, will be relevant. We postpone the discussion on Gleason’s theorem for the
next chapter, after we discuss other types of quantum measurements, in order to prove a
slightly more generalized version of Gleason’s theorem, that will be applicable to all types
of measurements (not only to projective von-Neumann measurements).
Exercise 2.5.1. Let Π be a projection on a Hilbert space A. Show that {Π, I − Π} is a
two-outcome von-Neumann projective measurement.
Exercise 2.5.2. Let {Πx }x∈[r] be a projective von-Neumann measurement on a finite dimen-
sional Hilbert space A. Show that the collection of all the linearly independent normalized
eigenvectors, of all the projections {Πx }x∈[r] , form an orthonormal basis of A.
Exercise 2.5.4. Let A be a d-dimensional Hilbert space, and let |ψ⟩, |ϕ⟩ ∈ A be two quantum
states.
1. Show that if |ψ⟩ and |ϕ⟩ are orthogonal, then there exists a projective measurement
that distinguishes them. That is, there exists a two-outcome projective measurement
{Π0 , Π1 } such that
Pr(ψ, Π0 ) = 1 and Pr(ϕ, Π1 ) = 1. (2.130)
2. Show that if |ψ⟩ and |ϕ⟩ are not orthogonal, then there is no projective measurement
that distinguishes them.
For any such a hidden variable model, there is an inherent assumption that the values of
the hidden variables are fixed, predetermined, and corresponds to an element of reality. It
is just the observer’s lack of knowledge about this element of reality that leads to statistical
behaviours. Historically, hidden variable theories were promoted by some physicists who
argued that the formulation of quantum mechanics (as we will discuss in the rest of this
book), does not provide a complete description for the system. Along with Albert Einstein,
they argued that quantum mechanics is ultimately incomplete, and that a complete theory
would avoid any indeterminism. Indeed, hidden variable models as the one described above
for the spin of one electron cannot be ruled out, although, as we discuss now, local hidden
variable models can!
Consider two friends, Alice and Bob, that are located far from each other, and each one
of them posses an electron in their lab. How can we describe the spins of the two electrons?
Following the same line of thoughts as above, we denote by An the random variable associated
with the spin of Alice’s electron in the n-direction, and by Bm the random variable associated
with the spin of Bob’s electron in the m-direction. We denote by p(ab|nm), with a, b ∈ {0, 1},
the joint probability that the two SG experiments in Alice’s lab and Bob’s lab will yield
respectively An = a and Bm = b.
Since it is possible that the spins are correlated in some way, we are not assuming that
p(ab|nm) has the form pA (a|n)pB (b|m), where pA (a|n) is the probability that Alice will get
the value a in a SG experiment in the n-direction (and pB (b|m) is defined similarly). Instead,
since in general An and Bm can be correlated, there exists a parameter λ (λ can describe a
collection of variables) and a probability distribution qλ over it, such that
Z
p(ab|nm) = dλ qλ pA B
λ (a|n) pλ (b|m) , (2.131)
where pA B
λ (a|n) and pλ (b|m) are probability distributions that depend on the correlating
parameter λ. The parameter λ can be either continuous or discrete, and for the latter the
integral above is replaced with a sum. Note that the distribution above is more general than
the form pA (a|n)pB (b|m) as it allows for correlations between Alice’s and Bob’s spins. Yet,
it is a local probability distribution depending only on the local variables An and Bm . We
now discuss a crucial consequence of this local hidden variable model for the spin of two
electrons.
Exercise 2.6.1. Let n, n′ , m, and m′ , be four unit vectors in R3 , and use the tilde symbol
over a random variable X to mean X̃ = 2X − 1 (i.e. X̃ takes values ±1 while X takes values
0, 1). Show that
Ãn B̃m + Ãn′ B̃m + Ãn B̃m′ − Ãn′ B̃m′ ⩽ 2 . (2.132)
The inequality in the exercise above is called the CHSH inequality after Clauser, Horne,
Shimony and Holt, and it generalizes a similar inequality that was proved in a seminal paper
by John Bell from 1964. As we will see in the exercise below, not all probability distributions
p(ab|nm) satisfy this inequality. One obvious property of the local distribution (2.131) is
that if we sum over a the dependance on n disappears, and similarly if we sum over b
the dependance in m disappear. This property is called “no-signalling” since by choosing
different directions of n, Alice cannot signal Bob, since the marginal distribution on his side
remains intact. The no-signalling property can be stated as follows:
X X
p(ab|nm) = p(ab|n′ m) := pB (b|m) ∀ b, n, n′ , m
a a
X X
p(ab|nm) = p(ab|nm′ ) := pA (a|n) ∀ a, n, m, m′ . (2.135)
b b
The following exercise shows that there exists a probability distribution that on one hand,
is non-signalling, and on the other hand, is violating the CHSH inequality (2.134).
Exercise 2.6.3. Denote the two directions in Alice’s side by n0 := n and n1 := n′ , and the
two direction vectors in Bob’s side by m0 := m and m1 := m′ . Denote also by p(ab|xy) =
p(ab|nx my ) with x, y ∈ {0, 1}. Consider the probability distribution given by
(
1
if a ⊕ b = xy
p(ab|xy) = 2 , (2.136)
0 otherwise
2. Show that p(ab|xy) is non-local by showing that it violates the CHSH inequality (2.134).
3. Show that no other probability distribution (even a signalling distribution) can provide
a higher violation than the one achieved by the distribution (2.136).
To summarize, any local hidden variable model have two main assumptions. The first
one is called the realism assumption, corresponding to our assumption that the spins of the
electrons in all directions have definite values which exist independently of observation. The
second assumption is called the locality assumption corresponding to our implicit assumption
that if, say Alice, is performing a measurement on her electron, it does not influence the
result of Bob’s measurement (on the spin of the electron in his lab). The following violation
of the CHSH inequality demonstrates that local realism does not hold!
(with b = 0, 1) are the eigenvectors of the spin matrix Sm . The corresponding eigenvalues
are given by, 12 − a, for |ϕA 1 B
a ⟩ and by, 2 − b, for |φb ⟩. Note the relation with the previous
A A
notations; for example, |ϕ0 ⟩ = | ↑n ⟩ and |ϕ1 ⟩ = | ↓n ⟩.
Recall that Ãn and B̃m in (2.134) are random variables taking the values ±1, whereas
the eigenvalues of Sn and Sm are ± 21 . Keeping this in mind, from the exercise above we
conclude that the probability distribution pψ (ab|nm) as given in (2.137) violates the CHSH
inequality (2.134) if
1
|⟨ψ|B|ψ⟩| > , (2.139)
2
where B is the Bell/CHSH operator from Exercise 2.4.5. √ From Exercise 2.4.5 it follows that
2 2
for any state |ψ⟩ ∈ C ⊗ C , we have |⟨ψ|B|ψ⟩| ⩽ 1/ 2. From the next exercise it follows
that there exists directions n, m, n′ , m′ such that this bound is saturated, thereby violating
the CHSH inequality since √12 > 21 . This bound is called the Tsirelson bound.
1
|⟨Ψ− |B|Ψ− ⟩| = √ . (2.140)
2
Note that the violation of the CHSH inequality implies that the quantum probability
distribution pψ (ab|nm) is in general not local; i.e. not of the form (2.131). Such non-local
probability distributions have other non-intuitive consequences as we discuss below.
2 2 1
psame (n1 , n2 ) := ⟨ΨAB A B AB A B
− | ↑n1 ↓n2 ⟩ + ⟨Ψ− | ↓n1 ↑n2 ⟩ = (1 + cos(θ)) (2.141)
2
where θ is the angle between the unit vectors n1 and n2 .
Consider now three unit vectors n1 , n2 , n3 ∈ R3 with an angle of 120◦ between any two; see
Fig. 2.7. From the exercise above we get that psame (n1 , n2 ) = psame (n1 , n3 ) = psame (n2 , n3 ) =
1
4
since cos(120) = −1/2. Therefore,
3
psame (n1 , n2 ) + psame (n1 , n3 ) + psame (n2 , n3 ) = <1. (2.142)
4
On the other hand, suppose it was possible to describe Alice’s electron n1 -spin, n2 -spin,
and n3 -spin, with three random variables X1 , X2 , and X3 (with some underlying probability
distribution over the three variables). Each of the three random variables can take the values
± 12 determining if the spin is pointing in the positive or negative direction. Then, irrespective
of the underlying probability distribution, the probabilities Pr(Xj = Xk ) (with j ̸= k and
j, k ∈ {1, 2, 3}) must satisfy
Pr(X1 = X2 ) + Pr(X1 = X3 ) + Pr(X2 = X3 ) ⩾ 1 . (2.143)
This problem is analogous to the problem of flipping 3 coins and asking what is the probability
that at least two of them are the same (either two heads or two tails). Clearly, flipping three
coins will always yield two that show the same symbol (either head or tail). Eq. (2.142)
shows that this is not the case for quantum coins (i.e. spins of an electron).
Figure 2.7: Three directions with an angle of 120◦ between any two.
So far we have seen a contradiction between quantum mechanics and local realism through
the violation of the CHSH inequality (2.134), and the inequality in (2.142). The next two
paradoxes show that this inconsistency between quantum mechanics and local realism can
be expressed without inequalities. In the literature they are referred to as “Bell non-locality
without inequalities”.
Exercise 2.6.7. Show that if p(ab|xy) is local and satisfies (2.145) then p(00|11) = 0 .
We now show that the logical implication of the exercise above does not hold for quantum
mechanics. Unlike the use of the singlet in the previous examples, here we consider a bipartite
state |ψθAB ⟩ that has the form:
tan(θ) 1
|ψθAB ⟩ = p 2
(|01⟩ + |10⟩) − p |11⟩ , (2.146)
1 + 2 tan (θ) 1 + 2 tan2 (θ)
with θ ∈ [0, 2π] being some angle. Note that the state above is normalized for all θ.
Suppose that Alice and Bob perform the same measurements, and in particular n0 =
m0 = z corresponds to a measurement in the computational basis, while n1 = m1 cor-
responds to a measurement in the orthonormal basis |u0 ⟩ := cos(θ)|0⟩ + sin(θ)|1⟩ and
|u1 ⟩ := sin(θ)|0⟩ − cos(θ)|1⟩.
Exercise 2.6.8. Verify that the above choices satisfy:
pψ (00|00) = |⟨ψ AB |0⟩|0⟩|2 = 0
pψ (01|10) = |⟨ψ AB |u0 ⟩|1⟩|2 = 0 (2.147)
pψ (10|01) = |⟨ψ AB |1⟩|u0 ⟩|2 = 0
while
sin4 (θ)
pHardy (θ) := pψ (00|11) = |⟨ψ AB |u0 ⟩|u0 ⟩|2 = . (2.148)
1 + 2 tan2 (θ)
We therefore see that for this example pHardy (θ) > 0 for all 0 < θ < π2 . Interestingly,
pHard > 0 for all non-product
√ states in {|ψθAB ⟩}θ except for the maximally entangled state
AB
|ψθ=π/2 ⟩ = (|01⟩ + |10⟩)/ 2 for which pHard (π/2) = 0. The maximum value of the function
pHard (θ) can easily be computed to give
1 √
max pHardy (θ) = (5 5 − 11) ≈ 0.09 (2.149)
θ∈[0,2π] 2
Exercise 2.6.9. Show that the GHZ state as defined in (2.150) can be expressed in the
yyx-basis as
1
| ↑A B A B C A B A B C
|GHZ⟩ = ↑
y y ⟩ + | ↓ ↓
y y ⟩ ⊗ | ↓x ⟩ + | ↑ ↓
y y ⟩ + | ↓ ↑
y y ⟩ ⊗ | ↑x ⟩ (2.151)
2
and in the xxx-basis as
1
| ↑A B A B C A B A B C
|GHZ⟩ = x ↑x ⟩ + | ↓x ↓x ⟩ ⊗ | ↑x ⟩ + | ↑x ↓x ⟩ + | ↓x ↑x ⟩ ⊗ | ↓x ⟩ . (2.152)
2
Denote by Ax (and similarly Ay ) the random variables that take the value +1 if the spin
of the first electron in the x-direction is positive, and take the value −1 if it is in the negative
x-direction. The random variables Bx , By , Cx , and Cy , are defined similarly.
Now, according to (2.152), if Alice, Bob, and Charlie perform the xxx-measurement, the
results of their measurements, given by Ax , Bx , and Cx , must satisfy
Ax Bx Cx = 1 . (2.153)
Ay Bx Cy = −1 and Ax By Cy = −1 . (2.155)
where we used A2x = By2 = Cy2 = 1 since these variable can only take the two values ±1.
To summarize, according to quantum mechanics, an xxx-measurement can only yield one of
the four possible outcomes:
| ↑A B C A B C A B C A B C
x ↑x ↑x ⟩ , | ↓x ↓x ↑x ⟩ , | ↑x ↓x ↓x ⟩ , | ↓x ↑x ↓x ⟩ . (2.157)
On the other hand, local realism predicts that an xxx-measurement yields the four possible
outcomes:
| ↑A B C A B C A B C A B C
x ↑x ↓x ⟩ , | ↓x ↓x ↓x ⟩ , | ↑x ↓x ↑x ⟩ , | ↓x ↑x ↑x ⟩ , (2.158)
in maximal contrast with quantum mechanics. One may argue that we used quantum me-
chanics to express the GHZ state in the form (2.151), but this does not affect the conclusion
that local realism cannot co-exist with the quantum mechanical formalism.
addition modulus 2. The following table summarizes the desired value for a ⊕ b for each of
the values of x and y:
x y a⊕b
0 0 0
0 1 0
1 0 0
1 1 1
Clearly, from the table above it is obvious that if Alice and Bob always choose a = b = 0
(no matter what the values of x and y) then they will win the game 3/4 of the times. Can
they do better?
Exercise 2.6.10. Show that Alice and Bob cannot win more than 3/4 of the times even if
they use some randomness (i.e. they share some correlated random variable).
Suppose now that Alice and Bob share quantum correlations; in particular, suppose they
each posses an electron in their lab, and that the two electrons are prepared in the some
bipartite state |ψ AB ⟩. With this state at hand, they use the following strategy. Based on the
bits x and y that they receive from the referee, they choose to perform spin-measurements
in the direction nx for Alice, and in the direction my for Bob. They then send to the referee
the outcomes of their corresponding measurements. The probability that Alice and Bob win
this CHSH game is given by
1 X
pwin := pψ (ab|xy) δxy,a⊕b , (2.159)
4 x,y,a,b
where the factor 1/4 represent the (uniform) probability that the referee sends x to Alice
and y to Bob. From the following exercise it follows that for appropriate choices of nx , my ,
and ψ AB , Alice and Bob can win the game with a probability greater than 3/4.
Exercise 2.6.11. Consider the quantum strategy described above, and denote by plose =
1 − pwin the probability that Alice and Bob lose the game. Recall the Bell operator B as
defined in Exercise 2.4.5 with n := n0 , n′ := n1 , m := m0 , and m′ := m1 .
1. Show that
⟨ψ|B|ψ⟩ = pwin − plose . (2.160)
2. Use Part 1 together with the Tsirelson bound to show that there exists a quantum
strategy (i.e. directions nx , my and a quantum state |ψ AB ⟩) such that
1 1 3
pwin = + √ > (2.161)
2 2 2 4
where 0 < c ∈ R and p is the vector whose components are the conditional probabilities
p(ab|xy) and s is any real vector with the same dimension as p. Note that for any real vector
s one can take c in (2.162) to be
c = max s · p (2.163)
p∈L(n)
where L(n) ⊂ Rn is the set of all vectors p = {p(ab|xy)} whose components have the form
(cf. (2.131)) Z
p(ab|xy) = dλ qλ pA B
λ (a|x) pλ (b|y) . (2.164)
We can therefore identify any Bell inequality with a single real vector s (since the constant c
is determined from above). In the general case, x = 1, . . . , |X|, y = 1, . . . , |Y |, a = 1, . . . , |A|,
and b = 1, . . . , |B|, can take more than two values. We also denoted by n := |X| · |Y | · |A| · |B|
the dimension of the vectors p and s. This corresponds to higher dimensional systems, and in
the quantum case a corresponds to the outcome of a projective von-Neumann measurement
that is labeled by x on Alice’s subsystem, and similarly b corresponds to the outcome of a
projective measurement that is labeled by y on Bob’s subsystem. Note that the definition
of local distribution as in (2.164) remains unchanged in higher dimensions. Therefore, there
are many Bell inequalities, and in recent years much effort has been made to characterize
and understand the structure of all them.
The Bell inequalities that we consider here are those that can be used to test if a given
distribution vector p is local (i.e. has the form (2.164)). If a given distribution vector p
violates a Bell inequality s (i.e. a Bell inequality of the form (2.162)) then we learn from it
that p is non-local. However, if a probability distribution does not violate a particular Bell
inequality, s, this alone does not mean that the distribution is local.
Given a probability vector p, how can we decide if it is local (i.e. can be written in
the form (2.164))? To answer this question, we first discuss the convexity property of local
distributions.
Exercise 2.6.12. Denote by P(n) ⊂ Rn the space of all real vectors in dimension n =
|ABXY | whose components are given in terms of conditional probabilities {p(ab|xy)}, and
let L(n) ⊂ P(n) be the set of all local vectors as in (2.164). Show that L(n) is a convex set.
Exercise 2.6.13. Show that P(n) is a polytope in Rn . Hint: Recall the definition of a
polytope in Sec. A.2.
Consider now a vector p ∈ P(n), and define the set {p} consisting of exactly one vector.
As such, it is (trivially) a convex set in Rn . Suppose now that p ̸∈ L(n). This means that
{p} ∩ L(n) = ∅, or in other words, {p} and L(n) are two disjoint convex sets. Therefore,
from the hyperplane separation theorem (see Theorem A.2) it follows that there exists a
vector s ∈ Rn and a real number r such that
The above equation can be interpreted as follows. If p ̸∈ L(n) then there exists a Bell
inequality s that it violates. We summarize it in the following theorem.
Theorem 2.6.1. Let L(n) ⊂ P(n) be the set of all local probability vectors as
in (2.164) with fixed cardinalities |X|, |Y |, |A|, |B|, and n = |ABXY |. Then,
p ̸∈ L(n) if and only if it violates at least one Bell inequality.
Exercise 2.6.14. Show that L(n) is a polytope in Rn (i.e. a convex hull of a finite number
of points). Hint: Show first that the set of vectors pA , whose components are any conditional
probabilities {p(a|x)}, is itself a polytope (i.e. find its extreme points and show that there are
a finite number of them).
From Theorem A.6.3, the polytope L(n) can be represented as an intersection of finitely
many half-spaces. Denoting by s(j) (with j = 1, . . . , m) the normal vectors to these half
spaces, we therefore conclude that p ∈ L(n) if and only if
s(j) · p ⩽ cj ∀j = 1, . . . , m (2.166)
where cj := maxq∈L(n) s(j) · q. In other words, there exists finitely many Bell inequalities that
can determine if a vector p ∈ L(n).
This analysis may give the impression that deciding if p is local is easy. Therefore, it is
important to note first that the computation of s(j) may be hard, and that the number m
may grow exponentially with the cardinalities |X|,|Y |, |A|, and |B|. In particular, already
for the case that |A| = |B| = 2 with arbitrary large |X| = |Y |, it was shown that the
decision problem of whether p is local is NP-complete [11]. On the other hand, the simplest
case in which |A| = |B| = |X| = |Y | = 2 was fully characterized in [81], and independently
by [78], and, in particular, it was shown that the only non-trivial Bell inequality is the CHSH
inequality. That is, for bits x, y, a, b ∈ {0, 1}, the 16-dimensional vector p = (p(ab|xy)) is
local if and only if it does not violate any of the CHSH inequalities.
We emphasize here that U (t) (with t > 0) does not depend on the initial state (i.e. on the
preparation of the system at time t = 0). It is important also to note that the formalism
of quantum mechanics does not propose which unitary family U (t) one should choose to
describe a particular evolution of a quantum system. It just states the evolution (whatever
the specific causes for it) is described with a unitary matrix. We saw earlier something
similar about quantum states of the spin of an electron. The first postulate of quantum
mechanics did not tell us which states to assign to a specific system. It only stated that
all the information about the system is encoded in a quantum state. We then used, as in
the example of the spin of an electron, further symmetry properties, to assign the physical
interpretation of any qubit state like |0⟩, |1⟩, |+⟩ | − i⟩.
One can view the unitary evolution postulate as a principle of distinguishability preserv-
ing. Recall from Exercise 2.5.4 that if two quantum states are orthogonal then they can
be perfectly distinguished by a suitable projective measurement. The principle of distin-
guishability preserving asserts that if a closed system is prepared in one out of two or more
distinguishable states, then the ability to distinguish between them remains intact through-
out the evolution, unless some type of external noise is pumped into the system. Therefore,
one can view a unitary evolution as a distinguishability preserving map. Alternatively, since
information quantifies the ability to distinguish between one thing from another, the unitary
evolution postulate of quantum mechanics, loosely speaking, is the statement that closed
systems don’t loose information (i.e. the ability to distinguish) if they don’t interact with
the external world.
We now discuss the form of the parametrized family of the unitaries U (t) given in (2.167).
We will assume here that the function t 7→ U (t) is continuous, and even differentiable.
Moreover, U (0) = I is the identity matrix so that we can express for a small t = ε > 0
where H is some Hermitian matrix. Note that H must be Hermitian since otherwise U (ε)
will not be a unitary matrix; i.e.
where we assumed that H is Hermitian. Therefore, taking the derivative on both sides
of (2.167) and setting t = 0 gives:
d
|ψ(t)⟩ = −iH|ψ(0)⟩ . (2.170)
dt t=0
Now, since the system is isolated, the state |ψ(t)⟩ must evolve according to the same rule as
the state |ψ(0)⟩. Hence, this homogeneity assumption implies that for all t > 0
d
|ψ(t)⟩ = −iH|ψ(t)⟩ . (2.171)
dt
Exercise 2.7.1. Show that from the equation above it follows that:
where the Plank’s constant ℏ = h/2π has the units of energy×time so that both sides of the
equation have the same P dimensions. Since the Hamiltonian H is an Hermitian matrix it can
be diagonalized as H = x Ex |φx ⟩⟨φx |, where {Ex } are the energy levels of the system, and
{|φx ⟩} are the corresponding eigenstates. The eigenstate |φx ⟩ that corresponds to the lowest
energy level is called the ground state of the system.
Finally, we assumed above that the system is closed, i.e. does not interact with the envi-
ronment in any way. This led us to assume a continuous uniform evolution. However, many
physical systems are not closed, and even us, the experimenters, can change the Hamiltonian
by changing parameters in the lab at different times. We leave this discussion to the next
chapter that covers evolution of open systems.
the experiment, the measuring device was given in some “ready” state. That is, according
to the first postulate of quantum mechanics there exists a vector |ready⟩ ∈ E containing
all the information about the measuring device prior to the measurement. Now, suppose
first that the state of the system was |0⟩A . Then, the initial state of the system+device is
|0⟩A |ready⟩E . After the measurement, the joint system evolves to
where the equality follows from the fact that the measurement is performed in the z-direction,
so the system state |0⟩A must remain intact, while the vector |ready⟩E of the measuring device
is transformed to another vector |output “0”⟩E in E, containing the information that the
output was 0. Similarly, if the initial state of the system was |1⟩A , then the initial state
of the system+device is |1⟩A |ready⟩E , and after the measurement, the joint system would
evolve to
|1⟩A |ready⟩E → U AE |1⟩A |ready⟩E = |1⟩A |output “1”⟩E . (2.175)
Now, lets consider the case in which the initial state of the system is |ψ⟩A = a|0⟩ + b|1⟩.
In this case, as before, the initial state of the system+device is |ψ⟩A |ready⟩E . However, after
the measurement, the system evolves unitarily to the state
Therefore, 2
⟨ψ|ϕ⟩ = ⟨ψ|V ∗ V |ϕ⟩ = ⟨ψ|ϕ⟩ (2.178)
But any complex number that satisfies c = c2 must be equal to 0 or 1. Hence, ⟨ψ|ϕ⟩ ∈ {0, 1}
which means that either |ψ⟩ = |ϕ⟩ or that |ψ⟩ is orthogonal to |ϕ⟩. Therefore, there is no
quantum machine that is capable of generating two copies of an arbitrary unknown quantum
state. This result is known by the term the no-cloning theorem.
Exercise 2.7.2. Verify that U AB in the equation above is indeed a unitary matrix.
Figure 2.10: (a) Controlled unitary gate. (b) Controlled NOT (CNOT) gate.
In quantum circuits, the controlled unitary is depicted as in Fig. 2.10a. The CNOT gate
is the controlled unitary map
where σ1 is the first Pauli (unitary) matrix. The CNOT gate is depicted in Fig. 2.10b.
Exercise 2.7.3.
√ Show that the CNOT gate can generate the maximally entangled state
(|00⟩ + |11⟩)/ 2 from a tensor product of two vectors (i.e. from a product state of the
form |ψ⟩A |ϕ⟩B ).
Exercise 2.7.4. Show the equivalence of the two circuits in Fig. 2.11, where
1 1 1
H := √ (2.181)
2 1 −1
Open physical systems are systems that have interactions with other external systems. These
external systems, which we will refer to as ‘the environment’, can either be correlated with
the system, and/or exchange information, energy, or matter, with it. Consequently, the
description and evolution of such systems can be very different than those we discussed for
isolated systems. Yet, there is no need to introduce new postulates in order to develop
thetheory of open quantum systems. Instead, we will see that all the postulates of quantum
mechanics on isolated systems are sufficient to determine the evolution, the measurements,
and the description of open systems.
1
|ψx ⟩ = √ Mx |ψ⟩ and px = ⟨ψ|Mx∗ Mx |ψ⟩ . (3.1)
px
91
92 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS
yields an even more general type of evolution known by the name generalized measurement
as it generalizes the von-Neumann projective measurement.
In Fig. 3.1 we describe the following evolution of a quantum state |ψ⟩ ∈ A. In the first
step of the evolution, an ancillary system is introduced which is prepared in some state
|1⟩ ∈ R. Consequently, the state of the joint system is |1⟩R |ψ A ⟩. Next, a joint unitary
evolution, U RA is applied to the joint state |1⟩|ψ⟩ yielding the bipartite state U RA |1⟩|ψ⟩.
Finally, a basis measurement {|y⟩⟨y|R }y∈[n] is applied on the reference system R.
We discuss now how the output state |ψy ⟩ is related to the input state |ψ⟩, and what is
the probability to obtain an outcome y. Denote by m := |A| and n := |R|. As an operator
in the vector space L(R ⊗ A), the unitary matrix U RA can be express as
X
U RA = |y⟩⟨y ′ | ⊗ Λyy′ where Λyy′ ∈ L(A) . (3.2)
y,y ′ ∈[n]
Note that any operator in L(R ⊗ A) has the above form, but since U RA is unitary we have
the equivalence
X
U ∗ U = I RA ⇐⇒ Λ∗yz Λyx = δxz I A ∀ x, z ∈ [n] . (3.3)
y∈[n]
1
|ψy ⟩ = √ My |ψ⟩ with py := ⟨ψ|My∗ My |ψ⟩ . (3.5)
py
Moreover, from (3.3) it follows that y∈[n] My∗ My = I A , so that y∈[n] py = 1, where py is
P P
the probability to obtain an outcome y. Note that the post-measurement state |ψy ⟩ with
its associated probability py , has a very similar form to the form of |ψx ⟩ and px in (3.9).
However, unlike the form Px U of Mx in (3.9), the only condition on {My := Λy1 } is that
they can be extended to a family of matrices {Λyy′ } that satisfies (3.3). This will ensure that
U AB is unitary. We now show that any set of complex matrices {My }y∈[n] with the property
∗ A
P
y∈[n] My My = I can be completed to a full family of matrices {Λyy ′ } that satisfies (3.3).
To see this, observe that the matrix U RA can be expressed in the following block form
Λ Λ12 · · · Λ1n
11
Λ21 Λ22 · · · Λ2n
RA
U = .. .. .. ..
(3.6)
. . . .
Λn1 Λn2 · · · Λnn
with the matrices {My := Λy1 }y∈[n] appearing in the first block column. Moreover, this first
column satisfies
M
1
h i
M2 X ∗
∗ ∗ ∗
M1 M2 · · · Mn . = My My = I A . (3.7)
.
. y∈[n]
Mn
Therefore, the first column block in (3.6) consists of m := |A| orthonormal vectors. Any
such set of m orthonormal vectors in Cmn can be completed to a full orthonormal basis
of Cmn (for example, by the Gram-Schmidt process). Therefore, it is always possible to
construct a unitary matrix U RA as above, from a set of matrices {Λy1 }y∈[n] that satisfy
∗ A
P
y∈[n] Λy1 Λy1 = I .
Generalized Measurement
Definition 3.1.1. A generalized measurement is a collection of m ∈ N complex
matrices {Mx }x∈[m] ⊂ L(A) with the property that
X
Mx∗ Mx = I A . (3.8)
x∈[m]
quantum mechanics, any evolution can be decomposed into a sequence of these two types
of processes. Since both unitary evolution and projective measurements are themselves
generalized measurement, we conclude that the most general measurement on a quantum
system can be described as a sequence of generalized measurements. In the following exercise
it is argued that any such sequence of generalized measurements can be simulated by a single
generalized measurement. Hence, the generalized measurement described above is indeed
general enough to describe the most general measurement in quantum mechanics.
Exercise 3.1.1. Show that if {Mx }x∈[m] and {Ny }y∈[n] are two generalized measurements
then {Mx Ny } is also a generalized measurement. Use this to show that a sequence of gener-
alized measurements can be simulated by a single generalized measurement.
Exercise 3.1.2. Show that the matrices (operators) Mx do not have to be square. That
is, show that any collection of m operators {Mx }x∈[m] ⊂ L(A, B) that satisfy (3.8) can also
be realized as a generalized measurement as depicted in Fig. 3.1. Hint: Consider a unitary
operator U : RA → R′ B where the reference systems R and R′ are such that |RA| = |R′ B|.
Exercise 3.1.3. Let A = C2 , and let M0 = a|+⟩⟨0| and M1 = b|0⟩⟨+| be two operators
in L(A) with a, b ∈ C. Find the precise conditions on a and b for the existence of a third
operator M2 ∈ L(A) such that {M0 , M1 , M2 } form a generalized measurement.
Exercise 3.1.4. Consider d (rank-one) operators {Mx = |ψx ⟩⟨ϕx |}x∈[d] in L(Cd ), where
{|ψx ⟩}x∈[d] and {|ϕx ⟩}x∈[d] are some normalized states in Cd . Show that {Mx }x∈[d] is a gen-
eralized measurement if and only if {|ϕx ⟩}x∈[d] is an orthonormal basis of Cd .
dice, do not have to be unbiased). Now, suppose that Alice forgot the value of x. Then, Alice
knows that her state is one out of the m states in the ensemble of states {|ψx ⟩⟨ψx |, px }x∈[m] .
How should we characterize the ensemble {|ψx ⟩⟨ψx |, px }x∈[m] ? We will see that there exists
many other ensemble of states that contains the exact same information as the ensemble
{|ψx ⟩⟨ψx |, px }x∈[m] . Therefore, instead of characterizing the information with a particular
ensemble (such as {|ψx ⟩⟨ψx |, px }x∈[m] ), we will characterize it with a mathematical object
that remains invariant under exchanges of such equivalent ensembles.
To gather information about her system, Alice can execute a generalized measurement,
denoted as {My }y∈[n] , on her system, characterized by the ensemble {|ψx ⟩⟨ψx |, px }x∈[m] .
This measurement results in an outcome y with a corresponding probability denoted as
qy . Furthermore, following the occurrence of outcome y, there emerges a post-measurement
ensemble that describes the state of Alice’s system. We will now delve into these details
to demonstrate that the dependencies of these quantities rely solely on a density matrix
associated with the ensemble {|ψx ⟩⟨ψx |, px }x∈[m] .
If the pre-measurement state was |ψx ⟩ then the post-measurement state after outcome y
occurred is
1
|ϕxy ⟩ := √ My |ψx ⟩ (3.10)
py|x
with probability
py|x := ⟨ψx |My∗ My |ψx ⟩ = Tr My∗ My |ψx ⟩⟨ψx | .
(3.11)
However, since Alice does not know the value of x, if she performs a measurement {My }y∈[n]
on her system, she will get the outcome y with probability
X X
px Tr My∗ My |ψx ⟩⟨ψx | = Tr My∗ My ρ
qy := py|x px = (3.12)
x∈[m] x∈[m]
where
X
ρ := px |ψx ⟩⟨ψx | , (3.13)
x∈[m]
is the density matrix associated with the ensemble {|ψx ⟩⟨ψx |, px }x∈[m] .
Note that py|x px is the probability that both the pre-measurement state is |ψx ⟩ and that
the outcome of its measurement is y. Therefore, using the Bayesian rule of probabilities, the
probability that the pre-measurement state is |ψx ⟩ given that the measurement outcome is y,
can be expressed as qx|y := py|x px /qy . Consequently, after outcome y occurred the ensemble
{px , |ψx ⟩⟨ψx |} changes to
{qx|y , |ϕxy ⟩⟨ϕxy |}x∈[m] . (3.14)
Note that the density operator, σy , that is associated with the above ensemble is given by
X
σy := qx|y |ϕxy ⟩⟨ϕxy |
x∈[m]
1 X
qx|y := py|x px /qy −−−−→ = px py|x |ϕxy ⟩⟨ϕxy |
qy
x∈[m]
(3.15)
1 X
(3.10)→ = px My |ψx ⟩⟨ψx |My∗
qy
x∈[m]
1
= My ρMy∗ .
qy
To summarize, the outcome
∗
y, of any generalized measurement {My }y∈[n] , occurs with
probability qy = Tr My My ρ , when applied to an ensemble {px , |ψx ⟩⟨ψx |}x∈[m] . Recall
from Exercise 2.3.15 that aside from the ensemble {px , |ψx ⟩⟨ψx |}x∈[m] , there are infinitely
many other ensembles that also correspond to the same density operator ρ. Therefore, the
dependance of qy only on ρ demonstrates that the statistics of any measurement outcome
depends only on the density operator, and not on the particular ensemble that realizes it. To
clarify, suppose {px , |ψx ⟩⟨ψx |}x∈[m] and {rz , |φz ⟩⟨φz |}z∈[k] are two ensembles that correspond
to the same density operator ρ. Then, the probability to obtain an outcome y is the same
for both ensembles, and therefore there is no way to distinguish between the two ensembles.
One may argue that maybe there is a way to distinguish between the post-measurement
ensembles, however, as can be seen in the above equation, any post-measurement ensemble
is also associated with a unique density operator q1y My ρMy∗ that depends only on ρ and not
on the particular ensemble {px , |ψx ⟩⟨ψx |}x∈[m] or {rz , |φz ⟩⟨φz |}z∈[k] that realizes ρ.
Given that the formalism of quantum mechanics lacks the means to differentiate between
ensembles of states corresponding to the same density operator, all the information about
the physical system accessible to us as observers is encapsulated within the density operator.
Consequently, instead of characterizing physical systems using ensembles of states, we shall
henceforth employ density operators for their descriptions.
As an example, consider a qubit state ρ ∈ D(C2 ). That is, ρ ⩾ 0 and Tr[ρ] = 1. Any
such qubit state can be expressed as a linear combination of the Pauli basis of Herm(C2 )
ρ = r0 σ0 + r1 σ1 + r2 σ2 + r3 σ3 , (3.18)
where σ0 := I2 . Now, since the Pauli matrices σ1 , σ2 , and σ3 , are traceless, the condition
Tr[ρ] = 1 gives r0 = 1/2. What are the conditions on r := (r1 , r2 , r3 )T ∈ R3 that ensure that
ρ ⩾ 0 ? Since ρ has two eigenvalues, say λ and 1 − λ, it follows that ρ ⩾ 0 if and only if
0 ⩽ λ ⩽ 1. This condition is equivalent to Tr[ρ2 ] = λ2 + (1 − λ)2 ⩽ 1. Therefore, ρ ⩾ 0 if
and only if
1 X 1
1 ⩾ Tr[ρ2 ] = + rj rk Tr[σj σk ] = + 2∥r∥22 . (3.19)
2 j,k
2
That is, ∥r∥2 ⩽ 1/2. Therefore, after the renaming r → 12 r we conclude that all qubit
quantum states has the form:
1
ρ = (I2 + r · σ) (3.20)
2
with ∥r∥2 ⩽ 1. Moreover, since Tr[ρ2 ] = 1 if and only if ∥r∥2 = 1 we get that ρ above is
pure if and only if ∥r∥2 = 1. Hence, a qubit can be represented by the Bloch Sphere (see
Fig. 2.5) with the pure states represented on the boundary of the sphere and mixed states
in the interior of the sphere. Note that the center of the sphere, i.e. r = 0, corresponds to
the state ρ = 21 I, which is called the maximally mixed state.
Exercise 3.2.1. Show that for r = (sin(α) cos(β), sin(α) sin(β), cos(α))T , ρ in (3.20) is given
by the state ρ = |ψ⟩⟨ψ| with |ψ⟩ as in (2.110).
Exercise 3.2.2. Consider a density operator for a qutrit; that is, ρ ∈ D(C3 ), ρ ⩾ 0,
and Tr[ρ] = 1. Let λ = (λ1 , λ2 , . . . , λ8 ) be a vector of matrices with {λj }j∈[8] being some
Hermitian traceless 3 × 3 matrices satisfying the condition Tr(λi λj ) = 2δij (note that also
the Pauli matrices satisfy this orthogonality condition).
4. Is it true that for every t with ∥t∥2 ⩽ √13 , ρ above corresponds to a density matrix? If
yes prove it, otherwise give a counter example.
px := ⟨ψ AB |Nx∗ Nx ⊗ I B |ψ AB ⟩
√ √
(3.22)→ = ΩAÃ ρNx∗ Nx ρ ⊗ V ∗ V ΩAÃ
(3.23)
AÃ √ √
V ∗ V = I Ã −−−−→ = Ω ρNx∗ Nx ρ ⊗ I Ã ΩAÃ
Part 1 of Exercise 2.3.26→ = Tr Nx∗ Nx ρA .
Thus, the outcome probability px depends only on the reduced density matrix ρA and not
(directly) on the bipartite state |ψ AB ⟩. Moreover, the post measurement state after outcome
x occurred is given by
1
|ψxAB ⟩ = √ Nx ⊗ I B |ψ AB ⟩
px
(3.24)
1 √
(3.22)→ = √ Nx ρ ⊗ V ΩAÃ .
px
Therefore, the reduced density matrix σxA of ψxAB = |ψxAB ⟩⟨ψxAB | is given by
1 √ h ∗ i √ ∗
σxA := TrB ψxAB = Nx ρ TrB I A ⊗ V ΩAÃ I A ⊗ V
ρNx , (3.25)
px
where we substitute the expression in (3.24) for ψxAB . Now, from the cyclic property of the
partial trace (see Exercise 2.3.29) we have that
h ∗ i h AÃ i h i
A ∗
AÃ A A AÃ
TrB I ⊗ V Ω I ⊗V = Trà I ⊗ V V Ω = TrÃ Ω = IA . (3.26)
This demonstrates that the reduced density matrix ρA along with the measurement operators
{Nx }x∈[m] determine the post-measurement reduced density matrices in the exact same way
as we saw in the previous section. For the same reasons as before, we conclude that all
the information that can be extracted from Alice’s subsystem (via quantum generalized
measurements) is encoded in the marginal state ρA . Therefore, if Alice has no accesses to
Bob’s subsystem, then from her perspective, the state of her subsystem can be characterized
by the marginal density operator ρA , and the fact that her subsystem is entangled with Bob’s
can be ignored.
To establish the equivalence between ρXA and the ensemble {px , |ψx ⟩⟨ψx |}{ x ∈ [m]},
we demonstrate that it is possible to transform ρXA into {px , |ψx ⟩⟨ψx |}x∈[m] and vice versa.
Firstly, consider performing a measurement in the |x⟩ basis on system X of a composite
system XA in the cq-state ρXA . This measurement yields the state |ψx ⟩ with a probability
of px . Consequently, this process reconstructs the ensemble {px , |ψx ⟩⟨ψx |}x∈[m] from ρXA .
Conversely, imagine that we have a state |ψx ⟩ randomly selected from the ensemble
px , |ψx ⟩⟨ψx |x∈[m] . If Alice possesses knowledge of which state was selected (i.e., she knows
the value of x), she can encode this information by introducing |x⟩⟨x|X , resulting in her
state transitioning to |x⟩⟨x|X ⊗ |ψx ⟩⟨ψx |A . When Alice opts to forget the specific value of x,
her quantum state becomes identical to ρXA . Furthermore,
XA P it is worth noting that when we
A
only have access to the marginal state ρ = TrX ρ = x∈[m] px |ψx ⟩⟨ψx |, it is generally
impossible to perfectly recover the value of x.
Cq-states play a pivotal role in quantum information science, particularly when describing
the outcomes of quantum measurements. Let’s consider a physical system characterized by
the density operator ρ ∈ D(A) and a generalized quantum measurement {Mx }x∈[m] . As
previously discussed, the application of this generalized measurement to the state ρA results
in the state σxA , as outlined in (3.17), with the associated probability px as defined in (3.16).
Because we know the outcome x, we have the option to record it within a classical system
denoted as X. In this context, we can perceive the measurement’s effect as a transformation,
We will see later on that in this case, the measurement acts as a quantum channel, converting
one density operator, ρA , to another, σ A (see Fig. 3.3c below).
Note that the role of the referee above is to provide Alice and Bob with a shared random-
ness. Therefore, any separable state as in (3.31) can be prepared by local operations assisted
with shared randomness. Bipartite density matrices that do not have this form are called
entangled and we will discuss them in details in the following chapters on entanglement
theory.
Exercise 3.2.3. Show that the maximally entangled state ΦAB := |ϕAB ⟩⟨ΦAB | ∈ D(A ⊗ B)
is not separable.
Exercise 3.2.4. Show that if σ ∈ D(A ⊗ B) is separable then there exists an integer k ∈ N,
a probability distribution {qz }z∈[k] , a set of k pure states {ψz }z∈[k] in Alice’s Hilbert space
(i.e. each ψz ∈ Pure(A)), and a set of k pure states {ϕz }z∈[k] on Bob’s Hilbert space B, such
that X
σ AB = qz ψzA ⊗ ϕB
z . (3.32)
z∈[k]
Recall that the Born’s rule (adapted to density matrices and generalized measurements)
states that the probability to obtain an outcome x, when a measurement {Mx }x∈[m] is per-
formed on a system described by a density operator ρ, is given by px = Tr [Mx∗ Mx ρ]. There-
fore, to describe a POVM we only need to consider the operators, {Λx := Mx∗ Mx }x∈[m] , since
we are only interested in the statistics of the measurement and not the post-measurement
state. The POVM operators Λx , are called effects, and have the following two properties:
X
Λx ⩾ 0 and Λx = I A . (3.33)
x∈[m]
To every generalized measurement there exists a unique POVM that corresponds to it via
the relation Λx = Mx∗ Mx . However, for every POVM there are many quantum measurements
corresponding to it.
1. Show that for any n × n complex matrix A there exists an n × n unitary matrix U such
that
√
A = U |A| where |A| := A∗ A . (3.34)
Exercise 3.3.2. Let {Λx }x∈[m] be a POVM in Pos(A). Show that a generalized measurement
{Mx }x∈[m] ⊂ L(A) corresponds to the POVM {Λx }x∈[m] if and only if there exists m unitary
matrices, {Ux }x∈[m] , in L(A) such that
p
Mx = Ux Λx . (3.36)
1. Find all the possible values of a and b for which the set {Λ1 , Λ2 , Λ3 } is a POVM.
2. Which values of a and b that you found in part 1 correspond to a rank 1 POVM (i.e.
all the POVM elements have rank 1)?
Exercise 3.3.4. Suppose Alice and Bob share a composite quantum system in the state ρAB .
Alice performs a measurement on her system described by a POVM {Λx }x∈[m] , and record the
outcome x in a classical system X. Show that the post-measurement state can be expressed
as a cq-state of the form
X
σ XB = px |x⟩⟨x|X ⊗ σxB . (3.38)
x∈[m]
Express the probabilities px , and density matrices σxB , in terms of Λx and ρAB .
The span above is with respect to the real numbers since Herm(A) is a real vector space.
By definition, if a POVM {Λx }x∈[m] is informationally complete then m ⩾ d2 , where d := |A|.
Moreover, if m = d2 then the informationally complete POVM form a basis of Herm(A).
Clearly, the basis is not orthonormal since Λx ⩾ 0 for all x ∈ [d2 ]. A theorem from linear
algebra states that for any basis of a vectors space there exists a dual basis. That is, if
{Λ1 , Λ2 , . . . , Λd2 } is a basis of Herm(A) then there exists another basis {Γ1 , Γ2 , . . . , Γd2 } of
Herm(A), such that the Hilbert Schmidt inner products
3. Set Λ := 3x=0 Λx and show that it is invertible, and that the operators {Λ̃0 , Λ̃1 , Λ̃2 , Λ̃3 },
P
with Λ̃x := Λ−1/2 Λx Λ−1/2 , form a rank 1 informationally complete POVM.
Exercise 3.3.6.
However, unlike a basis, in general a frame does not satisfy the relation (3.40) with its dual.
Consider now the case that m = d2 so that {Λx }x∈[m] is a basis of Herm(A). Since its
dual {Γy }y∈[m] also span Herm(A) it follows that the density matrix ρ can be written in
terms of the linear combination X
ρ= py Γy (3.44)
y∈[m]
That is, the coefficients py in (3.44) are given by py = Tr[Λy ρ] ⩾ 0, and can be interpreted
as the probability to obtain an outcome y. The significance of (3.44) with py = Tr[Λy ρ] ⩾ 0,
is that by repeating the POVM {Λx }x∈[m] on many copies of ρ, one can estimate from the
measurement outcomes the values of the py s, and thereby learn ρ due to the relation (3.44).
Exercise 3.3.7. Let {Λx }x∈[m] be an informationally complete POVM in Herm(A), and let
its dual frame be {Γy }y∈[m] .
1. Show that at least one of the matrices in the dual frame is not positive semidefinite.
That is, there exists y ∈ [m] such that Γy ̸⩾ 0.
2. Show that Tr[Γy ] = 1 for all y ∈ [m].
Exercise 3.3.8. [Symmetric Informationally Complete (SIC) POVM]
Let d := |A|, m := d2 , and {Λx }x∈[m] be an informationally complete POVM in Herm(A),
with the following properties:
4. Show that
√ √
1 3 3 + 1 −5 + i 3 3 + 1 1 − 5i
Λ1 = √ √ , Λ2 = 1√ √
12 3 −5 − i 3 3 − 1 12 3 1 + 5i 3 3 − 1
√ √
1 3 3−5 1+i 1 3+1 1+i
Λ3 = √ √ , Λ4 = √ √ . (3.48)
12 3 1−i 3 3+5 4 3 1−i 3−1
form a SIC POVM in Herm(C2 ). Moreover, show that the four pure state {2Λx }x∈[4]
are the vertices of a tetrahedron in the Bloch Sphere.
We now show that any function µ with these properties must have the form µ(Λ) = Tr[ρΛ]
for some fixed density operator ρ ∈ D(A). This means that there is no way to assign
probabilities to effects other than the Born’s rule. Originally, this remarkable result was
proved by Gleason for the case of projective measurements, i.e. the set Eff(A) was replaced
with the set of all projections on A, and consequently the requirement on µ was much weaker,
assuming only that (3.50) holds for orthogonal projections. Nonetheless, Gleason was able
to derive the probability formula for systems of dimension d ⩾ 3, and in the qubit case he
showed that there are counter examples. Gleason also considered in his proof the infinite
dimensional case, and derived similar results. Gleason’s proof for projective measurements
goes beyond the scope of this book, and we will follow here a much simpler proof of the
above generalization of Gleason theorem (see the section on Notes and References for more
details).
The idea of the proof is as follows. First we will show that for any r ∈ [0, 1] µ(rΛ) =
rµ(Λ), and use it to show that µ can be extended to a linear functional on the space of
Hermitian operators Herm(A). Then, as a linear functional, it can be expressed as µ(Λ) =
Tr[Λρ], and we end by showing that ρ must be a density operator.
Proof. Note that for any effect Λ ∈ Eff(A) and any integer n, we have Λ = n1 Λ+· · · n1 Λ ⩽ I A ,
where the sum contains n terms. Therefore, from (3.50), µ(Λ) = nµ( n1 Λ). Multiplying this
equation by an integer m ⩽ n and dividing by n gives m 1 m
n
µ(Λ) = mµ( n
Λ) = µ n
Λ , where
we used the third property of a measure µ as defined above. So far we showed that for any
rational number p ∈ [0, 1], we must have µ(pΛ) = pµ(Λ). Let r ∈ [0, 1] be a real number and
let {pj } and {qk } be two sequences of rational numbers in [0, 1] that converge to r and have
the property that pj ⩽ r ⩽ qk for all j and k. We therefore have
pj µ(Λ) = µ(pj Λ) ⩽ µ(rΛ) ⩽ µ(qk Λ) = qk µ(Λ) , (3.52)
where the inequalities above follows from the fact that if two effects satisfy Λ ⩾ Γ (i.e.
Λ − Γ ⩾ 0) then
µ(Λ) = µ Γ + (Λ − Γ) = µ(Γ) + µ(Λ − Γ) ⩾ µ(Γ) . (3.53)
Taking the limit j, k → ∞ gives µ(rΛ) = rµ(Λ) for all r ∈ [0, 1].
We now extend the definition of µ to any element in Herm(A). First, for any positive
semidefinite matrix P ⩾ 0 that is not in Eff(A) there always exists r > 1 such that 1r P ∈
Eff(A). Define µ(P ) := rµ( 1r P ). To show that µ(P ) is well define, let r′ > 1 be another
number such that r1′ P ∈ Eff(A) and assume without loss of generality that r′ > r so that
r
r′
< 1. Then,
1 ′r 1 ′ 1
rµ P = r ′µ P = r µ ′P . (3.54)
r r r r
Note that this extension of the domain of µ to any element of Pos(A) preserves the
linearity of µ; i.e. for any two matrices M, N ∈ Pos(A) and large enough r such that 1r (M +
N ) ∈ Eff(A),
1 1 1
µ(M + N ) = rµ (M + N ) = rµ M + rµ N = µ(M ) + µ(N ) . (3.55)
r r r
Finally, we extend the definition of µ to include in its domain any matrix L ∈ Herm(A). Any
such matrix can be expressed as L = M − N , with M, N ∈ Pos(A). We therefore define:
To show that µ(L) is well defined, we need to show that for any other decomposition of
L = M ′ − N ′ with M ′ , N ′ ∈ Pos(A), we have
map followed by a projective measurement. However, Naimark’s theorem deals directly with
POVMs and makes the connection between POVMs and projective measurements more
transparent. Moreover, Naimark’s dilation theorem is applicable to infinite dimensional
systems, although we will prove it here only in the finite dimensional case.
Naimark’s Theorem
Theorem 3.3.2. Let A be a Hilbert space and {Λx }x∈[m] ⊂ Eff(A) be a POVM.
Then, there exists an extended Hilbert space B (i.e. |B| ⩾ |A|), an isometry
V : A → B, and a von-Neumann projective measurement {Px }x∈[m] ⊂ Eff(B), such
that Λx = V ∗ Px V for all x ∈ [m].
Proof. Every POVM element Λx can be expressed as Λx = Mx∗ Mx where {Mx } is a general-
ized measurement. In Sec. 3.1 we saw that for every generalized measurement there exists
an ancillary system R, and unitary matrix U RA , such that (cf. (3.2))
R
MxA = R x U RA 1 (3.59)
where |1⟩R is some fixed state in R. Define B := RA and define the operator V : A → B by
R
V := U RA 1 . That is, for any |ψ⟩ ∈ A
R A
V |ψ⟩A := U RA 1 ψ ∈B. (3.60)
A
P
Moreover, the set of rank one matrices {ϕxy } also form a POVM (i.e. x,y ϕxy = I ).
Therefore, one can implement the POVM {Λx }x∈[m] by first implementing the rank one
POVM {ϕxy }, with corresponding (x, y) outcomes, and then forgetting/ignoring the outcome
y.
Exercise 3.3.10. Show that if {Λx }x∈[m] is a rank 1 POVM in Eff(A), where m := |A|, then
{Λx }x∈[m] is a basis measurement (i.e. rank one von-Neumann projective measurement).
1. Show that the function above is well defined in the sense that it is independent on the
choice of the orthonormal basis {ηj }j∈[m2 ] of L(A).
2. Show that the function above is an inner product in the vector space L(A → B). Hence,
L(A → B) is a Hilbert space.
We are now ready to introduce the axiomatic approach describing a physical evolution
from system A into B. Below each axiom we provide the physical justification.
Consider the scenario where Alice rolls a dice to obtain a classical variable x ∈ [m], with
associated probabilities {px }x∈[m] . If she obtains the value x, she prepares her system in the
state ρx . After a quantum evolution takes place, if the initial state was ρx , it will evolve to
E(ρx ). Now, if Alice forgets which state she prepared (i.e., she forgets
P the outcome of the
dice roll), from her new perspective, the input state is given by x∈[m] px ρx . In the same
P
vein, the post-evolution state of the system becomes x∈[m] px E(ρx ). Consequently, we must
have: X X
E px ρ x = px E(ρx ) . (3.65)
x∈[m] x∈[m]
The above equation signifies that E is convex-linear (i.e., linear under convex combinations)
on the set of density matrices and can be extended linearly to act on the entire space L(A)
(not limited to D(A) and not limited to convex combinations). Therefore, we infer that
every map describing a physical evolution is an element of L(A → B).
so that E preserve the trace of hermitian matrices. Moreover, if M ∈ L(A) is not Hermitian
it can still be expressed as M = η0 +iη1 , where both η0 := (M +M ∗ )/2 and η1 := (M +M ∗ )/2
are Hermitian matrices, so that
We therefore conclude that any physical evolution E is a trace preserving (TP) linear map.
Since a physical evolution E takes density matrices to density matrices it also preserves
positivity. That is, if ρ ∈ Pos(A) is positive semidefinite matrix then also E(ρ) is a positive
semidefinite matrix in Pos(B). We call such linear maps positive maps. There is yet one
more property that E has to satisfy if it describes an evolution a physical system.
Consider a composite system consisting of two subsystems A and B. Such a system is
described by a bipartite density operator ρAB ∈ D(A ⊗ B). If the subsystem B undergoes
a physical evolution described by a linear map E ∈ L(B → B ′ ), while system A does not
evolve and remain intact, then the state ρAB will evolve to the state
′ ′
σ AB := idA ⊗ E B→B ρAB .
(3.68)
Therefore, if E represents a physical evolution, then both E and idA ⊗ E must takes density
matrices to density matrices. In particular, the linear map id ⊗ E ∈ L(AB → AB ′ ) must
also be a positive map for any system A. It turns out that there are linear maps E that are
positive while idA ⊗ E is not positive. One such example is the transposition map.
Consider the linear map T ∈ L(A → A) defined by
The transpose map preserves the eigenvalues and therefore is a trace-preserving positive
map. Now, take A = C2 and consider the matrix ΩAÃ := |ΩAÃ ⟩⟨ΩAÃ | ∈ L(C2 ⊗ C2 ). The
matrix ΩAÃ is a rank one, positive semidefinite matrix that can be expressed as
1 0 0 1
AÃ
0 0 0 0
Ω = |00⟩⟨00| + |00⟩⟨11| + |11⟩⟨00| + |11⟩⟨11| =
.
(3.70)
0 0 0 0
1 0 0 1
Complete Positivity
Definition 3.4.1. A linear map E ∈ L(A → B) is called k-positive if idk ⊗ E is
positive, where idk ∈ L(Ck → Ck ) is the identity map. Furthermore, E is called
completely positive (CP) if it is k-positive for all k ∈ N.
From Exercise 3.4.15 in the next few subsections it follows that E (k) is k-positive, and yet, if
k < d then E (k) is not (k + 1)-positive. If k ⩾ d then this map is completely positive. More
generally, we will see later that a map is d-positive if and only if it is completely positive.
In conclusion, any evolution of a physical system has to be (1) linear, (2) trace-preserving
(TP), and (3) completely positive (CP). Such a linear CPTP map is called a quantum
channel. The set of all quantum channels in L(A → B) will be denoted by CPTP(A →
B). In the next subsections we discuss several representations of quantum channels, and
along the way show that any quantum channel has a physical realization; that is, it can
be implemented by physical processes. Therefore, the axiomatic approach above led us to
the precise conditions on the evolution of a physical system that are both necessary and
sufficient for the existence of its physical realization.
Exercise 3.4.3. Let Λ ∈ L(A) and define a map FΛ : L(A) → L(A) via
Show that FΛ is a completely positive linear map. Hint: Prove it first that FΛA→A (ψ RA ) ⩾ 0
for a pure state |ψ RA ⟩ = M ⊗ I A |ΩÃA ⟩.
Definition 3.4.2. Let E ∈ L(A → B) be a linear map. Its dual or adjoint map is
the map E ∗ ∈ L(B → A) that satisfies
The definition above of the dual map is analogous to the definition of a dual map in
L(A, B). Recall that the dual of a matrix M ∈ L(A, B) is defined via the relation
⟨ϕ|M ψ⟩ = ⟨M ∗ ϕ|ψ⟩ ∀ |ψ⟩ ∈ A, |ϕ⟩ ∈ B . (3.75)
Similarly, the definition of E ∗ in (3.74) can be expressed as
⟨σ, E(ρ)⟩HS = ⟨E ∗ (σ), ρ⟩HS ∀ ρ ∈ L(A), σ ∈ L(B) , (3.76)
where ⟨·, ·⟩HS is the Hilbert-Schmidt inner product.
Exercise 3.4.4. Let E ∈ L(A → B) be a linear map.
1. Show that E is trace preserving if and only if its dual E ∗ is unital; i.e. E ∗ (I B ) = I A .
2. Show that E is trace non-increasing (i.e., Tr[E(η)] ⩽ Tr[η] for all η ∈ Pos(A)) if and
only if its dual E ∗ is sub-unital; i.e. E ∗ (I B ) ⩽ I A .
3. Show that E is positive if and only if E ∗ is positive.
4. Show that E is completely positive if and only if E ∗ is completely positive.
Exercise 3.4.5. Show that a unitary evolution is a CPTP map. Specifically, show that a
unitary map U ∈ L(A → A) defined by
U(ρ) := U ρU ∗ ∀ ρ ∈ L(A) (3.77)
where U ∈ U(A) is a unitary operator, is a quantum channel.
Exercise 3.4.6. The replacement map is a map E ∈ L(A → B) defined by
E(ρ) := Tr[ρ] σ ∀ ρ ∈ L(A) (3.78)
where σ ∈ D(B) is some fixed density matrix.
1. Show that E is a quantum channel.
2. Show that for any two quantum states ρ ∈ D(A) and σ ∈ D(B) there exists a quantum
channel E such that E(ρ) = σ.
Exercise 3.4.7. Let A and B be two finite dimensional Hilbert spaces, and denote the set
of positive maps in L(A → B) by
n o
Pos(A → B) := E ∈ L(A → B) : E(ρ) ∈ Pos(B) ∀ ρ ∈ Pos(A) (3.79)
1 B
Denote also by R ∈ CPTP(A → B) the replacement channel R(ρA ) := Tr[ρA ] |B| I for all
ρ ∈ L(A).
1. Show that Pos(A → B) is a convex cone in the Hilbert space L(A → B).
2. Prove the equivalence of the following properties of a map E ∈ Pos(A → B):
(a) E belongs to the interior of the cone Pos(A → B).
(b) E = (1 − t)F + tR for some t ∈ (0, 1] and some F ∈ Pos(A → A).
(c) E ∗ belongs to the interior of the cone Pos(A → B).
(d) For any non-zero ρ ∈ Pos(A) we have E(ρ) > 0.
2
with rρ := (r1 , . . . , rm2 )T ∈ Cm . Observe that the mapping P ρ 7→ rρ defines an isomor-
m2
phism between L(A) and C . Similarly, for any σ = x∈[n2 ] sy Γy ∈ L(B) we define
sσ := (s1 , . . . , sn2 ) . Finally, for any linear map E ∈ L(A → B) we define the n2 × m2 matrix
T
σ = E(ρ) ⇐⇒ sσ = ME rρ . (3.82)
element). With these choices, an operator ρ ∈ L(A) has trace one if and only if rρ has
the form ( √1m , r2 , . . . , rm2 )T . We therefore conclude that the linear map E is both trace
preserving and Hermitian preserving if and only if its matrix representation has the form
p
m
0
ME = n where t ∈ Rn2 −1 and NE ∈ R(n2 −1)×(m2 −1) . (3.84)
t NE
What are the conditions on the matrix NE and the vector t that correspond to the condition
that E is completely positive? These conditions can be very complicated since, in general,
even the set of all vectors rρ for which ρ ⩾ 0 doesn’t have a simple characterization.
Exercise 3.4.8. Let E ∈ L(A → B) be a linear map.
1. Show that
ME ∗ = ME∗ . (3.85)
From the relation above, E is a positive map if the matrices t and NE are such that whenever
|r| ⩽ 1, |s| ⩽ 1 also holds. It’s important to note, however, that this criterion pertains
solely to the positivity of E and does not address its complete positivity. In the specific
scenario of qubits, it is feasible to articulate the conditions governing NE and t for E to be
completely positive. Nevertheless, these conditions tend to be rather intricate, and we direct
the interested reader to the pertinent literature found in the Notes and References section at
the end of this chapter. In the exercises below, you will demonstrate that these conditions
become more straightforward in the case of doubly-stochastic maps.
Doubly stochastic maps encompass mappings that possess two key properties: trace-
preservation and unitality, meaning they preserve the identity operator. One of the simplest
examples of such maps is the unitary map U(ρ) := U ρU ∗ , where U is a unitary P matrix.
Another example includes convex combinations of unitary maps in the form x∈[m] px Ux ,
where {px }x∈[m] forms a probability distribution, and each Ux represents a unitary quantum
channel. These instances exemplify completely positive maps that also qualify as doubly
stochastic.
Conversely, the transpose map T (ρ) = ρT or its combination with a unitary map, denoted
as U ◦ T , serve as examples of doubly stochastic maps that are positive but not completely
positive. In the subsequent set of problems, we will see that for the qubit case, all positive
doubly stochastic maps can be expressed as convex combinations of such maps.
Exercise 3.4.9. Let E ∈ Pos (C2 → C2 ) be a positive linear map.
1. Show that E is both trace-preserving and unital (i.e. doubly-stochastic) if and only if
1 0
ME = (3.87)
0 NE
where NE ∈ R3×3 has the property that ∥NE r∥2 ⩽ 1 for all r ∈ R3 with ∥r∥2 = 1 (in
particular, the absolute value of the eigenvalues of NE cannot exceed one).
2. Suppose E = U is the doubly stochastic unitary map given by U(ρ) = U ρU ∗ , where
U ∈ SU (2) can be expressed as U = wI2 + i(xσ1 + yσ2 + zσ3 ) with w, x, y, z ∈ R and
w2 + x2 + y 2 + z 2 = 1 (cf. (C.10)). Show that the matrix NU of (3.87) is an orthogonal
matrix in SO(3) given by
1 − 2y 2 − 2z 2 2xy + 2zw 2xz − 2yw
NU = 2xy − 2zw 1 − 2x2 − 2z 2 (3.88)
2yz + 2xw
2xz + 2yw 2yz − 2xw 1 − 2x2 − 2y 2
(cf. (C.7)). Hint: Calculate directly the components 21 Tr [σi U σj U ∗ ] for i, j ∈ {1, 2, 3}.
3. Use the previous parts to show that for any map of the form U(ρ) = U ρU ∗ , we have
that U ∈ U (2) if and only if the matrix NU ∈ SO(3). Hint: Every unitary U ∈ U (2)
can be written as U = exp(iθ)Ũ , where Ũ ∈ SU (2) and θ ∈ [0, 2π).
Exercise 3.4.10. Let T ∈ L (C2 → C2 ) be the transpose map defined by T (ρ) = ρT for all
ρ ∈ L(C2 ).
1. Show that the matrix representation of the transpose map with respect to the Pauli basis
of L(C2 ) is given by
1 0 00
0 1 0 0
MT = (3.89)
0 −1 0
0
0 0 0 1
2. Show that if E ∈ L(C2 → C2 ) is a positive doubly stochastic linear map with NE ∈ O(3)
and with det(NE ) = −1 then E = T ◦ U for some unitary map U. Hint: Use the fact
that any 3 × 3 orthogonal matrix can be expressed as a matrix product of an element
in SO(3) with NT .
Exercise 3.4.11. Use the exercises above in conjunction with Exercise A.3.3 to conclude
that any doubly stochastic positive map E ∈ Pos(C2 → C2 ) can be expressed as
where
P t ∈ [0, 1] and both N1 and N2 are mixtures of unitary maps; i.e. maps of the form
j∈[m] pj Uj with each Uj being a unitary map and {pj }j∈[m] is a probability distribution.
Hint: Use Exercise A.3.3 to show that NE can be expressed as a finite convex combination
of orthogonal matrices.
Størmer-Woronowicz Theorem
Theorem 3.4.1. Let E ∈ Pos(A → B) be a positive linear map. If |A| = 2 and
|B| ⩽ 3 then there exists two CP maps N1 , N2 ∈ CP(A → B) such that
E = N1 + T ◦ N2 (3.91)
Remark. The case |B| = 2 was proven by Størmer, and the case |B| = 3 was proven by
Woronowicz. Here, we will only prove Størmer theorem (i.e. |A| = |B| = 2) and refer the
reader to the section ‘Notes and References’ (at the end of this chapter) for more details.
Proof. We prove the theorem for the case that E is in the interior of Pos(A → A) (the
more general case will then follow from a continuity argument; see Exercise 3.4.13). From
Exercise 3.4.7 it follows that also E ∗ is in the interior of Pos(A → A), and furthermore,
E(ρ) > 0 for any non-zero ρ ∈ Pos(A).
The key idea of the proof is to find two positive definite operators Λ, Γ > 0 with the
property that the channel
D := FΛ ◦ E ◦ FΓ (3.92)
is doubly stochastic, where
From Exercise 3.4.3 (see also the section on operator sum representation below) it follows
that the above maps are completely positive. Apriori it is not clear if such positive definite
matrices Λ and Γ exists, but if they do then from (3.92) we have E = FΛ−1 ◦ D ◦ FΓ−1 , and
since all doubly stochastic positive maps have the form (3.91) (see Exercise 3.4.11) it follows
that also E has the form (3.91). It is therefore left to show that such Λ and Γ do exist.
By definition, the channel D is doubly stochastic if and only if both D and its dual D∗
are unital channels. Since the dual of D is given by D∗ = FΓ ◦ E ∗ ◦ FΛ (see Exercise 3.4.12)
we conclude that D is doubly stochastic if and only if the matrices Λ and Γ satisfies
By conjugating with the inverses of Λ and Γ, the two equations above can be expressed as
It is therefore left to show that there exists ρ := Λ−2 > 0 and σ := Γ2 > 0 such that
Observe that if ρ and σ satisfy the two equations above then for any s > 0 also sρ and sσ
satisfy the two equations. Hence, without loss of generality we can assume that if there exists
ρ and σ that satisfy the equation above then σ is normalized and since E is trace preserving
this implies that also ρ := E(σ) is normalized.
The equation ρ = E(σ) can be taken to be the definition of ρ. Substituting this ρ into
the second equality of (3.96) implies that
−1
σ −1 = E ∗ E(σ) . (3.97)
To show that such a σ exists, define the function f : D(A) → Pos(A) via
∗
−1 −1
f (ω) := E E(ω) ∀ ω ∈ D(A) , (3.98)
and observe that (3.97) is equivalent to f (σ) = σ. We also define the normalized version of
f , the function g : D(A) → D(A), as
f (ω)
g(ω) := ∀ ω ∈ D(A) . (3.99)
Tr [f (ω)]
Then, from Brouwer’s fixed-point theorem (see Theorem A.10.1) there exists a density matrix
σ ∈ D(A) such that g(σ) = σ. Denoting by t := Tr[f (σ)] > 0 this is equivalent to
f (σ) = tσ . (3.100)
It is therefore left to show that t = 1. For this purpose, observe first that with the definition
ρ := E(σ) we can express the above equation as
(tσ)−1 = E ∗ ρ−1 ,
(3.101)
so that
1 1
t−1 I = σ 2 E ∗ ρ−1 σ 2 = D∗ (I) , (3.102)
where D is defined in (3.92) with Λ = ρ−1/2 and Γ = σ 1/2 . On the other hand, the relation
ρ = E(σ) can be written as
1 1
I = ρ− 2 E (σ) ρ− 2 = D(I) (3.103)
which implies that the linear map D is unital. Since the dual of a unital map is trace
preserving (see the first part of Exercise 3.4.4) we conclude that D∗ is trace preserving.
Hence, by taking the trace on both sides of (3.102) and using the fact that D∗ is trace
preserving we conclude that t = 1. This completes the proof.
D∗ := FΛ ◦ E ∗ ◦ FΓ . (3.104)
Exercise 3.4.13. Use a continuity argument to prove that if all maps in the interior of
Pos(A → A) have the form (3.91) then all the maps in Pos(A → A) has this form.
One of the key properties of the Choi matrix is that it satisfies the relation
x,y∈[m]
X
= rxy E (|x⟩⟨y|)
(3.108)
x,y∈[m]
X
=E rxy |x⟩⟨y| = E(ρ) .
x,y∈[m]
The two relations (3.106,3.107) demonstrate that the mapping E 7→ JE is a linear bijection
(i.e. an isomorphism). This isomorphism is between the vector space L(A → B) of linear
operators from L(A) to L(B), and the space of bipartite matrices/operators L(AB). In the
following exercise you show that the mapping E 7→ JE is in fact isometrically isomorphism
between these two spaces.
Exercise 3.4.14. Let A and B be two finite dimensional Hilbert spaces and consider the
Hilbert space L(A → B) equiped with the inner product defined in (3.64). Show that this
inner product can be expressed as follows. For all E, F ∈ L(A → B) we have
where on the right-hand side we have the Hilbert-Schmidt inner product between the two Choi
matrices of E and F.
Proof. If E is completely positive then by definition JEAB := E Ã→B (ΩAÃ ) ⩾ 0. Suppose now
that JEAB ⩾ 0. Let k ∈ N, and |ψ RA ⟩ ∈ Ck ⊗ Cd , where R is a k-dimensional (reference)
system. Recall that any bipartite vector |ψ⟩RA can be expressed as
Exercise 2.3.10→ ⩾ 0 .
since each term in the sum is positive semidefinite. This completes the proof.
Theorem 3.4.3. A linear map E ∈ L(A → B) is trace preserving if and only if the
marginal state JEA := TrB JEAB = I A .
Proof. Suppose E is trace preserving and set m := |A|. Then, from (3.106)
X
TrB JEAB =
|x⟩⟨y|Tr [E (|x⟩⟨y|)]
x,y∈[m]
X
E is trace-preserving→ = |x⟩⟨y|Tr [|x⟩⟨y|] (3.113)
x,y∈[m]
X
= |x⟩⟨y|δxy = I A .
x,y∈[m]
We therefore conclude that a linear map E is a quantum channel if and only if its Choi
matrix JEAB ⩾ 0, and its marginal JEA = I A . In particular, the Choi matrix has trace |A| so
1
that |A| JEAB ∈ D(A ⊗ B). Hence, the Choi representation reveals that quantum channels can
be represented with bipartite quantum states. This equivalence between quantum channels
and bipartite quantum states is used very often in quantum information science.
Exercise 3.4.15. Show that the linear map, E (k) , defined in (3.72), with k < d, is k-positive
but not (k + 1)-positive.
Exercise 3.4.16. Let E ∈ L(A → A) be a quantum channel with the property that E(U ρU ∗ ) =
U E(ρ)U ∗ for any unitary matrix U ∈ U(A). Show that its Choi matrix JEAÃ must satisfy
∗
Ū ⊗ U JEAÃ Ū ⊗ U = JEAÃ
(3.115)
for all unitary matrices U .
Exercise 3.4.17. Show that any density matrix ρ ∈ D(AB) can be expressed as
ρAB = E Ã→B ψ AÃ (3.116)
We now show that the mapping ρ 7→ x∈[m] Mx ρMx∗ is a quantum channel and that every
P
quantum channel can be realized in this way. This representation of a quantum channel is
called the operator sum representation, and the elements {Mx }x∈[m] (with x∈[m] Mx∗ Mx =
P
Proof. Suppose E is a quantum channel. Since the Choi matrix of a quantum channel is
positive semidefinite we can always express it as
X
JEAB = |ψxAB ⟩⟨ψxAB | (3.121)
x∈[m]
for some integer m and some (possibly unnormalized) vectors |ψxAB ⟩ ∈ A ⊗ B. Recall that
any bipartite state |ψxAB ⟩ can be expressed as
where Mx ∈ L(A, B) is a linear operator. Moreover, since the marginal Choi matrix JEA = I A
we get
X
I A = TrB JEAB = TrB MxT ⊗ I B ΩB̃B (Mx∗ )T ⊗ I B
x∈[m] (3.123)
X h i X
= MxT TrB ΩB̃B (Mx∗ )T = MxT (Mx∗ )T
x∈[m] x∈[m]
To prove the converse, suppose that E has the form (3.120). Then, clearly the Choi matrix
JEAB := E Ã→B (ΩAÃ
) has the form (3.121) with |ψxAB ⟩ := I A ⊗ Mx |ΩAÃ ⟩. Hence, JEAB ⩾ 0 and
JEA = I A since x∈[m] Mx∗ Mx = I A . This completes the proof.
P
1. Show that there exists two sets of matrices {Mx }x∈[m] and {Nx }x∈[m] such that
X
E(ρ) = Mx ρNx∗ . (3.127)
x∈[m]
Hint:
P Start by showing that it is possible to express the complex Choi matrix as JEAB =
x∈[m] |ψx ⟩⟨ϕx |, and then follow similar lines as in the proof above.
3. Show that E ∈ L(A → B) is completely positive if and only if its dual map E ∗ ∈ L(B →
A) is completely positive.
1. The sets {Mx }x∈[m] and {Ny }y∈[n] constitute two operator sum representations
of the same quantum channel.
so that {Ny }y∈[n] is a generalized measurement. Similarly, for any ρ ∈ L(A) we have
X X X
Ny ρNy∗ = vyx v̄yx′ Mx ρMx∗′
y∈[n] x,x′ ∈[m] y∈[n]
X X (3.131)
V ∗ V = Im −−−−→ = δxx′ Mx ρMx∗′ = Mx ρMx∗ .
x,x′ ∈[m] x∈[m]
Hence, the sets {Mx }x∈[m] and {Ny }y∈[n] are two operator sum representations of the same
quantum channel.
Next, we proof the implication 1 ⇒ 2. From the assumption we have in particular that
for every ψ ∈ Pure(A) we have
X X
ρ := Mx |ψ⟩⟨ψ|Mx∗ = Ny |ψ⟩⟨ψ|Ny∗ . (3.132)
x∈[m] y∈[n]
Since both {Mx |ψ⟩}x∈[m] and {Ny |ψ⟩}y∈[n] form an unnormalized pure-state decomposition
of the same density matrix ρ, we get from Exercise 2.3.15 that there exists an n×m isometry
matrix V = (vyx ) such that for all y ∈ [n]
X
Ny |ψ⟩ = vyx Mx |ψ⟩ . (3.133)
x∈[m]
Since the above equality holds for all ψ ∈ Pure(A) the relation (3.129) must hold. This
completes the proof.
Exercise 3.4.20. Show that for every quantum channel E ∈ CPTP(A → B) there exists an
operator-sum representation with no more than |AB| elements.
0 = ⟨ψxAB |ψxAB
′ ⟩ = ⟨Ω
AÃ
|Mx∗ Mx′ ⊗ I Ã |ΩAÃ ⟩ = Tr [Mx∗ Mx′ ] . (3.134)
That is, the Kraus operators are also orthogonal in the Hilbert-Schmidt inner product. We
therefore arrived at the following corollary.
In particular, there are always operator sum representations with no more than |AB| Kraus
operators.
Exercise 3.4.21. Let {Mx }x∈[m] be a canonical Kraus decomposition of E ∈ CPTP(A → B).
Show that for any m × m unitary matrix U = (uyx ), also {Ny }y∈[n] with
X
Ny := uyx Mx , (3.136)
x∈[m]
U ρA ⊗ |0⟩⟨0|E U ∗ ; U ∗ U = I AE .
(3.137)
Finally, the environment system is traced out yielding the final state
We show now that every quantum channel can be realized in this way, giving a new
interpretation for quantum channels as joint unitary evolutions on the system plus environ-
ment. Recall that typically, the degrees of freedom of the environment are not accessible
and therefore they are traced out at the end of the process.
E(ρA ) := TrE V ρA V ∗
∀ ρ ∈ L(A) . (3.139)
Remark. The theorem above is an adaptation of Stinespring Dilation Theorem to the finite
dimensional case.
Proof. Suppose E has the form (3.139). To show that E is a quantum channel we denote by
{|ϕE E
z ⟩}z∈[k] an orthonormal basis of E, where k := |E|, and by Mz := ⟨ϕz |V . By definition,
for every z ∈ [k], Mz : A → B, and from (3.139) we get
X X
A ∗ E
E(ρA ) = ⟨ϕE
z |V ρ V |ϕz ⟩ = Mz ρMz∗ . (3.140)
z∈[k] z∈[k]
Hence, there exists an isometry V : A → BE such that (3.139) holds. This completes the
proof.
Exercise 3.4.22. Consider the isometry V : A → BE as expressed in (3.142), where
each Mz : A → B. Let {|x⟩A }x∈[m] be an orthonormal basis of A, and {|ψxBE ⟩}x∈[m] be an
orthonormal set of vectors in BE, such that
X
V = |ψxBE ⟩A⟨x| . (3.145)
x∈[m]
X
Mz = |ϕB A
z|x ⟩ ⟨x| . (3.147)
x∈[m]
Exercise 3.4.23. Show that a linear map E ∈ L(A → A) is a quantum channel if and only
if there exists an environment system E and a unitary matrix U : AE → AE such that E
has the form (3.138). Hint: Complete the isometry V in the above theorem into a unitary
operator.
Note that in the proof above we defined the isometry V in (3.142) using the Kraus
operators. Therefore, Eq. (3.142) provides a direct relationship between the Stinespring
representation and the operator sum representation. Moreover, the operator sum represen-
tation is directly related to the Choi representation via the relationship in Eqs.(3.121,3.122).
Therefore, together with (3.142) we can establish a direct relationship among all three rep-
resentations. We will use these relationships quite often in the next sections.
W = IB ⊗ U E V .
(3.148)
Therefore, {Nz }z∈[k] also forms an operator sum representation of E. Observe in particular
that the property that E is trace preserving implies that
X
IA = Nz∗ Nz
z∈[k]
X
Nz := ⟨ϕE
z |W −−−−→ = W ∗ |ϕE E
z ⟩⟨ϕz |W (3.151)
z∈[k]
−−−−→ = W ∗ W .
X
ϕE
z =I
E
z∈[k]
Thus, W is an isometry. Given that both {Mz }z∈[k] and {Nz }z∈[k] are operator-sum repre-
sentations of E, Theorem 3.4.5 implies the existence of a k × k unitary matrix V = (vzw ),
such that for every z ∈ [k] X
Nz = uzw Mw . (3.152)
w∈[k]
Thus, we conclude
X
W = IB ⊗ U E Mw ⊗ |ϕE
w⟩
w∈[k] (3.155)
B E
(3.142)→ = I ⊗ U V .
This concludes the proof.
Note that this channel has the unique property that for any unitary matrix U ∈ L(C2 )
Exercise 3.5.1. Show that the depolarizing channel is indeed a quantum channel, by showing
that it has an operator sum representation that is given in terms of the following four Kraus
operators: r √
3p p
M0 = 1 − I and for j ∈ [3] , Mj = σj , (3.160)
4 2
where σ1 , σ2 , and σ3 , are the three Pauli matrices.
From the exercise above it also follows that the normalized version of the Choi matrix of
the depolarizing channel has the form
1 AB 3p
p AB
Ã→B AÃ
ΦAB Φ− + ΨAB AB
JE = E Φ+ = 1 − + + + + Ψ− . (3.161)
2 4 4
The state above is known (up to local unitary) as the 2-qubit isotropic state and is used
quite often in quantum information as it has several interesting properties. We will discuss
it in more details later on.
The Stinespring isometry of the depolarizing channel can also be computed from (3.142)
and the exercise above; it is given by
3 r √ X
X
E 3p p
V = Mj ⊗ |j⟩ = 1 − I ⊗ |0⟩ + σj ⊗ |j⟩ (3.162)
j=0
4 2
j∈[3]
where σ1 , σ2 , and σ3 , are the three Pauli matrices. Interestingly, the following exercise shows
that the Bloch representation is in some sense the simplest representation of the depolarizing
channel.
Exercise 3.5.2. Let ρ = 12 (I + r · σ) and ρ′ = 12 (I + r′ · σ) be two Bloch representations of
two quantum states, and let E be the depolarizing channel (3.158). Show that if ρ′ = E(ρ)
then r′ = (1 − p)r.
From the above representation of ∆, it is clear that the completely dephasing map is idem-
potent, that is, it satisfies
∆2 := ∆ ◦ ∆ = ∆ . (3.164)
Its Choi matrix is X
J∆ = ∆Ã→à (ΩAà ) = |x⟩⟨x| ⊗ |x⟩⟨x| . (3.165)
x∈[m]
where {|ϕE E
x ⟩}x∈[m] are some normalized vectors in E (note that if {|ϕx ⟩} is an or-
thonormal set then N = ∆).
3. Show that
∆◦N =N ◦∆=∆. (3.169)
∆B ◦ E ◦ ∆A = E , (3.171)
In other words, we get that q = T p as in (3.170), with p and q being the vectors whose
components are the diagonal elements of ρ and σ, respectively, and T = (ty|x ) being the
column stochastic matrix whose components are ⟨y|E(|x⟩⟨x|)|y⟩.
Exercise 3.5.4. Let {ρz }z∈[k] and {σz }z∈[k] be two sets of k diagonal density operators with
respect to two fixed bases {|x⟩}x∈[m] and {|y⟩}y∈[n] of A and B, respectively. Show that there
exist a quantum channel E ∈ CPTP(A → B) such that σz = E(ρz ) for all z ∈ [k] if and only
if there exists a classical channel with the same property.
Note that
⟨y|E(ρ)|y⟩ = Tr [|y⟩⟨y|E(ρ)] = Tr [E ∗ (|y⟩⟨y|)ρ] := Tr [Λy ρ] (3.176)
where {Λy := E ∗ (|y⟩⟨y|)}y∈[n] are positive semidefinite matrices in Pos(A) satisfying
X X
Λy = E ∗ (|y⟩⟨y|) = E ∗ (I B ) = I A , (3.177)
y∈[n] y∈[n]
since the dual map of any CPTP map is unital (see Exercise 3.4.4). We therefore get that a
POVM channel has the form
X A A
E A→B ρA = Tr Λy ρ |y⟩⟨y|B . (3.178)
y∈[n]
where m := |A|, px := ⟨x|ρ|x⟩ for all x ∈ [m], and each σx := E(|x⟩⟨x|) is a fixed density
matrix in D(B). We can therefore view E as the mapping x 7→ σx . The Choi matrix of a
cq-channel has the form
JEAB = E Ã→B ΩAà = E Ã→B ◦ ∆Ã→à ΩAÃ
X
Ã→B AÃ
=E |xx⟩⟨xx|
x∈[m] (3.185)
X
= |x⟩⟨x|A ⊗ σxB .
x∈[m]
Therefore, E ∈ CPTP(A → B) is a cq-channel if and only if its Choi matrix JEAB is a cq-state.
Exercise 3.5.6. Let E ∈ CPTP(A → B) be a cq-channel as above, and set m := |A| and
n := |B|.
where {Λz }z∈[k] ⊂ Pos(A) is a POVM with m outcomes, and {σzB }z∈[k] are k quantum states
in D(B). Clearly, the map E above is linear and trace preserving. Moreover, setting m := |A|
we get that its Choi matrix is given by
X
JEAB = E Ã→B (ΩAÃ ) = |x⟩⟨x′ | ⊗ E(|x⟩⟨x′ |)
x,x′ ∈[m]
X X
= Tr [Λz |x⟩⟨x′ |] |x⟩⟨x′ | ⊗ σzB (3.190)
z∈[m] x,x′ ∈[m]
X
= ΛTz ⊗ σzB ⩾ 0 .
z∈[k]
Therefore, E, in (3.189) is a quantum channel. Observe that the Choi matrix above is
separable.
for some POVM {Λz }z∈[k] in Pos(A), and a classical system Z of dimension |Z| = k. Let
P ∈ CPTP(Z → B) be a preparation channel given as in (3.184) via
where σz ∈ D(B) are some density matrices. Then, the measurement-prepare channel E
given in (3.189) can be expressed as
Exercise 3.5.7. Let A, B and R, be three Hilbert spaces and let ∆ ∈ CPTP(R → R) be
the completely dephasing map with respect to some fixed basis of R. Show that for any two
quantum channels E ∈ CPTP(A → R) and F ∈ CPTP(R → B) the channel
is separable. Hint: Use (3.193) and consider the cq-state τ ZB := MA→Z (ρAB ).
The above exercise demonstrates that a measurement-prepare channel breaks the entan-
glement when applied to a subsystem of a composite bipartite system. Channels with this
property are called entanglement breaking channels. Therefore, measurement-prepare chan-
nels are entanglement breaking. It turns out that the converse is also true! That is, any
entanglement breaking channel can be represented as a measurement-prepare channel. For
this reason, we will use, depending on the context, both terms interchangeably.
Exercise 3.5.9. Show that any entanglement breaking channel is a measurement-prepare
channel. Hint: start by observing that the Choi matrix of any entanglement breaking channel
must be separable.
Exercise 3.5.10. Let E A→B be the quantum channel defined in (3.189). Show that its dual
is given by X
E ∗B→A η B = Tr η B σzB ΛA
z . (3.196)
z∈[m]
Exercise 3.5.11. Let A and B be two Hilbert spaces, and let {|y⟩B }y∈[n] be an orthonormal
basis of B (with n := |B|). Show that the set of operators {My }y∈[n] ⊂ L(AB, A) given by
My = I A ⊗ ⟨y|B , (3.199)
V ρA = V ρA V ∗
∀ ρ ∈ L(A) . (3.200)
Such an isometry channel can be viewed as an embedding of system A into B. Note that
like unitary channels, isometry channels have an operator sum representation with a single
Kraus operator.
Inrestingly, isometry channels have inverses. Specifically, for any τ ∈ D(A) define
Vτ−1 σ B := V ∗ σ B V + Tr I B − V V ∗ σ B τ A
∀ σ ∈ L(B) . (3.201)
The linear map above is a quantum channel in CPTP(B → A) and it is an inverse of the
isometry channel V above (see the following Exercise).
Exercise 3.5.12. Show that for all τ ∈ D(A) the linear map Vτ−1 as defined above is a
channel in CPTP(B → A) that satisfies
Let D = (dxy ) be the m × m matrix whose components are dyx := ⟨ϕy |E(|ψx ⟩⟨ψx |)|ϕy ⟩ , and
note that dyx ⩾ 0 for all x and y. Moreover,
X
dyx = ⟨ϕy |E(I)|ϕy ⟩ = ⟨ϕy |I|ϕy ⟩ = 1 and
x∈[m]
X (3.205)
dyx = Tr [E(|ψx ⟩⟨ψx |)] = Tr [|ψx ⟩⟨ψx |] = 1 .
y∈[m]
Hence, D is a doubly-stochastic matrix and (3.204) becomes q = Dp, where p and q are
the probability vectors consisting of the eigenvalues of ρ and σ, respectively.
where {Uw }w∈[k] is a set of k unitary matrices, and {tw }w∈[k] is a probability distribution.
This is the quantum version of a convex combination of permutation matrices. One can
implement such a quantum channel, for example, by rolling a dice and based on the outcome
w of the dice apply the evolution ρ 7→ Uw ρUw∗ . After forgetting the value of w, such a process
can be described by the equation above.
Exercise 3.5.14. Find the operator sum representation of the mixed-unitary channel (3.207).
One may wonder whether all unital channels can be expressed as mixed-unitary channels.
To answer this question, consider the following example given by Peter Shor (2010). Let
A = C3 and E ∈ CPTP(A → A) be the quantum channel
E(ω) = M1 ωM1∗ + M2 ωM2∗ + M3 ωM3∗ ∀ ω ∈ D(A) , (3.208)
where the Kraus operators
|z⟩⟨z + 1| + |z + 1⟩⟨z|
Mz := √ ∀ z ∈ [3] , (3.209)
2
with |4⟩ := |1⟩ (i.e. the summation in |z + 1⟩ is modulo 3).
Exercise 3.5.15. Show that the quantum channel E ∈ CPTP(A → A) as defined in (3.208)
and (3.209) is unital.
The unital channel E as defined in (3.208) and (3.208) is not a mixed-unitary channel.
To see√this, suppose by contradiction that E can be expressed as in (3.207). Then, {Mz }z∈[3]
and { tw Uw }w∈[k] are operator sum representations of the same channel E. Therefore, there
exists an m × 3 isometry channel V = (vwz ) such that
√ X
tw Uw = vwz Mz . (3.210)
z∈[3]
Now, taking all summations to be modulo 3, we have by definition that Mz Mz+1 = |z⟩⟨z + 2|
and Mz Mz+2 = |z + 1⟩⟨z + 2| (by definition Mz = Mz∗ ). Hence, the equation above implies
that
v̄wz vw(z+1) = v̄z vw(z+2) = 0 ∀ z ∈ [3] . (3.212)
In other words, we must have vwz′ vwz = 0 for all z ̸= z ′ ∈ [3]. This, in turn, implies that
vwz = 0 for at least two values of z ∈ [3]. Hence, the relation (3.210) implies that Uz is
proportional to one of the three matrices {Mz }z∈[3] , in contradiction with the fact that all
{Mz }z∈[3] have rank two, whereas Uw has a full rank. Therefore, the channel E is not a
mixed-unitary channel.
From the exmple above it follows that there is no quantum analogue to Birkhoff theorem,
as there are unital channels that are not mix-unitary channels. What is the distinction
between unital channels and mix-unitary channels in the Choi representation? Consider a
unital quantum channel E ∈ CPTP(A → A), and let
where m := |A|, and U is some unitary matrix. Now, observe that the Choi matrix of the
mix-unitary channel (3.207) is given by
X X
JEAÃ = E Ã→Ã (ΩAÃ ) = tw (I A ⊗ Uw )ΩAÃ (I A ⊗ Uw∗ ) = m tw |ϕA Ã AÃ
w ⟩⟨ϕw | , (3.216)
w∈[k] w∈[k]
where each
1 A
|ϕA
w
Ã
⟩ := √ I ⊗ U Ã
w |Ω
AÃ
⟩, (3.217)
m
is a maximally entangled states.
2. Show that every mixed-unitary channel, such as in (3.207), with rational probabilities
{tw }w∈[k] , can be expressed as in (3.218). That is, there exists a system B and a joint
unitary matrix U AB such that the expression for E(ρ) in (3.218) becomes (3.207).
3. Determine if the Shor example above can be expressed in the form (3.218).
where ∆ ∈ CPTP(X → X) is the completely dephasing channel with respect to the classical
basis of X (see Fig. 3.220).
(3.220)
X
|x⟩⟨x|X ⊗ NxA→B ω A ,
=
x∈[m]
where for each x ∈ [m] we define the linear map Nx ∈ L(A → B) via
Observe further that N A→B is a quantum channel since it can be expressed as a combination
of the two quantum channels
TrX and N A→BX . Moreover, each NxA→B is a CP map. Indeed,
let JNAXB = N Ã→BX ΩAÃ be the Choi matrix of the quantum instrument N , then the Choi
matrix of Nx is given by
JNAB
A X B
AXB
x
= Tr X I ⊗ |x⟩⟨x| ⊗ I JN , (3.223)
which is positive semidefinite since JNAXB ⩾ 0. We therefore conclude that {NxA→B }x∈[m] are
trace non-increasing CP maps that sums up to a CPTP map.
Mx∗ Mx ⩽ I .
P
1. Show that any operator sum representation {Mx }x∈[m] of E satisfies x∈[m]
2. Show that the marginal of the Choi matrix JEAB satisfies JEA := TrB JEAB ⩽ I A .
Exercise 3.5.18. Find an operator sum representation of the quantum instrument N A→XB
discussed above.
As an example, let N A→B be the qubit depolarizing channel with |A| = |B| = 2. From
it’s Stinespring isometry
q in (3.162) we can get its complementary channel. For this purpose,
√
p
we denote by t0 := 1 − 3p 4
and t1 = t2 = t3 = 2
, and by {σj }3j=0 the four Pauli matrices
(including σ0 := I). We therefore get
3
A ∗ X
NcA→E (ρA ) = TrB Vρ V = tj tk Tr [σj σk ρ] |j⟩⟨k|E ∀ ρ ∈ L(A) . (3.226)
j,k=0
where for each x ∈ [m], Px is the projector the eigenspace of λx . Note that PH is indeed
a quantum channel since {Px }x∈[m] form an orthogonal set of projectors that sum to the
identity. Moreover, if m = |A| (i.e. H has |A| distinct eigenvalues) then PH = ∆, where ∆
is the completely dephasing quantum channel in the basis comprising of the eigenvectors of
H.
Exercise 3.5.20. Let H ∈ Herm(A) and ρ ∈ D(A). Show that ρ = PH (ρ) if and only if
[ρ, H] = 0.
P 3.5.21. Let A be a quantum system and ρ, σ ∈ D(A) be two density matrices, with
Exercise
σ = y∈[n] λy Πy , where {Πy }y∈[n] form an orthogonal projective von-Neumann measurement
on system A, and {λy }y∈[n] is the set of distinct eigenvalues of σ. For each y ∈ [n], let
my := Tr[Πy ] be the multiplicity of the eigenvalue λy .
2. Show that there exists an orthonormal basis of A, denoted by {|ϕxy ⟩}x,y , with the prop-
erty that X X X
Πy = ϕxy and Pσ (ρ) = rxy ϕxy (3.229)
x∈[my ] y∈[n] x∈[my ]
P P
for some rxy ⩾ 0 with y∈[n] x∈[my ] rxy = 1.
4. Let ∆ ∈ CPTP(A → A) be the completely dephasing channel in the basis {|ϕxy ⟩}x,y .
Show that
∆(ρ) = Pσ (ρ) and ∆(σ) = σ . (3.231)
Exercise 3.5.22. Let A be a quantum system and ρ, σ ∈ D(A) be two density matrices
satisfying ρ ̸≪ σ (i.e. supp(ρ) ̸⊆ supp(σ)). Show that also
Pσ (ρ) ̸≪ σ . (3.232)
The following exercise demonstrate the pinching channel is a special type of mixture of
unitaries.
Exercise 3.5.23. Let H ∈ Herm(A) be an observable with spec(H) = {λ1 , . . . , λk } as above,
and let PH be its associated pinching channel as given in (3.227). Show that for any ρ ∈ L(A)
1 X X 2πxy
PH (ρ) = Uy ρUy∗ where Uy := ei m Px . (3.233)
m
y∈[m] x∈[m]
The above exercise demonstrate the pinching channel is a mixed unitary channel. Observe
also that for y = m we have Um = Im so that
1 1 X 1
PH (ρ) = ρ + Uy ρUy∗ ⩾ ρ . (3.234)
m m m
y∈[m−1]
Since m := |spec(H)| we get the following inequality known as the pinching inequality.
Since we got the inequality above by removing m−1 terms, it may give the impression that
this inequality is not very useful since it is never saturated and not tight. However, we will
see that in applications, this inequality can be used to provide good enough approximation
to PH (ρ) when we consider the asymptotic case in which H = σ ⊗n where σ is some quantum
state and n is a very large integer. In particular, we will see in Chapter 8 (particularly
Sec. 8.4.1) that in this case m = |spec(σ ⊗n )| grows polynomially with n. The fact that it is
not an exponential growth with n is one of the key reasons why the pinching inequality is
quite useful.
The pinching map can also be used to prove the reverse Hölder inequality with p ∈ (0, 1).
1
In this case, we still use the notation ∥M ∥p := (Tr [|M |p ]) p for any M ∈ L(A). However,
one has to be careful since for p ∈ (0, 1), ∥ · ∥p is not a norm.
∥τ ∥p
Tr[ωτ ] ⩾ (3.236)
∥ω −1 ∥ p
1−p
Proof. We first prove the theorem for the case that τ and ω commutes. In this case,
∥τ ∥pp = Tr[τ p ] = Tr τ p ω p ω −p = τ p ω p ω −p 1
= (Tr[τ ω])p ω −p 1 .
1−p
This prove the theorem for the case that ω and τ commutes. On the other hand, from
Exericse 3.5.21 we get that Pω (τ ) and ω commutes so that
∥Pω (τ )∥p
Tr [τ ω] = Tr [Pω (τ )ω] ⩾ (3.238)
∥ω −1 ∥ p
1−p
Finally, since t 7→ tp is operator concave for p ∈ (0, 1) (see Tabel B.1) we conclude that
∥Pω (τ )∥pp = Tr [(Pω (τ ))p ] ⩾ Tr [Pω (τ p )] = Tr[τ p ] . (3.239)
That is, ∥Pω (τ )∥p ⩾ ∥τ ∥p . Substituting this into (3.238) completes the proof.
Exercise 3.5.24 (Reverse Young’s Inequality). Let A and B be two Hilbert spaces, M, N ∈
Pos(A), p ∈ (0, 1), and q defined via p1 + 1q = 1 (hence q < 0). Use the reverse Hölder
inequality of the Schatten norm to show that
1 1
Tr[M N ] ⩾ Tr[M p ] + Tr[N q ] . (3.240)
p q
with equality if and only if M p = N q . Hint: Take the logarithm on both sides of the reverse
Hölder inequality and use the concavity property of the logarithm.
where dU is the Haar measure of the unitary group U(A). In other words, ρ is “twirled”
over all unitary matrices U ∈ U(A).
Exercise 3.5.25. Consider ρ ∈ L(A) and σ := G(ρ), where G is the channel defined
in (3.241). Demonstrate that the commutator [U, σ] = 0. Hint: Utilize the properties of
the Haar measure described in Sec. C.4 of the appendix.
The exercise above illustrates that the channel defined in (3.241) is not unfamiliar to us.
In fact, it can be represented as the replacement channel
Z
G(ρ) := dU U ρU ∗ = Tr[ρ]uA , (3.242)
U(A)
where uA = I A /|A| is the maximally mixed state. To understand why, remember from the
previous exercise that σ := G(ρ) commutes with all unitary matrices in U(A) and, thus,
must be proportional to the identity matrix. We now consider a more complex example with
numerous applications in quantum information science.
Let B be a replica of A, and set m := |A| = |B|. Consider the twirling map G ∈
CPTP(AB → AB), defined for all ρ ∈ L(AB) as:
Z
AB
G(ρ ) := dU (U ⊗ U )ρAB (U ⊗ U )∗ . (3.243)
U(m)
As in the previous example, for every ρ ∈ L(AB), the matrix σ AB := G(ρAB ) commutes with
U ⊗ U for all U ∈ U(A). The ensuing question is: which matrices σ AB commute with all
matrices of the form U ⊗ U , where U is a unitary matrix? Clearly, any matrix proportional
to the identity matrix satisfies this criterion. However, there exists another type of operator
that fulfills this property, known as the swap (or flip) operator:
X
F AB := |x⟩⟨y|A ⊗ |y⟩⟨x|B . (3.244)
x,y∈[m]
U ⊗ U, F AB = 0
∀ U ∈ U(m) . (3.245)
Our analysis in Sec. C.10.2 indicates that any operator commuting with all matrices in
the set {U ⊗ U }U ∈U(m) can be expressed as a linear combination of the identity and swap
operators. Consequently, G(ρAB ) = aI AB + bF AB for some a, b ∈ R, determinable from
the requirement that G is CPTP (see the following exercise). Furthermore, in Sec. C.10.2 of
Appendix C, we demonstrate that the representation U 7→ U ⊗U decomposes into two irreps,
corresponding to the symmetric and antisymmetric subspaces, each with a multiplicity of
one. Hence, from Theorem C.3.3 in Appendix C, it follows that for all ρ ∈ L(AB),
ΠSym ΠAsy
G(ρ) = Tr [ρΠSym ] + Tr [ρΠAsy ] (3.246)
Tr [ΠSym ] Tr [ΠAsy ]
where ΠSym := 12 (I AB + F AB ) and ΠAsy := 21 (I AB − F AB ) are the projections onto the
symmetric and antisymmetric subspaces of AB (see(C.186) and (C.188)). When ρ ∈ D(AB)
is a density matrix, the right-hand side above represents a density matrix known as the
Werner quantum state.
Exercise 3.5.27. Let F AB be the swap operator, with m := |A| = |B|.
1. Starting with G(ρAB ) = aI AB + bF AB for some a, b ∈ R, prove that G is CPTP if and
only if for all ρ ∈ L(AB),
AB
mTr[ρAB ] − Tr ρAB F AB AB mTr ρAB F AB − Tr[ρAB ] AB
G ρ = I + F . (3.247)
m(m2 − 1) m(m2 − 1)
2. Derive (3.247) from (3.246) by expressing the projections onto the symmetric and an-
tisymmetric subspaces in terms of the swap operator.
Exercise 3.5.28. Show that for all M ∈ L(A) and B ∼ = A we have
Tr M = Tr (M ⊗ M ) F AB .
2
(3.248)
where U := (U ∗ )T .
1. Show that
G(ΦAÃ ) = ΦAÃ , (3.252)
where Φ ∈ D(AÃ) is the maximally entangled state.
I AB − ΦAB
τ AB := . (3.254)
m2 − 1
1. Show that for all ρ ∈ D(AB)
2. Show that E = E ◦ G if and only if there exists ω1 , ω2 ∈ Eff(AB) such that for all
ρ ∈ D(AB)
3. Show that E = G ◦E if and only if there exists Λ ∈ Eff(AB) such that for all ρ ∈ D(AB)
Exercise 3.5.32. Let A be a Hilbert space of dimension m := |A|, and consider the channel
given (3.241) Show that for all ρ ∈ L(A)
1 X ∗
E(ρ) = Wp,q ρWp,q , (3.258)
m2
p,q∈[m]
where Wp,q are the are the Hiesenberg-Weyl operators defined in (C.35).
intensively and more details and references about them can be found in the review article
by [186].
Gleason’s theory was proved originally for projective von-Neumann measurements in [85].
The proof in this case holds only in dimension greater than two, and there are counter
examples in the qubit case. The version of Gleason’s theorem that we considered here is
due to [41]. In this case the proof is much simpler (and holds in all dimensions) since effects
replace orthogonal projections (and effects are simpler to work with). Gleason’s theorem
is of particular importance in the foundations of quantum physics as well as the field of
quantum logic in its effort to minimize the number of axioms needed to formulate quantum
mechanics. It also closes the bridge between some of the axioms of quantum mechanics and
Born’s rule. More details, references, and History of Gleason’s theorem can be found in a
Wikipedia article entitled “Gleason’s theorem”.
Naimark’s and Stinspring’s dilation theorems are results from operator theory and are
valid also in infinite dimensional Hilbert spaces. Here we only studied their adaptation to
the finite dimensional case. More details on their infinite dimensional version can be found
in many books on operator theory; e.g. the book by [176].
The Størmer-Woronowicz theorem was first proved for the qubit-to-qubit case by [205]
and later on for the qubit-to-qutrit case by [236]. Counter examples exists in higher dimen-
sions so these dimensions are optimal. Both proofs involve somewhat complicated calcula-
tions, and the simplified proof of the Størmer case presented here is due to [7]. It is an open
problem to find a simpler proof of Woronowicz theorem.
More information on the pinching channel and its properties can be found in the book
by [208].
151
CHAPTER 4
Majorization
A pre-order is a binary relation between objects that is reflexive and transitive. For example,
consider the inclusion relation ⊇ between subsets of [n] := {1, . . . , n} for some n ∈ N. Then,
⊇ is reflexive since for any subset A of [n] we have A ⊇ A. The relation ⊇ is transitive
since for any three subsets A, B, C ⊆ [n] with A ⊇ B and B ⊇ C it follows that A ⊇ C.
Furthermore, the relation ⊇ has yet another property known as symmetry. That is, if
A, B ⊆ [n] satisfies both A ⊇ B and B ⊇ A then necessarily A = B. A pre-order that
satisfies this additional symmetry property is called a partial order.
Partial orders plays a fundamental role in quantum resource theories. They typically
stem from a set of restrictions imposed on quantum operations. For example, we saw in
quantum teleportation that Alice and Bob are restricted to act locally and cannot commu-
nicate quantum particles. We will see in Chapter 12 that this restriction imposes a partial
order between two entangled states, determining if one entangled state can be converted
to another under operations that are restricted to be local. It turns out that there is one
partial order with variants that appear in many resource theories. This partial order, known
as majorization, has been studied extensively in quantum information and other fields, par-
ticularly in the field of matrix analysis, and there are several books on the topic (see section
‘Notes and References’ at the end of this chapter).
153
154 CHAPTER 4. MAJORIZATION
with probability p↓1 + p↓2 , where we denote by p↓ = (p↓1 , . . . , p↓n )T the vector obtained from p
by rearranging its components in non-increasing order. We call a game in which the player
is allowed to provide a set with k-numbers as guesses, a P k-gambling game. Note that the
highest probability to win a k-gambling game is given by x∈[k] p↓x .
Suppose now that at the beginning of each game, the player is allowed to choose between
two dice with corresponding probabilities p and q. Clearly, the player will choose the dice
that has better odds to win the game. For a k-game the player will choose the p-dice if
X X
p↓x ⩾ qx↓ . (4.1)
x∈[k] x∈[k]
If the relation above holds for all k ∈ [n], then the player will choose the p-dice for any
k-gambling game. In this case we say that p majorizes q and write p ≻ q.
Majorization
Definition 4.1.1. Let p, q ∈ Rn . We say that p majorizes q and write p ≻ q
if (4.1) holds for all k ∈ [n] with equality for k = n.
Remark. Note that in the definition above we did not assume that p and q are probability
vectors, however, in the applications we consider in this book, p and q will always be
probability vectors.
Majorization is a pre-order. That is, given three real vectors p, q, r ∈ Rn we have p ≻ p
(reflexivity), and if p ≻ q and q ≻ r then p ≻ r (transitivity). Moreover, if both p ≻ q
and q ≻ p then p and q are related by a permutation matrix. Therefore, using the notation
Prob↓ (n) to denote the subset of Prob(n) consisting of all vectors p ∈ Prob(n) with the
property that p = p↓ , we get that the majorization relation ≻ is a partial order on the set
Prob↓ (n).
Exercise 4.1.1. Show that for any n-dimensional probability vector p we have
(1, 0, . . . , 0)T ≻ p ≻ (1/n, . . . , 1/n)T . (4.2)
Exercise 4.1.2. Let p ∈ Prob(n), u(n) := (1/n, . . . , 1/n)T ∈ Prob(n) be the uniform proba-
bility vector, and t ∈ [0, 1]. Show that
p ≻ tp + (1 − t)u(n) . (4.3)
Exercise 4.1.3. Find an example of two vectors p, q ∈ Prob(3) such that p does not majorize
q, and q does not majorize p. Vectors with such a property that p ̸≻ q and q ̸≻ p are said
to be incomparable.
Exercise 4.1.4. Let n ∈ N and p ∈ Prob(n). Show that for sufficiently large m ∈ N we
have ⊗m
p ≻ u(2) , (4.4)
where u(2) := (1, 1)T is the 2-dimensional uniform distribution.
1. Show that p ≻ q if and only if Lp↓ ⩾ Lq↓ , where the inequality is entrywise.
Exercise 4.1.7. Let p, q ∈ Prob↓ (n) and suppose p ̸= q and p ≻ q. Show that the largest
integer z ∈ [n] for which pz ̸= qz must satisfy qz > pz .
Exercise 4.1.8. Let p, q ∈ Prob↓ (n) and k, m ∈ [n − 1] be such that k ⩽ m. Suppose that
q has the form
q = (a, a, . . . , a, qk+1 , . . . , qm , b, b, . . . , b )T . (4.7)
| {z } | {z }
k-times (n−m)-times
(i.e. there is no need to consider the cases ℓ < k or ℓ > m). Hint: Use the fact that p = p↓
and q = q↓ .
outcomes obtained by rolling the q-dice are more uncertain than those obtained by rolling
the p-dice.
To make this notion of uncertainty precise, consider a game of chance in which the player
is allowed to permute the symbols on the dice (for example, permute the stickers on the dice).
Clearly, such a permutation cannot change the odds in any k-gambling game. This relabeling
of the outcomes, is equivalently described by a permutation matrix P that is acting on p; i.e.
after the relabeling (permutation) of the symbols on the p-dice, the new probability vector
is given by P p.
Consider now a (somewhat unrealistic) scenario in which the player chooses to perform
the relabeling at random. For example, the player can flip an unbiased coin and if the
outcome is “head” the player does nothing to the p-dice whereas if the outcome is a “tail”
the player performs relabeling described by the permutation matrix P . Moreover, suppose
also that the player forget the outcome of the coin flipping. Hence, the player changed his
odds in winning the game since with probability 1/2 he did nothing to the p-dice and with
probability 1/2 he changed the order of the stickers on the dice. This means that now,
effectively, the player holds a q-dice with
1 1
q := p + P p . (4.9)
2 2
Since by “forgetting” the outcome of the coin flip the player cannot decrease the uncertainty
associated with the p-dice, we must conclude that the new q-dice is more uncertain than
the initial p-dice.
The relation above between the p and q can be expressed as
q = Dp , (4.10)
where D := 21 In + 21 P . More generally, if instead of an unbiased coin the player uses a random
device that produces the outcome j ∈ [m] with probability tj and a relabeling corresponding
to a permutation matrix Pj , then the matrix D can be expressed as
X
D= t j Pj . (4.11)
j∈[m]
We therefore conclude that for any such matrix and any probability vector p, the vector
q := Dp corresponds to more uncertainty than p. The matrix D above has the property
that all its components are non-negative and each row and column sums to one. Such
matrices are called doubly stochastic (see Appendix A.5).
Exercise 4.1.9. Show that the matrix D in (4.11) is doubly stochastic.
The converse of the statement of the exercise above is equally valid. Owing to Birkhoff’s
theorem (see Theorem A.5.1), we know that any n × n doubly stochastic matrix can be
expressed as a convex combination of no more than m ⩽ (n − 1)2 + 1 permutation matrices
(see Exercise A.5.3). By integrating this theorem with our prior analysis, we deduce that
q is more uncertain than p if and only if q = Dp. Shortly, we will demonstrate that the
relationship q = Dp corresponds to majorization. However, before we present this, it’s
essential to introduce the concept of a T -transform.
T -Transform
A T -transform is a special kind of a linear transformation from Rn to itself. The matrix
representation of a T -transform is an n × n matrix
where t ∈ [0, 1] and P is a permutation matrix that just exchanges two components of the
vector it acts upon. Therefore, for T as above there exist x, y ∈ [n] with x < y such that for
every p = (p1 , . . . , pn )T ∈ Prob(n) and every z ∈ [n] the z-component of the vector r := T p
is given by
p z
for z ̸∈ {x, y}
rz = tpx + (1 − t)py for z = x . (4.13)
tpy + (1 − t)px for z = y
q = T1 · · · Tm p . (4.14)
Proof. If p and q are related by a permutation matrix then the lemma follows from the
fact that any permutation matrix on n elements is a product of transposition matrices (i.e.
matrices that only exchange two elements and keep the rest unchanged). We therefore
assume now that p is not a permutation of q, and without loss of generality assume that
p = p↓ and q = q↓ .
The main idea of the proof is to construct a T -transform of the form given in (4.12) such
that the vector r := T p as given in (4.13) has the following three properties:
1. p ≻ r ≻ q.
2. rx ̸= px and ry ̸= py .
3. rx = qx or ry = qy .
From the third property at least one of the components of r is equal to one of the components
of q. Therefore, if such a T -transform exists, by a repetition of the above process, q can be
obtained from p by a finite number of such T -transforms. It is therefore left to show that a
T -transform with the above three properties exists.
Given that both p and q are probability vectors satisfying p ̸= q, it is impossible for
their components to satisfy px ⩾ qx for all x ∈ [n]. If this were the case, it would lead to
a contradiction, as the sum of the components of both p and q must equal one, implying
px = qx for all x ∈ [n]. Consequently, we define x ∈ [n] as the largest integer for which
px > qx .
Similar arguments to those discussed above imply that the reverse scenario, where px ⩽ qx
for all x ∈ [n], is also not feasible. Moreover, since p ̸= q and p ≻ q, it follows from
Exercise 4.1.7 that the largest integer z ∈ [n] for which pz ̸= qz must satisfy qz > pz .
Therefore, there exists an integer y ∈ [n] with the property that y > x and qy > py . We take
y to be the smallest integer that satisfies these two criteria.
Since x ∈ [n] is the largest integer for which px > qx we get that pw ⩽ qw for all w > x.
Similarly, since y is the smallest integer that satisfy y > x and qy > py we get that for all
x < w < y we have pw ⩽ qw . Combining these two observations we conclude that
pw = q w ∀ w ∈ [n] such that x < w < y . (4.15)
Moreover, from the definitions of x and y, along with the fact that x < y and p = p↓ and
q = q↓ , we deduce the following inequality:
px > qx ⩾ qy > py . (4.16)
Utilizing Equation (4.13), we find that rx = tpy + (1 − t)px and ry = tpx + (1 − t)py .
By choosing t ∈ (0, 1), which means t is strictly between zero and one, we ensure that the
second condition, rx ̸= px and ry ̸= py , is satisfied, given that px ̸= py . It is important to
note that p ≻ r for any T -transform (as per Exercise 4.1.10). Consequently, our remaining
task is to demonstrate the existence of a t ∈ (0, 1) such that r ≻ q, and either rx = qx or
ry = qy is true.
ε
Set ε := min{px − qx , qy − py } > 0, and define t := 1 − px −p y
. By definition, 0 < t < 1
qx −py
(due to (4.16)), and if ε = px − qx then t = px −py
so that
ε ⩽ px − qx −−−−→ ⩾ px − (px − qx ) = qx .
We therefore conclude that for both options rx ⩾ qx .
Finally, to show that r ≻ q we show that ∥r∥(k) ⩾ ∥q∥(k) for all k ∈ [n]. We show it in
three cases:
1. For 1 ⩽ k < x we have ∥r∥(k) = ∥p∥(k) ⩾ ∥q∥(k) since p ≻ q and rw = pw for w ∈ [k].
X k
X
∥r∥(k) ⩾ rw = ∥p∥(x−1) + rx + pw . (4.20)
w∈[k] w=x+1
The first term on the right-hand side, ∥p∥(x−1) , satisfies ∥p∥(x−1) ⩾ ∥q∥(x−1) since
p ≻ q. For the second term, we P have already established that rx ⩾ qx , and for the
k Pk
third term we get from (4.15) that w=x+1 pw = w=x+1 qw . Incorporating these three
relations into (4.20) yields ∥r∥(k) ⩾ ∥q∥(k) .
Exercise 4.1.11. Prove the converse of the statement presented in Lemma 4.1.1. Specif-
ically, demonstrate that for any vector p ∈ Rn and a sequence of m n × n T -transforms,
denoted as T1 , . . . , Tm , the resulting vector q := T1 · · · Tm p fulfills the condition p ≻ q. Hint:
Refer to Exercise 4.1.10 for guidance.
Characterization
n
Theorem 4.1.1. Let p, q ∈ R . The following are equivalent:
1. p ≻ q.
p − tu(n) 1
⩾ q − tu(n) 1
. (4.22)
where for all r ∈ R the notation (r)+ = r if r ⩾ 0 and otherwise (r)+ = 0. To see the
equivalence note that (r)+ = 21 (|r| + r) and “absorb” the factor 1/n into t.
q − tu(n) 1
= Dp − tu(n) 1
= Dp − tDu(n) 1
(n)
(4.24)
(2.8)→ ⩽ p − tu 1
,
where in the last inequality we used the property (2.8) of the norm ∥ · ∥1 , in conjunction
with the fact that the doubly-stochastic matrix D is particularly column stochastic.
The implication 3 ⇒ 1: Suppose (4.23) holds for all t ∈ R. Without loss of generality
suppose that p = p↓ and q = q↓ . Fix k ∈ [n − 1] and observe that for t = pk+1 the left-hand
side of (4.23) can be expressed as:
X X
(px − t)+ = (px − pk+1 )+
x∈[n] x∈[n] (4.25)
p=p ↓
−−−−→ = ∥p∥(k) − kpk+1 .
Hence, the combination of the two equations above with our assumption that (4.23) holds
for all t ∈ R, and in particular for t = pk+1 , gives ∥p∥(k) ⩾ ∥q∥(k) . Since k ∈ [n − 1] was
arbitrary we conclude that p ≻ q. This completes the proof.
Exercise 4.1.12. Show that the product of two n × n doubly stochastic matrices is itself a
doubly stochastic matrix.
q = Mp . (4.27)
The pivotal question then becomes how to define these mixing operations. Conceptually,
mixing operations are processes that increase the uncertainty of system X. In this context,
we propose that the mixing operation M can be conceptualized in three distinct manners:
1. The Axiomatic Approach: Since mixing operations can only increase the uncertainty
of system X, the uniform distribution remains invariant under mixing operations, as
its uncertainty cannot be increased further. Hence, M ∈ STOCH(n, n) can be defined
as a mixing operation if
M uX = uX . (4.28)
That is, M = D is doubly stochastic.
2. The Constructive Approach: In this approach the mixing operations are defined in-
tuitively as a convex combination of permutation matrices. Indeed, mixing a pack of
cards literally corresponds to the action of a random permutation. Therefore, in this
approach M ∈ STOCH(n, n) corresponds to a mixing operation if there exists k ∈ N
m × m permutation matrices {Pj }j∈[k] such that
X
M= s j Pj , (4.29)
j∈[k]
As we proved in the preceding subsections, all the three approaches above are equivalent,
leading to the same pre-order given in (4.1). Furthermore, the established equivalence of
these approaches solidifies the conceptual foundation of uncertainty. This, in turn, vali-
dates functions that exhibit monotonic behavior under majorization as reliable quantifiers
of uncertainty. Such measures of uncertainty are known as Schur concave functions.
As an example, consider the Shannon entropy, defined for any probability vector p ∈ Prob(n)
as X
H(p) := − px log2 (px ) . (4.34)
x∈[n]
This function is clearly symmetric under any permutation of the components of p, and it is
also concave. Therefore, the Shannon entropy is an example of a Schur concave function.
is Schur concave. Hint: Show first that log G(p) is Schur concave by showing that it is both
symmetric and concave (what is its Hessian matrix?).
From the following exercise it follows that not all Schur convex functions are symmetric
and convex. Therefore, in this sense, the notion of Schur convexity is weaker than (standard)
convexity.
Schur’s Test
Theorem 4.1.2. Let f : Prob(n) → R be a continuous function that is also
continuously differentiable on the interior of Prob(n). Then, f is Schur convex if and
only if the following two conditions hold:
Remark. Since the function f is symmetric, the condition in (4.38) is equivalent the following
condition. For all 0 < p ∈ Prob(n) and all x ̸= y ∈ [n]
∂f (p) ∂f (p)
(px − py ) − ⩾0. (4.39)
∂px ∂py
Proof. Suppose f is Schur convex. We need to show that (4.38) holds. Let p > 0 and observe
that if p1 = p2 the condition clearly holds. Therefore, without loss of generality we assume
that p1 > p2 (and recall that p2 > 0 since p > 0). Let 0 < ε < p1 − p2 and define
Since D is doubly stochastic, we get from Theorem 4.1.1 that p ≻ p̃ε . Therefore, from the
assumption that f is Schur convex we conclude that for all 0 < ε < p1 − p2
f (p) − f (p̃ε )
0⩽
ε
f (p) − f (p1 − ε, p2 , . . . , pn ) f (p1 − ε, p2 , . . . , pn ) − f (p̃ε )
= + (4.42)
ε ε
ε→0+ ∂f (p) ∂f (p)
−−−→ − .
∂p1 ∂p2
Note that if p1 = p2 then the transformation does not effect p. Therefore, without loss of
generality suppose that p1 > p2 (recall that f is symmetric, so we can exchange between p1
and p2 if necessary). Now, from (4.38) we get that
d
f (p1 − ε, p2 + ε, p3 , . . . , pn ) ⩽ 0 , (4.44)
dε
for any 0 ⩽ ε ⩽ 12 (p1 − p2 ) (note that in this domain p1 − ε ⩾ p2 + ε). We therefore conclude
that the function
g(ε) := f (p1 − ε, p2 + ε, p3 , . . . , pn ) (4.45)
is non-increasing in the domain 0 ⩽ ε ⩽ 21 (p1 − p2 ). Taking ε := (1 − t)(p1 − p2 ) we conclude
that
f (p) = g(0) ⩾ g(ε) = f (p1 − ε, p2 + ε, p3 , . . . , pn )
By definition of ε→ = f (tp1 + (1 − t)p2 , tp2 + (1 − t)p1 , p3 , . . . , pn ) (4.46)
= f (T p)
This completes the proof.
As an example, consider the family of Rényi entropies defined for any α ∈ [0, ∞] and all
p ∈ Prob(n) as
1 X
Hα (p) := log pαx , (4.47)
1−α
x∈[n]
where the cases α = 0, 1, ∞ are defined in terms of their limits. In the next chapter we
will study these functions in more details. Here we show that for all α ∈ (0, ∞) the Rényi
entropies are SchurP concave. Due to the monotonicity of the log function, it is enough to
show that fα (p) := x∈[n] pαx is Schur convex for α ∈ (0, 1) and Schur concave for α ∈ (1, ∞).
Indeed,
∂f (p) ∂f (p)
= α(p1 − p2 ) pα−1 − pα−1
(p1 − p2 ) − 1 2 (4.48)
∂p1 ∂p2
which is always non-negative for α > 1 and non-positive for α ∈ (0, 1). Hence, from the
Schur’s test it follows that Hα (p) is Schur concave for all α ∈ [0, ∞] (the cases α = 0, 1, ∞
follow from the continuity of Hα in α).
As another example, consider the elementary symmetric functions defined for each k ∈ [n]
by X
fk (p) := px1 · · · pxk ∀ p ∈ Prob(n) . (4.49)
x1 <···<xk
x1 ,...,xk ∈[n]
and for k = n we have fn (p) = p1 · · · pn . From Schur’s test it follows that the elementary
symmetric functions are Schur concave.
Exercise 4.1.15. Use Schur’s test to verify that the elementary symmetric functions are
Schur concave.
In general, partial orders don’t always have maximal and minimal elements. For example,
the set
1/2 2/5
C := 1/4 , 2/5 (4.51)
1/4
1/5
has no maximal nor minimal elements since none of the two vectors majorize the other.
On the other hand, one can define upper and lower bounds on a set of probability vectors.
Specifically, given a subset C ⊆ Prob(n)
• A vector p ∈ Prob(n) is said to be an upper bound of C if for all q ∈ C we have p ≻ q.
• A vector p ∈ Prob(n) is said to be a lower bound of C if for all q ∈ C we have q ≻ p.
Note that lower and upper bounds always exist since the vector e1 = (1, 0, . . . , 0)T is always
an upper bound and the vector u(n) is always a lower bound. Less trivial bounds are those
that are optimal: Given a subset C ⊆ Prob(n),
• An upper bound p ∈ Prob(n) of C is said to be optimal if for any other upper bound
p′ ∈ Prob(n) of C we have p′ ≻ p.
• A lower bound p ∈ Prob(n) of C is said to be optimal if for any other lower bound
p′ ∈ Prob(n) of C we have p ≻ p′ .
Exercise 4.2.1. Let
C := p1 , . . . , pm ⊂ Prob(n) . (4.52)
be a set consisting of m probability vectors, and for each z ∈ [n] denote by
sz := max ∥py ∥(z) (4.53)
y∈[m]
With this index k we define the steepest ε-approximation of p, denoted by p(ε) , whose
components are
p1 + ε if x = 1
p
x if x ∈ {2, . . . , k}
p(ε)
x := . (4.57)
1 − ε − ∥p∥ (k) if x = k + 1
0 otherwise
Note that from its definition above, p(ε) is indeed a probability vector whose components
are arranged in non-increasing order.
Exercise 4.2.3. Utilize the definition of k as provided in (4.56) and the definition of p(ε)
as outlined in (4.57) to demonstrate the following two properties:
(ε) (ε)
1. The components of p(ε) are arranged in non-increasing order. In particular, pk > pk+1 .
p(ε)
x ⩽ px ∀ x ∈ {2, . . . , n} . (4.58)
(ε)
In particular, pk+1 < pk+1 .
The intuition behind the definition above is that we want to alter p in a way that it
becomes more similar to e1 . However, since p(ε) must be close to p we cannot increase p1
by too much. Indeed, the vector p(ε) as defined above is ε-close to p. To see this, observe
that from its definition
1 (ε) X
p(ε)
p −p 1 = x − px +
2 (4.59)
x∈[n]
(4.58)→ = p1 + ε − p1 = ε .
Therefore, the vector p(ε) is indeed in Bε (p).
Theorem 4.2.1. Let p ∈ Prob↓ (n) be such that 12 ∥p − e1 ∥1 > ε. Then, the vector
p(ε) as defined in (4.57) is the maximal element (under majorization) of Bε (p).
Proof. Since we already showed that p(ε) ∈ Bε (p) it is left to show that for any q ∈ Bε (p)
we have p(ε) ≻ q. Indeed, since q ∈ Bε (p) it follows from (2.83) that for every ℓ ∈ [n]
1
∥q∥(ℓ) − ∥p∥(ℓ) ⩽ ∥q − p∥ ⩽ ε . (4.60)
2
Therefore, for ℓ ∈ [k] we get
∥q∥(ℓ) ⩽ ∥p∥(ℓ) + ε
(4.61)
(4.57)→ = ∥p(ε) ∥(ℓ) .
Combining this with the fact that for k + 1 ⩽ ℓ ⩽ n, ∥p(ε) ∥(ℓ) = 1, we conclude that p(ε) ≻ q.
This completes the proof.
Exercise 4.2.4. Let m, n ∈ N be such that m < n, and let p ∈ Prob↓ (n) be such that
u(m) ̸≻ p. Show that a minimal element of the set
Cp,m := q′ ∈ Prob(m) : q′ ≻ p
(4.62)
One can use the steepest ε-approximation to compute the distance of a vector p ∈
Prob(n) to the set of all vectors r ∈ Prob(n) that majorizes q. Specifically, let
denotes the set of all vectors in Prob(n) that majorize q, and define the distance between
p ∈ Prob(n) and the set Majo(q) as:
1
T p, Majo(q) := min ∥p − r∥1 . (4.66)
r∈Majo(q) 2
Theorem 4.2.2. Using the same notations as above, for all p, q ∈ Prob(n)
T p, Majo(q) = max ∥q∥(ℓ) − ∥p∥(ℓ) . (4.67)
ℓ∈[n]
Proof. Without loss of generality we will assume that p, q ∈ Prob↓ (n). For any ε ∈ (0, 1),
let p(ε) be the steepest ε-approximation of p; see (4.57). Observe that by definition, for any
m ∈ [d] we have p(ε) (m) ⩽ ∥p∥(m) +ε with equality if m ∈ [k]. In Theorem 4.2.1 we showed
that p(ε) is the maximal element of Bε (p) as long as ε < 12 ∥p − e1 ∥1 (otherwise, e1 is the
maximal element). Hence,
n1 o
T p, Majo(q) := min ∥p − r∥1 : r ≻ q , r ∈ Prob(n)
n2 o
= min ε ∈ [0, 1] : r ≻ q , r ∈ Bε (p) (4.68)
n o
(ε)
p(ε) ≻ r ∀ r ∈ Bε (p) −−−−→ = min ε ∈ [0, 1] : p ≻q .
That is, it is left to compute the smallest ε that satisfy p(ε) ≻ q. We will show that this
smallest ε equals
δ := max ∥q∥(ℓ) − ∥p∥(ℓ) . (4.69)
ℓ∈[n]
We first show that p(δ) ≻ q. Let k be the integer satisfying (4.56) but with δ replacing ε.
Then, from the definition of p(δ) it follows that for m > k we have
p(δ) (m)
= 1 ⩾ ∥q∥(m) . (4.70)
Moreover, for m ∈ [k] the definition in (4.100) gives δ ⩾ ∥q∥(m) − ∥p∥(m) so that
p(δ) (m)
= ∥p∥(m) + δ ⩾ ∥q∥(m) . (4.71)
Hence, p(δ) ≻ q. To prove the optimality of δ, we show that for any 0 < δ ′ < δ we must
′
have p(δ ) ̸≻ q. Indeed, since δ ′ < δ there exists m ∈ [d] such that
will focus on the scenario where u(n) ̸∈ Bε (p). In this context, the parameter ε satisfies the
following condition:
1
0<ε< p − u(n) 1 . (4.74)
2
Additionally, we will assume that the components of the vector p are sorted in a non-
increasing order; i.e., p = p↓ .
Exercise 4.2.5. Let ε ∈ (0, 1), p ∈ Prob↓ (n), and ℓ ∈ [n] be the integer satisfying pℓ ⩾ 1
n
>
pℓ+1 . Show that the inequality in (4.74) holds if and only if
ℓ
ε < ∥p∥(ℓ) − . (4.75)
n
1
p − u(n)
P
Hint: Start by expressing 2 1
as x∈[n] px − 1/n + .
The minimal element of Bε (p) can be found by “flattening” the tip of p (i.e. first few
components of p) and tail of p (i.e. the last few components of p). The intuition behind
this idea is to alter the vector p so that it becomes more similar to the uniform distribution
u(n) . This process involves replacing the first k components of p with a constant a, and
substituting the last n−m components with another constant b. We denote by p(ε) ∈ Prob(n)
the resulting vector. Its components are given by
a if x ∈ [k]
(ε)
px := px if k < x ⩽ m . (4.76)
b if x ∈ {m + 1, . . . , n}
The objective is to select suitable values for a, b, k, and m, ensuring that p(ε) forms the
flattest ε-approximation of p.
To find the coefficients a, b, k, m we outline the properties that p(ε) has to satisfy:
1. The vector p(ε) is a probability vector in Prob(n). Since all of its components
Pm are non-
negative, we just need to require that they sum to one. Using the relation x=k+1 px =
∥p∥(m) − ∥p∥(k) we get that the coefficients a, b, k, m must satisfy
X
1= p(ε)
x
= ka + ∥p∥(m) − ∥p∥(k) + (n − m)b . (4.77)
x∈[n]
2. The vector p(ε) ∈ Prob↓ (n); i.e. its components are arranged in non-decreasing order.
Since p = p↓ it is sufficient to require that a > pk+1 and b < pm (these inequalities are
strict since we want k and m to mark the indices in (4.76) in which that the “flattening”
process ends and begins, respectively). We therefore conclude that
3. The vector p(ε) ∈ B(ε) (p). Moreover, since p(ε) is an optimal vector, we would expect
it to be ε-close (and not δ-close with δ < ε) to p. Therefore, we require that
1 X
ε= p − p(ε) 1
= px − p(ε)
x
2 +
x∈[n]
(4.79)
X
a∈(pk+1 ,pk ]
b∈[pm+1 ,pm )
−−−−→ = (px − a)
x∈[k]
= ∥p∥(k) − ka .
In addition to the three conditions above, we need to require that p(ε) is the minimal element
of Bε (p). However, we first show that the three conditions above already determine uniquely
the coefficients a, b, k, m. Indeed, from (4.77) it follows that
Comparing this equality with (4.79) implies that ε = ∥p∥(k) −ka and ε = (n−m)b+∥p∥(m) −1.
We therefore conclude that
∥p∥(k) − ε 1 + ε − ∥p∥(m)
a= and b = . (4.81)
k n−m
That is, the equation above can be viewed as the definitions of a and b, and it is left to
determine k and m.
Substituting the above definitions of a and b into (4.78) and isolating ε gives that
Moreover, in the exercise below you show that the components of the vectors r := (r1 , . . . , rn )T
and s ∈ (s1 , . . . , sn )T are non-negative and satisfy r = r↑ and s = s↓ . Thus, the relations
in (4.82) uniquely specify k and m. However, it is left to show that k ⩽ m since otherwise
p(ε) would not be well defined.
For this purpose, let ℓ ∈ [n] be the integer defined in Exercise 4.2.5. We will show that
k ⩽ ℓ ⩽ m. To prove k ⩽ ℓ, suppose by contradiction that k ⩾ ℓ + 1. Since ε ∈ [rk , rk+1 ) we
have ε ⩾ rk ⩾ rℓ+1 , where the second inequality follows from the fact that r = r↑ and our
assumption that k ⩾ ℓ + 1. Combining this with the definition of rℓ+1 in (4.83), we get
ε ⩾ ∥p∥(ℓ+1) − (ℓ + 1)pℓ+1
= ∥p∥(ℓ) − ℓpℓ+1
(4.84)
1 ℓ
pℓ+1 < −−−−→ > ∥p∥(ℓ) − ,
n n
which is in contradiction with (4.75). Therefore, the assumption that k ⩾ ℓ + 1 cannot hold
and we conclude that k ⩽ ℓ.
Similarly, to prove that m ⩾ ℓ, suppose by contradiction that m ⩽ ℓ − 1. Since ε ∈
[sm+1 , sm ) we have ε ⩾ sm+1 ⩾ sℓ , where the second inequality follows from the fact that
s = s↓ and our assumption that m + 1 ⩾ ℓ. Combining this with the definition of sℓ in (4.83),
we get
ε ⩾ (n − ℓ)pℓ + ∥p∥(ℓ) − 1
1 ℓ (4.85)
pℓ ⩾ −−−−→ ⩾ ∥p∥ (ℓ) − ,
n n
which is again in contradiction with (4.75). Therefore, the assumption that m ⩽ ℓ−1 cannot
hold and we conclude that m ⩾ ℓ. Combining this with our earlier result that k ⩽ ℓ we
conclude that k ⩽ m.
Exercise 4.2.6. Show that the vectors r and s, whose components are given in (4.83) satisfy:
0 = r1 ⩽ r2 ⩽ · · · ⩽ rn = 1 − npn and np1 − 1 = s1 ⩾ s2 ⩾ · · · ⩾ sn = 0 . (4.86)
It’s important to note that the index k is characterized by its role as the maximizer of
∥p∥(ℓ) −ε
the function ℓ 7→ tℓ := ℓ
. To put it another way, tk = maxℓ∈[n] {tℓ }. This implies that
the coefficient a can be straightforwardly defined as:
∥p∥(ℓ) − ε
a := max . (4.87)
ℓ∈[n] ℓ
To understand this, let ℓ be the largest integer that satisfies tℓ = maxℓ′ ∈[n] {tℓ′ }. The inequal-
ity tℓ > tℓ+1 leads to (see Exercise 4.2.7):
rℓ+1 − ε
0 < tℓ − tℓ+1 = , (4.88)
ℓ(ℓ + 1)
where rℓ := ∥p∥(ℓ) − ℓpℓ as previously defined. This implies that rℓ+1 > ε. Conversely, by
following a similar reasoning, the condition tℓ ⩾ tℓ−1 yields rℓ ⩽ ε. Therefore, we conclude
that ℓ is the integer for which ε falls in the interval [rℓ , rℓ+1 ), which leads us to deduce that
ℓ = k.
rℓ+1 −ε
Exercise 4.2.7. Verify the equality tℓ − tℓ+1 = ℓ(ℓ+1)
.
Exercise 4.2.8. Using the same notations as above, show that for every ε ∈ (0, 1) and
p ∈ Prob(n) the coefficient b can be expressed as:
1 + ε − ∥p∥(ℓ)
b = min . (4.89)
ℓ∈[n−1] n−ℓ
Theorem 4.2.3. Let ε ∈ (0, 1) and p ∈ Prob↓ (n) be a probability vector such
that (4.74) holds. Let k, m ∈ [n − 1] be the integers satisfying (4.82), and a and b be
the numbers defined in (4.81). Then, for these choices of k, m, a, and b, the vector
p(ε) as defined in (4.76) is the minimal element (under majorization) of Bε (p).
Proof. We already showed that p(ε) ∈ Bε (p). It is therefore left show that if q ∈ Bε (p)
then q ≻ p(ε) . To establish that ∥q∥(ℓ) ⩾ p(ε) (ℓ) for every ℓ ∈ [n], we partition the proof
into three distinct cases:
∥q∥(ℓ) ⩾ ∥p∥(ℓ) − ε
ℓ
X
= ∥p∥(k) − ε + px
x=k+1
ℓ (4.92)
X
(4.81)→ = ka + px
x=k+1
(ε)
(4.76)→ = p (ℓ)
.
3. The case m < ℓ ⩽ n. We use once more (2.83) (with m replacing k) to get ∥q∥(m) ⩾
∥p∥(m) − ε. Moreover, observe that in this case we have for all m < ℓ ⩽ n
n
X
∥q∥(ℓ) = 1 − qx↓ and p(ε) (ℓ)
= 1 − (n − ℓ)b . (4.93)
x=ℓ+1
n
1 X ↓
q ⩽ b. (4.94)
n − ℓ x=ℓ+1 x
The inequality in (4.94) is equivalent to the statement that the average of the last
n − ℓ components q↓ is no greater than b. Since the components of q↓ are arranged
in non-increasing order, this average is no greater than the average of the last n − m
components of q↓ (recall that n − m > n − ℓ). Hence,
n n
1 X ↓ 1 X
qx ⩽ q↓
n − ℓ x=ℓ+1 n − m x=m+1 x
1 − ∥q∥(m)
= (4.95)
n−m
1 + ε − ∥p∥(m)
∥q∥(m) ⩾ ∥p∥(m) − ε −−−−→ =
n−m
(4.81)→ = b .
One can use the flatest ε-approximation to compute the distance of a vector p ∈ Prob(n)
to the set of all vectors r ∈ Prob(n) that are majorized by q. Specifically, let
denotes the set of all vectors in Prob(n) that are majorized by q, and define the distance
between p ∈ Prob(n) and the set majo(q) as:
1
T p, majo(q) := min ∥p − r∥1 . (4.97)
r∈majo(q) 2
Theorem 4.2.4. Using the same notations as above, for all p, q ∈ Prob(n)
T p, majo(q) = max ∥p∥(ℓ) − ∥q∥(ℓ) . (4.98)
ℓ∈[n]
Proof. Without loss of generality we will assume that p, q ∈ Prob↓ (n) and q ̸≻ p. For any
ε ∈ (0, 1), let p(ε) be the flattest ε-approximation of p; see (4.76). By definition,
n1 o
T p, majo(q) := min ∥p − r∥1 : q ≻ r , r ∈ Prob(n)
n2 o
= min ε ∈ [0, 1] : q ≻ r , r ∈ Bε (p) (4.99)
n o
(ε)
r ≻ p(ε) ∀ r ∈ Bε (p) −−−−→ = min ε ∈ [0, 1] : q ≻ p .
That is, it is left to compute the smallest ε that satisfy q ≻ p(ε) . We will show that this
smallest ε equals
δ := max ∥p∥(ℓ) − ∥q∥(ℓ) . (4.100)
ℓ∈[n]
We first show that q ≻ p(δ) . Let k, m ∈ [n − 1] be the integers satisfying (4.82), and a and
b be the numbers defined in (4.81), but with δ replacing ε. From Exercise 4.1.8 we have
q ≻ p(δ) if and only if
Now, for k ⩽ ℓ ⩽ m
ℓ
X
(δ)
∥q∥(ℓ) − p (ℓ)
= ∥q∥(k) − ka + (qx − px )
x=k+1
ℓ
X (4.102)
ka = ∥p∥(k) − δ −−−−→ = δ + ∥q∥(k) − ∥p∥(k) + (qx − px )
x=k+1
= δ + ∥q∥(ℓ) − ∥p∥(ℓ) .
Hence, q ≻ p(δ) if and only if for all ℓ ∈ {k, . . . , m} we have δ ⩾ ∥p∥(ℓ) − ∥q∥(ℓ) . From its
definition, δ ⩾ ∥p∥(ℓ) − ∥q∥(ℓ) for all ℓ ∈ [n]. Hence, q ≻ p(δ) .
To prove the optimality of δ, we use the fact that p(δ) is δ-close to p so that from (2.83)
we get for any ℓ ∈ [n]
δ ⩾ ∥p∥(ℓ) − p(δ) (ℓ)
(4.103)
q ≻ p(δ) → ⩾ ∥p∥(ℓ) − ∥q∥(ℓ) .
Since the above inequality holds for all ℓ ∈ [n] we conclude that δ is optimal. This concludes
the proof.
Recall that (s − t)+ = 21 (|s − t| + s − t), and since the absolute value is a continuous function,
these functions are continuous (although not differentiable). Observe that fp (t) = 0 for
t ⩾ p1 , whereas gp (t) = 0 for t ∈ [0, pn ] and gp (t) = nt − 1 for t ⩾ p1 . The function fp (t) is
non-increasing in t while gp (t) is non-decreasing in t. See Fig. 4.1 for examples of fp (t) and
gp (t).
Exercise 4.2.9. In this exercise we use the same notations used in this subsection with a
fix p ∈ Prob↓ (n).
Figure 4.1: The functions fp (t) and gp (t). The red dots indicate the points at which the slop of
the functions changes.
where k ∈ [n] is the integer satisfying r ∈ (rk , rk+1 ]. The inverse function gp−1 : [0, n − 1] →
[pn , 1] : s 7→ gp−1 (s) is given by (see Exercise 4.2.10)
( 1+s−∥p∥
(m)
−1 n−m
if s > 0
gp (s) = , (4.108)
0 if s = 0
Exercise 4.2.10. Consider the functions fp : [0, p1 ] → [0, 1] and gp : [pn , 1] → [0, n − 1] as
defined above and let fp−1 and gp−1 be as defined in (4.107) and (4.108).
1. Show that for any t ∈ [0, p1 ] and r ∈ [0, 1]
fp−1 (fp (t)) = t and fp fp−1 (r) = r .
(4.109)
Relative Majorization
Definition 4.3.1. Let p, q ∈ Prob(n) and p′ , q′ ∈ Prob(m) be two pairs of
probability distributions. We say that (p, q) relatively majorize (p′ , q′ ) and write
Relative majorization is a pre-order. The property (p, q) ≻ (p, q) (i.e. reflexivity) follows
by taking E in the definition above to be the identity matrix. The transitivity of relative
majorization follows from the fact that the product of two column stochastic matrices is also
a column stochastic matrix.
2. Show that for any m, n ∈ N and any probability vectors, p ∈ Prob(n) and q ∈ Prob(m),
From the second relation in (4.113) it follows that without loss of generality we can
always assume that there is no x ∈ [n] such that px = qx = 0 (since any x-component with
px = qx = 0 can be removed from the vectors p and q without changing the equivalency).
Standard Form
Definition 4.3.2. A pair (p, q) of probability vectors in Prob(n) is said to be given
in a standard form if there is no x ∈ [n] such that px = qx = 0, and the components
of the vectors p and q are arranged such that
p1 p2 pn
⩾ ⩾ ··· ⩾ , (4.116)
q1 q2 qn
where we used the convention px /qx = ∞ for x ∈ [n] with px > 0 and qx = 0.
Observe that since (P p, P q) ∼ (p, q) for every permutation matrix P , any pair of vectors
is equivalent (under relative majorization) to its standard form. The choice of the order given
in (4.116) will be clear later on when we characterize relative majorization with testing
regions.
Exercise 4.3.2. Let {e1 , e2 } be the standard basis of R2 . Express the pair (e1 , e2 ) in the
standard form.
Exercise 4.3.3. Show that if p, q ∈ Prob(n) and q has the form q = (q1 , . . . , qr , 0, . . . , 0)
for some r < n then
(p, q) ∼ (p′ , q) (4.117)
for any p′ ∈ Prob(n) whose first r components equal the first r components of p.
In the following theorem we bound any pair of probability vectors by pairs of two dimen-
sional vectors. For any n ∈ N and p, q ∈ Prob(n), we denote by
X qx
λmin := qx and λmax := min , (4.119)
x∈supp(p) px
x∈supp(p)
where supp(p) := {x ∈ [n] : px ̸= 0}. Later in the book we will see that λmax and λmin are
related to the min and max relative entropies. In the following theorem we use the notations
e1 := (1, 0)T and e2 := (0, 1)T , and in addition denote by
vmax := λmax e1 + (1 − λmax )e2 and vmin := λmin e1 + (1 − λmin )e2 . (4.120)
Theorem 4.3.1. Using the same notations as above we have for any p, q ∈ Prob(n)
Remark. Note that the bounds are not symmetric, meaning that if we swap p with q, the
vector e1 := (1, 0)T will appear second in the bounding pairs, and λmin and λmax will also
change. Moreover, if p > 0 (i.e. all the components of p are strictly positive) than λmin = 1
and consequently the lower bound becomes trivial.
Proof. We first prove the upper bound. For this purpose, we need to find an n × 2 column
stochastic channel E ∈ STOCH(n, 2) with the property that
where e1 := (1, 0)T and e2 := (0, 1)T . Observe that the first condition above implies that the
first column of E must be equal to p. Combining this with the definition of vmax and with
the second condition above we get that
q − λmax p
Evmax = λmax p + (1 − λmax )Ee2 = q ⇒ Ee2 = . (4.123)
1 − λmax
By definition, λmax ∈ [0, 1] and it has the property that q ⩾ λmax p (i.e. qx ⩾ λmax px for each
x ∈ [n]). Therefore, the right-hand side of the equation above is a probability vector. To
summarize, the n × 2 column stochastic matrix E, whose first column is p, and its second
column is q−λ max p
1−λmax
satisfies (4.122) so that by definition the upper bound in (4.121) holds.
We now prove the lower bound. By definition, it is sufficient to show that there exists a
channel E ∈ STOCH(2, n) such that
Ep = e1 and Eq = vmin = λmin e1 + (1 − λmin )e2 . (4.124)
Since E must be a column stochastic matrix with two rows, it follows that if its first row
is tT then its second row is (1n − t)T , where 1Tn = (1, . . . , 1). Hence, E satisfies the above
conditions if and only if
t · p = 1 and t · q = λmin . (4.125)
Note also that we must have 0 ⩽ t ⩽ 1n (element-wise) since E is column stochastic. We
therefore choose t = (t1 , . . . , tn )T with
(
1 if px > 0
tx = ∀ x ∈ [n] . (4.126)
0 if px = 0
It is simple to check that this t satisfies (4.125). This completes the proof.
Note that when supp(p) = supp(q), the value of λmax , as defined in (4.119), is constrained
to the range 0 < λmax < 1, ensuring that vmax > 0. In this case we can improve the upper
bound (e1 , vmax ). Indeed, let 0 < s < t ⩽ 1 be such that
1−t t
p⩽q⩽ p. (4.127)
1−s s
Note that such s and t exists since we assume that p and q have the same support, and we
can take s close enough to zero and t close enough to one. Now, define a stochastic evolution
matrix E = [v1 v2 ] ∈ STOCH(m, 2), with the two columns v1 , v2 ∈ Prob(m) given by
(1 − s)q − (1 − t)p tp − sq
v1 := and v2 := . (4.128)
t−s t−s
Note that the conditions in (4.127) implies that E is indeed a column stochastic matrix since
tp − sq ⩾ 0 (entrywise) and (1 − s)q − (1 − t)p ⩾ 0. Moreover, denoting by s := (s, 1 − s)T
and t := (t, 1 − t)T we have by direct calculation (Exercise 4.3.5)
Es = p and Et = q . (4.129)
We therefore conclude that for every p, q ∈ Prob(n) with equal support, i.e., supp(p) =
supp(q), there exist s, t ∈ Prob>0 (2) satisfying the relation:
(s, t) ≻ (p, q) . (4.130)
Observe that the relation above hold as long as s, t ∈ [0, 1] satisfies s < t and
(1 − s)q ⩾ (1 − t)p and tp ⩾ sq . (4.131)
Exercise 4.3.5. Verify by direct calculation the relations in (4.129).
Exercise 4.3.6. Show that by taking t = 1 − λmax and s = 0 the relation (s, t) ≻ (p, q) is
equivalent to the upper bound in (4.121).
where u := ( n1 , . . . , n1 )T is the uniform distribution. In the exercise below you will prove this
assertion using Theorem 4.1.1.
Exercise 4.3.7. Use the equivalence between the first two conditions in Theorem 4.1.1 to
prove (4.132).
where u(kx ) is the uniform probability vector in Prob(kx ). We then have the following
theorem.
Remark. Observe that any vector 0 < q ∈ Prob(n) ∩ Qn can be expressed as in (4.133) for
sufficiently large k. This k is a common denominator for all the components of q.
Proof. We first show that (p, q) ≻ (r, u(k) ). For any x ∈ [n], let E (x) be the kx × n matrix
whose x-column is u(kx ) and all the remaining n − 1 columns are zero. Moreover, let E be
the k × n matrix given by
E (1)
(2)
E
E :=
.. .
(4.136)
.
(n)
E
By definition, E (x) p = px u(kx ) so that Ep = r. Similarly, E (x) q = kkx u(kx ) = k1 1kx so that
Eq = u(k) . Therefore, since E is column stochastic we get that (p, q) ≻ (r, u(k) ).
For the converse, let F (x) be the n × kx matrix whose x-row is [1, . . . , 1] and all the
remaining n − 1 rows are zero. Moreover, denote by F the n × k column stochastic matrix
given by h i
F := F (1) F (2) · · · F (n) . (4.137)
where t (also known as a probabilistic hypothesis test) is an n-dimensional vector with entries
between 0 and 1. Note that for any pair of probability vectors the points (0, 0) and (1, 1)
belong to its testing region. Explicitly, (0, 0) is obtained by taking t to be the zero vector,
and (1, 1) is obtained by taking t = (1, . . . , 1)T . An example of a testing region is plotted in
Fig. 4.2.
Exercise 4.3.11. Show that the testing region is convex, and it has the symmetry that if
(x, y) ∈ T(p, q) then also (1 − x, 1 − y) ∈ T(p, q). Hint: For the latter property, consider
the vector t′ = (1, . . . ., 1) − t.
The testing region is bounded by two curves known as lower and upper Lorenz curves.
Due to the symmetry that (1 − x, 1 − y) ∈ T(p, q) for any (x, y) ∈ T(p, q), the upper Lorenz
curve can be obtained from the lower Lorenz curve through a 180-degree rotation centered
at the midpoint (1/2, 1/2). Consequently, either the lower or the upper Lorenz curve is
sufficient to uniquely define the entire testing region.
Since the testing region is convex it can be characterized by its extreme points. It is
tempting to draw a parallel with the convex set [0, 1]n , which possesses 2n extreme points
encapsulated within the set 0, 1n . However, this analogy can be misleading in the context of
our testing region. In reality, only 2n points are necessary to fully characterize the testing
region. We will focus on the extreme points characterizing the lower Lorenz curve. Note
that the lower Lorenz curve is a convex curve (while the upper Lorenz curve is concave).
Exercise 4.3.12. Let p, q ∈ Prob(n) and let P be an n × n permutation matrix.
1. Show that
T(P p, P q) = T(p, q) . (4.140)
2. Show that
T(p ⊕ 0, q ⊕ 0) = T(p, q) . (4.141)
Since for any permutation matrix P , (p, q) and (P p, P q) have the same testing region,
we can assume without loss of generality that (p, q) are always given in the standard form.
This is also justified by the fact that under relative majorization we have (p, q) ∼ (P p, P q)
for any permutation matrix P .
Theorem 4.3.3. Given p, q ∈ Prob(n) in standard form, the extreme points on the
lower boundary of the testing region T(p, q) (specifically, on its lower Lorenz curve)
are the n + 1 vertices:
X X
(ak , bk ) := px , qx k = 0, 1, . . . , n , (4.142)
x∈[k] x∈[k]
where a0 := 0 and b0 := 0.
P
Remark. In general, the sum x∈[k] px does not equal to ∥p∥(k) since the components of p
are not necessarily arranged in a non-increasing order. Instead, the components of p and q
are arranged such that the order in (4.116) holds.
Proof. Let f : [0, 1] → [0, 1] be the function whose graph is the lower Lorenz curve of
(p, q). Then, by definition, for every a ∈ [0, 1], (a, f (a)) is the lowest point in T(p, q) whose
x-coordinate is a. Therefore, for any r ∈ [0, 1] we can express f (r) as
f (r) = min q · t : t ∈ [0, 1]n , p · t = r .
(4.143)
Our objective is to demonstrate that the function f (r) defines the segment connecting two
adjacent vertices. Specifically, consider a fixed k ∈ 0, 1, . . . , n. Our aim is to establish that
for any r in the interval [ak , ak+1 ), the function f (r) corresponds to the line segment joining
the points (ak , bk ) and (ak+1 , bk+1 ). Mathematically, this means that for all r ∈ [ak , ak+1 ),
we have:
f (r) = sk+1 (r − ak ) + bk , (4.144)
where sx := px /qx for all x ∈ [n], adhering to the convention that sx := ∞ if px = 0 and
qx > 0 (recall that we assume that there is no x ∈ [n] such that both px and qx are equal to
zero). Successfully proving this relationship implies that the set of points {(ak , bk )}nk=0 are
indeed the extreme points on the lower Lorenz curve of the testing region T(p, q).
To prove (4.144), observe that the optimization problem in (4.143) is a linear program.
In Exercise 4.3.13 you will apply methods discussed in Sec. A.9, specifically the dual problem
framework, to express f (r) as:
n o
n
f (r) = max rs − v · 1n : v ∈ R+ , sp − q ⩽ v , s ∈ R+ (4.145)
where 1n := (1, . . . , 1)T and the inequality is entry-wise. The maximization in (4.145) can
be simplified since the vector v with the smallest non-negative components that satisfies
the constraint v ⩾ sp − q is given by v = (sp − q)+ (the components of (sp − q)+ are
{(spx − qx )+ }x∈[n] ). Hence,
f (r) = max sr − (sp − q)+ · 1n . (4.146)
s⩾0
Now, from (4.116) it follows that s1 ⩽ s2 ⩽ · · · ⩽ sn . Therefore, for any s ⩾ 0 there exists
ℓ ∈ {0, 1, . . . , n} with the property that sℓ ⩽ s < sℓ+1 , where we added the definitions s0 := 0
and sn+1 := ∞. With this definition of ℓ we get
X
(sp − q)+ · 1n = px (s − sx ) = saℓ − bℓ . (4.148)
x∈[ℓ]
Therefore, by splitting the maximization in (4.146) into maximization over all ℓ ∈ {0, 1, . . . , n}
and all s ∈ [sℓ , sℓ+1 ) we get
f (r) = max sup s(r − aℓ ) + bℓ . (4.149)
ℓ∈{0,1...,n} s∈[sℓ ,sℓ+1 )
Moreover, among all ℓ ∈ {0, 1, . . . , k} the choice ℓ = k yields the greatest value since the
right-hand side above is increasing in ℓ as long as ℓ ⩽ k (see Exercise 4.3.14); hence,
max sup s(r − aℓ ) + bℓ = sk+1 (r − ak ) + bk . (4.151)
ℓ∈{0,1,...,k} s∈[sℓ ,sℓ+1 )
Furthermore, among all ℓ ∈ {k + 1, . . . , n} the choice ℓ = k + 1 yields the greatest value since
the right-hand side above is decreasing in ℓ as long as ℓ ⩾ k + 1 (see Exercise 4.3.14); that
is,
max sup s(r − aℓ ) + bℓ = sk+1 (r − ak+1 ) + bk+1 . (4.153)
ℓ∈{k+1,...,n} s∈[sℓ ,sℓ+1 )
The right-hand side of (4.151) is in fact equal to the right-hand side of (4.153) (see Exer-
cise 4.3.14). We therefore conclude that for any k ∈ {0, 1, . . . , n} and any r ∈ [ak , ak+1 ) we
have f (r) = sk+1 (r − ak ) + bk . This completes the proof.
1. Show that the condition p·t = r in (4.143) can be replaced with p·t ⩾ r. Hint: Observe
that any t satisfying p · t > r can be rescaled to give p · t = r (and this rescaling can
only decrease q · t).
2. Prove the equality in (4.145). Hint: First express the minimization in (4.143) (after
replacing p · t = r with p · t ⩾ r) as a conic linear programming of the form (A.52)
(with vectors in Rn replacing Hermitian matrices, and the dot product replacing the
Hilbert-Schmidt inner product). Then use (A.57) and the strong duality to get (4.145).
Exercise 4.3.14. Show that the right-hand side of (4.150) is increasing in ℓ ∈ [k], and
the right-hand side of (4.152) is decreasing in ℓ ∈ {k + 1, . . . , n}. Moreover, show that the
two expressions are the same for ℓ = k and ℓ = k + 1 (i.e., show that the right-hand side
of (4.151) is equal to the right-hand side of (4.153)).
1
1 . (p − tq)+ · 1n = ∥p − tq∥1 + 1 − t .
2 (4.154)
1
2 . (tp − q)+ · 1n = ∥tp − q∥1 + t − 1 .
2
Hint: Use the relation (a − b)+ = 12 |a − b| + 12 (a − b).
Exercise 4.3.16. Compute the vertices of the lower Lorenz curve of the example given in
Fig. 4.2. If necessary, rearrange the components of p and q so that (4.116) holds.
Exercise 4.3.17. For a given p, q, ∈ Prob(n), find the vertices of T(p, q) that are located
on the upper Lorenz curve of (p, q).
Exercise 4.3.18. Let p ∈ Prob(n). Show that the vertices of the lower Lorenz curve of the
pair (p, u(n) ) are given by
k
∥p∥(k) , k = 0, 1, . . . , n , (4.155)
n
Remark. We will see shortly that the converse to the statement in the corollary above is also
true.
Proof. The proof follows immediately from the expression for the lower Lorenz curve in (4.146).
Explicitly, by using the variable t := 1s in (4.146), and using the notation fp,q (r) for f (r) we
get that for all r ∈ [0, 1]
r − (p − tq)+ · 1n
fp,q (r) = max
t⩾1 t
(4.156)
2r − ∥p − tq∥1 + t − 1
(4.154)→ = max .
t⩾1 2t
Therefore, if ∥p − tq∥1 ⩾ ∥p′ − tq′ ∥1 for all t ⩾ 1 then fp,q (r) ⩽ fp′ ,q′ (r) for all r ∈ [0, 1];
i.e., the lower Lorenz curve of the pair (p, q) is nowhere above the lower Lorenz curve of
(p′ , q′ ) so that T(p, q) ⊇ T(p′ , q′ ).
Exercise 4.3.19. Let p, q ∈ Prob(n) and p′ , q′ ∈ Prob(n′ ). Show that if (p, q) ≻ (p′ , q′ )
then for all t ∈ R we have ∥p − tq∥1 ⩾ ∥p′ − tq′ ∥1 . Hint: Use the property 2.8 of the 1-norm
∥ · ∥1 .
Exercise 4.3.20. Let p, q ∈ Prob(n) and p′ , q′ ∈ Prob(n′ ).
1. Show that if (p, q) ≻ (p′ , q′ ) then T(p′ , q′ ) ⊆ T(p, q).
2. Show that if (p, q) ∼ (p′ , q′ ) then T(p′ , q′ ) = T(p, q).
Hint: For the first part, let E ∈ STOCH(n′ , n) be such that p′ = Ep and q′ = Eq, and show
first that for any t′ ∈ [0, 1]n the vector t := E T t′ belongs to [0, 1]n and satisfies (t′ ·p′ , t′ ·q′ ) =
(t · p, t · q).
Characterization
Theorem 4.3.4. Let n, n′ ∈ N, p, q ∈ Prob(n), and p′ , q′ ∈ Prob(n′ ). Then, the
following are equivalent:
1. (p, q) ≻ (p′ , q′ ).
3. T(p, q) ⊇ T(p′ , q′ ).
Remark. The equivalence between 1 and 3 in Theorem 4.3.4 provides a very simple geomet-
rical characterization of relative majorization. Denoting by LC(p, q) and LC(p′ , q′ ) the two
lower Lorenz curves associated with the two testing regions, we have that (p, q) ≻ (p′ , q′ ) if
and only if LC(p, q) is nowhere above LC(p′ , q′ ). An example illustrating this property is
depicted in Fig. 4.3.
Figure 4.3: Lower Lorenz Curves. The red lower Lorenz curve LC(p, q) is nowhere above the blue
lower Lorenz curve LC(p′ , q′ ). This means that the pair (p, q) relatively majorizes the pair (p′ , q′ ).
5 1 7 1
Note that aside from the vertices (0, 0) and (1, 1), the vertices of LC(p, q) are ( 12 , 12 ), ( 12 , 6 ),
5 1 ′ ′ 1 1 7 1 5 7
( 6 , 2 ), and the vertices of LC(p , q ) are ( 3 , 12 ), ( 12 , 4 ), ( 6 , 12 ).
The implication 1 ⇒ 2 can be easily deduced from the monotonicity property of the norm
∥ · ∥1 , as discussed in (2.8). This part of the proof is straightforward and hence, is suggested
as an exercise for the reader (refer to Exercise 4.3.19). Having previously established the
implication 2 ⇒ 3 in Corollary 4.3.1, our remaining task is to demonstrate that 3 ⇒ 1. We
begin this proof by focusing on the case where both q and q′ consist of positive rational
components.
Lemma 4.3.1. Let q, p ∈ Prob(n) and p′ , q′ ∈ Prob(n′ ), and suppose that q and q′
have positive rational components. If T(p, q) ⊇ T(p′ , q′ ) then (p, q) ≻ (p′ , q′ ).
Proof. From Theorem 4.3.2 we get that there exist vectors r, r′ ∈ Prob(k) such that (p, q) ∼
(r, u(k) ) and (p′ , q′ ) ∼ (r′ , u(k) ), where k is a common denominator of all the components of
q and q′ . Therefore, from the second part of Exercise 4.3.20 we get that T(p, q) = T(r, u(k) )
and T(p′ , q′ ) = T(r′ , u(k) ). Moreover, since we assume that T(p, q) ⊇ T(p′ , q′ ) we get that
T(r, u(k) ) ⊇ T(r′ , u(k) ). Hence, the Lorenz curve LC(r, u(k) ) is nowhere above the Lorenz
curve LC(r′ , u(k) ). In addition, the non-zero vertices of LC(r, u(k) ) and LC(r′ , u(k) ) are given
Therefore, since the vertex (∥r∥(ℓ) , ℓ/k) has the same y-coordinate as the vertex (∥r∥(ℓ) , ℓ/k),
and since the convex curve LC(r, u(k) ) is nowhere above the convex curve LC(r′ , u(k) ), we
get that ∥r∥(ℓ) ⩾ ∥r′ ∥(ℓ) for all ℓ ∈ [k]. That is, r ≻ r′ and from (4.138) this is equivalent to
(p, q) ≻ (p′ , q′ ). This completes the proof.
In order to completes the proof of Theorem 4.3.4 we will need a continuity argument that
extends the lemma above to the general case of arbitrary q and q′ .
Proof. We provide a geometrical proof using Fig. 4.4. By keeping p′ unchanged, we can raise
slightly and vertically the vertices of LC(p′ , q′ ) to get the lower Lorenz curve of LC(p′ , q′(ε) ),
where q′(ε) has positive rational components and is ε-close to q′ ; see Fig. 4.4(a) for an
illustration. Explicitly, let ε1 , . . . , εn−1 be small enough positive numbers such that for all
x ∈ [n − 1], q ′ (ε) x := qx′ + εx is a rational number. Furthermore,
P we can always choose
ε1 , . . . , εn−1 to be small enough such that their sum δ := x∈[n−1] εx < ε and satisfies
′ (ε) := ′
qn qn − δ > 0 (due to the standard form of (p , q ) we have qn′ > 0). For these choices,
′ ′
′(ε) ′(ε)
the q′(ε) := (q1 , . . . , qn )T has positive rational numbers, and q′(ε) is also ε-close to q.
Furthermore, by construction, LC(p′ , q′(ε) ) is everywhere above LC(p′ , q′ ).
Once we established LC(p′ , q′(ε) ) we construct q(ε) in a similar way; see Fig. 4.4(b).
Explicitly, let ν1 , . . . , νn−1 be small enough positive numbers such that for all x ∈ [n − 1],
(ε)
qx := qx + νx is a rational number. Furthermore, we can always choose ν1 , . . . , νn−1 to be
P (ε)
small enough such that their sum ν := x∈[n−1] νx < ε and satisfies qn := qn − ν > 0. For
(ε) (ε)
these choices, the vector q(ε) := (q1 , . . . , qn )T has positive rational numbers, is ε-close to q,
and as long as ν1 , . . . , νn−1 are sufficiently small, LC(p, q(ε) ) is everywhere below LC(p′ , q′(ε) ).
This completes the proof.
With these lemmas at hand, we are now ready to prove the theorem.
In the exercise above we approximated (p, q) with a pair of vectors (p, q′ ), where q′ has
some desired properties (particularly, rational components), and p is fixed. In the next two
lemmas we remove some of the assumptions on q by allowing p to vary. We will only assume
that q′ is close to q.
Specifically, consider two probability vectors p, q ∈ Prob(n) and let ε ∈ (0, 1) be a
sufficiently small number to be determined later. Our objective is to show that for every
p′ ∈ Bε (p) there exists δ ∈ (0, 1) and q′ ∈ Bδ (q) such that (p′ , q′ ) ≻ (p, q). We will be
able to show that such a q′ exists if we assume that q > 0 and that supp(p′ ) ⊆ supp(p). In
the following lemma we will use the notation ε0 := 12 pmin qmin , where pmin and qmin are the
smallest non-zero components of p and q, respectively.
Proof. We need to define q′ ∈ Bδ (q) and a channel E ∈ STOCH(n, n) such that Ep′ = p
and Eq′ = q. The key idea is to look for a matrix E of the form
Et := p + s (t − p′ ) ∀ t ∈ Prob(n) , (4.163)
where s ∈ R+ is some coefficient. Observe that we define the matrix E by its action on
probability vectors. Clearly, by construction, Ep′ = p. However, E above is not necessarily
a stochastic matrix since for arbitrary s ∈ R the vector p + s (t − p′ ) could have negative
Observe that s ⩽ 1 (since we cannot have px > p′x for all x ∈ supp(p′ )), and since supp(p′ ) ⊆
supp(p) we have s > 0. In fact, since p′ is ε-close to p we must have p′x ⩽ px + ε for all
x ∈ [n] so that
px pmin
s ⩾ min ′ ⩾ . (4.165)
x∈supp(p ) px + ε pmin + ε
Our next goal is to define q′ such that Eq′ = q. Observe that according to (4.163) we
have Eq′ = p + s (q′ − p′ ) so that Eq′ = q if p + s (q′ − p′ ) = q. Isolating q′ we get that
q′ should have the form
1
q′ := p′ + (q − p) (4.166)
s
(indeed, check that with this q′ we have Eq′ = q). However, it is not obvious that q′ has
non-negative components. Therefore, we next show that ε is small enough so that q′ ⩾ 0
(i.e. q′ is a probability vector).
Observe that q′ ⩾ 0 if and only if for all x ∈ [n] we have qx ⩾ px − sp′x . Now, since p is
ε-close to p′ we get that
qx′
Lemma 4.3.4. Let p, q, q′ ∈ Prob(n) and denote by δ := 1 − minx∈[n] qx
. Then,
there exists p′ ∈ Bδ (p) such that (p, q) ≻ (p′ , q′ ).
qx − qx′ ε
δ = max ⩽ (4.170)
x∈[n] qx qℓ
where ℓ is the integer satisfying δ = 1 − qℓ′ /qℓ . Therefore, if q and q′ are very close to each
other so are p and p′ .
′
Proof. Let s := 1 − δ = minx∈[n] qqxx . From the definition of s we have q′ ⩾ sq (entry-wise),
so that the mapping
Et := q′ + s (t − q) ∀ t ∈ Prob(n) (4.171)
is a channel. By definition, Eq = q′ . Define
p′ := Ep
(4.172)
(4.171)→ = q′ + s (p − q) ,
1 ′ 1
∥p − p∥1 = ∥q′ − sq − (1 − s)p∥1
2 2
1 ′ 1 (4.173)
Triangle inequality→ = ∥q − sq∥1 + (1 − s)
2 2
q ⩾ sq −−−−→ = 1 − s .
′
Exercise 4.3.26. Prove the following theorem: Let p, q, p′ ∈ Prob(n) and denote by δ :=
′
1 − minx∈[n] ppxx . Then, there exists q′ ∈ Bδ (q) such that (p, q) ≻ (p′ , q′ ).
p ≻∗ q , (4.174)
Remark. Observe that while we require that the dimension m < ∞, it is still unbounded. We
will see later that this implies that the trumping relation is very sensitive to perturbations,
and also leads to a phenomenon know as ‘embezzlement’ of entanglement.
By definition, if p ≻ q then necessarily p ≻∗ q. The example below demonstrates that
the opposite direction does not hold in general. In this sense, the trumping relation impose
a weaker constraint than majorization.
Example. Consider the probability vectors
2/5 1/2
2/5 1/4
p :=
and
q=
(4.175)
1/10 1/4
1/10 0
It is simple to check (see exercise below) that p ̸≻ q and q ̸≻ p. Yet, p ≻∗ q since the vector
r = (3/5, 2/5)T satisfies (4.175).
Exercise 4.4.1. Let p, q, r be as in the example above.
1. Verify that p ̸≺ q and q ̸≺ p.
2. Verify that (4.175) holds.
Exercise 4.4.2. Show that if p, q ∈ Prob(3) and p ≻∗ q then p ≻ q.
Exercise 4.4.3. Show that the uniform probability vector u cannot act as a catalyst verctor;
that is, show that if p ∈ Prob(n) and q ∈ Prob(m) are such that p ̸≻ q then for any k ∈ N,
p ⊗ u(k) ̸≻ q ⊗ u(k) .
Exercise 4.4.4. Let f : Prob(n) → R be a Schur convex function that is additive under
tensor product; i.e. for all p ∈ Prob(n) and q ∈ Prob(m)
f (p ⊗ q) = f (p) + f (q) . (4.176)
Show that
p ≻∗ q ⇒ f (p) ⩾ f (q) . (4.177)
Relative Trumping
Definition 4.4.2. Let p, q ∈ Prob(n) and p′ , q′ ∈ Prob(m). We say that the pair
(p, q) relatively trumps the pair (p′ , q′ ), and write
(p ⊗ r, q ⊗ s) ≻ (p′ ⊗ r, q′ ⊗ s) . (4.179)
Exercise 4.4.5. Show that if we did not impose s > 0 (or alternatively that r > 0) in
the definition above then there would always exists a catalyst. Hint: Take r and s to be
orthogonal.
Remark. In the above definition, an alternative approach could have been to require that
the vectors r and s are not orthogonal instead of enforcing s > 0. Nevertheless, opting for
the stricter criterion of s > 0 brings two distinct benefits. Firstly, within the rational field,
where vectors like q, q′ , r and others comprise rational components, Theorem 4.3.2 implies
that (r, s) ∼ (t, u), for some vector t of higher dimensionality. This equivalence effectively
simplifies relative trumping to standard trumping in this scenario. Secondly, when applying
the concept of relative trumping to thermodynamic contexts, the vector s typically represents
a Gibbs state, which is inherently positive by nature. This correspondence ensures that the
mathematical model is in harmony with the underlying physical principles of Gibbs states
in thermodynamics.
In the next chapter we will study functions that behaves monotonically under relative
trumping. A well known family of such functions are Rényi divergences. Rényi divergences
are defined for any α ∈ [0, ∞] and any p, q ∈ Prob(n) as
(
1
pαx qx1−α
P
α−1
log x∈[m] if supp(p) ⊆ supp(q) .
Dα (p∥q) := (4.180)
∞ otherwise
The cases α = 0, 1, ∞ are defined by taking the appropriate limits (more details will be given
in the next chapter). Both the trumping and relative trumping relations can be characterized
with the above family of Rényi divergences.
p ≻∗ q (4.181)
1
if and only if for all α ⩾ 2
The proof of the theorem above is rather complicated and goes beyond the scope of this
book. In the section ‘Notes and References’ at the end of this chapter, we discuss its history
and provide relevant references for further reading. In the corollary below we show that this
theorem can be extended to relative tramping for the case that one of the vectors in each
pair has positive rational components. We use the notation
(p, q) ⊗ (p′ ⊗ q′ ) := (p ⊗ p′ , q ⊗ q′ ) ∀ p, q ∈ Prob(n) , ∀ p′ , q′ ∈ Prob(m) . (4.183)
1
2. For all α ⩾ 2
Dα (p∥q) > Dα (p′ ∥q′ ) and Dα (q∥p) > Dα (q′ ∥p′ ) . (4.185)
Note that the condition 1 above implies in particular that (p, q) ≻∗ (p′ , q′ ). We leave
the proof of the corollary as an exercise.
Exercise 4.4.6. Prove Corollary 4.4.1 using the combination of Theorem 4.4.1 and Theo-
rem 4.3.2.
and let {p′k }k∈N and {q′k }k∈N be sequences in Prob(m) with limits p′ and q′ . Suppose now
that (pk , qk ) ≻ (p′k , q′k ) for all k ∈ N and recall from Theorem 4.3.4 that this means that
T(pk , qk ) ⊇ T(p′k , q′k ). Since this inclusion of testing regions is robust under taking the limit,
we conclude that T(p, q) ⊇ T(p′ , q′ ) so that necessarily (p, q) ≻ (p′ , q′ ).
The above argument cannot be applied to the trumping and relative trumping relations.
To see why, consider two sequences {pk }k∈N ⊆ Prob(n) and {qk }k∈N ⊆ Prob(m) with limits
p and q, and suppose that pk ≻∗ qk for all k ∈ N. This means that for each k ∈ N there
exists a catalyst vector rk ∈ Prob(ℓk ) such that pk ⊗ rk ≻ qk ⊗ rk , where ℓk ∈ N is the
dimension of rk that can depend on k. Without invoking additional arguments, one cannot
rule out the possibility that the dimension ℓk goes to infinity as k goes to infinity. This
means that we cannot conclude that p ≻∗ q. However, at the time of writing this book, it
is left open to find an example with convergent sequences satisfying pk ≻∗ qk for all k ∈ N,
whereas their limits satisfy p ̸≻∗ q.
Exercise 4.5.1. Show that if there exists an example as above, then there is also a similar
example for relative trumping. That is, there exists sequences {pk }k∈N , {qk }k∈N , {p′k }k∈N ,
and {q′k }k∈N , in Prob(n), with limits p, q, p′ , and q′ , respectively, such that (pk , qk ) ≻∗
(p′k , q′k ) for all k ∈ N and (p, q) ̸≻∗ (p′ , q′ ).
Catalytic Majorization
Definition 4.5.1. Let m, n ∈ N and p, q ∈ Prob(n) and p′ , q′ ∈ Prob(m). We say
that (p, q) catalytically majorizes (p′ , q′ ) and write
if for any ε > 0 there exists four vector pε ∈ Bε (p), qε ∈ Bε (q), p′ε ∈ Bε (p′ ), and
q′ε ∈ Bε (q′ ) such that (pε , qε ) ≻∗ (p′ε , q′ε ).
Exercise 4.5.2. Show that ≻c is indeed a pre-order, and if (p, q) ≻∗ (p′ , q′ ) then necessarily
(p, q) ≻c (p′ , q′ ).
Exercise 4.5.3 (Robustness of Catalytic Majorization). Show that if for any ε > 0 there
exists four vector pε ∈ Bε (p), qε ∈ Bε (q), p′ε ∈ Bε (p′ ), and q′ε ∈ Bε (q′ ) such that
(pε , qε ) ≻c (p′ε , q′ε ) then (p, q) ≻c (p′ , q′ ).
1. (p, q) ≻c (p′ , q′ ).
2. For any ε > 0 there exist qε ∈ Bε (q) and q′ε ∈ Bε (q′ ) such that
(p, qε ) ≻∗ (p′ , q′ε ).
Proof. Clearly, if for all ε > 0 the two vectors in the second statement exist, then (p, q) ≻c
(p′ , q′ ) since we can define pε := p and p′ε := p′ , so that the four vectors pε , qε , p′ε , and q′ε
satisfy the conditions in Definition 4.5.1. It thus remains to show the converse implication.
Suppose (p, q) ≻c (p′ , q′ ) and let pε , qε , p′ε , and q′ε be as in Definition 4.5.1. In particular,
(pε , qε ) ≻∗ (p′ε , q′ε ) and we choose ε > 0 to be sufficiently small so that qε > 0 (recall
that q > 0 and qε is ε-close to q) and p′ε > 0. From Lemma 4.3.3 it follows that there
ε
exists r ∈ Bδ (qε ) with δ := qε,min (qε,min being the smallest component of qε ) such that
(p, r) ≻ (pε , qε ). Similarly, from the version of Lemma (4.3.4) that is given in Exercise 4.3.26
′ ′
it follows that there exists a vector r′ ∈ Bδ (q′ε ) with δ ′ := 1 − minx∈[n] (pp′x)x , such that
ε
(p′ε , q′ε ) ≻ (p′ , r′ ). Observe that r and r′ satisfy
Hence, in particular (p, r) ≻∗ (p′ , r′ ). Now, recall that r is δ-close to qε and therefore
(δ + ε)-close to q. Similarly, r′ is (δ ′ + ε)-close to q′ . The proof is therefore concluded by the
observation that both δ and δ ′ go to zero in the limit ε → 0 so that r and r′ can be made
arbitrarily close to q and q′ , respectively.
Exercise 4.5.4. Show that under the assumption that q > 0 and p′ > 0 both δ and δ ′ in the
lemma above goes to zero as ε goes to zero.
1. (p, q) ≻c (p′ , q′ )
2. For all α ⩾ 1
2
we have Dα (p∥q) ⩾ Dα (p′ ∥q′ ) and Dα (q∥p) ⩾ Dα (q′ ∥p′ ).
Proof. The proof of the theorem for the special case that either p = q or p′ = q′ is very
simple and is left as an exercise. Therefore, we assume now that both p ̸= q and p′ ̸= q′ .
Due to the symmetry in the roles of p and q, and since one of them has full support, we
assume without loss of generality (without loss of generality) that it is q; that is, we assume
q > 0. The proof of the monotonicity property of Dα under catalytic majorization will be
detailed in Chapter 6, where we extensively study the properties of the Rényi divergences.
Consequently, this section will only cover the proof of the implication 2 ⇒ 1.
From Exercise 4.3.24 it follows that for any ε > 0 there exist qε ∈ Bε (q) and q′ε ∈ Bε (q′ )
with 0 < qε ∈ Prob(n) ∩ Qn and 0 < q′ε ∈ Prob(m) ∩ Qm such that
In Chapter 6 we will see that Dα behaves monotonically under both relative majorization
and catalytic majorization. Therefore, the relations above combined with our assumption
that (p, q) ≻c (p′ , q′ ), lead to the conclusion that
and similarly
Now, since both qε and q′ε have positive rational components, there exists two finite dimen-
sional probability vectors rε , r′ε ∈ Prob(mε ), with mε ∈ N, such that (see Theorem 4.3.2)
Our strategy is to use Theorem 4.4.1 in conjunction with the inequalities above in order to
obtain a majorization relation between r′ε and rε . However, since the inequalities above are
not strict, we cannot use Theorem 4.4.1 and will need to tweak a bit the vector r′ε .
We first rule out the possibility that r′ε = u(mε ) . Consulting the construction in Theo-
rem 4.3.2, we see that r′ε = u(mε ) implies p′ = q′ε . However, this cannot occur for sufficiently
ε→0+
small ε > 0 since q′ε −−−→ q′ ̸= p′ by our assumption. Hence, we can assume r′ε ̸= u(mε ) for
sufficiently small ε > 0. Moreover, observe that for any ε ∈ (0, 1), we have (see Exercise 4.1.2)
r′ε ≻ sε := (1 − ε) r′ε + εu(mε ) . (4.193)
A combination of the relation r′ε ≻ sε (note that sε > 0) with Theorem 4.4.1 and Eq. (4.192),
gives the following strict inequalities for all α ⩾ 12
where {eX m Y n
x }x∈[m] is the standard basis of R , and {ey }y∈[n] is the standard basis of R . While
mathematically, pXY is a vector in Prob(mn), conceptually, we treat it as a joint probability
distribution.
To introduce the concept of conditional majorization, we build upon the foundation of
conditional mixing operations. Our objective is to characterize a set of evolution matrices
that possess a specific property: they increase the conditional uncertainty associated with
system X when provided access to system Y . To embark on this journey, consider three clas-
sical systems: X, Y , and Y ′ , each with dimensions m, n, and n′ , respectively. Additionally,
′
consider two probability vectors pXY ∈ Prob(mn) and qXY ∈ Prob(mn′ ). We say that pXY
′ ′
conditionally majorizes qXY and denote it as pXY ≻X qXY when there exists a conditional
mixing operation (to be define shortly), denoted as M ∈ STOCH(mn′ , mn), such that:
′
qXY = M pXY . (4.198)
The challenge lies in crafting a meaningful definition for M that aligns with the concept
of “conditional mixing.” In this context, we demonstrate that there exist three distinct
approaches to defining M , mirroring the three methodologies introduced in Section 4.1.2.
Remarkably, all of these approaches converge to the same definition of conditional mixing and
conditional majorization, thereby establishing a solid foundation for the notion of conditional
majorization.
This section is structured as follows: Initially, we introduce both the axiomatic and con-
structive approaches, showing that they both lead to the identical definition of a conditional
mixing operation. We then use this definition to establish conditional majorization and to
examine some of its key properties. Following that, we explore a useful characterization of
conditional majorization, paying special attention to cases in smaller dimensions. Lastly, in
the final subsection on this topic, we investigate the operational approach to conditional ma-
jorization and demonstrate its consistency with the axiomatic and constructive approaches.
uncertainty about system X when one has access to system Y . To address this, we introduce
a minimalistic causality assumption that accounts for the property that system X has no
causal effect on system Y ′ . In mathematical terms, this assumption implies that the compo-
nents of the stochastic matrix M = (µx′ y′ |xy ) satisfy the following equation for all x ∈ [m],
y ∈ [n], and y ′ ∈ [n′ ]: X
µx′ y′ |xy = ry′ |y , (4.199)
x′ ∈[m]
where {ry′ |y } (with y ∈ [n], and y ′ ∈ [n′ ]) is some conditional probability distribution in-
dependent on x. We refer to this condition as non-signalling from X to Y ′ or in short
X ̸→ Y ′ -signalling (see (2.135) for a similar definition).
Matrices that satisfies the above non-signalling condition has a relatively simple form.
′
Specifically, for every y ∈ [n] and y ′ ∈ [n′ ] let T (y,y ) ∈ Rm×m
+ be the matrix whose components
are µx′ y′ |xy
(y,y ′ )
tx′ |x := ∀x, x′ ∈ [m] . (4.200)
ry′ |y
′
From (4.199) it then follows that T (y,y ) is column stochastic; i.e., for every y ∈ [n] and
′
y ′ ∈ [n′ ] we have T (y,y ) ∈ STOCH(m, m). With these notations we can express M as
(summations runs over all x, x′ ∈ [m] and all y ∈ [n] and y ′ ∈ [n′ ])
X
M= µx′ y′ |xy |x′ ⟩⟨x| ⊗ |y ′ ⟩⟨y|
x,x′ ,y,y ′
X X (y,y ′ )
(4.201)
= ry′ |y tx′ |x |x′ ⟩⟨x| ⊗ |y ′ ⟩⟨y| ,
y,y ′ x,x′
where we employed quantum notations by denoting |x′ ⟩⟨x| (and similarly |y ′ ⟩⟨y|) as the
m × m rank-one matrix eTx′ ex , which is a matrix with a one at the (x′ , x)-position and
zeros elsewhere. We therefore conclude that M ∈ STOCH(mn, mn′ ) is X ̸→ Y ′ -signalling
′
if and only if there exists nn′ stochastic matrices T (y,y ) ∈ STOCH(m, m) (with y ∈ [n] and
y ′ ∈ [n′ ]), and another stochastic matrix R = (ry′ |y ) ∈ STOCH(n′ , n) such that
′
X
M= ry′ |y T (y,y ) ⊗ |y ′ ⟩⟨y| . (4.202)
y,y ′
In the following exercise you show that the above form of M represents a bipartite channel
that can be realized with one-way communication from Bob to Alice.
Exercise 4.6.1. Let M ∈ STOCH(mn′ , mn). Show that the following two statements are
equivalent:
1. M is X ̸→ Y ′ -signalling.
2. M can be realized with one-way communication from Bob to Alice. That is, M can be
expressed as (refer to Fig. 4.5):
X
M= T (j) ⊗ Rj (4.203)
j∈[k]
′
where k ∈ N, and for each j ∈ [k], T (j) ∈ STOCH(m, m), Rj ∈ Rn+ ×n , and R :=
′
P
j∈[k] Rj ∈ STOCH(n , n).
1. Show that if M satisfies (4.199) then for every evolution matrix E ∈ STOCH(m, m)
the marginal channel N satisfies
N (E ⊗ In ) = N . (4.204)
This condition ensure that any operation E that Alice (system X) may chose to apply
to her system cannot be detected by Bob (system Y ). Such a condition is also called
X ̸→ Y ′ semi-causal. See Fig. 4.6 for an illustration of a semi-causal channel.
2. Show that if N satisfies (7.14) for all E ∈ STOCH(m, m) then M satisfies (4.199).
Figure 4.6: An illustration of a semi-causal classical bipartite channel M . The marginal channel
N equals N (E ⊗ In ) for any choice of E ∈ STOCH(m, m).
Note that the multiplication by row vector 1Tm in the expression above effectively functions
as the “tracing out” of system X.
Before we delve into characterizing the maps within CMO(mn′ , mn), let’s explore an
alternative approach, which we refer to as the constructive approach. We will demonstrate
that this approach ultimately yields the same set of conditional mixing operations.
obtained by Alice applying a mixing operation to her system (i.e., doubly stochastic map)
conditioned on information received from Bob (see Fig. 4.7). Mathematically,
X
M= D(j) ⊗ Rj (4.207)
j∈[k]
where j ∈ [k] is the information Bob sends to Alice after he processes his input y via
Rj = (ry′ j|y ). Upon receiving j Alice applies a mixing operation to her input x described by
the m × m doubly-stochastic matrix D(j) .
It is important to note that expression in (4.207) is very similar to the one given in (4.203)
except that for any fixed j ∈ [k] the stochastic matrix T (j) is replaced with the doubly
stochastic matrix D(j) . Therefore, CDS channels are necessarily X ̸→ Y ′ semi-causal. More-
over, since each D(j) is doubly stochastic we get that for any pY ∈ Prob(n)
X (j) X
M uX ⊗ pY = D u ⊗ Rj pY
j∈[k] (4.208)
X Y
D(j) uX = uX −−−−→ = u ⊗ Rp ,
P
where R = j∈[k] Rj . That is, M satisfies the condition given in (4.205). We therefore
conclude that every CDS channel M is necessarily a CMO. In the next theorem we prove
that the converse also holds. Therefore, we get that both the axiomatic and the constructive
approaches leading to the same set of conditionally mixing operations.
Exercise 4.6.3. Let M be as in (4.207). Show that without loss of generality we can assume
that the matrices D(j) are permutation matrices. Hint: Recall that every doubly-stochastic
matrix is a convex combination of permutation matrices.
Exercise 4.6.4. Show that for any doubly stochastic matrix D ∈ STOCH(m, m) and any
stochastic matrix R ∈ STOCH(n′ , n) we have that R ⊗ D is CDS.
Proof. We already proved the inclusion CDS(mn′ , mn) ⊆ CMO(mn′ , mn) (see the discussion
below Definition 4.6.2). To prove the opposite inclusion, suppose M ∈ CMO(mn′ , mn). We
want to show that M ∈ CDS(mn′ , mn). For this purpose we will use the form given in (4.202)
′
for X ̸→ Y ′ semi-causal, and show that each T (y,y ) is an m × m doubly stochastic matrix.
′
Observe that if ry′ |y = 0 for some y ′ ∈ [n′ ] and y ∈ [n], then replacing T (y,y ) with the identity
′
matrix will not affect M , since ry′ |y = 0. Consequently, it suffices to demonstrate that T (y,y )
is doubly stochastic for those indices y and y ′ where ry′ |y ̸= 0.
Let {eYy }y∈[n] be the standard basis of Rn and consider the condition given in (4.205).
Fix y ∈ [n] and observe that from (4.202) we get
′ ′
X
M uX ⊗ eYy = ry′ |y T (y,y ) uX ⊗ eYy′ .
(4.210)
y ′ ∈[n′ ]
′ ′ ′
for some vector qYy := y′ ∈[n′ ] qy′ |y eYy′ in Prob(n′ ). Since {eYy }y′ ∈[n′ ] is an orthonarmal basis
P
′
of Rn , a comparison of (4.210) and (4.211) reveals that for all y ∈ [n] and y ′ ∈ [n′ ] that
′
ry′ |y T (y,y ) uX = qy′ |y uX . (4.212)
′ ′
Observe that since T (y,y ) is column stochastic, T (y,y ) uX is a probability vector so its dot
product with the vector 1m equals one. Thus, by taking the dot product on both sides of
the equation above with the vector 1m we get ry′ |y = qy′ |y for all y ∈ [n] and y ′ ∈ [n′ ]. We
′ ′
therefore conclude that T (y,y ) uX = uX (recall we assume ry′ |y ̸= 0). Hence, T (y,y ) is doubly
stochastic. This completes the proof.
′
In the proof above we showed that the matrices T (y,y ) as appear in (4.202) are doubly
stochastic. We therefore conclude that M ∈ STOCH(mn′ , mn) is a conditionally mixing
operation if and only if there exists a column stochastic matrix R ∈ STOCH(n′ , n) and nn′
′
doubly stochastic matrices D(y,y ) ∈ STOCH(m, m), with y ∈ [n] and y ′ ∈ [n′ ], such that
′
X X
M= ry′ |y D(y,y ) ⊗ |y ′ ⟩⟨y| . (4.213)
y∈[n] y ′ ∈[n′ ]
Conditional Majorization
Definition 4.6.3. Let X, Y , and Y ′ , be three classical systems of dimensions m, n,
′
and n′ , respectively. Further, let pXY ∈ Prob(mn) and qXY ∈ Prob(mn′ ). We say
′
that pXY conditionally majorizes qXY with respect to X, and write
′
pXY ≻X qXY , (4.214)
′
if there exists M ∈ CMO(mn′ , mn) such that qXY = M pXY . We further write
′ ′ ′
qXY ∼X pXY if both pXY ≻X qXY and qXY ≻X pXY .
′ ′
Likewise, if we denote the components of qXY as {qxy′ }, we can represent qXY as:
′ ′
X X
qXY = qX Y
y ′ ⊗ ey ′ , where qX
y ′ := qxy′ eX
x . (4.216)
y ′ ∈[n′ ] x∈[m]
′
Using these notations, the pre-order pXY ≻X qXY can be expressed as a relationship between
the two sets of vectors {pX X
y }y∈[n] and {qy ′ }y ′ ∈[n′ ] . Observe further that all these vectors has
non-negative components and their sums are given by the marginal probability vectors:
X X
pX := pX X
y ∈ Prob(m) and q := qX
y ′ ∈ Prob(m) . (4.217)
y∈[n] y ′ ∈[n′ ]
′
By definition, if pXY ≻X qXY then there exists a matrix M of the form (4.213) such
′
that qXY = M pXY . Using the notations above this relation can be expressed as (see
Exercise 4.6.5)
′
X
qy ′ = ry′ |y D(y,y ) py ∀ y ′ ∈ [n′ ] , (4.218)
y∈[n]
′
where R = (ry′ |y ) ∈ STOCH(n′ , n) and each D(y,y ) is an m × m doubly stochastic matrix.
′
Exercise 4.6.5. Prove the relation (4.218) using the above form of pXY and qXY , and the
form (4.213) of M .
Exercise 4.6.6. Prove the relation (4.219). Hint: take in (4.218), Y ′ = Y , and for each
′ ′ ′
y ′ , y ∈ [n] take ry′ y = δy′ y and D(y ,y ) = Π(y ) .
The relation (4.219) implies that without loss of generality we can assume that the
components of the vectors {pX X
y }y∈[n] and {qy ′ }y ′ ∈[n′ ] are arranged in non-increasing order.
We will therefore assume this order in the rest of this section.
′
Theorem 4.6.2. Let pXY ∈ Prob(mn) and qXY ∈ Prob(mn′ ) with m := |X|,
′
n := |Y |, and n := |Y ′ |. Then, pXY ≻X qXY if and only if there exists
R = (ry′ |y ) ∈ STOCH(n′ , n) such that
X
ry′ |y pX X
y ≻ qy ′ ∀ y ′ ∈ [n′ ] , (4.220)
y∈[n]
where {pX X
y }y∈[n] and {qy ′ }y ′ ∈[n′ ] are defined in (4.215) and (4.216), respectively.
Remarks:
↓
1. We assume in the lemma above that for each y ∈ [n] we have pX
y = pX
y .
2. The vectors pX X
y and qy ′ are not necessarily probability vector since the sum of their
components py := 1m ·pX X
y and qy ′ := 1m ·qy ′ are in general smaller than one. Therefore,
the majorization relation in (4.220) implies in particular that
X
ry′ |y py = qy′ ∀ y ′ ∈ [n′ ] . (4.221)
y∈[n]
where the inequality is entry-wise, and L is the m × m matrix defined in Exercise 4.1.5.
We will later demonstrate that the inequality (4.222) is instrumental in characterizing
conditional majorization as a semidefinite program.
′ ′
Proof. Suppose pXY ≻X qXY so that the relation (4.218) holds. Since each D(y,y ) is doubly
(y,y ′ ) X
stochastic we have pX
y ≻ D py . Multiplying both sides of this relation by ry′ |y and
summing over y ∈ [n] gives (see Exercise 4.1.6)
′
X X
ry′ |y py ≻ ry′ |y D(y,y ) py
y∈[n] y∈[n] (4.223)
X
(4.218)→ = qy′ .
Conversely, suppose (4.220) holds. Then, from Theorem 4.1.1 for every y ′ ∈ [n′ ] there
′
exists a doubly stochastic matrix D(y ) ∈ STOCH(m, m) such that
(y ′ ) ′
X X
qX
y′ = D r ′ p
y |y y
X
= ry′ |y D(y ) pX
y . (4.224)
y∈[n] y∈[n]
′ ′
Defining D(y,y ) := D(y ) we conclude that the above relation is a special case of the rela-
′
tion (4.218). Hence, pXY ≻ qXY . This completes the proof.
To get a better intuition about conditional majorization, we first consider the cases in
which one of the systems X, Y , and Y ′ , is trivial:
1. The Case |X| = 1. This is a trivial case in which there is no uncertainty about
system X. We therefore expect the pre-order to be trivial as well. Indeed, in this case
′ ′
pXY = pY ∈ Prob(n) and qXY = qY ∈ Prob(n′ ), so the relation (4.220) becomes
′ ′
qY = RpY . Since for any pY ∈ Prob(n) and any qY ∈ Prob(n′ ) there exists a row
′ ′
stochastic matrix R that satisfies qY = RpY , we conclude that pXY ∼X qXY for any
′
probability vectors pXY and qXY with |X| = 1.
2. The Case |Y | = 1. In this case pXY = pX ∈ Prob(m), and the stochastic matrix R that
appear in Theorem 4.6.2 is a vector R = r := (r1 , . . . , rn′ )T ∈ Prob(n′ ). Therefore,
′ ′
the relation (4.220) becomes ry′ pX ≻ qX y ′ for all y ∈ [n ]. Moreover, denoting by
qy′ := 1n′ · qy′ the sum of the components of qy′ , we get from (4.221) that ry′ = qy′ for
all y ′ ∈ [n′ ]. We therefore conclude that for |Y | = 1,
′
pX ≻X qXY ⇐⇒ pX ≻ qX
|y ′ ∀ y ′ ∈ [n′ ] , (4.225)
1 X
where qX
|y ′ := q
qy′ y ′
is the vector whose components are {qx|y′ }x∈[m] .
′
3. The Case |Y ′ | = 1. In this case, qXY = qX ∈ Prob(m), and since R in Theorem 4.6.2
has to be an 1×n column stochastic
P matrix it must be equal to the row vector (1, . . . , 1).
X := X
Therefore, denoting by p y∈[n] py we get from Theorem 4.6.2 that
pXY ≻X qX ⇐⇒ pX ≻ qX . (4.226)
′ ′
Exercise 4.6.7. Let pXY ∈ Prob(mn) and qXY ∈ Prob(mn′ ). Show that if pXY ≻X qXY
then pX ≻ qX .
2. Similarly, for any permutation/bijection π : [n] → [n] we get that (see Exercise 4.6.8)
X
pXY ∼X pX Y
π(y) ⊗ ey . (4.227)
y∈[n]
n
′ ′
X
XY
p ∼X (pX
1 + pX
2 ) ⊗ eY1 + pX Y
y ⊗ ey−1 (4.228)
y=3
Moreover, if there exists y ∈ [n−1] such that p1|y = p1|y+1 then, if necessary, we will exchange
the vectors pX X
y and py+1 so that p2|y ⩾ p2|y+1 . If the latter inequality is also an equality
we continue by induction until we get a k ∈ [m − 1] such that px|y = px|y+1 for all x ∈ [k]
but pk+1|y > pk+1|y+1 . Combining these observations with the exercise above we are ready to
define the standard form.
Standard Form
Let pXY ∈ Prob(mn) be as defined in (4.215). We say that pXY is given in the
standard form if the vectors {pX
y }y∈[n] satisfy the following three conditions:
X ↓
1. For all y ∈ [n], pX
y = py .
′
Exercise 4.6.10. Let pXY ∈ Prob(mn), qXY ∈ Prob(mn′ ) be two probability vectors given
in their standard form, and L be the m × m matrix defined in Exercise 4.1.5. Use Theo-
′
rem 4.6.2 to show that pXY ≻X qXY if and only if there exists R ∈ STOCH(n′ , n) such
that
′
(L ⊗ R)pXY ⩾ (L ⊗ In′ ) qXY , (4.230)
where the inequality is entrywise.
Based on the preceding discussion, particularly the three examples provided, we can
deduce that any pXY is, under conditional majorization, equivalent to its standard form.
Consequently, without loss of generality, we may always assume that pXY ∈ Prob(mn) is
presented in its standard form. We will now demonstrate that conditional majorization
between vectors in standard form indeed constitutes a partial order.
′
Theorem 4.6.3. Let pXY ∈ Prob(mn) and qXY ∈ Prob(mn′ ) be two probability
′
vectors given in their standard form. Suppose further that pXY ∼X qXY . Then,
′
pXY = qXY (in particular, Y = Y ′ and n = n′ ).
′
Proof. From Exercise 4.6.10, the relation pXY ∼X qXY implies that there exists R ∈
STOCH(n′ , n) and R′ ∈ STOCH(n, n′ ), such that
′ ′
(L ⊗ R)pXY ⩾ (L ⊗ In′ ) qXY and (L ⊗ R′ )qXY ⩾ (L ⊗ In ) pXY . (4.231)
Denote by S = R′ R and S ′ := RR′ , and observe that the two equations above implies that
(see Exercise 4.6.11)
′ ′
(L ⊗ S)pXY ⩾ (L ⊗ In ) pXY and (L ⊗ S ′ )qXY ⩾ (L ⊗ In ) qXY . (4.232)
Denoting by sy|w the (w, y)-component of the matrix S we get that the equation above is
equivalent to X
sy|w Lpw ⩾ Lpy ∀ y ∈ [n] . (4.233)
w∈[n]
On the other hand, observe that by taking the sum over y ∈ [n] on both sides of the equation
above we get an equality between the two sides. Therefore, all the n inequalities above must
be equalities! Multiplying both sides by the inverse of L gives
X
py = sy|w pw ∀ y ∈ [n] . (4.234)
w∈[n]
We now argue that sy|y = 1 for all y ∈ [n] (which is equivalent to sy|w = δyw and S = In ).
Otherwise, suppose by contradiction that there exists y ∈ [n] such that sy|y < 1. Without
loss of generality suppose that this y is n. We then get that
X sn|w
pn = pw . (4.236)
1 − sn|n
w∈[n−1]
Denoting by
sy|n sn|w
ty|w := sy|w + . (4.238)
1 − sn|n
we conclude that X
py = ty|w pw ∀ y ∈ [n − 1] . (4.239)
w∈[n−1]
P
Observe that y∈[n−1] ty|w = 1. Next, we rule out the case ty|w = δyw for all y, w ∈ [n − 1].
Otherwise, this relation implies in particular that for all y, w ∈ [n − 1] with y ̸= w we
have sy|w = 0 and sy|n sn|w = 0. Now, recall that we assumed that sn|n < 1 so there exists
y ∈ [n − 1] such that sy|n > 0. For this choice of y ∈ [n − 1] the relation sy|n sn|w = 0 gives
sn|w = 0 for all w ̸= y. Substituting this into (4.236) gives
sn|y
pn = py (4.240)
1 − sn|n
in contradiction with the third property of the standard form of the vector pXY . We therefore
conclude that there must exists y ∈ [n − 1] such that ty|y < 1. Observe that we started with
the relation (4.234) with the condition that there exists sy|y < 1 for some y ∈ [n], and we
reduced it to the relation (4.239) with the condition that there exists ty|y < 1 for some
y ∈ [n − 1]. Continuing by induction until we have only one term in the sum on the right-
hand side of (4.234) (or of (4.239)) we conclude that one of the vectors of {pX y }y∈[n] is
proportional to another vector in the same set, in contradiction with the standard form of
pXY . Therefore, the assumption that there exists y ∈ [n] such that sy|y < 1 is in correct,
and we conclude that S = In or equivalently R′ R = In .
Moreover, following the same arguments as above we conclude that also S ′ := RR′ = In′ .
Combining this with R′ R = In we must have n′ = n and R′ = R−1 . However, the only
stochastic matrix whose inverse is also stochastic is a permutation matrix (that is, doubly
stochastic and orthogonal). We therefore conclude that the sets {pX X
y }y∈[n] and {qy }y∈[n]
can only differ up to a permutation; i.e., for all y ∈ [n], pX Y
y = qπ(y) for some permutation
π : [n] → [n]. However, since the {pX X
y }y∈[n] and {qy }y∈[n] are ordered in a specific way given
in the second property of the standard form, we conclude that π(y) = y for all y ∈ [n]. This
completes the proof.
Exercise 4.6.11. Prove the relations in (4.232). Hint: Multiply both sides of the first
inequality in (4.231) by R′ and the second inequality by R.
S
Definition 4.6.4. A function f : n,m∈N Prob(mn) → R is said to be conditionally
′
Schur-convex if for every pXY ∈ Prob(mn) and qXY ∈ Prob(mn′ )
′ ′
pXY ≻X qXY ⇒ f pXY ⩾ f qXY .
(4.241)
Observe that the conditionally Schur convex functions reduce to Schur convex functions
when restricted to Prob(m) (i.e. n = 1). Conversely, in the theorem below we show that
every convex symmetric function on the set of probability vectors can be extended to a con-
ditionally Schur convex function (see theorem below). Remember that in Subsection 4.1.3,
we established that such symmetric convex functions are in particular Schur convex.
In the theorem below, for every convex symmetric function f : Prob(m) → R we define
1
where p|y :== p
py y
is the probability vector whose components are {px|y }x∈[m] .
S
Theorem 4.6.4. Let f : m∈N Prob(m) → R be a symmetric convex function.
Then, the function Hf , as defined in (4.242), is conditionally Schur concave.
Proof. Recall that pXY ≻X qXY if and only if there exists a stochastic matrix R ∈ STOCH(n′ , n)
such that (4.220) holds. By rewriting (4.220) with pX X X X
y = py p|y and qy ′ = qy ′ q|y ′ we get that
X ry′ |y py
pX X
|y ≻ q|y ′ ∀ y ′ ∈ [n′ ] . (4.243)
qy ′
y∈[n]
where y ∈ [n], y ′ ∈ [n′ ], ay := p1y , by′ := q1y′ , and py := p1y + p2y and qy′ := q1y′ + q2y′ are
the sums of the components of pX X
y and qy ′ , respectively. With these notations we get for all
′ ′
y ∈ [n] and y ∈ [n ]
a by ′
LpX
y =
y and LqX
y′ =
, (4.248)
py q y′
1 0
where L := . Moreover, since we assume that pXY and qXY ′ are given in their
1 1
standard form, we have (see (4.229))
a1 an b1 bn′
⩾ ··· ⩾ and ⩾ ··· ⩾ . (4.249)
p1 pn q1 qn ′
′
From Theorem 4.6.2 we have that pXY ≻X qXY if and only if there exists a row stochastic
matrix R ∈ STOCH(n′ , n) such that for all y ′ ∈ [n′ ] we have
X X
by ′ ⩽ ry′ |y ay and qy′ ⩽ ry′ |y py . (4.250)
y∈[n] y∈[n]
P P
Observe that since y′ ∈[n′ ] qy′ = y∈[n] py = 1, the second inequality above must hold with
equality (in fact, we know it already from (4.221)) .
Let a, b, p, q be the vectors whose components are respectively {ay }y∈[n] , {by′ }y′ ∈[n′ ] ,
{py }y∈[n] , and {qy′ }y′ ∈[n′ ] . To streamline our analysis, we omitted the superscripts Y and
Y ′ when referring to the vectors a, b, p, q. It is important for the reader to bear in mind
that the vectors a := aY and p := pY correspond to a system of dimension n (referred to
′ ′
as system Y ), while b := bY and q := qY pertain to a system of dimension n′ (referred to
′
as system Y ′ ). With these notations we get that pXY ≻X qXY if and only if there exists
R ∈ STOCH(n′ , n) such that
Ra ⩾ b and Rp = q . (4.251)
This relation is closely related to relative majorization, however, note that the vectors a
and b are not probability vectors since their components in general don’t sum to one. We
therefore say in this case that the pair (a, p) relatively submajorize the pair (b, q). Observe
also that if a := ∥a∥1 equals b := ∥b∥1 then the inequality Ra ⩾ b can be replaced with
Ra = b so that (4.251) becomes equivalent to relative majorization; i.e.
1 1
a, p ≻ b, q . (4.252)
a b
We therefore assume now that a > b (the case a < b is not possible since Ra ⩾ b, and R is
column stochastic).
Even though the components of a do not sum to one (in general), we can still define its
testing region as (see (4.139))
n o
n
T(a, p) := (a · t, p · t) : t ∈ [0, 1] . (4.253)
By taking t = 1n we get the point (a, 1) ∈ T(a, p) as oppose to the point (1, 1) that one
would get if a was a probability vector. In fact, the testing region of the pair of probability
vectors ( a1 a, p) is almost identical to that of (a, p) except for a rescaling of the x-axis by a
factor of a; that is, (r, s) ∈ T( a1 a, p) if and only if (ar, s) ∈ T(a, p). Therefore, if (r, s) is
an extreme point of T( a1 a, p) then (ar, s) is an extreme point of T(a, p). That is, there are
n + 1 extreme points on the lower Lorenz curve of T(a, p) given by (0, 0) and the n points
{(µℓ , νℓ )}ℓ∈[n] , where
X X
µℓ := ax and νℓ := px . (4.254)
x∈[ℓ] x∈[ℓ]
Recall that since we assume that pXY is given in its standard form, the components of a
and p satisfy (4.249). See the red line in Fig. 4.8 for an example of the lower Lorenz curve
of the pair (a, p).
In Fig. 4.8 we also depicted another (purple) Lorenz curve, by taking it to be identical
to the Lorenz curve of (a, p) if the x-coordinate is no greater than b, and a vertical line
if the x-coordinate equals b. This purple curve is a Lorenz curve of some pair of vectors
(ã, p̃) for which ∥ã∥1 = b and p̃ is a probability vector. Moreover, the Lorenz curve of
(ã, p̃) has the property that any other Lorenz curve LC(b, q) (see the blue curve in Fig. 4.8)
that is no where below the Lorenz curve of (a, p) is also nowhere below the Lorenz curve of
′
(ã, p̃). We will see shortly that this implies that the relation pXY ≻X qXY is equivalent to
(ã, p̃) ≻ (b, p).
The vectors ã and p̃ that corresponds to the purple Lorenz curve of Fig. 4.8 can be
expressed as follows. Let k ∈ [n − 1] be the integer satisfying µk ⩽ b < µk+1 , or equivalently
Such an index k exists since we assume that a > b. The line connecting the vertices vk :=
(µk , νk ) with vk+1 := (µk+1 , νk+1 ) contains the point (b, λ) (see Fig. 4.8), where
pk+1
λ := (b − µk ) + νk . (4.256)
ak+1
Figure 4.8: Submajorization. Given the Lorenz curve LC(a, p) (red), we can always construct
another Lorenz curve, LC(ã, p̃) (purple), with ∥ã∥1 = b such that any Lorenz curve LC(b, q)
(blue) that is no where below LC(a, p) (red) is also nowhere below LC(ã, p̃) (purple). In this
example n = 5 and k = 3.
The point (1, λ) is therefore a vertex of the purple line in Fig. 4.8. With this notations, ã
and p̃ are given by
Theorem 4.6.5. Using the same notations as above, for |X| = 2 the following
statements are equivalent:
′
1. pXY ≻X qXY
Proof. The case a = b is relatively simple and is left as an exercise. We therefore prove the
′
theorem for the case a > b. Suppose pXY ≻X qXY . Then, there exists R ∈ STOCH(n′ , n)
such that (4.251) holds. Let (b · t′ , q · t′ ) ∈ LC(b, q) be a point on the lower Lorenz
′
curve of the testing region of (b, q), where t′ is some vector in [0, 1]n . Then, the vector
t := RT t′ ∈ [0, 1]n has the property that
a · t = aT RT t′ ⩾ b · t′ and p · t = pT RT t′ = q · t′ , (4.258)
where we used the relations in (4.251). The above relation implies that for any point (b·t′ , q·
t′ ) in LC(b, q), there exists a point (a · t, p · t) in the testing region of (a, p) that is located
to its right (i.e. a point with the same y-coordinate and no smaller x-coordinate). Since the
lower Lorenz curve is convex, this means that LC(b, q) is a nowhere below LC(a, p). That
is, the second statement of the theorem holds.
To prove that the second statement implies the third statement of the theorem, observe
that by construction of LC(ã, p̃) (see Fig. 4.8), the curve LC(b, q) is nowhere below the
curve LC(ã, p̃). Since ∥ã∥1 = b we get from Theorem 4.3.4 that (ã, p̃) ≻ (b, q).
It is therefore left to prove that the third statement in the theorem implies the first one.
Since we assume that (ã, p̃) ≻ (b, q) there exists a matrix S ∈ STOCH(n′ , k + 2) such that
Sã = b and S p̃ = q. On the other hand, the matrix
λ−νk
Ik 0k,n−k 0 ··· 0
T := ∈ STOCH(k + 2, n) where M := pk+1 (4.259)
νk+1 −λ
02,k M pk+1
1 ··· 1
The Case |Y | = 2
In this case, pXY has the form
pXY = p1 ⊗ e1 + p2 ⊗ e2 , (4.261)
aw p1 + bw p2 ≻ qw ∀ w ∈ [n′ ] . (4.262)
q w = aw p 1 + b w p 2 ∀ w ∈ [n′ ] , (4.263)
qw tw = aw p1 . (4.265)
Using the fact that a ∈ Prob(n′ ), by summing over w ∈ [n′ ] both sides of the equation above
we get that t must satisfy X
qw tw = p1 . (4.266)
w∈[n′ ]
′ ′
We therefore conclude that pXY ≻X qXY if and only if there exists t ∈ [0, 1]n that satis-
fies (4.266) and for all w ∈ [w′ ], tw p|1 + (1 − tw )p|2 ≻ q|w .
′
In order to determine when such t ∈ [0, 1]n exists, we assume that pXY is given in its
standard form so that both p|1 = p↓|1 and p|2 = p↓|2 . With this property, the majorization
relation given in (4.264) is equivalent to
Our next goal is to characterize the constraints that the equation above impose on tw . For this
purpose, we denote by I+ , I0 , and I− the set of all indices k ∈ [m] for which ∥p|1 ∥(k) −∥p|2 ∥(k)
is positive, zero, and negative, respectively. With these notations, if k ∈ I0 then (4.267)
takes the form ∥p|2 ∥(k) ⩾ ∥q|w ∥(k) . On the other hand, if k ∈ I+ we can isolate tw to get
Therefore, since this inequality holds for all k ∈ I+ and since tw ⩾ 0 we conclude that
tw ⩾ µw for all w ∈ [n′ ], where
Simililarly, by isolating tw in (4.267) for the cases that k ∈ I− we get tw ⩽ νw for all w ∈ [n′ ],
where
∥p|2 ∥(k) − ∥q|w ∥(k)
νw := min 1, min . (4.270)
k∈I− ∥p|2 ∥(k) − ∥p|1 ∥(k)
Theorem 4.6.6. Using the same notations as above, for the case |Y | = 2 we have
′
pXY ≻X qXY if and only if the following conditions hold:
Exercise 4.6.14. Simplify the conditions in Theorem 4.6.6 for the case that I+ = [m].
Exercise 4.6.15. Consider the case |X| = |Y | = |Y ′ | = 2 and let pXY , qXY ∈ Prob(4)
be such that pY = qY = u(2) . Simplify the necessary and sufficient conditions given in the
theorem above for this case.
where L is the m × m matrix defined in Exercise 4.1.5. Denote the rows of R by r1 , . . . , rn′ ∈
Rn+ ; i.e., ry′ := (ry′ |1 , . . . , ry′ |n )T , and denote by r ∈ Prob(nn′ ) the probability vector
r1
.
r := .. . (4.272)
rn′
Note that in this vector form of R, the condition that R is stochastic is equivalent to r ⩾ 0
−LP 0 ··· 0
−LqX
1
0 −LP · · · 0
..
..
.. . ..
(mn′ +n)×nn′
.
M := . . . . . ∈R and b := ∈ Rmn′ +n . (4.273)
−LqX
n′
· · · −LP
0 0
1n
In In ··· In
It is then straight forward to check that the inequalities given in (4.271) can be expressed
′
compactly as M r ⩽ b. The only other constraint is that r ∈ Rnn + . The problem of deter-
mining if such a vector r exists is known as a linear programming feasibility problem, and
there are several algorithms that can be used to solve it.
′
Exercise 4.6.16 (Farkas Lemma). Show that there exists r ∈ Rnn + satisfying M r ⩽ b if and
mn′ +n T
only if for every v ∈ R+ that satisfies v M ⩾ 0 (entrywise) we have v · b ⩾ 0. Hint:
For the harder direction, use the hyperplane separation theorem (Theorem A.2).
Dual Characterization
′
In the discussion above we saw that the condition pXY ≻X qXY is equivalent to the existence
′
of a vector r ∈ Rnn+ such that M r ⩽ b. Moreover, from the exercise above it follows that
′ +n
such an r exists if and only if for every v ∈ Rmn
+ that satisfies vT M ⩾ 0 we have v · b ⩾ 0.
We now express this later condition in terms of sub-linear functionals.
For this purpose, we express v as
v
1
..
.
v :=
,
(4.274)
vn′
t
where v1 , . . . , vn′ ∈ Rm n
+ and t ∈ R+ . From the definition of M in (4.273) we get that the
condition vT M ⩾ 0 can be expressed as
Similarly, from the definition of b in (4.273) we get that the condition v · b ⩾ 0 is equivalent
to X X
ty ⩾ vyT′ LqX
y′ . (4.277)
y∈[n] y ′ ∈[n′ ]
where we took ty in (4.277) to be equal to its smallest possible value as given in (4.276).
Finally, for each w ∈ [n′ ] let sw := LT vw and observe that the inequality above can be
written as X X
max′ sw · pX
y ⩾ sw · qX
y′ . (4.279)
w∈[n ]
y∈[n] w∈[n′ ]
↓
Note that since each vw ∈ Rm m
+ we get that sw ∈ R+ and sw = sw (see Exercise 4.1.5). Finally,
by dividing both sides of the inequality above by a sufficiently large number and absorbing
it into each sw we can assume without loss of generality that the matrix S := [s1 · · · sn′ ] ∈
STOCH⩽ (m, n′ ) is sub-stochastic (i.e., the components of each column sums to a number
smaller or equal to one). We therefore arrived at the following theorem.
′
Theorem 4.6.7. Let pXY ∈ Prob(mn) and qXY ∈ Prob(mn′ ) be given in their
′
standard form. Then, pXY ≻X qXY if and only if for every sub-stochastic matrix
S := [s1 · · · sn′ ] ∈ STOCH⩽ (m, n′ ), whose columns satisfy sw = s↓w for all w ∈ [n′ ], we
have X X
max′ sw · pX
y ⩾ sw · qXy′ . (4.280)
w∈[n ]
y∈[n] w∈[n′ ]
Exercise 4.6.17. Consider the theorem above without the assumption that sw = s↓w for all
′
w ∈ [n′ ] and without the assumption that pXY and qXY are given in their standard form.
′
Show that pXY ≻X qXY if and only if for every sub-stochastic matrix S := [s1 · · · sn′ ] ∈
STOCH⩽ (m, n′ ) X X
max′ s↓w · p↓y ⩾ s↓w · q↓y′ , (4.281)
w∈[n ]
y∈[n] w∈[n′ ]
rationale for the definition of conditional majorization as introduced in the previous subsec-
tions.
In the beginning of this chapter we introduced the concept of majorization using games
of chance. We saw that two probability vectors, p, q ∈ Prob(n) satisfy p ≻ q if and only
if in all games of chance, a player has better odds to win the game with the p-dice rather
than with the q-dice. Similarly, we will see that our definition of conditional majorization as
given in Definition 4.6.3 can be characterized with games of chance that involve a correlated
source.
We can think about a correlated source XY as two dice that are connected with a gum.
Rolling the two dice results in an outcome x for system X and a correlated outcome y
for system Y . As before, we denote by pXY ∈ Prob(mn) the probability matrix whose
(x, y)-entry, pxy , represents the probability that X = x and Y = y. It will be convenient
to denote by px|y = ppxyy , the conditional probability that X = x given that Y = y, where
P
py := x∈[m] pxy for any y ∈ [n].
Consider now a gambling game with such two correlated dice, in which a player, say
Alice, has to provide k ⩽ m numbers as her guesses for the value of X. If Alice has access to
the value y of Y , then she will choose the k numbers that has the largest probability to occur
relative to the conditional probability {px|y }x∈[m] . Therefore, the maximum probability to
win such a k-gambling game is given by
X X ↓
py px|y . (4.282)
y∈[n] x∈[k]
That is, Alice chooses the k numbers that has the largest probability to occur after she
learns the value of Y = y, which occur with probability py .
The example provided earlier is not the only kind of gambling game that Alice can engage
in with a correlated source, like the two-dice system. More expansively, we can envisage a
game where the host randomly determines the value of k according to a certain distribution.
In line with our aim to explore the widest range of scenarios in a gambling game with a
correlated source, we allow the player a degree of control in choosing which k-gambling
game will be played. This control is exercised through the player selecting a number w ∈ [ℓ]
and communicating it to the game host. Subsequently, the host decides the value of k
based on a distribution T := (tk|w ) ∈ Rm×ℓ
+ , aP detail known to the player. This distribution
adheres to the conditions that tk|w ⩾ 0 and k∈[m] tk|w ⩽ 1 for all w ∈ [ℓ]. Notably, we
also accommodate the scenario where the set {tk|w }k∈[m] does not sum to one, reflecting the
possibility of no k value occurring, resulting in the player losing the game from the onset.
The procedural steps of such a T -gambling game are illustrated in Fig. 4.9.
Note that the set encompassing all T -gambling games includes all k-gambling games as
well. This is evident when we consider T = (tk|w ) with tk|w = δkk0 , where k0 is a specific
integer within [m]. In this scenario, the game essentially becomes a k0 -gambling game,
meaning the host selects k = k0 regardless of w. In another example, where tk|w = m1 , the
host picks k from a uniform distribution, also independent of w.
Generally, for a given Y = y and a chosen w ∈ [ℓ], the optimal chance Alice has to win
Figure 4.9: A T -gambling game with correlated source pXY . Upon learning the value of Y , the
player provides the host a number w. Then, the host chooses k (at random) according to the
distribution {tk|w }k∈[m] . After that, the player provides her k guesses with the highest probability
to occur.
Consequently, for each Y = y, Alice will select the number w that maximizes this probability.
Therefore, the maximum likelihood of winning a T -gambling game, as outlined above, is given
by:
X X X ↓
PrT pXY = py max tk|w px|y . (4.284)
w∈[ℓ]
y∈[n] k∈[m] x∈[k]
The expression above for the winning probability, sometimes referred to as the reward
function in game theory, can be simplified. Consider the following alteration in the sequence
of summations:
X X XX m
↓
tk|w px|y = tk|w p↓x|y . (4.285)
k∈[m] x∈[k] x∈[m] k=x
Let’s introduce the matrix S = (sxw ) ∈ Rm×ℓ , whose coefficients are defined by
m
X
sxw := tk|w (4.286)
k=x
The text can be improved for better readability and coherence as follows:
”The formula for calculating the winning probability, sometimes referred to as the reward
function in game theory, can be simplified. Consider the following transformation:
X X m
XX
tk|w p↓x|y = tk|w p↓x|y ; . (4.287)
k∈[m] x∈[k] x∈[m] k=x
Let’s introduce the matrix S = (sxw ) ∈ Rm×ℓ , whose coefficients are defined by
m
X
sxw := tk|w ; . (4.288)
k=x
It’s important to note that the columns of S are in non-decreasing order; that is, for every
w ∈ [ℓ],
1 ⩾ s1w ⩾ s2w ⩾ · · · ⩾ smw . (4.289)
With this notation, the probability of winning can be expressed as
X X
PrT pXY = py max sxw p↓x|y . (4.290)
w∈[ℓ]
y∈[n] x∈[m]
where py ∈ Prob(m) is the probability vector with components {pxy }x∈[m] . Observe that
the formula above for calculating the winning probability coinside with the left-hand side
of (4.280). This observation provides our initial clue about the connection between games
of chance and conditional majorization. Another key insight is that conditional mixing
operations cannot increase the maximum probability of winning the game, as the following
lemma demonstrates.
′
Lemma 4.6.1. Let T ∈ STOCH⩽ (m, ℓ), pXY ∈ Prob(mn), and qXY ∈ Prob(mn′ ).
′
If pXY ≻ qXY then the maximal probability to win a T -gambling game satisfies
′
PrT pXY ⩾ PrT qXY .
(4.292)
Proof. Let S be the m×ℓ matrix whose components as defined in (4.288). Due to (4.289) the
columns of S = [s1 · · · sℓ ] satisfy s↓w = sw for all w ∈ [ℓ]. According to (4.291), the function
PrT (pXY ) has the form (4.242) with f : Prob(m) → R being the sublinear functional
f (p) := max′ sw · p↓ ∀ p ∈ Prob(m) . (4.293)
w∈[n ]
Since the function above is convex and symmetric (under permutations) we get from The-
orem 4.6.4 that PrT (pXY ) is conditionally Schur concave and in particular PrT (pXY ) ⩾
′
PrT (qXY ).
The subsequent exercise establishes a one-to-one correspondence (bijection) between the
set of all m × ℓ T -matrices and all m × ℓ S-matrices.
Exercise 4.6.18. Use (4.288) to find an m × m matrix U such that S = U T . Show that U
is invertible by computing its inverse, and use that to show that for any matrix S ∈ Rm×ℓ
+
whose components satisfy (4.289), the matrix U −1 S has non-negative entries.
′
Remark. We emphasize that the theorem above states that pXY ≻X qXY if and only if
with the pXY -dice pair, Alice has better odds to win all T -gambling games than with the
′
qXY -dice pair. Moreover, observe that instead of considering T -gambling games with T ∈
STOCH⩽ (m, ℓ) over all ℓ ∈ N, it is sufficient to consider ℓ = n′ . That is, the dimensions of
T are completely determined by X and Y ′ .
′
Proof. Due to Lemma 4.6.1, it is sufficient to prove the (4.294) implies pXY ≻X qXY . Let
S := [s1 · · · sn′ ] ∈ STOCH⩽ (m, n′ ), whose columns satisfy sw = s↓w for all w ∈ [n′ ]. From
Exercise 4.6.18 it follows that there exists a sub-stochastic matrix T ∈ STOCH⩽ (m, n′ ) that
satisfies the relation (4.288). Therefore,
X
max′ sw · py = PrT pXY
w∈[n ]
y∈[n]
XY ′
(4.294)→ ⩾ PrT q
X (4.295)
= max′ sw · qy′
w∈[n ]
y ′ ∈[n′ ]
X
⩾ sy′ · qy′ .
y ′ ∈[n′ ]
Since the above inequality holds for all S := [s1 · · · sn′ ] ∈ STOCH⩽ (m, n′ ), whose columns
′
satisfy sw = s↓w for all w ∈ [n′ ], we conclude from Theorem 4.6.7 that pXY ≻X qXY . This
completes the proof.
This chapter explores methods to quantify the distinguishability between entities such as
probability distributions and quantum states. Unlike generic vectors, mathematical objects
like probability vectors and quantum states embody information about physical systems.
Consequently, their distinguishability is typically measured using functions attuned to this
inherent information. Consider this example: Alice possesses a system in her laboratory
that is either in state ρ (for instance, an electron with its spin oriented in the z-direction)
or in state σ (such as the same electron with spin in the x-direction). Alice can attempt to
discern the state of her system (whether it is ρ or σ) by performing a quantum measurement
on it. The underlying principle is that the greater the distinguishability between ρ and σ,
the easier (or more likely) it is for Alice to accurately identify which of the two states her
system is in.
In any task involving distinguishability, such as the one mentioned earlier, a key observa-
tion is that sending a system (like the electron in Alice’s lab) through a quantum communi-
cation channel does not enhance Alice’s ability to differentiate between two states, ρ and σ.
This implies that if E ∈ CPTP(A → B) represents a quantum channel, the states E(ρ) and
E(σ) that result from this channel are less distinguishable than the original states ρ and σ
(this concept is visually illustrated in Fig. 5.1). In essence, any measure that quantifies the
distinguishability between two quantum states ρ and σ must decrease (or at most stay the
same) under any quantum process that transforms the pair (ρ, σ) into (E(ρ), E(σ)). Func-
tions that adhere to this principle are known as quantum divergences. Their characteristic
of reducing in value under such transformations is often referred to as the data processing
inequality (DPI).
Quantum divergence extends the concept of divergences from classical to quantum realms.
In a classical context, divergences are functions that behave monotonically under transfor-
mations that map a pair of probability vectors (p, q) to (Ep, Eq), with E being a column
stochastic matrix. As a result, many metrics in Rn , like the Euclidean distance, do not serve
well for quantifying distinguishability between two probability vectors. It’s also notewor-
thy that divergences are functions that behave monotonically under relative majorization.
Therefore, the tools developed in Chapter 4 will be very useful in this context as well.
229
230 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES
Classical Divergence
Definition 5.1.1. The function D, as defined in (5.1), is termed a divergence
provided it fulfills these two conditions:
2. Normalization, D(1∥1) = 0.
Note that for the trivial dimension n = 1, Prob(n) contains only the number one. In
this dimension, we require the divergence to be zero. Functions as above that satisfy the
DPI but with D(1∥1) ̸= 0 will be called unnormalized divergences. Moreover, observe that
the DPI property of a divergence D can also be viewed as monotonicity under relative
majorization. That is, we can state the first property above as follows: For all p, q ∈ Prob(n)
and p′ , q′ ∈ Prob(m) such that (p, q) ≻ (p′ , q′ ) we have
Since we assume t ⩾ 1 we have Dt (p∥p) = 0 for all p ∈ Prob(n). To show that Dt satisfies
the DPI we can use the relation (r)+ = (|r| + r)/2 for all r ∈ R to express Dt as
1
(∥p − tq∥1 + 1 − t) .
Dt (p∥q) = (5.9)
2
Consequently, the property outlined in (2.8) implies that {Dt }t⩾1 is a family of divergences.
Moreover, from Theorem 4.3.4 we learn that this family of classical divergences can be used
to characterize relative majorization; i.e., for all p, q ∈ Prob(n) and p′ , q′ ∈ Prob(n′ ) we
have (see Exercise 5.1.1)
(p, q) ≻ (p′ , q′ ) ⇐⇒ Dt (p∥q) ⩾ Dt (p′ ∥q′ ) ∀t⩾1. (5.10)
Exercise 5.1.1. Prove (5.10). Hint: Use Theorem 4.3.4 in conjunction with Corollary 4.3.1.
is finite. Therefore, for some convex functions as above, we can have Df (p∥q) = ∞ for some
choices of p, q ∈ Prob(n). From the theorem below it will follow that Df is a divergence and
therefore is always non-negative (even if f (x) is negative for some x ∈ (0, ∞)). Furthermore,
observe that the f -divergence can be expressed for any p, q ∈ Prob(n) as
X X px
Df (p∥q) = f˜(0) px + qx f , (5.14)
qx
x̸∈supp(q) x∈supp(q)
where we split the sum in (5.12) into a sum over all x ∈ [n] with qx = 0 and over all x ∈ [n]
with qx ̸= 0.
Exercise 5.1.3. Show that for every t ⩾ 1 the function Dt as defined in (5.8) is an f -
divergence.
Exercise 5.1.4. Show that the definition above for the f -divergence is equivalent to the fol-
lowing definition. Let f : (0, ∞) → R be a convex function and define f (0) := limε→0+ f (ε).
Then, the f -Divergence is defined as in (5.12) for p, q ∈ Prob(n) with q > 0, and for q ̸> 0,
Df (p∥q) := lim+ Df p (1 − ε)q + εu , (5.15)
ε→0
Exercise 5.1.5. Let f : (0, ∞) → R be a convex function that satisfy f (1) = 0, and let
f˜(r) := rf 1r . Show that f˜ is also convex with f˜(1) = 0 and prove that
Proof. The normalization condition Df (1|1) = 0 is directly derived from the requirement
that f (1) = 0. To illustrate the data processing inequality, consider m, n ∈ N, a stochastic
matrix E ∈ STOCH(m, n), and probability vectors p, q ∈ Prob(n). Define r := Ep and
s := Eq. For each x ∈ [m] and y ∈ [n], let ex|y represent the (x, y)-component of E.
P
With these definitions, the x-components of r and s are respectively rx = y∈[n] ex|y py
P
and sx = y∈[n] ex|y qy . Assuming initially that q > 0, we find that if sx = 0, then ex|y = 0
for all y ∈ [n]. Consequently, if sx = 0, it follows that rx must also be 0. This leads to the
conclusion that
X rx
Df (Ep∥Eq) = Df (r∥s) = sx f , (5.17)
sx
x∈supp(s)
where the summation is limited to all x ∈ [m] for which sx ̸= 0. If sx = 0, then rx is also 0,
which contributes 0f ( 00 ) = 0 to the sum in (5.12), with (r, s) replacing (p, q).
The strategy of the proof involves representing rx /sx , as seen on the right-hand side
of (8.171), as a convex combination of the ratios py /qy y ∈ [n]. This is achieved by defining,
for each x ∈ supp(s),
ex|y qy ex|y qy
ty|x := P = . (5.18)
y ′ ex|y ′ qy ′ sx
It is important to note that for every x ∈ supp(s), the set {ty|x }y∈[n] forms a probability
vector, and for all x ∈ supp(s), it holds that
rx X py
= ty|x . (5.19)
sx qy
y∈[n]
x∈supp(s) qy
y∈[n]
P
where the equality x∈supp(s) ex|y = 1 is valid because ex|y = 0 if sx = 0.
For the case q ⩾ 0 define qε := (1 − ε)q + εu. We then get that qε > 0 so that for any
ε>0
Df (Ep∥Eqε ) ⩽ Df (p∥qε ) . (5.21)
Taking the limit ε → 0+ on both sides of the equation above and using continuity of Df (p∥q)
in q (see Exercise 5.1.6) completes the proof.
Exercise 5.1.6 (Continuity of Df (p∥q) in q). In the last part of the proof above we used the
fact that the f -Divergence is continuous in q. Show that in general, if {qk }k∈N is a sequence
of probability vectors in Prob(n) that satisfies qk → q as k → ∞ then
Exercise 5.1.7. Prove the corollary above. Hint: Use (5.10), Exercise 5.1.3, and Theo-
rem 5.1.1 to prove the above corollary.
5.1.3 Examples
In this subsection we give several examples of f -divergences that play important role in
applications.
Kullback–Leibler divergence
The Kullback–Leibler divergence (also known as the KL-divergence or the relative entropy) is
perhaps the most well known divergence which appears in numerous applications in statistics,
information theory, and as we will see in resource theories. For this reason, it is the only
divergence that we will denote simply by D without any subscript. It is the f -divergence
that corresponds to the function f (r) = r log r. For this choice, we get,
(P
x∈[n] px log px − log qx if p ≪ q
D(p∥q) = (5.24)
∞ otherwise
where p ≪ q denotes supp(p) ⊆ supp(q), and we use the convention 0 log 0 = 0. In the next
chapters we will study the many properties of this divergence.
The trace distance, also known as the total variation distance (sometimes also called sta-
tistical distance), is an f -divergence with f (r) = 21 |r − 1|. For this convex function we
get
X 1 px 1X 1
Df (p∥q) = qx −1 = |px − qx | = ∥p − q∥1 . (5.25)
2 qx 2 2
x∈[n] x∈[n]
This f -divergence, which also functions as a metric, will be examined in detail in the subse-
quent sections.
The Hellinger distance is defined as the square root of the above expression
s
1X √ √
H(p, q) := ( p x − qx ) 2 (5.27)
2
x∈[n]
We will see later on that the above divergence is also a metric that is closely related to a
quantity known as the fidelity.
The α-Divergence
r −r α
The α-Divergence is an f -Divergence corresponding to fα (r) = α(α−1) , where α ∈ [0, ∞),
α
r −r
where the case α = 1 is defined by the limit limα→1 α(α−1) = r ln(r) (which yields the
KL-divergence), and similarly the case α = 0 is given by − ln(r). For this choice of f we get
α
X 1 px px 1 X
Dfα (p∥q) = qx − = pαx qx1−α − 1 (5.28)
α(α − 1) qx qx α(α − 1)
x∈[n] x∈[n]
The α-divergence can be expressed as a function of the Rényi divergences that we will study
in the Chapter 6.
Exercise 5.1.8. Show that all the functions f above are convex and satisfy f (1) = 0.
Exercise 5.1.9 (The Jensen–Shannon Divergence). Let f : (0, ∞) → R given by
2
f (r) = (r + 1) log + r log r ∀r∈R. (5.29)
r+1
Show that f is convex with f (1) = 0 and compute its f -Divergence.
Exercise 5.1.10. Let D′ be the α-divergence for α = 2, and let pk and qk as in (5.31).
Prove Eq. (5.33).
We show now that by utilizing the data processing inequality, if a divergence is continuous
in one of its arguments it is necessarily continuous in the second argument as well.
qx′
Ev := (1 − ε)(v − q) + q′ where ε := 1 − min . (5.34)
x∈[n] qx
Ev = (1 − ε)v + q′ − (1 − ε)q ⩾ 0 .
(5.35)
The inequality above implies that E is indeed column stochastic. Moreover, by definition
∥p − Ep∥1 = ε(p − q) + q′ − q 1
⩽ ε + ∥ q − q′ ∥1 . (5.36)
Since D(p∥q) is continuous in p, the expression on the right-hand side of the equation above
vanish when q → q′ .
Subsequently, we establish a comparable upper bound for D(p|q)−D(p|q′ ) by introducing
Ẽ ∈ STOCH(n, n) and ε̃ ∈ (0, 1). These are defined identically to how E and ε were defined,
but with the roles of q and q′ reversed. Specifically, Ẽ is defined by its action on every
v ∈ Prob(n) as
qx
Ẽv := (1 − ε̃)(v − q′ ) + q where ε̃ := 1 − min ′
. (5.38)
x∈[n] qx
By definition, Ẽq′ = q and ε̃ ∈ (0, 1). Further, following similar steps as above, it can
be verified that Ẽ is indeed column stochastic, and Ẽp approaches p as q approaches q′ .
Utilizing the DPI we also have
D(p∥q) − D(p∥q′ ) ⩽ D(p∥q) − D(Ẽp∥Ẽq′ )
(5.39)
= D(p∥q) − D(Ẽp∥q) .
Therefore, as before, due to the continuity of D(p∥q) in p, the expression on the right-hand
side of the equation above vanish when q′ → q. Combining this with the lower bound
in (5.37), we conclude that D(p∥q) is continuous in q.
A Measure of Nonuniformity
Definition 5.1.3. A function
[
g: Prob(n) → R ∪ {∞} (5.40)
n∈N
In Chapter 16 we will study the resource theory of nonuniformity in which the functions
above quantify the resource of this theory. Specifically, these functions quantify how different
a probability vector p is from the uniform distribution u. As an indication of this, note that
if D is a classical divergence that is continuous in its first argument, then the function
gD (p) := D p u(n)
∀ p ∈ Prob(n) , (5.41)
is a measure of non-uniformity.
Exercise 5.1.11. Verify that gD indeed satisfies all the three properties above. Hint: For
the third property show that
D(p∥u(n) ) = D p ⊗ u(k) u(nk)
(5.42)
as a consequence of the DPI applied twice for channels introducing and removing an inde-
pendent distribution u(k) .
Exercise 5.1.12. Let f : (0, ∞) → R be convex with f (1) = 0. Show that for the f -
divergence, Df , we have
1 X
gDf (p) = f (npx ) ∀ p ∈ Prob(n) . (5.43)
n
x∈[n]
Verify by direct calculation that this expression satisfy the three properties of g.
In Theorem 4.3.2, we established that for every n ∈ N, p ∈ Prob(n), and q ∈ Prob>0 (n)∩
Qn , there is a vector r ∈ Prob(k) with the property that (p, q) ∼ (r, u(k) ). To elaborate, let
q = ( kk1 , . . . , kkn )T , where each kx ∈ N and k := k1 + · · · + kn . The vector r is then expressed
as:
M
r := px u(kx ) . (5.44)
x∈[n]
Building upon this equivalency, the following theorem demonstrates a bijective relationship
between divergences and measures of non-uniformity. However, prior to this, it’s essential
to explore the uniqueness of the vector r above.
The vector r is not unique. To see why, let {mx }x∈[n] be a set of n integers satisfying
kx mx X
qx = = where m := mx . (5.45)
k m
x∈[n]
Given any such set, we can define the probability vector s := x∈[n] px u(mx ) so that (p, q) ∼
L
(s, u(m) ). This demonstrates that r is not unique. However, since kmx = mkx , we observe
that:
M M
u(m) ⊗ r = px u(mkx ) = px u(kmx ) = u(k) ⊗ s . (5.46)
x∈[n] x∈[n]
The above relation highlights that for any measure of non-uniformity g, following the third
property of Definition 5.1.3, we obtain:
g(r) = g u(m) ⊗ r
(5.46)→ = g u(k) ⊗ s
(5.47)
= g(s) ,
where the last equality again utilizes the third property of Definition 5.1.3. Thus, in this
context, r and s have the same non-uniformity.
Remark. Observe that Dg in (5.48) is well defined since g(r) = g(s) for any other vector s
as defined above. We will also show in the proof below that the continuous extension for
general q ∈ Prob(n) is well defined.
Proof. We first show that Dg is a divergence on the restricted space in which q ∈ Prob>0 (n)∩
Qn . The normalization of Dg holds since Dg (1∥1) = g(1) = 0. To show the DPI, let
p ∈ Prob(n), q ∈ Prob>0 (n) ∩ Qn , and E ∈ STOCH(m, n) ∩ Qm×n be a stochastic matrix
(channel) with rational components. Let further k ∈ N be large enough such that we can
express
T T
k1′ k′
k1 kn ′
q= ,..., , q := Eq = ,..., m , (5.49)
k k k k
Exercise 5.1.13. Describe explicitly the bijection between divergences that are continuous
in their second argument and measures of non-uniformity. That is, for any g express the
corresponding D and vice versa.
2. Normalization: D(1∥1) = 0.
Remark. Note that a classical divergence can be viewed as a quantum divergence whose
domain is restricted to classical systems. The union in (5.51) is over all systems A and
particularly over all finite dimensions |A|. Therefore, the domain of D consists of pairs of
density matrices (ρ, σ) in any dimension |A| ∈ N. For the case of a trivial system A with
|A| = 1 the only density matrix in D(A) is the number one. In this case, divergences satisfies
D(1∥1) = 0.
Like classical divergences, quantum divergences are non-negative since for any pair of
states ρ, σ ∈ D(A) we have the trace Tr ∈ CPTP(A → 1), so that
D(ρ∥σ) ⩾ D Tr[ρ] Tr[σ] = D(1∥1) = 0 , (5.52)
where the inequality follows from the DPI. Moreover, recall that any state ρ ∈ D(A) with
dimension |A| > 1 can be viewed as a preparation channel ρ1→A ∈ CPTP(1 → A). Hence,
where again we used the DPI for divergences. Combining this with the non-negativity of
divergences we conclude that for any state ρ ∈ D(A) in any dimension |A| ∈ N
D(ρ∥ρ) = 0 . (5.54)
This is consistent with the intuition that divergences quantify the distinguishability between
two states.
An interesting question remaining is whether the converse of the above property also
holds. A quantum divergence, D, is said to be faithful if for any ρ, σ ∈ D(A), the condition
D(ρ∥σ) = 0 implies ρ = σ. We will see later on that not all quantum divergences are faithful.
However, in the following lemma we show that a quantum divergence is faithful if and only
if its reduction to classical systems is faithful.
Lemma 5.2.1. Let D be a quantum divergence. Then, D is faithful if and only if its
reduction to classical (diagonal) states is faithful.
But since D is faithful on diagonal states we get the contradiction that ∆(ρ) = ∆(σ). Hence,
D is faithful also on quantum states.
Exercise 5.2.1. Let ρ, σ ∈ D(A). Show that if ρ ̸= σ then there exists a basis of A such
that the diagonal of ρ in this basis does not equal to the diagonal of σ in the same basis.
The data processing inequality implies that quantum divergences are invariant under
isometries. That is, for any isometry channel V ∈ CPTP(A → B) we have
D V(ρ) V(σ) = D(ρ∥σ) ∀ ρ, σ ∈ D(A) . (5.56)
To see why, recall that every isometry channel has a left inverse channel R ∈ CPTP(B → A)
that satisfies RB→A ◦ V A→B = idA (see Section 3.5.8). Hence, by definition of R
D(ρ∥σ) = D R ◦ V(ρ) R ◦ V(σ)
DPI→ ⩽ D V(ρ) V(σ) (5.57)
DPI→ ⩽ D(ρ∥σ) .
That is, all the inequalities above must be equalities so that (5.56) holds.
Exercise 5.2.2. Use the invariance property of quantum divergences under isometries, to
show that any classical divergence, D, satisfies
1. Show that if there exists a channel F ∈ CPTP(B → A) such that F ◦ E(ρ) = ρ and
F ◦ E(σ) = σ then
D(ρ∥σ) = D E(ρ) E(σ) . (5.59)
Tr[ρΛ]
sup(ρ/σ) := sup . (5.61)
0⩽Λ⩽I A Tr[σΛ]
2. Show that
sup(ρ/σ) = inf λ ∈ R : λσ − ρ ⩾ 0 . (5.63)
Joint Convexity
We say that a quantum divergence D is jointly convex if for any quantum system A, m ∈ N,
p ∈ Prob(m), and two sets, {ρx }x∈[m] and {σx }x∈[m] of m density matrices in D(A) we have
X X X
D px ρ x px σ x ⩽ px D(ρx ∥σx ) . (5.64)
x∈[n] x∈[n] x∈[n]
Although not every quantum divergence exhibits joint convexity, the combination of joint
convexity with both the property described in (5.60), and the invariance under isometries,
results in a condition that is more stringent than DPI.
Lemma 5.2.2. Let D be a function with the same domain and range as a quantum
divergence that is invariant under isometries. Suppose further that D is jointly
convex and satisfies (5.60) for any quantum systems A and B, and quantum states
ρ, σ ∈ D(A) and ω ∈ D(B). Then D satisfies the DPI.
Proof. Due to Stinespring dilation theorem, the invariance under isometries implies that it
is sufficient to prove that for any two bipartite states ρ, σ ∈ D(AB)
D ρAB σ AB ⩾ D ρA σ A .
(5.65)
D ρA σ A = D ρA ⊗ uB σ A ⊗ uB
B→B AB B→B AB
=D R ρ R σ
1 X (5.66)
D UxB→B ρAB UxB→B σ AB
Joint Convexity → ⩽
n
x∈[n]
where {|ax ⟩}x∈[m] and {|by ⟩}y∈[m] are orthonormal bases consisting of the eigenvectors of ρ
and σ, respectively. Define the probability vectors p̃, q̃ ∈ Prob(m2 ) whose components are
given by
p̃xy := px |⟨ax |by ⟩|2 and q̃xy = qy |⟨ax |by ⟩|2 ∀ x, y ∈ [m] . (5.68)
Now, if D is a classical divergence then we can extend it to ρ, σ ∈ D(A) by
Dq (ρ∥σ) := D p̃ q̃ .
(5.69)
Clearly, Dq (ρ∥σ) is zero if ρ = σ, and in the exercise below you show that if ρ and σ are
diagonal then Dq (ρ∥σ) := D p q , where p and q are the diagonals of ρ and σ. In the
following lemma we show that Dq is invariant under isometries.
Lemma 5.2.3. Let D be a classical divergence and define Dq as in (5.69). Then, for
any isometry channel V ∈ CPTP(A → B) and any ρ, σ ∈ D(A)
Proof. The non-zero components of p̃ and q̃ remain unchanged if ρ and σ are replaced with
V(ρ) and V(σ), for any isometry V ∈ CPTP(A → B). Moreover, note that with n := |A|
since both {|ax ⟩}x∈[m] and {|by ⟩}y∈[m] are bases of A. Hence, denoting by n := |B|, the
additional n−m zero eigenvalues of V(ρ) (and similarly of V(σ)), corresponds to eigenvectors
that are in the orthogonal complement of K. Hence, if p̃, q̃ corresponds to ρ and σ as in (5.68)
then p̃ ⊕ 0k and q̃ ⊕ 0k corresponds to V(ρ) and V(σ), respectively, where 0k is the zero
vector in dimension k := n2 − m2 . Hence,
where the second equality follows from the fact that classical divergences are invariant under
embedding (see Exercise 5.2.2).
Exercise 5.2.5. Show that if ρ and σ are diagonal then Dq (ρ∥σ) := D p q , where p and
q are the diagonals of ρ and σ.
Due to Lemma 5.2.3, the Stinespring delation implies that Dq (ρ∥σ) as defined above is
a quantum divergence if and only if it is non-increasing under the partial trace. This later
property does not hold in general, however, it does hold when D belongs to a large class of
f -divergences.
Quantum f -Divergence
Definition 5.2.2. Let f : (0, ∞) → R be an operator convex function satisfying
f (1) = 0. Let Df be its corresponding classical f -divergence as defined in
Definition 5.1.2. The quantum f -divergence, Dfq , is defined on any ρ, σ ∈ D(A) as
Remark. We will see below that the requirement that f is operator convex (vs just convex)
ensures that Dfq is indeed a quantum divergence. Moreover, from (5.15), for any ρ, σ ∈ D(A)
with spectral decomposition as in (5.67) we have (see exercise below)
X px
Df (ρ∥σ) = lim+ (qy + ε)f |⟨ax |by ⟩|2 . (5.74)
ε→0
x,y
q y + ε
Exercise 5.2.6. Prove (5.74) and use it to show that for any ρ, σ ∈ D(A)
X px
|⟨ax |by ⟩|2 + f (0)Tr (I − ρ0 )σ + f˜(0)Tr (I − σ 0 )ρ , (5.75)
Df (ρ∥σ) = qy f
qy
x∈supp(p)
y∈supp(q)
Quantum Formula
Theorem 5.2.1. Let f : (0, ∞) → R be an operator convex function satisfying
f (1) = 0. For any ρ, σ ∈ D(A) with σ > 0 the quantum f -Divergence can be
expressed as h
−1
i
Df (ρ∥σ) = Tr ϕA
σ
Ã
f σ ⊗ ρ T
, (5.76)
where |ϕA Ã
σ ⟩ := σ
1/2
⊗ I Ã |ΩAÃ ⟩ is a purification of σ. For σ ⩾ 0 the f -divergence
satisfies (with u ∈ D(A) is the maximally mixed state)
Df (ρ∥σ) = lim+ Df ρ (1 − ε)σ + εu . (5.77)
ε→0
X p̃xy
Df (ρ∥σ) := Df p̃ q̃ = q̃xy f
q̃xy
x,y∈[m]
(5.78)
X
2 px
= qy |⟨ax |by ⟩| f .
qy
x,y∈[m]
Now, for every x, y ∈ [m] we can express qy |⟨ax |by ⟩|2 as follows:
E
qy |⟨ax |by ⟩|2 = ΩAÃ qy |by ⟩⟨by | ⊗ |ax ⟩⟨ax |T ΩσAÃ
D E (5.79)
1 1 AÃ T AÃ
qy |by ⟩⟨by | = σ |by ⟩⟨by |σ −−−−→ = ϕσ
2 2 |by ⟩⟨by | ⊗ |ax ⟩⟨ax | ϕσ .
The case σ ⩾ 0 follows directly from Exercise 5.1.4 and is left as an exercise.
We now demonstrate that the expression for the f -divergence, as outlined in the preceding
theorem, satisfies the data processing inequality when f is operator convex.
From the quantum formula in (5.76) we see that the left hand side of (5.81) depends on
(σ A )−1 ⊗ (ρà )T whereas the right hand side depends on (σ AB )−1 ⊗ (ρÃB̃ )T . In the exercise
below you will show that there exists an isometry that relates between these two expressions.
Explicitly, you will show that there exists an isometry V : AÃ → AÃB B̃ (with V ∗ V = I AÃ )
such that T −1 Ã T
∗ AB −1 ÃB̃
V = σA
V σ ⊗ ρ ⊗ ρ . (5.82)
Combining this with the operator Jensen’s inequality (B.30) for operator convex functions
we get
T T
A −1 Ã ∗ AB −1 ÃB̃
f σ ⊗ ρ =f V σ ⊗ ρ V
T (5.83)
∗ AB −1 ÃB̃
(B.30)→ ⩽ V f σ ⊗ ρ V .
Finally, multiplying both sides by ϕσAÃ and taking the trace gives
T
A A Ã ∗ AB −1
ϕA ÃB̃
Df ρ ⩽ Tr V
σ σ ⊗ ρ σ V f
T
(5.84)
AB ÃB̃ AB −1 ÃB̃
(5.86)→ = Tr ϕσ f σ ⊗ ρ
= Df ρAB σ AB .
3. Show that
A
21 Ã
AÃ
AB
21 ÃB̃
V σ ⊗I Ω = σ ⊗I ΩAB ÃB̃ , (5.86)
Examples:
1. The Umegaki Divergence. For the function f (r) = r log r
X px
Df (ρ∥σ) = lim+ (qy + ε)f |⟨ax |by ⟩|2
ε→0
x,y
q y + ε
X px
= lim+ px log |⟨ax |by ⟩|2
ε→0
x,y
qy + ε (5.87)
X X
= px log px − px |⟨ax |by ⟩|2 log qy
x x,y
where in the last line we used the relation ⟨by |ρ|by ⟩ = x px |⟨ax |by ⟩|2 . Since f (r) =
P
r log r is operator convex, the above expression is a quantum divergence. It is known
as the Umegaki divergence or sometimes referred to as the relative entropy. We will
discuss many of its properties in the following chapters.
2. The Trace Distance? The function f (r) = 21 |r − 1| is convex but it is not operator
convex (on any domain that includes 1). Therefore, we cannot conclude that for this
choice Dfq is a quantum divergence. Moreover, note that for this case
X 1 px
Df (ρ∥σ) = lim+ (qy + ε) − 1 |⟨ax |by ⟩|2
ε→0
x,y
2 qy + ε
(5.88)
1X
= px |⟨ax |by ⟩|2 − qx |⟨ax |by ⟩|2 ,
2 x,y
Exercise 5.2.8. Use the quantum formula given in Theorem 5.2.1 to compute the Umegaki
divergence and the quantum α-divergence.
One may wonder whether such optimal quantum extensions exists. In the next theorem we
prove that they do using the following construction.
Let D be a classical divergence, and for any ρ, σ ∈ D(A) define
D(ρA ∥σ A ) := sup D E A→X ρA E A→X σ A ,
(5.91)
X X A X→A X A X→A X
D(ρ∥σ) := inf D p q : ρ =F (p ), σ = F (q ) , (5.92)
where the optimizations are over the classical system X, the channels E ∈ CPTP(A → X)
and F ∈ CPTP(X → A), as well as the probability distributions (diagonal density
matrices)
p, q ∈ D(X). Note that E is a POVM channel and therefore D E(ρ) E(σ) is well defined
since E(ρ) and E(σ) are classical states; i.e. they can be viewed as probability vectors or
diagonal density matrices. Similarly, pX and qX can be viewed either as diagonal density
matrices or as probability vectors. Moreover, the supremum and infimum are taken over all
dimensions |X| ∈ N.
Optimal Extensions
Theorem 5.3.1. Let D be a classical divergence, and let D and D be as in (5.91)
and (5.92), respectively. Then, both D and D are quantum divergences that reduces
to D on classical states. In addition, any other quantum divergence D′ that reduces
to D on classical states satisfies (5.90).
Proof. We first prove the reduction property. Let ρ, σ ∈ D(A) be classical states. Then, for
D we can take X in (5.91) to be a classical system with |X| = |A| and E to be the identity
channel. Since this identity channel is not necessarily the optimal channel, we get that
D(ρ∥σ) ⩾ D(ρ∥σ) . (5.93)
Conversely, since ρ and σ are classical, any E in (5.91) can be assumed to be classical since
E(ρ) = E ◦ ∆(ρ) and E(σ) = E ◦ ∆(σ) (5.94)
where ∆ is the completely dephasing channel. Therefore, if E is not classical we can replace
it with E ◦ ∆ which is classical (recall that the output of E is classical). Now, by the DPI
property of the classical divergence D we have for all such classical E, D E(ρ) E(σ) ⩽
D(ρ∥σ). Hence, we must have
D(ρ∥σ) ⩽ D(ρ∥σ) . (5.95)
Combining (5.93) with (5.95) we conclude that D(ρ∥σ) = D(ρ∥σ). Similarly, for D we can
assume that F in (5.92) is a classical channel since ρ and σ are classical. Hence, by the DPI
of D we get the lower bound D(ρ∥σ) ⩾ D(ρ∥σ), and this bound can be saturated since we
can take F in (5.92) to be the identity channel.
We next prove that D and D both satisfy the DPI. Let N ∈ CPTP(A → B). Then,
D N (ρ) N (σ) = sup D E ◦ N (ρ) E ◦ N (σ) : E ∈ CPTP(B → X)
X
′ ′
: E ′ ∈ CPTP(A → X)
E replaces E ◦ N → −−−−→ ⩽ sup D E (ρ) E (σ)
′ (5.96)
X
= D(ρ∥σ) .
For D we have
D(ρ∥σ) := inf D(p∥q) : ρ = F(p), σ = F(q), F ∈ CPTP(X → A)
X
⩾ inf D(p∥q) : N (ρ) = N ◦ F(p), N (σ) = N ◦ F(q), F ∈ CPTP(X → A)
X
⩾ inf D(p∥q) : N (ρ) = F ′ (p), N (σ) = F ′ (q), F ′ ∈ CPTP(X → B)
X
= D N (ρ) N (σ) ,
(5.97)
where the first inequality follows from the fact that if ρ = F(p) then necessarily N (ρ) =
N ◦ F(p) (but the converse is not necessarily true), and in the second inequality we replaced
N ◦ F with F ′ .
Finally, we prove the optimality of D and D. First observe that from the DPI of D′ we
have for any ρ, σ ∈ D(A) and any E ∈ CPTP(A → X)
where the last equality follows from the fact that D′ reduces to D on classical states. Since the
above inequality holds for all E ∈ CPTP(A → X) it also holds for the supremum over such
E. We therefore conclude that D′ (ρ∥σ) ⩾ D(ρ∥σ). For the second inequality, let ρ, σ ∈ D(A)
and p, q ∈ D(X), and suppose there exists F ∈ CPTP(X → A) such that ρ = F(p) and
σ = F(q). Then, from the DPI of D′ we get
where the last equality follows from the fact that D′ reduces to D on classical states. Since
the above inequality holds for all such p, q for which there exists an F that takes them to ρ
and σ, it must also hold for the infimum over all such p, q. Hence, D′ (ρ∥σ) ⩽ D(ρ∥σ).
Since the maximal and minimal extension provides upper and lower bounds on all exten-
sions, it can be useful to have a closed formula for them. Remarkably, a closed formula for
the maximal extension exists if one of the input states is pure, or for the f -Divergences if f
is operator convex. On the other hand, at the time of writing this book, a closed formula
for the minimal extension of the f -divergence is not known. However, for specific examples
such as the trace distance and fidelity, the minimal extension can be computed (see the
next section), and as we will see in Chapter 6 the regularized minimal extension can also be
computed for all known relative entropies.
Exercise 5.3.1. Let f : [0, ∞) → [0, ∞) be a operator convex function, and for all ρ, σ ∈
D(A) with σ > 0 define
h 1 i
′ −2 − 12
Df (ρ∥σ) := Tr [ρ#f σ] = Tr σf σ ρσ , (5.100)
where #f is the Kobu-Ando operator mean (see Definition B.5.1). Finally, let Df be the
maximal f -divergence.
1. Show that Df′ reduces to the classical f -divergence when ρ and σ are classical (i.e.
diagonal in the same basis).
2. Show that Df′ satisfies the DPI in the domain D(A) × D>0 (A). Hint: Show that Df′
satisfies all the conditions of Lemma 5.2.2.
where n ∈ N, p, q ∈ Prob(n), and for each x ∈ [n], ωx ∈ D(A). Note that we replaced
F(|x⟩⟨x|) with ωx . The infimum above can include vectors p and q with zero components.
We now show that the number of zeros in each of these vectors can be restricted to be at
most one.
Proof. We first show that q can have this property. Since divergences are invariant under
(joint) permutation of the components of p and q, without loss of generality we can assume
that
q1 ⩾ · · · ⩾ qr > qr+1 = · · · = qn = 0 . (5.104)
where r is the number of non-zero components of q. With this order of q we have (see
Exercise 4.3.3) (p, q) ∼ (p′ , q), where
n
′
T X
p = p1 , . . . , pr , p′r+1 , 0, . . . , 0 where p′r+1 := px . (5.105)
x=r+1
1
Pn
where τ := p′r+1 x=r+1 px ωx . Therefore, the vectors
T
p̃ = p1 , . . . , pr , p′r+1 and q̃ = (q1 , . . . , qr , 0) (5.107)
satisfy both (5.103) with n replaced by r + 1, and D(p∥q) = D(p̃∥q̃). Repeating the same
arguments for p̃ completes the proof.
It’s important to note that the lemma mentioned above aids in simplifying the opti-
mization problem described in (5.102). This simplification is achieved by assuming, without
any loss of generality, that p and q have forms similar to p̃ and q̃ as specified in (5.107).
Consequently, we can redefine the infimum in (5.102) as an infimum over all 1 < n ∈ N,
p ∈ Prob(n), and 0 < q ∈ Prob(n − 1), provided there are n − 1 density matrices
{ωx }x∈[n−1] ⊂ D(A) meeting the following criteria:
X X
ρ⩾ px ωx and σ = qx ωx , (5.108)
x∈[n−1] x∈[n−1]
where it is understood that the inequality in the first relation is satisfied if and only if there
exists a density matrix ωn ∈ D(A) such that
X
ρ= px ωx + pn ωn . (5.109)
x∈[n−1]
As we will see, in certain cases, working with the expression in (5.108) becomes more man-
ageable because q > 0. In other situations, it might be preferable to work with p > 0. It
is worth noting that by applying the same reasoning as above but substituting p for q and
vice versa, we can also express the infimum in (5.102) as an infimum over all 1 < n ∈ N,
0 < p ∈ Prob(n − 1) and q ∈ Prob(n), with the requirement of having n − 1 density matrices
{ωx }x∈[n−1] ⊂ D(A) that satisfy:
X X
ρ= px ωx and σ ⩾ qx ωx . (5.110)
x∈[n−1] x∈[n−1]
In the next theorem we employ this property to calculate the maximal divergence when one
of the input states is pure.
where n o
λmax := max λ ∈ R : λψ ⩽ σ . (5.112)
Proof. Consider the relation (5.110) with the pure state ψ replacing ρ. Since ρ := ψ is a
pure state, the first relation in (5.110) can hold if and only if for any x ∈ [n − 1] we have
ωx = ψ. Substituting this into the second relation in (5.110) we obtain
X
σ⩾ qx ψ = (1 − qn )ψ . (5.113)
x∈[n−1]
Finally, we simplify the expression above by demonstrating that we can confine the value of
n in the optimization above to be equal to two. To achieve this, let E be the 2 × n column
stochastic matrix
1 ··· 1 0
E := . (5.115)
0 ··· 0 1
Observe that Ep = (1, 0)T and Eq = (1 − qn , qn )T so that
Therefore, the minimum is obtained with n = 2 and with the pair (p, q) being equal to the
pair on the right hand side of the equation above.
where f˜(0) is defined in (5.13), and the infimum above is over all 1 < n ∈ N, p ∈ Prob(n)
and 0 < q ∈ Prob(n − 1), such that there exists n − 1 density matrices {ωx }x∈[n−1] ⊂ D(A)
1 1 1 1
satisfying (5.108). Denoting by Λx := qx σ − 2 ωx σ − 2 , and applying the conjugation σ − 2 (·)σ − 2
to both sides of (5.108) gives the relations:
1 1
X px X
σ − 2 ρσ − 2 ⩾ Λx and Λx = I A . (5.118)
qx
x∈[n−1] x∈[n−1]
With these new notations, the infimum in (5.117) is taken over all 1 < n ∈ Prob(n), all
p ∈ Prob(n), and all POVMs {Λx }x∈[n−1] for which the inequality (5.118) holds with qx :=
Tr [Λx σ] > 0.
One natural choice/guess for the optimal n, p and {Λx }x∈[n−1] , is to choose them such
that the inequality in (5.118) becomes an equality. This is possible for example by taking
n = |A| + 1, and for any x ∈ [n − 1] to take Λx = ψx ∈ Pure(A) with |ψx ⟩ being the
1 1
x-eigenvector of σ − 2 ρσ − 2 corresponding to the eigenvalue px /qx (i.e. p is chosen such that
1 1
px /Tr[σΛx ] is the x-eigenvalue of σ − 2 ρσ − 2 ). For this choice we have
1 1
X px
σ − 2 ρσ − 2 = |ψx ⟩⟨ψx | (5.119)
qx
x∈[n−1]
which forces pn to be
X X px
pn = 1 − px = 1 − ⟨ψx |σ|ψx ⟩ = 1 − Tr[ρ] = 0 , (5.120)
qx
x∈[n−1] x∈[n−1]
where the last equality follows by multiplying both sides of (5.119) by σ and taking the
trace. Moreover, for these choices of n, p and {Λx }, we have
X px X px
qx f = Tr[σ|ψx ⟩⟨ψx |]f
qx qx
x∈[n−1] x∈[n−1]
X px
∀t ⩾ 0 f (t|ψx ⟩⟨ψx |) = f (t)|ψx ⟩⟨ψx | −−−−→ = Tr σf |ψx ⟩⟨ψx |
qx (5.121)
x∈[n−1]
h X p i
x
{|ψx ⟩}x∈[n−1] is orthonormal −−−−→ = Tr σf |ψx ⟩⟨ψx |
qx
x∈[n−1]
h 1 1
i
(5.119)→ = Tr σf σ − 2 ρσ − 2 .
Note that we obtained the formula above for a particular choice of n, p and {Λx }x∈[n−1] .
Therefore, since this is not necessarily the optimal choice (recall Df is defined in terms of
an infimum), we must have
h 1 i
−2 − 12
Df (ρ∥σ) ⩽ Tr σf σ ρσ = Tr [ρ#f σ] . (5.122)
where #f is the Kobu-Ando operator mean (see Definition B.5.1). Interestingly, to get this
upper bound we did not even assume that f is convex, but if f is operator convex we get an
equality above.
Remark.
ρ11 ζ
(see (B.75)) of the block ρ22 of ρ = .
∗
ζ ρ22
2. In Theorem B.5.1 we proved that for a continuous function f : [0, ∞) → [0, ∞) the
Kobu-Ando operator mean #f is operator convex if and only if it is jointly convex.
Therefore, at least in the domain D(A) × D>0 (A) the maximal f divergence is jointly
convex for any operator convex f .
Proof. The proof of the theorem follows immediately from the inequality (5.122) combined
with the opposite inequality (5.101).
Examples:
1. The Belavkin–Staszewski divergence. Consider the function f (r) = r log r. In this
case we have f˜(0) = limε→0+ εf (1/ε) = limε→0+ log(1/ε) = ∞. According to the closed
form in (D.42), this means that unless supp(ρ) ⊆ supp(σ) we have Df (ρ∥σ) = ∞. For
the case supp(ρ) ⊆ supp(σ) we have
h 1 1 i
−2 − 12 −2 − 12
Df (ρ∥σ) = Tr σ σ ρσ log σ ρσ (5.125)
σ > 0)
1 h 1 1 α
1 1
i
Dfα (ρ∥σ) = Tr σ σ − 2 ρσ − 2 − σ − 2 ρσ − 2
α(α − 1)
(5.127)
1 h 1 1 α
i
= Tr σ σ − 2 ρσ − 2 −1 .
α(α − 1)
Note that if all the components of v are non-negative real numbers then all the inequalities
above must be equalities and we get in particular that ∥v∥ = ∥v∥1 .
Exercise 5.4.1.
1. Show that the trace norm is indeed a norm.
2. Show that the trace norm is always bigger than the norm induced by the inner prod-
uct (2.16).
Exercise 5.4.2. Show that for any 3 Hermitian operators M, N, σ ∈ Herm(A), with σ > 0,
the following holds:
1. Tr(M N ) ⩽ ∥M ∥2 ∥N ∥2
√ √
2. ∥ σM σ∥1 ⩽ ∥M ∥2 ∥σ∥2
p
3. ∥M ∥1 ⩽ Tr[σ] σ −1/4 M σ −1/4 2 .
√
Hint: Use part (b) with M replaced by σ −1/4 M σ −1/4 and σ replaced by σ.
where ∥ · ∥2 is the norm induced by the Hilbert-Schmidt inner product.
The subsequent two lemmas establish that the trace norm can be formulated as opti-
mization problems. These formulations are instrumental in proving various properties of the
trace norm. We begin with an expression that is particularly useful for Hermitian matrices.
∥M ∥1 = max Tr [M Π] : −I A ⩽ Π ⩽ I A , Π ∈ Herm(A) .
(5.132)
Proof. Let M+ and M− be the positive and negative parts of M (see (2.54)), and let Π−
and Π+ = I − Π− the projections to the negative, and non-negative eigenspaces of M . With
these notations we have |M | = M+ + M− , so that the trace norm of M can be expressed as
∥M ∥1 = Tr[M+ ] + Tr[M− ]
= Tr [M (Π+ − Π− )] (5.133)
Exercise 5.4.3→ = max Tr [M Π] ,
−I⩽Π⩽I
where the maximum is over all matrices Π ∈ Herm(A) with eigenvalues between −1 and
1.
Exercise 5.4.3. Prove the last equality in Eq. (5.133).
Exercise 5.4.4. Show that for any two (normalized) pure states |ψ⟩, |ϕ⟩ ∈ A we have
1 p
T (ψ, ϕ) := |ψ⟩⟨ψ| − |ϕ⟩⟨ϕ| 1
= 1 − |⟨ψ|ϕ⟩|2 . (5.134)
2
Hint: Denote |0⟩ := |ψ⟩ and express |ϕ⟩ := a|0⟩ + b|1⟩ where |1⟩ is some (normalized)
orthogonal vector to |0⟩.
Exercise 5.4.5. Let A be a Hilbert space and let |ψ⟩, |ϕ⟩ ∈ A be two (normalized) states in
A. Denote ψ := |ψ⟩⟨ψ| and ϕ := |ϕ⟩⟨ϕ|. Show that
1
∥ψ − ϕ∥1 ⩽ |ψ⟩ − |ϕ⟩ (5.135)
2
where the norm on the right-hand side is the induced inner-product norm ∥|χ⟩∥ := ⟨χ|χ⟩1/2 .
Hint: Use the previous exercise.
The trace norm can also be expressed as an optimization over partial isometries.
Lemma 5.4.1. Let A and B be two finite dimensional Hilbert spaces, and let
M : A → B be a linear operator. Then, the trace norm of M can be expressed as
∥M ∥1 = Tr [U M ] ⩽ max Tr [V M ] (5.137)
V :B→A
Hence, all the inequalities in (5.138) must be equalities. This completes the proof.
Exercise 5.4.6. Show that if |A| ⩾ |B| in Lemma (5.4.1) then the maximization over
partial isometries in (5.136) can be replaced with maximization over isometries V : B → A.
Similarly, show that if |A| ⩽ |B| then
∥M ∥1 = max Tr [U ∗ M ] , (5.140)
U :A→B
A A
Hence, it will be sufficient
P to prove that E |ϕx ⟩⟨ψ x | 1
⩽ 1 for all x since this would
imply that ∥E(M )∥1 ⩽ x∈[n] λx = ∥M ∥1 . For simplicity of the exposition we remove the
sub-index x from the rest of the proof, since nothing will depend on it.
Now, the square matrix E(|ϕ⟩⟨ψ|) has a polar decomposition
where {|φy ⟩} is an orthonormal basis of B, and {θy } are some phases. Hence,
From the Exercise 3.4.4, it follows that E ∗ is positive and sub-unital (i.e. E ∗ (I) ⩽ I).
Therefore, the matrices Λy := E ∗ (|φy ⟩⟨φy |) ⩾ 0 form an incomplete POVM since
X
Λy = E ∗ (I B ) ⩽ I A . (5.146)
y∈[n]
Exercise 5.4.7. Provide an alternative (simpler) proof of the theorem above for the case
that M is Hermitian. Hint: Use the previous lemma and prove first that if −I B ⩽ Π ⩽ I B
then −I A ⩽ E ∗ (Π) ⩽ I A .
1
T (ρ, σ) := ∥ρ − σ∥1 (5.148)
2
The inclusion of the one-half factor is for normalization purposes, specifically to ensure
that the distance reaches its maximum value of 1 when the two states, ρ and σ, are orthogonal
(refer to Exercise 5.4.8 for more details).
Consider the case in which ρ and σ are classical, or
Pequivalently commute,Pand therefore
diagonal in the same basis. In this case, denoting ρ = x∈[n] px |x⟩⟨x| and σ = x∈[n] qx |x⟩⟨x|
we get
1 X 1X
T (ρ, σ) := (px − qx )|x⟩⟨x| = |px − qx | =: T (p, q) , (5.149)
2 1 2
x∈[n] x∈[n]
where p := (p1 , . . . , pn )T , q = (q1 , . . . , qn )T , and T (p, q) as defined above denotes the trace
distance between the classical probability vectors p and q.
In the general case, since both ρ and σ have the same trace
where (ρ − σ)± are the positive and negative parts of ρ − σ. Therefore, denoting by Π+ the
projection to the positive eigenspace of ρ − σ, we conclude that
1
T (ρ, σ) := (Tr[(ρ − σ)+ ] + Tr[(ρ − σ)− ]) = Tr[(ρ − σ)+ ] = Tr[(ρ − σ)Π+ ] (5.151)
2
That is, the trace distance can be written as
where the maximization is over any matrix Π ∈ Pos(A) (not necessarily a projection) with
eigenvalues between 0 and 1. The expression above for the trace distance will be useful in
some of applications we discuss later on.
The monotonicity of the trace norm under quantum channels (in fact positive maps)
implies the monotonicity of the trace distance as well. We summarize it in the following
theorem.
T (ρ, σ) = 1 ⇐⇒ ρσ = σρ = 0. (5.154)
Exercise 5.4.9. Let u ∈ D(A) be the maximally mixed state and ψ ∈ Pure(A) be a pure
state. Show that the trace distance between these two states is given by
m−1
T uA , ψ A = , (5.155)
m
where m := |A|.
where Πm ∈ Pos(A) is the projection to the subspace spanned by the m eigenvectors of the
m largest eigenvalues of ρ. By definition, Tr [ρΠm ] = ∥ρ∥(m) and ρ commutes with ρ(m) . In
the following exercise you use these properties to show that the trace distance between ρ
and ρ(m) is related to the Ky Fan norm.
T ρ, ρ(m) = 1 − ∥ρ∥(m) ,
(5.157)
where ∥ · ∥(m) is the Ky Fan norm. Hint: Use the relations T ρ, ρ(m) = Tr ρ(m) − ρ + and
Tr [ρΠm ] = ∥ρ∥(m) , and the fact that ρ commutes with ρ(m) .
In the following theorem we use the notation Dm (A) to denote the set of all density
matrices in D(A), whose rank is not greater than m.
Theorem 5.4.3. Using the same notations as above, the trace-distance of ρ to the
set Dm (A) is given by
∥ρ∥(m) ⩾ Tr [Πσ ρ]
= Tr [Πσ σ] + Tr [Πσ (ρ − σ)]
h i
= 1 + Tr Πσ (ρ − σ)+ − (ρ − σ)−
(5.159)
Tr [Πσ (ρ − σ)+ ] ⩾ 0 −−−−→ ⩾ 1 − Tr [Πσ (ρ − σ)− ]
Πσ ⩽ I A −−−−→ ⩾ 1 − Tr(ρ − σ)−
= 1 − T (ρ, σ) .
T (ρ, σ) ⩾ 1 − ∥ρ∥(m)
(5.160)
Exercise 5.4.10→ = T ρ, ρ(m) .
On the other hand, since ρ(m) ∈ Dm (A) by taking above σ = ρ(m) we can achieve an equality.
Hence,
T ρ, Dm (A) = T ρ, ρ(m) = 1 − ∥ρ∥(m) .
(5.161)
where the supremum is over all classical systems X, POVM channels E ∈ CPTP(A → X),
and the diagonal matrices E(ρ) and E(σ) are viewed as probability vectors.
Theorem 5.4.4. Using the same notations as above, for all ρ, σ ∈ D(A)
1
T c (ρ, σ) = T (ρ, σ) := ∥ρ − σ∥1 (5.163)
2
Remark. The theorem above demonstrates that the quantum trace distance is the smallest
divergence that reduces to the classical trace distance on classical states.
Proof. From Sec. (5.3), particularly Theorem 6.4, it follows that for any ρ, σ ∈ D(A) we have
T (ρ, σ) ⩾ T c (ρ, σ) since T c is the minimal quantum divergence that reduces to the classical
trace distance when the input restricted to classical states. To prove the converse inequality
T (ρ, σ) ⩽ T c (ρ, σ) , let Π± be the two projections to the positive and negative eigenspaces
of ρ − σ, and let E ∈ CPTP(A → X) with |X| = 2 be its corresponding POVM channel; i.e.
E(ω) = Tr[ωΠ+ ]|0⟩⟨0| + Tr[ωΠ− ]|1⟩⟨1| for all ω ∈ D(A). Then, by definition,
1
T c ρ, σ ⩾ Tc E(ρ), E(σ) = (Tr[(ρ − σ)+ ] + Tr[(ρ − σ)− ]) = T (ρ, σ) . (5.164)
2
This completes the proof.
Then,
1 X
T ρXA , σ XA = |x⟩⟨x| ⊗ (px ρx − qx σx )
2 1
x∈[m]
1 X
= ∥px ρx − qx σx ∥1
2
x∈[m]
1 X
= ∥px ρx − px σx + px σx − qx σx ∥1 (5.167)
2
x∈[m]
1 X
Triangle inequality→ ⩽ ∥px ρx − px σx ∥1 + ∥px σx − qx σx ∥1
2
x∈[m]
X 1 X
= px T (ρx , σx ) + |px − qx | .
2
x∈[m] x∈[m]
Hence, X X
px ρA qx σxA = T TrX [ρXA ], TrX [σ XA ]
T x,
x∈[m] x∈[m]
Therefore,
X X X
T px ρx , qx σx = px Tr [Π(ρx − σx )] + (px − qx )Tr[Πσx ]
x∈[m] x∈[m] x∈[m]
X X
⩽ px T (ρx , σx ) + (px − qx )+ (5.169)
x∈[m] x∈[m]
X
= px T (ρx , σx ) + T (p, q) ,
x∈[m]
where we used the fact that Tr[Πσx ] ⩽ 1 and the fact that px − qx ⩽ (px − qx )+ . This
completes the proof.
We conclude this subsection by discussing a nuanced yet crucial property of the trace
distance. This property is highly relevant to certain applications in quantum information,
though it is often overlooked. Consider ρ, σ ∈ D(A) and let us define ε := T (ρ, σ). If
ε is very small, it implies that ρ and σ are nearly identical states. This concept can be
articulated as follows: Decompose ρ − σ into positive and negative parts, written as ρ − σ =
(ρ − σ)+ − (ρ − σ)− . Then, define two states ω± := 1ε (ρ − σ)± . Given that ε = T (ρ, σ) =
Tr(ρ − σ)+ = Tr(ρ − σ)− , it follows that ω± are valid density matrices in D(A). Furthermore,
we can express:
ρ − σ = ε(ω+ − ω− ); . (5.170)
The importance of this equation lies in the fact that the matrix H := ω+ − ω− is bounded,
satisfying −I ⩽ H ⩽ I. Additionally, the equation ρ = σ + εH does not depend explicitly
on the underlying dimension |A|.
To further elucidate this point, consider the following straightforward example involving
the Schatten 2-norm (the norm induced by the Hilbert-Schmidt inner product). Let ρn = n1 In
denote the n × n maximally mixed state. Observe that its 2-norm is calculated as follows:
p 1
∥ρn ∥2 := Tr[ρ2n ] = √ . (5.171)
n
Consequently, as n approaches infinity, ∥ρn ∥2 tends towards zero, while the trace norm
∥ρn ∥1 = 1 for all n ∈ N.
Exercise 5.4.11. Using the same notations as above, show that if a set of Hermitian matrices
{Hn }, with each Hn ∈ Herm(Cn ), satisfies limn→∞ ∥Hn ∥1 = 0, then there exists a sequence
of positive numbers {εn } with a limit limn→∞ εn = 0 and a set of bounded matrices {Mn }
with −In ⩽ Mn ⩽ In such that Hn = εn Mn .
√ √ √ √ √ √
q
F (ρ, σ) := ∥ ρ σ∥1 = Tr | ρ σ| = Tr σρ σ . (5.172)
where p := (p1 , . . . , pn )T , q = (q1 , . . . , qn )T , and F (p, q) as defined above denotes the fidelity
√ √
between the classical probability vectors p and q. If ρ = σ we get that F (ρ, ρ) = ∥ ρ ρ∥1 =
∥ρ∥1 = Tr[ρ] = 1. Moreover, for any ρ, σ ∈ D(A), the fidelity F (ρ, σ), cannot be greater
than one. This will follow trivially from Uhlmann’s theorem below, and can also be seen
from the following argument:
√ √ √ √
| ρ σ|2 = σρ σ
√ √
ρ = I − (I − ρ) −−−−→ = σ − σ(I − ρ) σ (5.174)
√ √
σ(I − ρ) σ ⩾ 0 −−−−→ ⩽ σ ⩽ I A .
√ √
Therefore, | ρ σ| ⩽ I A .
Exercise 5.4.12. Let ρ, σ ∈ D(A).
1. Show that the fidelity is symmetric: F (ρ, σ) = F (σ, ρ). Hint: Use the fact that for any
complex matrix M , the matrix M ∗ M has the same non-zero eigenvalues as M M ∗ .
p
2. Show that if σ = |ψ⟩⟨ψ| is pure then F (ρ, σ) = ⟨ψ|ρ|ψ⟩
√ √
Exercise 5.4.13. Let ρ, σ ∈ D(A). Show that if λ is an eigenvalue of the matrix √ | ρ σ|
√
then λ2 is an eigenvalue of the non-Hermitian matrix ρσ. Hint: Let M = σρ σ and
N = ρσ and find a matrix η ⩾ 0 such that M = η −1 N η, where η −1 is the generalized inverse
of η.
Uhlmann’s Theorem
The last part in the exercise above also implies that if both ρ = |ψ⟩⟨ψ| and σ = |ϕ⟩⟨ϕ|
are pure, then the fidelity becomes the absolute value of the inner product between the two
states; i.e. F (ρ, σ) = |⟨ψ|ϕ⟩|. The following theorem by Uhlmann’s shows that this can be
extended to mixed states by considering all the possible purifications of ρ and σ.
Uhlmann’s Theorem
Theorem 5.4.6. Let ρ, σ ∈ D(A) be two density matrices, and let |ψ AB ⟩ and |ϕAC ⟩
be two purifications of ρA and σ A , respectively. Then,
Remark. We emphasis that the purifying systems B and C are not necessarily isomorphic.
That is, we can have |B| =
̸ |C|.
Proof. From Exercise (2.3.32) it follows that the purifications |ψ AB ⟩ and |ϕAC ⟩ must have
the form:
√ √
|ψ AB ⟩ = ρ ⊗ U Ã→B |ΩAÃ ⟩ and |ϕAB ⟩ = σ ⊗ W Ã→C |ΩAÃ ⟩ , (5.176)
Uhlmann’s theorem has numerous applications in quantum information, and we will use
it quite often later on in the book. The following corollary is an immediate consequence of
Uhlmann’s theorem. We leave its proof as an exercise.
Corollary 5.4.1. Let ρ, σ ∈ D(A). Then, F (ρ, σ) ⩽ 1 with equality if and only if ρ = σ.
The next consequence of Uhlmann’s theorem is the monotonicity of the fidelity under
quantum channels.
Proof. Let |ψ AC ⟩ and |ϕAC ⟩ be optimal purifications of ρ and σ such that the fidelity
F (ρ, σ) = |⟨ψ AC |ϕAC ⟩| (i.e. we are using Uhlmann’s Theorem). Now, from Stinespring
dilation theorem there exists an isometry V : A → BE such that
Denote by |ψ̃ BEC ⟩ := V A→BE ⊗ I C |ψ AC ⟩ and by |ϕ̃BEC ⟩ := V A→BE ⊗ I C |ϕAC ⟩. There-
fore the above equation implies that |ψ̃ BEC ⟩ and |ϕ̃BEC ⟩ are purifications of E(ρ) and E(σ).
We therefore get from Uhlmann’s Theorem that
Note that since the partial trace is a quantum channel it follows that for any two bipartite
states ρ, σ ∈ D(AB) we have
Remark. Note that from the corollary above it follows in particular that
X X X
F px ρ x , px σ x ⩾ px F (ρx , σx ) . (5.183)
x∈[m] x∈[m] x∈[m]
where C is some mP dimensional system. Note that the two states above are purifications of
A A
P
p ρ
x∈[m] x x and q σ
x∈[m] x x , respectively. Therefore, we must have
X X
F px ρ A
x , q x σx
A
⩾ ⟨ψ̃ ABC |ϕ̃ABC ⟩
x∈[m] x∈[m]
X√
(5.185)→ = px qx ⟨ψxAB |ϕAB
x ⟩ (5.186)
x∈[m]
X√
= px qx F (ρx , σx ) .
x∈[m]
Exercise 5.4.15. The square fidelity on Prob(n) × Prob(n) is defined for all p, q ∈ Prob(n)
as X√ X √
2 √
F (p, q)2 = px qx = p · q + px qx py qy . (5.187)
x∈[n] x,y∈[n]
x̸=y
Show that the square of the fidelity is concave on each of its arguments; that is, show that
for any k ∈ N, {qz }z∈[k] ⊂ Prob(n), and t ∈ Prob(k) we have
X X 2
tz F (p, qz )2 ⩽ F p, tz qz . (5.188)
z∈[k] z∈[k]
Similarly, show that the square fidelity is concave with respect to the first argument.
Exercise 5.4.16. Let ρ, σ ∈ D(A) and let τ, ω ∈ D(B). Show that
F ρA ⊗ τ B , σ A ⊗ ω B = F ρA , σ A F τ B , ω B .
(5.189)
where the infimum is over all classical systems X and all POVM channels E ∈ CPTP(A →
X). By applying Theorem 6.4 to the classical divergence 1−F (p, q) we get that any function
f that satisfies the same monotonicity property (5.178) as the fidelity and that reduces to the
fidelity on classical states must satisfy f (ρ, σ) ⩽ F (ρ, σ) for all ρ, σ ∈ D(A). Remarkably,
the Uhlmann’s theorem implies that the fidelity in fact equals to this maximal quantum
extension.
Optimality
Corollary 5.4.4. For any ρ, σ ∈ D(A)
Proof. As discussed above, the inequality F (ρ, σ) ⩾ F (ρ, σ) follows by applying Theorem 6.4
to the classical divergence 1 − F (p, q). To prove the converse, let {|x⟩⟨x|}x∈[m] be the
orthonormal eigenbasis of the Hermitian matrix
1/2
Λ = σ −1/2 σ 1/2 ρσ 1/2 σ −1/2 (5.192)
such that Λ|x⟩ = λx |x⟩, with {λx }x∈[m] being the eigenvalues of Λ. The key reason for this
choice of basis is that the matrix Λ satisfies
ΛσΛ = ρ . (5.193)
Since the function 1 − F (p, q) is a classical divergence, we can also define its maximal
quantum extension. This maximal extension corresponds to the minimal quantum extension
of the fidelity. The minimal quantum extension of the classical fidelity is given by
where the supremum is over all classical systems X, and over all p, q ∈ D(X) for which
there exists a channel E ∈ CPTP(X → A) such that ρ = E(p) and σ = E(q) (depending on
the context, we are using the notation p, q to indicate either diagonal density matrices in
D(A) or probability vectors in Prob(n) ).
Proof. Define
r
X √ X px
D⋆ (p∥q) := 1 − F (p, q) = q x − px q x = qx 1 − (5.197)
qx
x∈[n] x∈supp(q)
Hence, 1
− 12 − 12 2
F (ρ, σ) = 1 − Df (ρ∥σ) = Tr σ̃ σ̃ ρ̃σ̃ . (5.199)
5.4.4 The Relation Between the Trace Distance and the Fidelity
The trace distance and the fidelity satisfy the following inequalities.
Theorem 5.4.8. Let ρ, σ ∈ D(A), F be the fidelity, and T the trace distance. Then,
q
1 − F (ρ, σ) ⩽ T (ρ, σ) ⩽ 1 − F (ρ, σ)2 . (5.202)
This relation reveals that if the fidelity is close to one then the trace distance is close to
zero, and if the fidelity is close to zero then the trace distance is close to one.
Proof. We first prove the upper bound. Let ψ AB and ϕAB be purifications or ρA and σ A ,
such that F (ρA , σ A ) = |⟨ψ AB |ϕAB ⟩|. Such purifications exists due to Uhlmann’s theorem.
We then have from monotonicity of the trace distance under partial trace
T ρA , σ A ⩽ T ψ AB , ϕAB
p
Exercise 5.4.4→ = 1 − |⟨ψ AB |ϕAB ⟩|2 (5.203)
q
= 1 − F (ρA , σ A )2 .
where the last equality follows from the definition of ψ AB and ϕAB .
To get the lower bound of (5.202), we start by observing that from Corollary 5.4.4 that
there exists a POVM {Λx }x∈[n] such that
X√
F (ρ, σ) = px qx where px := Tr[Λx ρ] , qx := Tr[Λx σ] . (5.204)
x∈[n]
√ √ √
Hence, from the equality 2 px qx = px + qx − ( px − qx )2 it follows
1 X √ √ 1X √ √
F (ρ, σ) = px + qx − ( px − qx )2 = 1 − ( px − qx )2 . (5.205)
2 2 x
x∈[n]
√ √ √ √ 1X √ √ √ √
| px − qx | ⩽ px + qx −−−−→ ⩽ | px − qx |( px + qx )
2 (5.206)
x∈[n]
1X
= |px − qx | = T (p, q)
2
x∈[n]
Exercise 5.4.19.
1. Show that if two states ρ, σ ∈ D(A) are ε-close in trace distance then they are ε-close
in fidelity.
√
2. Show that if two states ρ, σ ∈ D(A) are ε-close in fidelity, then they are 2ε-close in
fidelity.
The relation between the trace distance and the fidelity can also be used to derive some
additional bounds on the trace distance. For example, consider a pure state ρAB whose
marginal (mixed) state ρA is ε-close to a pure state ψ A . Since ψ A is pure, this means that
the marginal state ρA is itself close to being pure, and this in turn means that the pure state
ρAB should be close to a produce state ψ A ⊗ ρB . We make this intuition rigorous in the
following lemma.
Lemma 5.4.2. Let ρ ∈ Pure(AB) be a pure state and let ψ ∈ Pure(A) be another
AB A A
pure state. If the marginal of ρ satisfies T ρ , ψ ⩽ ε then
√
T ρAB , ψ A ⊗ ρB ⩽ 2 2ε .
(5.208)
= T ρAB , ψ A ⊗ ϕB + T ϕB , ρB
(5.213)
√ √ √
⩽ 2ε + 2ε = 2 2ε .
This completes the proof.
Exercise 5.4.20. Using the same notations as in the theorem above, suppose F (ρA , ψ A ) ⩾
1 − ε. What is the best lower bound that you can find for F (ρAB , ψ A ⊗ ρB )?
Exercise 5.4.21. Prove that the state defined in (5.215) is indeed a purification of ρ̃A .
where {Ex }x∈[m] are trace non-increasing CP maps, and {Ex (ρ)}x∈[m] are sub-normalized
states. These states provide both the information about the probability px := Tr[Ex (ρ)] that
an outcome x ∈ [m] occurs during the quantum measurement, and the post-measurement
state p1x Ex (ρ).
We previously saw that distance measures for normalized states are monotonic under
quantum channels and satisfy the DPI, a crucial aspect in applications. Quantum channels
map normalized states to normalized states, while trace non-increasing (TNI) CP maps,
including CPTP maps, take sub-normalized states to subnormalized states. Therefore, it’s
beneficial to define a distance measure for subnormalized states that is monotonic under
TNI-CP maps. We denote by CP⩽ (A → B) the set of all TNI maps in CP(A → B).
In section 5.3, we explored extending divergences from the classical to the quantum
domain. We now apply a similar approach to extend divergences from normalized to sub-
normalized states. However, unlike classical-to-quantum extensions, we will see that there
is no analogous ‘minimal’ extension from normalized to sub-normalized states. Thus, we
begin by introducing the maximal extension of a quantum divergence to the sub-normalized
domain.
where the infimum is over all systems R, and all density matrices ρ̃, σ̃ ∈ D(R) for
which there exists E ∈ CP⩽ (R → A) such that
Remark. Note that earlier we used the same notation D to denote the maximal extension of
a classical divergence to a quantum one. The bar symbol over D in our notations will always
indicate maximal extensions from one domain to a larger one, whereas the domain of a given
extension should be clear from the context.
The maximal extension D have the following three properties:
The last property justify the name for D as the maximal extension of D to subnormalized
states.
Exercise 5.5.1. Prove the three properties above using the same techniques that were used
to prove Theorem 6.4. Hint: As you follow the same lines used in the proof of Theorem 6.4,
replace ‘classical states’ with ‘normalized quantum states’ and ‘quantum states’ with ‘sub-
normalized quantum states’.
Remarkably, the maximal extension has the following closed formula.
Closed Formula
Theorem 5.5.1. Let D be a quantum divergence and D be its maximal extension to
sub-normalized states as defined in (5.220). For any pair of sub-normalized states
ρ, σ ∈ D⩽ (A)
D(ρ∥σ) = D ρ ⊕ (1 − Tr[ρ]) σ ⊕ (1 − Tr[σ]) . (5.225)
Proof. Let ρ̃, σ̃ ∈ D(R) and E ∈ CP⩽ (R → A) be a TNI-CP map such that ρ = E(ρ̃) and
σ = E(σ̃). Moreover, define N ∈ CPTP(R → A ⊕ C) as
N (ω) := E(ω) ⊕ Tr[ω] − Tr[E(ω)] ∀ ω ∈ L(A) . (5.226)
Then, since N is a CPTP map,
D(ρ̃∥σ̃) ⩾ D N (ρ̃)∥N (σ̃)
= D E(ρ̃) ⊕ (1 − Tr[E(ρ̃)]) E(σ̃) ⊕ (1 − Tr[E(σ̃)]) (5.227)
= D ρ ⊕ (1 − Tr[ρ]) σ ⊕ (1 − Tr[σ]) .
Since the above inequality holds for all such ρ̃, σ̃, E we must have that D(ρ̃∥σ̃) is no smaller
than the right-hand side on (5.225). To prove the converse inequality, take R = A ⊕ C,
ρ̃ = ρ ⊕ (1 − Tr[ρ]), σ̃ = σ ⊕ (1 − Tr[σ]), and E(·) := P (·)P † , where P is the projection to the
subspace A in R. Then, ρ = E(ρ̃) and σ = E(σ̃) so that by definition (see (5.220)) we must
have D(ρ∥σ) ⩽ D(ρ̃∥σ̃). Together with the previous inequality, this completes the proof of
the equality in (5.225).
If D is a quantum divergence, its minimal extension D can be defined in analogy with (5.91)
as
D(ρ∥σ) := sup D(E(ρ)∥E(σ)) ∀ ρ, σ ∈ D⩽ (A) , (5.228)
where the supremum is over all systems R and all E ∈ CP⩽ (A → R) such that and E(ρ)
and E(σ) are normalized states. However, such E does not exists if either ρ or σ has trace
strictly smaller than one. Hence, the minimal extension of D must satisfy
D(ρ∥σ) = 0 (5.229)
for all subnormalized states ρ, σ ∈ D⩽ (A) with either Tr[ρ] < 1 or Tr[σ] < 1. Therefore,
this extension is rather pathological and not useful in applications. The following corollary
applies specifically to the case where D functions as both a divergence and a metric.
Corollary 5.5.1. Let D be a quantum divergence that is also a metric. Then, its
maximal extension to sub-normalized states, D, is also a metric.
Proof. We need to show that D is symmetric and satisfies the triangle inequality. To see it,
let ρ, σ, ω ∈ D⩽ (A). The symmetry of D follows from the symmetry of D:
D(ρ∥σ) = D ρ ⊕ (1 − Tr[ρ]) σ ⊕ 1 − Tr[σ]
D is symmetric→ = D σ ⊕ (1 − Tr[σ]) ρ ⊕ (1 − Tr[ρ]) (5.230)
= D(σ∥ρ) .
5.5.1 Examples
The Generalized Trace Distance
The maximal extension of the trace distance to subnormalized states is known as the genar-
alized trace distance. From Theorem 5.5.1 we get that the generalized trace distance has the
following simple form.
Corollary 5.5.2. The generalized trace distance can be expressed for any
ρ, σ ∈ D⩽ (A) as
1 1
T (ρ, σ) = ∥ρ − σ∥1 + Tr[ρ − σ] . (5.232)
2 2
Exercise 5.5.2. Prove the corollary above using the formula given in Theorem 5.5.1 when
D is replaced by the trace distance.
To see why, set a := Tr(ρ − σ)+ and b := Tr(ρ − σ)− , and use the relation max{a, b} =
1
2
a + b + |a − b| . The formula above is consistent with the fact that the trace distance is the
largest extension of the trace distance to sub-normalized states that satisfies the monotonicity
property under TNI-CP maps.
Exercise 5.5.3. Show that for any ρ, σ ∈ D⩽ (A) the function f (ρ, σ) = 12 ∥ρ − σ∥1 is also
an extension of D to sub-normalized states that satisfies the exact same properties satisfied
by T except for the optimality. Give an example showing that f (ρ, σ) can be strictly smaller
than T (ρ, σ).
Exercise 5.5.4. Show that for two sub-normalized pure states ψ, ϕ ∈ D⩽ (A) the generalized
trace distance can be expressed as
r
1 1
T (ψ, ϕ) = (Tr[ψ + ϕ])2 − |⟨ψ|ϕ⟩|2 + Tr[ψ − ϕ] . (5.234)
4 2
Hint: Use similar techniques as in Exercise 5.4.4.
1 √
Lemma 5.5.1. Using the notations above, if Tr[ρΛ] ⩾ 1 − ε then 2
∥ρ − ρ̃∥1 ⩽ ε.
Remark. The gentle operator lemma’s extension to cases where ρ̃ = GρG∗ , with G ∈ L(A, B)
being an arbitrary element of a generalized measurement and Λ ≡ G∗ G ∈ Eff(A), may seem
promising. However, without imposing further constraints on G, such an extension could
result in non-informative bounds. Consider, for instance, the scenario where G is a unitary
matrix, making Λ = G∗ G = I A . Here, Tr[Λρ] = 1 ⩾ 1 − ε for any ε ⩾ 0. But, if we
choose ρ = |0⟩⟨0| and a unitary G such that G|0⟩ = |1⟩, it follows that 12 ∥ρ − ρ̃∥1 = 1.
This example illustrates that extending the gentle operator lemma to encompass arbitrary
elements of generalized measurements is impractical without specific additional constraints
on G.
√ √
Proof. Let |ψ AÃ ⟩ = ρ ⊗ I Ã |ΩAÃ ⟩ and |ψ̃ AÃ ⟩ := Λ ⊗ I Ã |ψ AÃ ⟩ be purifications ρA and ρ̃A ,
respectively. Denote by t := Tr[ρΛ] ⩾ 1 − ε and observe that
√
⟨ψ AÃ |ψ̃ AÃ ⟩ = ⟨ψ AÃ | Λ ⊗ I Ã |ψ AÃ ⟩
√
Λ⩾Λ −−−−→ ⩾ ⟨ψ AÃ |Λ ⊗ I Ã |ψ AÃ ⟩ (5.235)
√
|ψ AÃ ⟩ = ρ ⊗ I Ã |ΩAÃ ⟩ −−−−→ = Tr[ρΛ] = t .
From (5.232) we get that both sides on the equation above contain the same term 12 Tr[ρ −
ρ̃] = 21 Tr[ψ − ψ̃] , so we can cancel it. Combining this with Exercise 5.5.4 we conclude that
r
1 1 2
∥ρ − ρ̃∥1 ⩽ Tr[ψ + ψ̃] − |⟨ψ|ψ̃⟩|2
2 4
r
1
(5.235)→ ⩽ (1 + t)2 − t2 (5.237)
4
√
Exercise 5.5.5→ ⩽ 1 − t
√
t ⩾ 1 − ε −−−−→ ⩽ ε.
Exercise 5.5.6. Show that one can use the gentle measurement lemma (Lemma 5.4.3) to
prove a slightly weaker version of the gentle operator lemma (Lemma 5.5.1). Use only
Lemma 5.4.3 and the triangle inequality of the trace norm to show that
1 √ 1
∥ρ − ρ̃∥1 ⩽ ε + ε . (5.238)
2 2
√ √
Λρ Λ
Hint: Set ρ′ := Tr[Λρ]
and write ρ − ρ̃ = ρ − ρ′ + ρ′ − ρ̃.
We can use the techniques developed above to extend the fidelity to sub-normalized states.
However, since the fidelity achieves its maximum for identical states, the infimum of (5.220)
will be replaced with a supremum.
where the supremum is over all systems R, and all density matrices ρ̃, σ̃ ∈ D(R) for
which there exists E ∈ CP⩽ (R → A) with the property that ρ = E(ρ̃) and σ = E(σ̃).
Since 1 − F (ρ, σ) is a quantum divergence we can use Theorem 5.5.1 to get a closed
formula for the generalized fidelity.
Remark. The formula (5.240) for the generalized fidelity reveals that we have F (ρ, σ) =
√ √
∥ ρ σ∥1 even if only one of the states is normalized. Note that for the trace distance
T (ρ, σ) = T (ρ, σ) only if both states are normalized.
The generalized fidelity has the following properties:
The last property above indicates that the generalized fidelity is the minimal extension of
the fidelity to sub-normalized states.
Exercise 5.5.8. Show that for two sub-normalized pure states ψ ∈ D⩽ (A) and ϕ ∈ D⩽ (A)
the generalized fidelity is give by
p
F (ψ, ϕ) = |⟨ψ|ϕ⟩| + (1 − ⟨ψ|ψ⟩)(1 − ⟨ϕ|ϕ⟩) . (5.241)
where the maximum is over all CPTP maps V(·) := V (·)V ∗ , where V : B → C is an
isometry.
Tr V B→C ψ AB = Tr ψ AB = Tr[ρA ] ,
(5.243)
and similarly Tr V B→C ϕAB = Tr[σ A ]. Therefore, it is sufficient to show that
√ √ p p
ρ σ 1
= max V B→C (ψ AB ) ϕAC . (5.244)
V B→C 1
Since V B→C ψ AB and ϕAC are rank one sub-normalized states we have (Exercise (5.5.9))
p p
V B→C (ψ AB ) ϕAC = ⟨ψ AB |I A ⊗ V ∗ |ϕAC ⟩ . (5.245)
1
Hence, the rest of the proof follows the exact same lines as in the proof of Uhlmann’s theorem
(Theorem 5.4.6). In particular, note that all the steps in (5.177) holds even if ρ and σ are
sub-normalized.
Exercise 5.5.9. Prove the equality in Eq. (5.245).
Considering this close relationship between trace distance and fidelity when applied to pure
states, we will explore all possible extensions of the trace distance from pure states to mixed
states. In this context, we define the purified distance as the maximal extension among all
such extensions.
Definition 5.6.1. Let T be the trace distance. The purified distance is defined for
all ρ, σ ∈ D(A) as
n o
P (ρ, σ) := inf T (ψ, ϕ) : ρ = E(ψ) , σ = E(ϕ) , E ∈ CPTP(R → A)
ψ,ϕ∈Pure(R)
(5.247)
where the infimum is also over all systems R.
Remarks:
1. Observe that the extension of the trace distance in the definition above is reminiscent to
the maximal quantum extension of the trace distance that was discussed in the previous
sections. Later on we will develop a framework to extend certain functions (specifically
resource monotones) from one domain to a larger one. This framework is very general
and all extensions discussed in this chapter (including the above extension of the trace
distance, i.e. the purified distance) are just specific applications of the framework.
2. We will see below that the purified distance has a closed formula. Historically, this
closed formula has been used as its definition. However, the definition above emphasize
its operational meaning as the largest mixed-state extension of the trace distance (see
Theorem 5.6.1 below).
3. The justification for the name “purified distance” will become clear from the properties
discussed below.
We start by showing that the purified distance is an optimal divergence.
Theorem 5.6.1. The purified distance is a quantum divergence that reduces to the
trace distance on pure states. Moreover, if D is another quantum divergence that
reduces to the trace distance on pure states then for any ρ, σ ∈ D(A) we have
The proof follows very similar lines as in the proof of Theorem 6.4 and is left as an
exercise.
Exercise 5.6.1. Prove Theorem (5.6.1). Hint: Adopt the methodology used in Theorem 6.4
related to D. In this process, substitute each occurrence of a classical state on system X with
a pure state on system R.
The upcoming lemma demonstrates that the purified distance is derived from a purifica-
tion process, which justifies its name. We will utilize this lemma to derive a closed formula
for the purified distance.
Lemma 5.6.1. Let P be the purified distance and T the trace distance. Then, for
all ρ, σ ∈ D(A)
P (ρA , σ A ) = inf T (ψ AB , ϕAB ) , (5.249)
ψ,ϕ
Proof. Let ψ AB and ϕAB be purifications of ρA and σ A , respectively, and denote by E AB→A :=
TrB . By definition, E AB→A (ψ AB ) = ρA and E AB→A (ϕAB ) = σ A so that ψ AB and ϕAB satisfies
the conditions in (5.247) with R := AB. Therefore, P (ρA , σ A ) cannot be greater than the
right-hand side of (5.249). To get the other direction, recall the definition (5.247) and let
ρA = E(ψ R ) and σ A = E(ϕR ) for some ψ, ϕ ∈ Pure(R) and E ∈ CPTP(R → A). Let
R→AB
R→A A R
V ∈ CPTP(R → AB) be the
isometry purifying E . Therefore, ρ = Tr B V ψ
and σ A = TrB V R→AB ϕR . Finally, since the trace distance
is invariant under isometries,
AB := R→AB R AB := R→AB R
denoting by χ V ψ and φ V ϕ we get
where the infimum is over all purifications ψ ′AB and ϕ′AB of ρA and σ A . Hence, since ψ R
and ϕR were arbitrary pure states that satisfy the conditions in (5.247), we conclude that
P (ρA , σ A ) is no smaller than the right-hand side of (5.249). This completes the proof.
Closed Formula
Theorem 5.6.2. Let P be the purified distance. Then, for all ρ, σ ∈ D(A)
p
P (ρ, σ) = 1 − F (ρ, σ)2 . (5.251)
The maximal extension of purified distance from density matrices to sub normalized
states follows trivially from Theorem 5.5.1. We therefore extend the definition of the purified
distance to subnormalized states in the following way.
Remark. The purified distance on normalized states has been defined earlier as the maximal
extension of the trace distance from pure states to mixed states. Therefore, the purified
distance on subnormalized states can be viewed as the maximal extension of the trace dis-
tance from pure states to mixed subnormalized states. Moreover, observe that the purified
distance can also be expressed as:
p
P (ρ, σ) := 1 − F (ρ̃, σ̃)2 (5.254)
where ρ̃ := ρ ⊕ (1 − Tr[ρ]) and σ̃ := σ ⊕ (1 − Tr[σ]).
Finally, we show that the purified distance is a metric.
Theorem 5.6.3. The purified distance is a metric on the set of subnormalized states.
Proof. Since F (ρ, σ) ⩽ 1 the purified distance is non-negative. Since F (ρ, σ) = 1 if and only
if ρ = σ the purified distance P (ρ, σ) = 0 if and only if ρ = σ. Since F is symmetric also
P is symmetric. It is therefore left to show that the purified distance satisfies the triangle
inequality.
Let ρ, σ, ω ∈ D⩽ (A) and set ρ̃ := ρ ⊕ (1 − Tr[ρ]), σ̃ := σ ⊕ (1 − Tr[σ]), and ω̃ :=
ω ⊕ (1 − Tr[ω]). Moreover, let ψ, ϕ, φ ∈ D(B B̃) be the purifications of ρ̃, σ̃, and ω̃ such
that F (ρ̃, ω̃) = F (ψ, φ) and F (ω̃, σ̃) = F (φ, ϕ). Such purifications exist due to Uhlmann’s
theorem. Moreover, note that from the Uhlmann’s theorem we also have F (ρ̃, σ̃) ⩾ F (ψ, ϕ).
Hence, p p
P (ρ, ω) + P (ω, σ) = 1 − F (ρ̃, ω̃)2 + 1 − F (ω̃, σ̃)2
p p
= 1 − F (ψ, φ)2 + 1 − F (φ, ϕ)2
(5.134)→ = T (ψ, φ) + T (φ, ϕ)
Triangle inequality of T→ ⩾ T (ψ, ϕ) (5.255)
p
(5.134)→ = 1 − F (ψ, ϕ)2
p
F (ρ̃, σ̃) ⩾ F (ψ, ϕ) −−−−→ ⩾ 1 − F (ρ̃, σ̃)2
= P (ρ, σ)
This completes the proof.
Note that the purified distance is monotonic under TNI-CP maps. That is, for every
map E ∈ CP⩽ (A → B), and any two subnormalized states ρ, σ ∈ D⩽ (A) we have
P E(ρ), E(σ) ⩽ P (ρ, σ) . (5.256)
This follows trivially from Theorem 5.5.1 and the monotonicity property in (5.223) (or equiv-
alently from the monotonicity of the generalized fidelity). Moreover, note that from Theo-
rem 5.5.2 it follows that for any ρ, σ ∈ D⩽ (A) and any purification ψ AB of ρA , there exists
a purification ϕAB of σ A such that
P (ρA , σ A ) = P (ψ AB , ϕAB ) . (5.257)
We end this subsection by showing that the purified distance is bounded by the generalized
trace distance, T .
6.1 Entropy
Entropy is pivotal in numerous fields, including statistical mechanics, thermodynamics, in-
formation theory, black hole physics, cosmology, chemistry, and even economics. This wide
range of applications has led to diverse interpretations of entropy. In thermodynamics, it’s
seen as a measure of energy dispersal at a specific temperature. In contrast, information the-
ory views it as a rate of compression. Other perspectives, explored extensively in literature,
link entropy to disorder, chaos, system randomness, and the concept of time’s arrow. These
varying attributes and contexts give rise to different measures of entropy, such as Gibbs
and Boltzmann entropy, Tsallis entropies, Rényi entropies, and von-Neumann and Shannon
entropies, along with other entropy functions like molar entropy, entropy of mixing, and loop
entropy.
The multifaceted nature of entropy calls for a systematic and unifying approach, where
entropy is defined rigorously and context-independently. This requires identifying common
287
288 CHAPTER 6. ENTROPIES AND RELATIVE ENTROPIES
characteristics across all forms of entropy. One such universal trait is uncertainty, whether
it’s about the state of a physical system or the output of a compression scheme. In various
contexts, this uncertainty also encompasses concepts like disorder and randomness. For
instance, uncertainty about a system’s state correlates with its disorder level.
In Chapter 4, especially in Sec. 4.1, we delved into the role of majorization in defining
uncertainty. We employed three different methodologies – axiomatic, constructive, and op-
erational – to determine that every measure of uncertainty should inherently be a Schur
concave function. Consequently, it is reasonable to anticipate that entropy functions will
exhibit monotonic behavior under majorization.
Besides uncertainty, entropy embodies other attributes. A second key feature, related to
the second law of thermodynamics – especially the Clausius and Kelvin-Planck statements
– involves cyclic processes where a system undergoes a thermodynamic transition while all
other systems, including the environment and heat baths, return to their original state.
Recent developments in quantum information’s approach to small-scale thermodynamics,
as referenced in [29], categorize these as catalytic processes. Consider a thermodynamical
evolution where a physical system A in state ρA transitions into system B in state σ B . The
encompassing thermal machine, including heat baths, environment, etc., can be represented
as an additional system C in state τ C . Thus, for cyclic processes, the thermodynamic
transition can be described as:
ρA ⊗ τ C → σ B ⊗ τ C . (6.1)
In this framework, the second law asserts not just that system A’s entropy is no greater
than that of system B, but also that this holds true only if the entropy of the combined state
ρA ⊗ τ C increases or remains unchanged in such a thermodynamic cyclic process where τ C
is preserved. If entropy is measured with an additive function (under tensor products), then
the entropy of ρA being no greater than that of σ B implies the same relationship between
ρA ⊗ τ C and σ B ⊗ τ C . Thus, we define entropies as additive measures of uncertainty.
[
H: Prob(n) → R (6.2)
n∈N
that maps probability vectors in all finite dimensions to the real numbers.
Entropy
Definition 6.1.1. The function H as given in (6.2) is called an entropy if it is not
equal to the constant zero function and it satisfies the following two axioms:
The first axiom ensures that an entropy quantifies uncertainty. In Sec. 4.1 we arrived
at the definition of majorization from a game of chance, indicating that p ≻ q if q is more
uncertain than p. Note however that we extend here the definition of majorization to vectors
that are not necessarily of the same dimension. This can be done by adding zeros to the
vector with the smaller dimension to make the vectors in the same dimension. This means
in particular that for any p ∈ Prob(n), any entropy H satisfies H(p ⊕ 0) = H(p). Note also
that this axiom also implies that H is Schur concave.
The additivity axiom distinguishes entropy functions from arbitrary measures of uncer-
tainty. For example, in Sec. 4.1.3 we encounter several Schur concave functions, such as
the symmetric elementary functions (see (4.49)) that are in general not additive. Therefore,
such functions cannot be entropies. The additivity property is consistent with the extensiv-
ity property of entropy in thermodynamics, and particularly, the monotonicity of entropy
under cyclic thermodynamical processes. As mentioned above, in such cycles, all degrees of
freedom other than the degrees of freedom of the system remains intact at the end of the
cycle. Therefore, suppose the system at the beginning and end of the cycle is characterized
with some probability vectors p and q, respectively. If the initial state of the system was
described by p ⊗ r, where r corresponds to the remaining degrees of freedom, then at the
end of the cycle the system+environment are described by q ⊗ r (i.e. with the same r).
Since entropy should be monotonic under such cycle in which p ⊗ r ≻ q ⊗ r, it motivates
the additivity property of an entropy function so that it is monotonic under the trumping
relation. That is, the monotonicity under mixing can be strengthened using the additivity
property such that
p ≻∗ q ⇒ H(p) ⩽ H(q) . (6.4)
There are other arguments to motivate the additivity axiom that comes from information
theory and we will discuss them as we go along.
In the definition above we allow for the case that n = 1. In this trivial case, Prob(n) =
Prob(1) contains only the 1-dimensional vector (i.e. number) one. Observe that for any
p ∈ Prob(n) we get from the additivity axiom that H(p) = H(p ⊗ 1) = H(p) + H(1), so
that H(1) = 0. From the fact that for all x ∈ [n] 1 ≻ ex ≻ 1 (i.e. 1 ∼ ex ), where {ex }x∈[n]
is the standard (elementary) basis of Rn , we conclude that also H(ex ) = 0 for all x ∈ [n].
Moreover, since for every n ∈ N and every p ∈ Prob(n) we have ex ≻ p we get from the
monotonicity axiom that H(p) ⩾ H(ex ) = 0. That is, entropy functions cannot be negative.
In the definition of entropy above we assumed that the entropy H is not the zero function.
This means that there exists n ∈ N and p ∈ Prob(n) such that H(p) ̸= 0. Since entropy
cannot be negative, this means that H(p) > 0. On the other hand, for sufficiently large
⊗m
m ∈ N we have p ≻ u(2) (see Exercise 4.1.4) so that
1 ⊗m
H(u(2) ) = H u(2)
m (6.5)
⊗m 1
p≻ u (2)
−−−−→ ⩾ H(p) > 0 .
m
Therefore, all entropy functions take strictly positive values on u(2) := 12 (1, 1)T ∈ Prob(2).
It will be convenient to normalize all entropy functions such that
H u(2) = 1 .
(6.6)
Throughout the remainder of the book, we will focus exclusively on entropy functions that
are normalized as above.
Proof. The inequality follow from the Schur concavity of H and the fact that p ≻ u(n) . To
prove the equality, define f : N → R, via f (n) := H(u(n) ). From the normalization (6.6)
we have f (2) = 1 and from the additivity f (2k ) = k for all k ∈ N. More generally, for any
m, n ∈ N the additivity gives
m
⊗m
f (nm ) = H u(n ) = H u(n) = mH u(n) = mf (n) .
(6.8)
Moreover, from the monotonicity property of H and the fact that u(n) ≻ u(n+1) we get that
f is monotonically non-decreasing. Using these properties of f we get for all k, m ∈ N
1 1
f (nm ) = f 2m log(n)
f (n) =
m m
1
f 2⌈m log(n)⌉
f is non-decreasing −−−−→ ⩽ (6.9)
m
1
= ⌈m log(n)⌉ .
m
Similarly, taking the floor instead of the ceiling above gives f (n) ⩾ m1 ⌊m log(n)⌋. In the
limit m → ∞ both of these bounds converge to log n. This concludes the proof.
Exercise 6.1.1. Show that any convex combination of entropies is itself an entropy. That
k
P
is, if {Hx }x=1 is a set of entropies and s ∈ Prob(k) then x∈[k] sx Hx is itself an entropy.
where the cases α = 0, 1, ∞ are defined by the appropriate limits. That is, for α = 0 the
Rényi entropy is also known by the name the max entropy and is given by
where | supp(p)| is the number of non-zero components in p. For α = 1 the Rényi entropy
reduces to the Shannon entropy
X
H(p) = lim Hα (p) = − px log px . (6.12)
α→1
x∈[n]
Finally, for the case α = ∞ the Rényi entropy is also known by the name the min-entropy
and is given by
Hmin (p) := lim Hα (p) = − log max{px } . (6.13)
α→∞ x∈[n]
Exercise 6.1.4.
1. Show that the Rényi entropy satisfies the additivity axiom of an entropy.
The following is a very interesting result proved in [165]. It essentially states that all the
entropy functions are Rényi entropies. We refer the reader to [165] for the proof as it goes
beyond the scope of this book.
that maps density matrices in all finite dimensions to the real numbers.
Quantum Entropy
Definition 6.1.2. Let H be as in (6.16) and suppose it is not equal to the constant
zero function. Then, H is called an entropy if it satisfies the following two axioms:
vector consisting of the eigenvalues of ρ. Therefore, any classical entropy Hclassical can be
extended to the quantum domain via (m := |A|)
where {λx }x∈[m] are the eigenvalues of ρ. It is left as a simple exercise to show that Hquantum
is indeed a quantum entropy that satisfies the two axioms of the definition above.
As an example, consider the classical Rényi entropies as defined in (6.10). By replacing
the components {px }x∈[n] with the eigenvalues {λx }x∈[n] of ρ, we get the quantum version of
the Rényi entropies. For any α ∈ [0, ∞] they are given by
1 X 1
Hα (ρ) := log λαx = log Tr[ρα ] . (6.19)
1−α 1−α
x∈[n]
Similarly, from the classical case we get that the limits α = 0, 1, ∞ are given for all ρ ∈ D(A)
by:
Relative Entropy
Definition 6.2.1. The function D in (6.26) is called a relative entropy if it satisfies
the following three conditions:
In the definition above we did not include the normalization condition D(1∥1) = 0 (as
satisfied by all normalized divergences) since it follows from the additivity property. Indeed,
let p, q ∈ Prob(n) and observe that
where e1 = (1, 0)T and e2 = (0, 1)T . Hint: Show first that D(e1 ∥e2 ) ⩾ 1 and then use the
additivity property together with the DPI to show that for any n ∈ N, D(e1 ∥e2 ) ⩾ n.
Remark. We use the convention that if qx = px = 0 then pαx qx1−α = 0 even for α > 1. With
this convention, the conditions that supp(p) ⊆ supp(q) or α ∈ [0, 1) and p · q ̸= 0 are
1
precisely the conditions that the expression α−1 log x∈[n] pαx qx1−α is well defined. Otherwise,
P
if it is not well defined the Rényi relative entropy is set to be infinity.
For α = 0 the relative Rényi entropy is called the min-relative entropy. It is given by
X
Dmin (p∥q) := lim+ Dα (p∥q) = − log qx . (6.30)
α→0
x∈supp(p)
Observe that if Dmin (p∥q) ̸= 0 then p must have zero components. For α = ∞ the relative
Rényi entropy is called the max-relative entropy. It is given by
px
Dmax (p∥q) := lim Dα (p∥q) = log max . (6.31)
α→∞ x∈[n] qx
Finally, for α = 1 the Rényi relative entropy is called the Kullback–Leibler divergence, or in
short the KL-divergence. It is given by
X
D(p∥q) := lim Dα (p∥q) = px (log px − log qx ) , (6.32)
α→1
x∈[n]
3. Why in the first inequality above r must be greater than one, whereas in the second it
must be smaller than one?
Exercise 6.2.5. Show that all the Rényi entropies satisfy the additivity and normalization
properties of a relative entropy as given in Definition 6.2.1.
(2)
Exercise 6.2.6. A relative entropy D is said to be pathological if D(u(2) ∥e1 ) = 0, where
(2)
u(2) := ( 12 , 21 )T is the uniform distribution in Prob(2), and e1 := (1, 0)T . Show that Dmin is
pathological and use it to show that Dpath which is defined for any n ∈ N and p, q ∈ Prob(n)
as
Dpath (p∥q) := Dmin (p∥q) + Dmin (q∥p) , (6.34)
is a relative entropy.
We now show that in addition to the additivity and normalization, the Rényi relative
entropies also satisfies the DPI.
Theorem 6.2.1. The Rényi relative entropy of any order α ∈ [0, ∞] is a relative
entropy; i.e. it satisfies the axioms of DPI, additivity, and normalization, as given in
Definition 6.2.1.
Proof. The additivity and normalization you proved in Exercise 6.2.5. To show DPI recall
the α divergences given in (5.28) by
1 X
Dfα (p∥q) = pαx qx1−α − 1 (6.35)
α(α − 1)
x∈[n]
r −r α
Since the above expression has been derived from the convex function fα (r) = α(α−1) it is
an f -divergence and in particular satisfies the DPI. For α = 1 the above expression coincide
with the Rényi relative entropy of that order (i.e. the KL-divergence), so in this case the
DPI property follows. For α ̸= 1 we denote by
X
Qα (p∥q) := pαx qx1−α . (6.36)
x∈[n]
Observe that from the DPI of Dfα we get that for α > 1 the function Qα (p∥q) is monotoni-
cally non-decreasing under maps (p, q) 7→ (Ep, Eq) with E ∈ STOCH(m, n), and for α < 1
it is monotonically non-increasing under such maps. Since the Rényi relative entropy can be
1
expressed as Dα (p∥q) := α−1 log Qα (p∥q) and the log is monotonically increasing function,
we conclude that Dα (p∥q) satisfies the DPI.
Exercise 6.2.7. Show that for any α ∈ (0, 1) and p, q ∈ Prob(n)
α
Dα (p∥q) = D1−α (q∥p) . (6.37)
1−α
Exercise 6.2.8. Show that if p, q ∈ Prob(n) and ρ, σ ∈ D(X) are two diagonal density
matrices with diagonals p and q, respectively, then
1
Dα (p∥q) = log Tr[ρα σ 1−α ] . (6.38)
α−1
Theorem 6.2.2. Let D be a relative entropy, and let {ex }x∈[n] be the standard
(elementary) basis of Rn . Then, for any p ∈ Prob(n) and x ∈ [n] we have
The proof of the theorem above is based on the following lemma by Erdös.
Erdös Theorem
Lemma 6.2.1. Let g : N → R be a function from the set of natural numbers to the
real line. Suppose g is non-decreasing and is additive; i.e. g(mn) = g(n) + g(m) for all
n, m ∈ N. Then, there exists a constant c ∈ R such that g(n) = c log(n) for all n ∈ N.
g(n)
Proof. Suppose by contradiction that log n
is not a constant. Therefore, there exists m, n ∈ N
such that
g(m) g(n)
> . (6.40)
log m log n
g(m) g(n)
Denote by a := log m
and b := log n
and observe that a > b or equivalently ab < 1. Multiplying
log n
both sides of the inequality ab < 1 by the positive number log m
k, where k is any integer,
gives
b log n log n
k< k. (6.41)
a log m log m
Therefore, for sufficiently large k ∈ N there must exists an integer between the above two
numbers; i.e. there exists ℓ ∈ N such that
b log n log n
k<ℓ< k. (6.42)
a log m log m
The above two inequalities can be expressed as
The first equation above implies that the integers nk and mℓ satisfies nk > mℓ , and the
second equation implies that kg(n) < ℓg(m) . From the additivity of g we therefore conclude
that g(nk ) < g(mℓ ). To summarize, we got that
These two inequalities are in contradiction with the assumption that g is non-decreasing.
This completes the proof.
Proof of Theorem 6.2.2. Since divergences (and therefore relative entropies) are invariant
under permutations (see (5.11)), it is sufficient to show that D(e1 ∥p) = − log p1 . We first
show that for any vector r = (r1 , . . . , rn )T ∈ Prob(n) with r1 = 0 we have
(e1 , p) ∼ e1 , p1 e1 + (1 − p1 )r (6.45)
where the symbol ∼ corresponds to the equivalence relation under relative majorization.
Define E := [e1 , r, . . . , r] ∈ STOCH(n, n) to be the column stochastic matrix whose first
column is e1 and the remaining n − 1 columns equal r. We then have
(e1 , p) ≻ (Ee1 , Ep) = e1 , p1 e1 + (1 − p1 )r . (6.46)
1
Conversely, define p̃ := 1−p 1
(0, p2 , . . . , pn )T ∈ Prob(n) and Ẽ := [e1 , p̃, . . . , p̃] ∈ STOCH(n, n).
Then,
e1 , p1 e1 + (1 − p1 )r ≻ Ẽe1 , p1 Ẽe1 + (1 − p1 )Ẽr = (e1 , p) . (6.47)
Combining (6.46) and (6.47) gives (6.45).
The relation in (6.45) implies that
D(e1 ∥p) = D e1 p1 e1 + (1 − p1 )r (6.48)
so that the function f (p1 ) := D(e1 ∥p) is independent on p2 , . . . , pn . Moreover, the function
f : [0, 1] → R+ ∪ {∞} has the following two properties:
1. f is monotonically non-increasing.
is non-decreasing and additive. Therefore, from Erdös theorem there exists a constant c ∈ R
such that g(m) = c log m for all m ∈ N. The condition g(2) = f (1/2) = D(e1 ∥u(2) ) = 1 gives
c = 1. Therefore, for any m ∈ N we have f (1/m) = log m. Furthermore, observe that for
any k ⩽ m the additivity of f gives
k 1 k 1
log k + f =f +f =f = log m . (6.50)
m k m m
k
Hence, f m = log m − log k = − log(k/m). Hence, f (r) = − log r for all rationals in [0, 1].
To prove that this relation holds for any r ∈ [0, 1] (possibly irrational), let {sk } and {tk } be
two sequences of rational numbers in [0, 1] both with limit r and with sk ⩽ r ⩽ tk for all
k ∈ N. Then, the monotonicity property of f gives for any k ∈ N
− log sk = f (sk ) ⩾ f (r) ⩾ f (tk ) = − log tk . (6.51)
Taking the limit k → ∞ on both sides and using the continuity of the log function gives
f (r) = − log r. This completes the proof.
The theorem above has the following interesting corollary that justify the terminology of
the max and min relative entropies.
Corollary 6.2.1. Let D be a relative entropy. Then for any n ∈ N and any
p, q ∈ Prob(n),
Dmin (p∥q) ⩽ D(p∥q) ⩽ Dmax (p∥q) . (6.52)
Triangle Inequality
Theorem 6.2.3. Let D be a relative entropy. Then, for any p, q, r ∈ Prob(n)
Proof. The key idea of the proof is to denote by ε := 2−Dmax (r∥q) and observe that the right
hand side of (6.56) can be expressed as
D(p∥r) + Dmax (r∥q) = D(p∥r) − log ε
Theorem 6.2.2→ = D(p∥r) + D e1 (ε, 1 − ε)T
(6.57)
Additivity→ = D p ⊗ e1 r ⊗ (ε, 1 − ε)T .
Note that D is both lower and upper semi-continuous at (p, q) if and only if it is continuous
at (p, q).
Exercise 6.2.12.
1. Show that the max relative entropy, Dmax (p∥q), is not upper semi-continuous when q
does not have full support. Hint: Consider the sequences {pk }k∈N and {qk }k∈N with
T T
pk := k1 , 1 − k1 and qk := k12 , 1 − k12 .
2. Show that Dpath (p∥q) := Dmin (p∥q) + Dmin (q∥p) is not lower semi-continuous at the
boundary of Prob(n) × Prob(n).
From the exercise above it is clear that we cannot expect relative entropies to be con-
tinuous everywhere in Prob(n) × Prob(n). However, if we remove some of the points in the
boundary, we get the following continuity property.
Proof. Let (pk , qk )k∈N be a sequence in Prob(n) × Prob(n) that converges to (p, q). For
any k ∈ N, define a column stochastic matrix Ek ∈ STOCH(n, n) by its action on every
s ∈ Prob(n) as
Ek s := pk + 2−Dmax (p∥pk ) (s − p) . (6.67)
Since limk→∞ pk = p, for sufficiently large k we have 2−Dmax (p∥pk ) > 0 (see the exercise
below). Moreover, from the definition of Dmax we get that pk − 2−Dmax (p∥pk ) p ⩾ 0 so that Ek
is indeed a column stochastic matrix. Using these notations, we derive the following from
the DPI:
D(p∥q) ⩾ D(Ek p∥Ek q)
(6.67) → = D(pk ∥Ek q) (6.68)
Theorem 6.2.3 → ⩾ D(pk ∥qk ) − Dmax (Ek q∥qk )
Moving the term involving Dmax to the other side and taking the supremum limit on both
sides gives
lim sup D(pk ∥qk ) ⩽ D(p∥q) + lim sup Dmax (Ek q∥qk ) . (6.69)
k→∞ k→∞
The second term on the right-hand side above vanishes since the vector
q̃k := Ek q = 2−Dmax (p∥pk ) q + pk − 2−Dmax (p∥pk ) p
(6.70)
has a limit limk→∞ q̃k = q so that
lim sup Dmax (q̃k ∥qk ) = 0 . (6.71)
k→∞
Note that we used indirectly the fact that q > 0, since for sufficiently large k we must
have qk > 0 so the limit above is indeed zero. This completes the proof that D is upper
semi-continuous on Prob(n) × Prob>0 (n).
We now prove the lower semi-continuity on Prob>0 (n) × Prob>0 (n). Note that since we
already proved upper semi continuity in this domain, this will imply that D is continuous on
Prob>0 (n) × Prob>0 (n). For any k ∈ N, we define Ek as before but with the role of pk and
p interchanged; i.e. Ek ∈ STOCH(n, n) is defined by its action on any s ∈ Prob(n) as
Ek s := p + 2−Dmax (pk ∥p) (s − pk ) . (6.72)
Note that for all k, 2−Dmax (pk ∥p) > 0 since we assume p > 0. Moreover, from the definition
of Dmax we have p − 2−Dmax (pk ∥p) pk ⩾ 0, so that Ek is indeed a column stochastic matrix.
With the above notations we get from the DPI
D(pk ∥qk ) ⩾ D(Ek pk ∥Ek qk )
(6.72) → = D(p∥Ek qk ) (6.73)
Theorem 6.2.3 → ⩾ D(p∥q) − Dmax (Ek qk ∥q)
Taking the infimum limit on both sides gives
lim inf D(pk ∥qk ) ⩾ D(p∥q) , (6.74)
k→∞
Faithfulness
Theorem 6.2.5. Let D be a relative entropy. The following statements are
equivalent:
1. D is not faithful.
Proof. The direction 2 ⇒ 1 is trivial. We therefore prove that 1 ⇒ 2. Since D is not faithful
there exists p, q ∈ Prob(m) such that p ̸= q and D(p∥q) = 0. For any n ∈ N it follows from
⊗n ⊗n
the additivity property of D that also D p q = 0. Now, in Corollary 8.3.1 of the next
chapter we will see that for any s, t ∈ Prob>0 (2) and large enough n we have
Therefore,
0 = D p⊗n q⊗n ⩾ D(s∥t) .
(6.78)
We therefore conclude that D(s∥t) = 0 for all s, t ∈ Prob>0 (2). It is left to show that this
also holds in dimensions higher than two.
Indeed, let p, q ∈ Prob(m) with supp(p) = supp(q), and recall from (4.130) that there
exists s, t ∈ Prob>0 (2) such that (s, t) ≻ (p, q). We therefore get that
Since we already proved that D(s∥t) = 0 for all s, t ∈ Prob>0 (2) we conclude that D(p∥q) = 0
for all p and q with the same support. This completes the proof.
Remark. Note that the corollary above in particular implies that relative entropies that are
continuous in the first argument must be faithful.
Proof. Let {pk }k∈N be a sequence in Prob(m) such that supp(q) = supp(pk ) and pk → e1
and k → ∞. Such a sequence exists since q1 ∈ (0, 1). From the theorem above it follows
that D(pk ∥q) = 0 so that
More generally, every relative entropy D can be used to define an entropy H via
Exercise 6.2.14. Show that if D is a relative entropy then H as defined in (6.82) satisfies
the normalization and additivity axioms of an entropy.
To show that H as defined in (6.82) is indeed an entropy, we need to prove the monotonic-
ity property (in addition to the properties you proved in the exercise above). Recall that if
p, q ∈ Prob(n) and p ≻ q then there exists a doubly stochastic matrix D ∈ STOCH(n, n)
such that q = Dp. Therefore, in this case we get that
That is, H satisfies the monotonicity property of an entropy if the two vectors have the same
dimension. If p ∈ Prob(n) and q ∈ Prob(m) have different dimensions (i.e. n ̸= m) then
the relation p ≻ q is equivalent to a majorization relation between two vectors with the
same dimension max{n, m} in which one of the vectors is padded with zeros to make the
dimensions equal. Therefore, to show that H above satisfies the monotonicity property of
an entropy, it is left to show that it is invariant under embedding; i.e. H(p ⊕ 0) = H(p) for
all p ∈ Prob(n). For this purpose, note that
(n) (n)
H(e1 ) = log n − D(e1 ∥u(n) ) = log n − log n = 0 , (6.84)
where we used Theorem 6.2.2. Therefore, from the additivity property that you proved in
the exercise above it follows that for any p ∈ Prob(n)
(n+1) (n)
H(p) = H p ⊗ e1 = H (p ⊕ 0) ⊗ e1 = H (p ⊕ 0) . (6.85)
Hence, H satisfies the monotonicity property of an entropy, and when combined with the
exercise above we conclude that Eq. (6.82) demonstrates that for any relative entropy there
is a corresponding entropy. Remarkably, the next theorem shows that the converse is also
true.
One-To-One Correspondence
Theorem 6.2.6. There exists a bijection f with inverse f−1 mapping between
relative entropies that are continuous in the second argument and entropies.
via !
n
M
DH (p∥q) := log n − H px u(kx ) , (6.88)
x=1
for all n ∈ N, p ∈ Prob(n), and q ∈ Prob>0 (n) ∩ Qn with q = ( kk1 , . . . , kkn )T for kx ∈ N and
k = k1 + · · · + kn . Note that this construction is equivalent to the one given in Theorem 5.1.3
with g(p) := log n − H(p), although we do not assume here that H is continuous, and
therefore also g is not assumed to be continuous. Still, the same arguments given in the
proof of Theorem 5.1.3 imply that DH is a divergence in the restricted domain in which the
second argument has positive rational components. Moreover, since H is additive also DH as
defined above is additive under tensor products; i.e., DH is a relative entropy with a restricted
domain (see Exercise 6.2.15 below). This restricted domain will not change the arguments
leading to (6.123) and we therefore conclude that for any fixed n ∈ N and p ∈ Prob(n),
DH (p∥q) is continuous in q ∈ Prob>0 (n) ∩ Qn . Therefore, the continuous extension of DH
to Prob(n) × Prob(n) is well defined. We therefore define f−1 (H) := DH , where DH is the
continuous extension of the expression in (B.3.3) to the full domain Prob(n)×Prob(n). Note
that data-processing inequality and additivity are preserved under continuous extensions and
thus the resulting quantity DH is indeed a relative entropy, concluding the proof.
Exercise 6.2.15. Show that DH as defined in (B.3.3) is a relative entropy on the restricted
domain [
Prob(n) × (Prob>0 (n) ∩ Qn ) . (6.89)
n∈N
Explicitly, show that:
(2)
1. Normalization: DH (e1 ∥u(2) ) = 1.
In Exercise 6.2.12 you showed that Dpath (p∥q) := Dmin (p∥q) + Dmin (q∥p), provides a
counterexample to lower semi-continuity. Note that f (Dpath ) = Hmax , and Hmax is in turn
mapped to Dmin (p∥q) by its inverse f−1 ; i.e. f−1 (Hmax ) = Dmin so that the contribution
Dmin (q∥p) that is discontinuous in q is lost in the process. This underscores why the conti-
nuity of relative entropies in the second argument is essential for the existence of the bijection
f. Finally, observe that the correspondence between relative entropies and entropies allows
to import certain results from relative entropies to entropies.
We end this section by recalling Theorem 6.1.1 proved by [165]. This theorem states
that any entropy function can be expressed as a convex combination of Rényi entropies.
Combining this with the one-to-one correspondence between entropies and relative entropies
we get the following uniqueness result.
Observe the crucial need for continuity in the second argument. This is highlighted by
the fact that Dpath (p|q) := Dmin (p|q) + Dmin (q|p) lacks continuity in its second argument
and is not a convex combination of Rényi divergences.
that is acting on pairs of quantum states in all finite dimensions |A| < ∞.
D ρ ⊗ ρ′ ∥σ ⊗ σ ′ = D(ρ∥σ) + D(ρ′ ∥σ ′ ) .
(6.94)
3. Normalization: For D |0⟩⟨0| u(2) = 1, where |0⟩⟨0|, u(2) ∈ D(C2 ).
Proof. Fix x ∈ [n] and let E ∈ CPTP(AX → AX) be a quantum channel that acts as the
identity channel if the input on the classical system X is |x⟩⟨x|X , and otherwise acting as a
replacement channel on system A with output σxA . Explicitly, for all τ ∈ D(A) and w ∈ [n]
(
τ A ⊗ |x⟩⟨x|X if w = x
E AX→AX τ A ⊗ |w⟩⟨w|X :=
(6.97)
σxA ⊗ |w⟩⟨w|X otherwise
Then, denoting by pX := w∈[n] pw |w⟩⟨w|X , we get from the DPI of D
P
AX→AX AX
A X AX AX→AX A X
D ρ ⊗ |x⟩⟨x| σ ⩾D E ρ ⊗ |x⟩⟨x| E σ
(6.97) → = D ρA ⊗ |x⟩⟨x|X σxA ⊗ pX
(6.98)
Additivity → = D ρA σxA + D |x⟩⟨x|X pX
The combination of the above equation with (6.98) concludes the proof.
Exercise 6.3.2. Let D be a quantum relative entropy, ρ, σ∈ D(A), ω ∈ D(B), and t ∈ [0, 1].
tσ Z
In addition, let Z be a |A|×|B| complex matrix such that is a density matrix
∗
Z (1 − t)ω
in D(A ⊕ B). Show that
ρ 0 tσ Z
D ⩾ D(ρ∥σ) − log t , (6.101)
0 0 Z ∗ (1 − t)ω
with equality if Z = 0.
Exercise 6.3.3. Let D be a quantum relative entropy and u ∈ D(A) be the maximally mixed
state.
1. Show that
H(ρA ) := log |A| − D(ρA ∥uA ) ∀ ρ ∈ D(A) , (6.102)
is a quantum entropy.
2. Show that if D is jointly convex then H, as defined above, is concave.
Before we discuss additional properties of quantum relative entropies, we first consider
an example of a family of relative entropies that generalizes the Rényi relative entropies.
Remark. If supp(ρ) ⊆ supp(σ) the trace in the definition above is strictly positive for all
α ∈ [0, ∞]. Also, if α < 1 and ρσ ̸= 0 (i.e. ρ and σ are not orthogonal) then also ρα σ 1−α ̸= 0
and we have in this case Tr [ρα σ 1−α ] > 0. In all other cases, the trace in the definition above
is either zero or not well defined. One can also extend the definition to α > 2 however we
will see below that the DPI only holds for α ∈ [0, 2].
Exercise 6.3.4. Prove all the statements in the remark above (except for the very last one
about α > 2).
Exercise 6.3.5. Let ρ, σ ∈ D(A) and consider their spectral decomposition as given in (5.67).
Set m := |A|, and let pXY , qXY ∈ Prob(m2 ) be the probability vectors whose components are
{px |⟨ax |by ⟩|2 }x,y∈[m] and {qy |⟨ax |by ⟩|2 }x,y∈[m] , respectively (cf. (5.68)). Show that
Dα (ρ∥σ) = Dα pXY qXY .
(6.103)
where the right-hand side is the classical Rényi divergence between pXY and qXY .
The Petz-Rényi divergence satisfies all the properties of a relative entropy. The normal-
ization and additivity properties you will prove in the exercise below, and we now prove the
data processing inequality.
Theorem 6.3.2. The Petz quantum α-Rényi divergence is a relative entropy for any
α ∈ [0, 2].
for α ∈ (1, 2] the expression Tr [ρα σ 1−α ] is monotonically decreasing under such mappings.
Therefore, for any 1 ̸= α ∈ [0, 2] the Petz quantum Rényi α-divergence satisfies the DPI.
The DPI for the case α = 1 has been proven in (5.87).
Exercise 6.3.6. Show that for any α ⩾ 0, the Petz quantum Rényi entropy Dα satisfies the
normalization and additivity properties of a relative entropy.
The calculation of the limits α → 0, 1 of the Petz quantum Rényi divergence is a bit
more subtle than the classical case. For this purpose, we will use the expression (6.103) in
Exercise 6.3.5. For the limit α → 0 observe that
where Πρ is the projection to the support of ρ. The quantity above is also known as the min
quantum relative entropy and is denoted by
Note that the quantum min relative entropy reduces to the classical min relative entropy
when the states are classical (i.e. diagonal).
For the limit α → 1 we use again (6.103) to get
Exercise 6.3.7. Prove the last two lines in the equation above; particularly, show that
P 2 px
x,y px |⟨ux |vy ⟩| log qy = Tr[ρ log ρ] − Tr[ρ log σ].
Exercise 6.3.8 (Quasi-Convexity). Show that for any α ∈ [0, 2], ρ, ω0 , ω1 ∈ D(A), and
t ∈ [0, 1] we have
n o
Dα ρ tω0 + (1 − t)ω1 ⩽ max Dα (ρ∥ω0 ), Dα (ρ∥ω1 ) . (6.111)
Similar to the definition of the min quantum relative entropy, we can extend the max
relative entropy to the quantum domain.
2. Dmax (ρ∥σ) reduces to the classical max relative entropy when ρ and σ commutes.
1 1
3. For the case that supp(ρ) ⊆ supp(σ) Dmax (ρ∥σ) = log ∥σ − 2 ρσ − 2 ∥∞ . Hint; conjugate
1 1
both sides of tσ ⩾ ρ by σ − 2 (·)σ − 2 .
4. Dmax (ρ∥σ) = limα→∞ Dα (ρ∥σ) if ρ and σ commutes, and give an example for which
Dmax (ρ∥σ) ̸= limα→∞ Dα (ρ∥σ). Here Dα refers to the same formula as the Petz quan-
tum Rényi divergence but with α > 2.
Theorem 6.3.3. Let D be a relative entropy. Then for any quantum system A and
any ρ, σ, ω ∈ D(A):
1. Bounds:
Dmin (ρ∥σ) ⩽ D(ρ∥σ) ⩽ Dmax (ρ∥σ) . (6.113)
2. Triangle Inequality:
Proof. Let Πρ denotes the projector to the support of ρ. Define the POVM channel E ∈
CPTP(A → X) with |X| = 2 as
E(σ) := Tr σΠρ |0⟩⟨0|X + Tr σ (I − Πρ ) |1⟩⟨1|X .
(6.115)
Then,
D(ρ∥σ) ⩾ D E(ρ)∥E(σ)
(6.115) → = D |0⟩⟨0| Tr σΠρ |0⟩⟨0| + Tr σ (I − Πρ ) |1⟩⟨1|
(6.116)
Theorem 6.2.2 → = − log Tr σΠρ
= Dmin (ρ∥σ) .
For the second inequality, denote by t = 2Dmax (ρ∥σ) , and note that in particular, tσ ⩾ ρ (i.e.
tσ − ρ ⩾ 0). Define a channel E ∈ CPTP(X → A) with |X| = 2 by
tσ − ρ
E(|0⟩⟨0|) := ρ and E(|1⟩⟨1|) := . (6.117)
t−1
Furthermore, denote
1 t−1
qX := |0⟩⟨0|X + |1⟩⟨1|X , (6.118)
t t
and observe that E(qX ) = σ. Hence,
X X
D(ρ∥σ) = D E(|0⟩⟨0| ) E(q )
DPI → ⩽ D |0⟩⟨0|X qX
(6.119)
1
Theorem 6.2.2 → = − log = Dmax (ρ∥σ) .
t
This completes the proof of (6.113).
To prove the triangle inequality (6.114), note first that for |A| = 1 the statement is trivial
so we can assume |A| ⩾ 2. Let ε := 2−Dmax (ω∥σ) ∈ (0, 1), and observe that σ ⩾ εω so that
the matrix τ := (σ − εω)/(1 − ε) is a density matrix satisfying
σ = εω + (1 − ε)τ . (6.120)
From the definition of ε we have
D(ρ∥ω) + Dmax (ω∥σ) = D(ρ∥ω) − log ε
Theorem 6.2.2→ = D(ρ∥ω) + D |0⟩⟨0| ε|0⟩⟨0| + (1 − ε)|1⟩⟨1|
(6.121)
Additivity → = D ρ ⊗ |0⟩⟨0| ω ⊗ ε|0⟩⟨0| + (1 − ε)|1⟩⟨1|
DPI → ⩾ D(ρ∥σ) ,
where in the last inequality we used the DPI property of D with a quantum channel that
acts as an identity upon measuring |0⟩⟨0| in the second register, and produces a constant
output τ upon measuring |1⟩⟨1| in the second register.
Exercise 6.3.10. The quantum Thompson’s metric is defined for any ρ, σ ∈ Prob(n) by
n o
DT (ρ∥σ) := max Dmax (ρ∥σ), Dmax (σ∥ρ) . (6.122)
1. Prove that the quantum Thompson’s metric is both a quantum divergence and a metric
in D(A) × D(A).
2. Prove that any quantum relative entropy D satisfies for all ρ, σ, σ ′ ∈ D(A)
The exercise above demonstrates that quantum relative entropies are continuous in their
second argument. One can also get a continuity property in the first argument.
where the second inequality holds if σ > 0 and λmin (ρ′ ) > ∥ρ − ρ′ ∥∞ .
′
Proof. In somewhat of a variation of the previous theorem, fix 0 ⩽ s ⩽ 2−Dmax (ρ ∥ρ) and
′
denote by ε := 2−Dmax (ρ+s(σ−ρ )∥σ) . Then,
D(ρ′ ∥σ) + Dmax ρ + s(σ − ρ′ ) σ = D(ρ′ ∥σ) − log ε
′
Theorem 6.2.2→ = D(ρ ∥σ) + D |0⟩⟨0| ε|0⟩⟨0| + (1 − ε)|1⟩⟨1|
(6.126)
Additivity→ = D ρ′ ⊗ |0⟩⟨0| σ ⊗ ε|0⟩⟨0| + (1 − ε)|1⟩⟨1|
DPI→ ⩾ D N (ρ′ ) εN (σ) + (1 − ε)ω
where in the last inequality we used the DPI with a channel that acts as some channel
N ∈ CPTP(A → A) when measuring |0⟩⟨0| in the second register and outputs some state
ω ∈ D(A) when measuring |1⟩⟨1|. In other words, the inequality above holds for all N ∈
CPTP(A → A) and all ω ∈ D(A). It is therefore left to show that there exists such N and
ω that satisfy N (ρ′ ) = ρ and εN (σ) + (1 − ε)ω = σ. The latter implies that we can define
ω to be
σ − εN (σ)
ω := . (6.127)
1−ε
Note that we need to choose N such that σ − εN (σ) ⩾ 0 so that ω ∈ D(A). We take
N ∈ CPTP(A → A) to be a measurement-prepare channel of the form
where we want to choose τ such that both N (ρ′ ) = ρ and σ − εN (σ) ⩾ 0. The condition
N (ρ′ ) = ρ can be expressed as ρ = sρ′ + (1 − s)τ . Isolating τ we get that
ρ − sρ′
τ= . (6.129)
1−s
The above matrix is positive semidefinite if and only if ρ ⩾ sρ′ which hold since s ⩽
′
2−Dmax (ρ ∥ρ) . We therefore choose τ as above so that N (ρ′ ) = ρ. It is left to check that
σ − εN (σ) ⩾ 0. Indeed, since N (σ) := sσ + (1 − s)τ we have
σ − εN (σ) = (1 − εs)σ − ε(1 − s)τ
(6.129)→ = (1 − εs)σ − ε(ρ − sρ′ )
(6.130)
= σ − ε ρ + s(σ − ρ′ )
By definition of ε→ ⩾ 0 .
′
To summarize, we showed that for any 0 ⩽ s ⩽ 2−Dmax (ρ ∥ρ) we have
Since we assume now that µ := λmin (σ) > 0 we can take r = 1 + 1−s µ
. Note that for this
choice of r we have
σ
(r − s)σ = (1 − s)(1 + µ) ⩾ (1 − s)(1 + µ)I A ⩾ ρ − sρ′ (6.133)
µ
since ρ − sρ′ is a subnormalized state with trace 1 − s. Moreover, if λmin (ρ′ ) ⩾ ∥ρ − ρ′ ∥∞
′∥
then we can take s = 1 − ∥ρ−ρ
′
∞
λmin (ρ′ )
since in this case s ⩽ 2−Dmax (ρ ∥ρ) (or equivalently ρ ⩾ sρ′ ,
see Exercise 6.3.11). We therefore get for these choices of r and s
∥ρ − ρ′ ∥∞
′
D(ρ∥σ) − D(ρ ∥σ) ⩽ log r = log 1 + . (6.134)
λmin (ρ′ )λmin (σ)
This completes the proof.
′
Exercise 6.3.11. Show that if λmin (ρ′ ) ⩾ ∥ρ − ρ′ ∥∞ > 0 then ρ ⩾ sρ′ where s = 1 − ∥ρ−ρ ∥∞
λmin (ρ′ )
.
Exercise 6.3.12. Show that if ρ, σ ∈ D(A) and λmin (ρ) > ∥σ − ρ∥∞ then
∥σ − ρ∥∞
Dmax (ρ∥σ) ⩽ − log 1 − (6.135)
λmin (ρ)
Use this to get a bound on DT (σ∥σ ′ ) in (6.123).
Proof. Let (ρk , σk )k∈N be a sequence in D(A)×D(A) that converges to (ρ, σ). For any k ∈ N,
define a quantum channel Ek ∈ CPTP(A → A) by its action on any ω ∈ D(A) as
Ek (ω) := ρk + 2−Dmax (ρ∥ρk ) (ω − ρ) . (6.136)
Note that for sufficiently large k, 2−Dmax (ρ∥ρk ) > 0 (see the exercise below). Moreover, observe
that ρk −2−Dmax (ρ∥ρk ) ρ ⩾ 0 so that Ek is indeed a quantum channel. With the above notations
we get from the DPI
D(ρ∥σ) ⩾ D Ek (ρ) Ek (σ)
(6.136) → = D ρk Ek (σ) (6.137)
(6.114) → ⩾ D(ρk ∥σk ) − Dmax Ek (σ) σk
Moving the term involving Dmax to the other side and taking the supremum limit on both
sides gives
lim sup D(ρk ∥σk ) ⩽ D(ρ∥σ) + lim sup Dmax Ek (σ) σk . (6.138)
k→∞ k→∞
The second term on the right-hand side above vanishes since the density matrix
σ̃k := Ek (σ) = 2−Dmax (ρ∥ρk ) σ + ρk − 2−Dmax (ρ∥ρk ) ρ
(6.139)
has a limit limk→∞ σ̃k = σ so that
lim sup Dmax (σ̃k ∥σk ) = 0 . (6.140)
k→∞
Note that we used indirectly the fact that σ > 0, since for sufficiently large k we must
have σk > 0 so the limit above is indeed zero. This completes the proof that D is upper
semi-continuous on D(A) × D>0 (A).
We now prove the lower semi-continuity on D>0 (A)×D>0 (A). Note that since we already
proved upper semi continuity in this domain, this will imply that D is continuous on D>0 (A)×
D>0 (A). For any k ∈ N, we define Ek as before but with the role of ρk and ρ interchanged;
i.e. Ek ∈ CPTP(A → A) is defined by its action on any ω ∈ D(A) as
Ek (ω) := ρ + 2−Dmax (ρk ∥ρ) (ω − ρk ) . (6.141)
Since we assume that ρ > 0 we get that 2−Dmax (ρk ∥ρ) > 0 for all k. Moreover, observe that
ρ − 2−Dmax (ρk ∥ρ) ρk ⩾ 0 so that Ek is indeed a quantum channel. With the above notations we
get from the DPI
D(ρk ∥σk ) ⩾ D Ek (ρk ) Ek (σk )
(6.141) → = D ρ Ek (σk ) (6.142)
(6.114) → ⩾ D(ρ∥σ) − Dmax Ek (σk ) σ
Exercise 6.3.13.
1. Show that if {ρk }k∈N is a sequences in D(A) that converges to ρ ∈ D(A) then for
sufficiently large k we have Dmax (ρ∥ρk ) < ∞. Hint: Show that for sufficiently large k,
supp(ρ) ⊆ supp(ρk ).
where the optimizations are over the classical system X, the channels E ∈ CPTP(A → X)
and F ∈ CPTP(X → A) as well as the diagonal density matrices p, q ∈ D(X). The
functions D and D are in general not additive even if the D is a classical relative entropy
(and therefore additive). However, in the following lemma we show that in this case D is
super-additive while D is sub-additive.
Lemma 6.4.1. Let D be a classical relative entropy, and let D and D be its maximal
and minimal quantum extensions as defined in (5.91). Then, for all ρ1 , σ1 ∈ D(A1 )
and ρ2 , σ2 ∈ D(A2 ) we have:
1. Super-Aditivity: D ρ1 ⊗ ρ2 σ1 ⊗ σ2 ⩾ D(ρ1 ∥σ1 ) + D(ρ2 ∥σ2 ).
2. Sub-Additivity: D ρ1 ⊗ ρ2 σ1 ⊗ σ2 ⩽ D(ρ1 ∥σ1 ) + D(ρ2 ∥σ2 ).
Proof. We will prove the sub-additivity property and leave it as an exercise to prove the
super-additivity using similar lines. By definition we have
D ρ1 ⊗ ρ2 σ1 ⊗ σ2 = sup D E(ρ1 ⊗ ρ2 ) E(σ1 ⊗ σ2 )
E∈CPTP(A1 A2 →X)
Restricting E = E1 ⊗ E2 → −−−−→ ⩾ sup D E1 (ρ1 ) ⊗ E2 (ρ2 ) E1 (σ1 ) ⊗ E2 (σ2 )
E1 ∈CPTP(A1 →X1 )
E2 ∈CPTP(A2 →X2 ) (6.148)
Additivity of D→ = sup D E1 (ρ1 ) E1 (σ1 ) + sup D E2 (ρ2 ) E2 (σ2 )
E1 E2
= D(ρ1 ∥σ1 ) + D(ρ2 ∥σ2 ) .
In the Exercise 6.4.2 below you will show that the limits above exist and that in general
reg
Dreg (ρ∥σ) ⩾ D(ρ∥σ) and D (ρ∥σ) ⩽ D(ρ∥σ). Moreover, note that by definition, Dreg and
reg
D are at least partially additive in the sense that for any n ∈ N and any ρ, σ ∈ D(A)
reg reg
Dreg ρ⊗n σ ⊗n = nDreg ρ σ ρ⊗n σ ⊗n = nD
and D ρ σ . (6.150)
reg
It is an open problem to determine if Dreg and D are fully additive. We will see below
reg
that in many examples, Dreg and D turns out to be fully additive so that they are in fact
relative entropies. The following theorem shows that these functions remains optimal.
reg
Theorem 6.4.1. Let D be a classical relative entropy, and let Dreg and D be as
reg
above. Then, both Dreg and D are partially additive quantum divergences that
reduces to D on classical states. In addition, any other quantum relative entropy D′
that reduces to D on classical states, satisfies for all ρ, σ ∈ D(A)
reg
Dreg (ρ∥σ) ⩽ D′ (ρ∥σ) ⩽ D (ρ∥σ) . (6.151)
reg
Remark. Observe that since in general Dreg (ρ∥σ) ⩾ D(ρ∥σ) and D (ρ∥σ) ⩽ D(ρ∥σ), the
bounds on D′ above are tighter than the bounds given in (5.90). We are able to get tighter
bounds since D is additive.
reg
Proof. We already saw that Dreg and D are partially additive quantum divergences that
reduces to D on classical states. It is therefore left to prove the inequality (6.151). From (5.90)
we have for all n ∈ N
D ρ⊗n σ ⊗n ⩽ D′ ρ⊗n σ ⊗n ⩽ D ρ⊗n σ ⊗n
(6.152)
Since D′ is additive under tensor product we get after dividing the equation above by n
1 1
D ρ⊗n σ ⊗n ⩽ D′ (ρ∥σ) ⩽ D ρ⊗n σ ⊗n .
(6.153)
n n
The proof is concluded by taking the limit n → ∞ in the equation above.
Exercise 6.4.2. Let ρ, σ ∈ D(A) and let D be a classical relative entropy with maximal and
minimal quantum extensions D and D. Denote by
an := D ρ⊗n σ ⊗n and bn := D ρ⊗n σ ⊗n .
(6.154)
1. Show that the sequences {an } and {bn } satisfies for all n, m ∈ N
an+m ⩽ an + am and bn+m ⩾ bn + bm . (6.155)
2. Then, use this selected En to compute the limit as n → ∞, which will lead to the
desired closed formula.
This approach enables the development of a precise and concise formula representing the
minimal quantum extension for the Rényi divergence.
A natural guess for an optimal POVM channels En are the pinching channels discussed in
Sec. 3.5.12. Recall that for any ρ, σ ∈ D(A), and a pinching channel Pσ ∈ CPTP(A → A),
we have that Pσ (ρ) and σ commutes. Therefore, Pσ (ρ) and σ have a common eigenbasis
{|x⟩}x∈[m] (with m := |X| = |A|) that spans A. Let ∆ ∈ CPTP(A → X) be the completely
dephasing channel in this basis. Then, the channel ∆ ∈ CPTP(A → X) is a POVM channel
that we can take to be E1 . From Exercise 3.5.21 it follows that ∆(σ) = σ and ∆(ρ) = Pσ (ρ)
(see (3.231)).
In general, for any n ∈ N, we can choose En = ∆n , where ∆n ∈ CPTP(An → X n ) is the
completely dephasing channel in the common eigenbasis of Pσ⊗n (ρ⊗n ) and σ ⊗n . We will see
shortly that this choice is indeed optimal in the limit n → ∞.
Before we continue with the derivation of the closed formula, we first give a snapshot of
what one can expect the formula to be. With {|x⟩}x∈[m] being the common eigenbasis of
Pσ (ρ) and σ we get (cf. (3.231)) that
1 X
⟨x|ρ|x⟩α ⟨x|σ|x⟩1−α .
Dα Pσ (ρ) σ = Dα ∆ (ρ) σ = log (6.158)
α−1
x∈[m]
where we used the fact that each |x⟩ is a common eigenvector of both σ and Pσ (ρ). In
particular, for any λ ∈ R we have ⟨x|σ λ |x⟩ = ⟨x|σ|x⟩λ . Therefore, the term inside the sum
above can be expressed as
1−α 1−α
α
⟨x|ρ|x⟩α ⟨x|σ|x⟩1−α = ⟨x|σ 2α |x⟩⟨x|ρ|x⟩⟨x|σ 2α |x⟩
1−α 1−α
α (6.159)
1−α 1−α
|x⟩⟨x|σ 2α |x⟩ = σ 2α |x⟩ −−−−→ = ⟨x|σ 2α ρσ 2α |x⟩ .
Since the function x 7→ xα is concave for α ∈ (0, 1) and convex for α ⩾ 1 it follows from the
Jensen’s inequality (B.31) that
1 X 1−α 1−α α
Dα Pσ (ρ) σ ⩽ log ⟨x| σ 2α ρσ 2α |x⟩
α−1
x∈[m] (6.161)
1 1−α 1−α α
= log Tr σ 2α ρσ 2α .
α−1
The expression on the right-hand side is known as the sandwiched Rényi relative entropy.
Remarkably, we will see below that the regularization of the left-hand side equals the right-
hand side in the equation above. For this purpose, it will be convenient to denote the trace
in the equation above as
1−α 1−α α
Q̃α (ρ∥σ) := Tr σ 2α ρσ 2α . (6.162)
Exercise 6.4.3. Show that for any isometry channel V ∈ CPTP(A → B), any ρ, σ ∈ D(A),
and any ω ∈ D(C),
Qα V(ρ) V(σ) = Qα (ρ∥σ) and Qα (ρ ⊗ ω∥σ ⊗ ω) = Qα (ρ∥σ) . (6.163)
We first show that D̃α is indeed a relative entropy. It’s additivity and normalization
properties are relatively easy to show and are left as an exercise.
Exercise 6.4.4. Show that the sandwiched Rényi relative entropy of order α ∈ [0, ∞] satisfies
the additivity and normalization properties of a quantum relative entropy.
Hint: Recall that for any complex matrix M , the matrices M M ∗ and M ∗ M have the same
non-zero eigenvalues.
Theorem 6.4.2. The sandwiched Rényi relative entropy of any order α ∈ [0, ∞] is a
quantum relative entropy; i.e., it satisfies the three relative entropy axioms of DPI,
additivity, and normalization.
Proof. Since D̃α (ρ∥σ) fulfills both additivity and normalization properties (as shown in Ex-
ercise 6.4.4), our task is to demonstrate its compliance with the DPI. For α > 1, the DPI of
D̃α is derived from that of Q̃α . For α ∈ [ 12 , 1), it follows from the DPI of −Q̃α . Based on
Exercise 6.4.3 and Lemma 5.2.2, we know that if Q̃α is jointly convex for α > 1, then it sat-
isfies the DPI. Similarly, for α ∈ [ 12 , 1), if Q̃α is jointly concave, then −Q̃α satisfies the DPI.
Our objective is therefore to show that for α > 1, Q̃α is jointly convex, and for α ∈ [ 21 , 1),
Q̃α is jointly concave. The case α ∈ (0, 12 ] is effectively covered by the case α ∈ [ 12 , 1) when
we swap ρ with σ, thus it need not be considered separately.
Firstly, consider := α−1
−β α−β>α1 and define β 2α
. The proof’s central strategy is to decompose
the trace Tr (σ ρσ ] into two terms, one dependent only on ρ and the other solely on
σ. This decomposition allows us to separately assess the convexity in ρ and σ. To obtain this
decomposition, we utilize Young’s inequality (2.75), choosing M = σ −β ρσ −β , N = σ β ησ β ,
α 1
p = α, and q = α−1 = 2β , where η is an arbitrary positive semidefinite matrix in Pos(A).
With these choices, Tr[M N ] = Tr[ρη], leading to the inequality (cf. (2.75))
1 α α − 1 1
Tr [ρη] ⩽ Tr σ −β ρσ −β + Tr σ β ησ β 2β (6.165)
α α
α
Rearranging terms and recalling Q̃α (ρ∥σ) = Tr σ −β ρσ −β , we obtain
1
Q̃α (ρ∥σ) ⩾ αTr[ρη] − (α − 1)Tr σ β ησ β 2β . (6.166)
This inequality holds for all η ∈ Pos(A), with equality if M p = N q , which translates to
(Exercise (6.4.6))
α−1 −β
η = σ −β σ −β ρσ −β σ . (6.167)
Therefore, Q̃α (ρ∥σ) can be expressed as
n 1 o
Q̃α (ρ∥σ) = sup αTr[ρη] − (α − 1)Tr σ β ησ β 2β . (6.168)
η⩾0
With this expression, we can now analyze the convexity of each term independently.
A consequence of Lieb’s concavity theorem, given in Corollary B.6, establishes the con-
cavity of the function
1 1 1
2β1
σ 7→ Tr σ β ησ β 2β = Tr η 2 σ 2β η 2 , (6.169)
where we used the fact that LL∗ and L∗ L have the same non-zero eigenvalues, where L :=
1 1
σ β η 2 . Therefore, the term −(α − 1)Tr σ β ησ β 2β is convex in σ. Furthermore, the linearity
of αTr[ρη] in ρ ensures its convexity in ρ. As a result, for any p ∈ Prob(n) and two sets of
n density matrices in D(A), {ρx }x∈[n] and {σx }x∈[n] , it follows that
X n X 1 1 o
2β 12 2β
X X
Q̃α px ρ x px σx ⩽ sup α px Tr[ρx η] − (α − 1) px Tr η σx η
2
η⩾0
x∈[n] x∈[n] x∈[n] x∈[n]
n 1 1 o
2β 12 2β
X
⩽ px sup αTr[ρx η] − (α − 1)Tr η σx η
2
η⩾0
x∈[n]
X
(6.168)→ = px Q̃α (ρx ∥σx ) .
x∈[n]
(6.170)
This proves the case for α > 1. For α ∈ 12 , 1 , we apply similar reasoning using the reverse
Young’s inequality (3.240). Using the same substitutions for M and N , we obtain (6.166)
but with the inequality reversed. Consequently, we have
n 1 o
Q̃α (ρ∥σ) = inf αTr[ρη] + (1 − α)Tr σ β ησ β 2β . (6.171)
η⩾0
Observe that β < 0 since α < 1. As with the previous case, the joint concavity of Q̃α follows
from the concavity of the function in (6.169), completing the proof.
Exercise 6.4.6. Using the same notations as in the proof above, show that M p = N q if and
only if η have the form given in (6.167).
Exercise 6.4.7. Show that for any ρ, σ ∈ D(A), the function α 7→ D̃α (ρ∥σ) is continuous
for all α ∈ [0, ∞].
We are now ready to prove the closed formula for the minimal quantum Rényi relative
entropy.
Dreg
α (ρ∥σ) = D̃α (ρ∥σ) . (6.172)
Remark. Recall that a priori, Dregα is only known to be partially additive, however, the
theorem above implies that it is fully additive.
Proof. It is sufficient to prove the theorem for all α ⩾ 12 , since if the theorem holds for this
case then the case α ∈ (0, 12 ) simply follows from Exercise 6.2.7 via the relation
α
Dreg
α (ρ∥σ) = Dreg (ρ∥σ)
1 − α 1−α
α (6.173)
(6.172) → = D̃1−α (ρ∥σ)
1−α
= D̃α (ρ∥σ) .
We will therefore assume in the rest of the proof that α ⩾ 21 . Since D̃α is a relative en-
tropy that reduces to the Rényi relative entropy in the classical domain, it follows from
Theorem 6.4.1 that
Dreg
α (ρ∥σ) ⩽ D̃α (ρ∥σ) . (6.174)
For the reversed inequality, we first show that
1
Dreg Dα Pσ⊗n ρ⊗n σ ⊗n .
α (ρ∥σ) ⩾ lim (6.175)
n→∞ n
Indeed, since Pσ⊗n (ρ⊗n ) commutes with σ ⊗n they have a common eigenbasis that spans
An . Let ∆n ∈ CPTP(An → An ) be the completely dephasing channel in this basis. From
Exercise 3.5.21 we have Pσ⊗n (ρ⊗n ) = ∆n (ρ⊗n ). Therefore,
Dα E ρ⊗n E σ ⊗n ⩾ Dα ∆n ρ⊗n ∆n σ ⊗n
sup
E∈CPTP(An →X) (6.176)
⊗n
⊗n
= Dα Pσ⊗n ρ σ .
Dividing both sides of the equation above by n and taking the limit n → ∞ proves (6.175).
It is left to show that the right-hand side of (6.175) is no smaller than D̃α (ρ∥σ). We will
divide this part of the proof into several cases:
1. The case α > 1 and ρ ̸≪ σ. Recall that for every n ∈ N the states Pσ⊗n (ρ⊗n )
and σ ⊗n have a common eigenbasis. Therefore, both of these states are diagonal
in this eigenbasis and the condition that ρ ̸≪ σ implies that these diagonal states
also satisfy Pσ⊗n (ρ⊗n ) ̸≪ σ ⊗n (see Exercise 3.5.22). Hence, we must have that
Dα Pσ⊗n (ρ⊗n ) σ ⊗n = ∞ for all n ∈ N.
2. The case α > 1 and ρ ≪ σ. Observe first that since Pσ (ρ) and σ commutes, we have
(cf. Exercise 6.2.8)
1 h 1−α 1−α α
i
Dα Pσ (ρ) σ = log Tr σ 2α Pσ (ρ)σ 2α . (6.177)
α−1
Now, from the pinching inequality (3.235) we have ρ ⩽ |spec(σ)|Pσ (ρ), so that (cf.
Exercise B.3.2)
α
1 1−α ρ 1−α
Dα Pσ (ρ) σ ⩾ log Tr σ 2α σ 2α
α−1 |spec(ρ)|
(6.178)
1 h 1−α 1−α α i α
= log Tr σ 2α ρσ 2α − log |spec(σ)| .
α−1 α−1
Hence, replacing ρ and σ above with ρ⊗n and σ ⊗n , and recalling from (8.102) that
|spec(σ ⊗n )| ⩽ (n + 1)|A| we get in the limit n → ∞
1 1 h 1−α 1−α α i
lim Dα Pσ⊗n ρ⊗n σ ⊗n ⩾
log Tr σ 2α ρσ 2α . (6.179)
n→∞ n α−1
4. The case α ∈ [ 21 , 1) and ρ ̸⊥ σ. In this case, the first inequality in (6.178) holds in
1
the opposite direction since the factor α−1 is negative. We therefore need another
argument or trick. First, observe that
1−α 1−α α 1−α 1−α α−1 1−α 1−α
σ 2α ρσ 2α = σ 2α ρσ 2α σ 2α ρσ 2α . (6.180)
Hint: Recall that Pσ (ρ) commutes with σ and that all the pinching projectors commutes with
all the operators above except for the single ρ.
Proof. Follows trivially from a combination of the theorem above with Eqs. (6.175,6.179).
Note that the corollary above demonstrates that an optimizer for (6.157) is En = ∆n .
Exercise 6.4.9. Show that for all α ∈ [0, ∞]
1 ⊗n
Dreg Pρ⊗n σ ⊗n .
α (ρ∥σ) ⩾ lim sup Dα ρ (6.187)
n→∞ n
Further, show that the equality above holds for all α ∈ (0, 21 ).
Proof. Let {e1 , e2 } be the standard basis of R2 , and observe that λmax in (5.112) is precisely
2−Dmax (ψ∥σ) . Therefore, Theorem 5.3.2 gives
D(ψ∥σ) = D e1 ∥ 2−Dmax (ψ∥σ) e1 + 1 − 2−Dmax (ψ∥σ) e2
(6.189)
Theorem 6.2.2 → = Dmax (ψ∥σ)
where the last equality holds since D is a relative entropy.
The corollary above demonstrates that the maximal quantum extension is closely related
to Dmax . The corollary is universal in the sense that it holds for any classical relative entropy
D, however, it is quite limited as it holds only for pure ρ. In the corollary below we will see
that for some of the Rényi relative entropies there exists a closed formula for the maximal
quantum extension without any restriction on ρ and σ. This closed formula is given in terms
of the family of geometric relative entropies.
b α (ρ∥σ) := lim D
D b α (ρ∥σ + εI) . (6.191)
+ε→0
Remarks:
1. Alternatively, one can define the geometric relative entropy for any ρ, σ ∈ D(A) using
the decomposition (D.27) with ρ̃ := ρ11 − ζρ−1 ∗
22 ζ and σ̃ := σ11 . Then, the geometric
relative entropy of order α ∈ [0, 2] is given by
( h 1 α i
1 −2 − 12
b α (ρ∥σ) = α−1 log Tr σ̃ σ̃ ρ̃σ̃ if α ∈ [0, 1) or ρ ≪ σ
D (6.192)
∞ otherwise
2. The geometric relative entropy can be written differently using the relation M f (M ∗ M ) =
1 1
f (M M ∗ )M given in Exercise B.0.1. Denoting M := ρ 2 σ − 2 we get
1
log Tr σM ∗ M (M ∗ M )α−1
D
b α (ρ∥σ) =
α−1
1
log Tr σM ∗ (M ∗ M )α−1 M
= (6.193)
α−1
1 1
1 α−1
−1
= log Tr ρ ρ 2 σ ρ 2 .
α−1
4. Observe that for α = 2 the definition of the geometric relative entropy coincides with
the Petz quantum Rényi divergence of the same order.
Exercise 6.4.10. Show that the two definitions above for the geometric relative entropy are
equivalent (for any two density matrices ρ, σ ∈ D(A)); i.e., prove (6.192).
Exercise 6.4.11. Show that the geometric relative entropy satisfies the properties (axioms)
of additivity and normalization of a quantum relative entropy.
Exercise 6.4.12. Show that the geometric relative entropy reduces to the Rényi relative
entropy in the classical domain.
Instead of proving directly that the geometric relative entropy satisfies the DPI, we will
show that it is equal to the maximal quantum extension of the Rényi relative entropy. Since
the latter satisfies the DPI, this will imply that geometric relative entropy also satisfies the
DPI.
Proof. The proof follows directly from Theorem 5.3.3 for the case σ > 0 and from Theo-
rem D.2.1 for the general case. We leave the details as an exercise.
Exercise 6.4.13. Provide the full details of the proof of the corollary above.
Conditional Entropy
In this chapter, we delve into a variant of the entropy function, widely prevalent in informa-
tion theory and quantum resource theories, especially in the realm of dynamical resources,
which is the focus of the second volume of this book. This variant is known as conditional
entropy, which pertains to the entropy associated with a physical system A that shares a
correlation with another system B. When an observer, say Bob, has access to system B, he
can reduce his uncertainty about system A by performing a quantum measurement on his
subsystem. In essence, conditional entropy quantifies the residual uncertainty of system A
when such access to system B is available.
Traditionally, the conditional entropy of a bipartite state ρAB is defined in terms of the von
Neumann entropy associated with system AB minus the von Neumann entropy associated
with system B. This is given by:
H(A|B)ρ := H ρAB − H ρB .
(7.1)
See Fig. 7.1 for a heuristic description of this definition in terms of a Venn diagram. However,
in this chapter, we take a different approach. Here, conditional entropy is defined axiomati-
cally, similar to how we defined entropy and relative entropy. This approach provides a more
rigorous definition of conditional entropy, placing the intuitive Venn diagram interpretation
on a more solid theoretical foundation.
329
330 CHAPTER 7. CONDITIONAL ENTROPY
′
D(A) and σ ∈ D(A′ ) we say that ρ majorizes σ and write ρA ≻ σ A if the probability
vector consisting of the eigenvalues of ρ, majorizes the probability vector consisting of the
eigenvalues of σ. For conditional majorization, such a straightforward extension from the
classical to the quantum domain is more complex as it involves two systems that can be
correlated quantumly (i.e. entangled). For this reason we employ the axiomatic approach
to introduce quantum conditional majorization, and then discuss some of its key properties.
As discussed above, intuitively, conditional majorization is a pre-order on the set D(AB)
that characterizes the uncertainty of system A given access to system B. To make this intu-
ition more precise, we employ the axiomatic approach determining which set of operations
in CPTP(AB → AB ′ ) can only increase the uncertainty of system A (even if one has access
to system B). We will now examine two highly intuitive axioms that these channels must
adhere to. These two axioms extend the principles explored in Sec. 4.6.1 to the quantum
domain.
′ ′
where N AB→B := TrA ◦ N AB→AB . Such a channel N ∈ CPTP(AB → AB ′ ) is referred to
as conditionally unital. It is important to note that both the input and output systems on
Alice’s side remain the same, whereas on Bob’s side, the systems B and B ′ can be different.
Lemma 7.1.1. Let N ∈ CPTP(AB → ÃB ′ ) be a bipartite quantum channel and let
′
JNAB ÃB be its Choi matrix. Then, N AB→ÃB is conditionally unital if and only if its
Choi matrix satisfies
′ ′
JNB ÃB = JNBB ⊗ uà . (7.4)
′
Proof. We begin by proving that the channel N AB→ÃB is conditionally unital if its Choi
matrix has the form in (7.4). To this end, let ρ ∈ D(B) and consider that
h ′
T ′
i
N uA ⊗ ρB = TrAB JNAB ÃB uA ⊗ ρB ⊗ I ÃB
1 h
B ÃB ′
B T
ÃB ′
i
= TrB JN ρ ⊗I
|A| (7.5)
1 h
à BB ′
B T
ÃB ′
i
(7.4)→ = TrB u ⊗ JN ρ ⊗I
|A|
= uà ⊗ σ B ,
h T i
1 ′ ′
where σ B := |A| TrB JNBB ρB ⊗ I B .
′ ′
We next prove that the Choi matrix of N AB→ÃB has the form in (7.4) if N AB→ÃB is
conditionally unital. Recall that the defining property of a conditionally unital channel is
that for every state ρ ∈ D(B), there exists a state σ ∈ D(B ′ ) such that
′ ′
N AB→ÃB (uA ⊗ ρB ) = uà ⊗ σ B . (7.6)
By taking the trace over à on both sides of the equation above we get that
h i
B′ AB→B ′ A B ABB ′ A B T B′
σ =N (u ⊗ ρ ) = TrAB JN u ⊗ ρ ⊗I
1 h i (7.7)
BB ′ B T B′
= TrB JN ρ ⊗I .
|A|
On the other hand, observe that
h i
AB→ÃB ′ A B ′ B T ÃB ′
TrAB JNAB ÃB A
N (u ⊗ ρ ) = u ⊗ ρ ⊗I
1 h i (7.8)
B ÃB ′ B T ÃB ′
= TrB JN ρ ⊗I .
|A|
Therefore, from the two expressions above for σ B and N (uA ⊗ ρB ) we conclude that (7.6)
can be expressed as
h i h i
B ÃB ′ B T ÃB ′ Ã BB ′ B T ÃB ′
TrB JN ρ ⊗I = TrB (u ⊗ JN ) ρ ⊗I . (7.9)
′ ′ ′
Denote by η B ÃB := JNB ÃB − uà ⊗ JNBB and observe that the equation above can be written
as h i
B ÃB ′ B T ÃB ′
Tr η ρ ⊗I =0 ∀ ρ ∈ D(B). (7.10)
Due to the existence of bases of density operators that span the space of linear operators
acting on B, we conclude from (7.10) that for every operator ζ ∈ L(B) we have
h ′
′
i
TrB η B ÃB ζ B ⊗ I ÃB =0. (7.11)
Note that by multiplying both sides of the equation above by any element ξ ∈ L(ÃB ′ ) and
taking the trace we get that
B ÃB ′ B ÃB ′
Tr η ζ ⊗ξ =0. (7.12)
Since the equation above holds for all ζ ∈ L(B) and all ξ ∈ L(ÃB ′ ) it also holds for any
′ ′
linear combinations of matrices of the form ζ B ⊗ ξ ÃB . Since matrices of the form ζ B ⊗ ξ ÃB
′
span the whole space L(B ÃB ′ ) we conclude that η B ÃB is orthogonal (in the Hilbert-Schmidt
′
inner product) to all the elements of L(B ÃB ′ ). Therefore, we must have η B ÃB = 0 which is
equivalent to (7.4). This completes the proof.
′
Figure 7.2: An illustration of an A ̸→ B ′ semi-causal bipartite channel N AB→AB . The marginal
channel N AB→B ′ equals N AB→B ′ ◦ MA→A for any choice of M ∈ CPTP(A → A).
In essence, channels that are A ̸→ B ′ signalling are those that can be implemented via
one-way communication from Bob to Alice. An illustration of this concept can be found in
Fig. 7.3.
Remark. The theorem above demonstrate the intuitive assertion that semi-causal bipartite
channels are channels that can be realized with one-way communication from Bob to Alice.
With such channels, Alice cannot influence Bob’s system. We also point out that the relation
in (7.15) has been written in a compact form; that is, we removed identity channels so that
′
′ ′
′
E RA→A ◦ F B→RB := E RA→A ⊗ idB →B ◦ idA→A ⊗ F B→RB . (7.16)
Proof. We begin by proving the implication 1 ⇒ 2. Consider the following marginal of the
′
channel N AB→AB :
′ ′
N AB→B := TrA ◦ N AB→AB
′
(7.15)→ = TrA ◦ E RA→A ◦ F B→RB
′ (7.17)
= TrRA ◦ F B→RB
′
= TrA ◦ F B→B ,
where we have utilized the trace-preserving property of quantum channels and denoted
′ ′
F B→B := TrR ◦ F B→RB . It is evident that this channel satisfies (7.13) with any trace
′
preserving map MA→A , establishing that N AB→AB is A ̸→ B ′ semi-causal.
Moving on to the implication 2 ⇒ 1, we examine two distinct purifications of the marginal
′
Choi matrix JNABB :
′
1. Consider a physical system C and a pure (unnormalized) state ψ ABAB C , which acts as
′ ′
a purification of both JNABAB and its marginal state JNABB .
′ 1 ′
2. Denoting by φBB R , an (unnormalized) purification of the operator J BB ,
|A| N
we get
BB ′ R ABB ′
from (7.14), that ΩAÃ ⊗ φ is another purification JN .
1
Since the marginal of JNAB equals I AB , it implies that φB = |A| JNB = I B . This property
implies the existence of an isometry F ∈ CPTP(B → B ′ R) that satisfies
′ ′
φBB R = F B̃→B R (ΩB B̃ ) . (7.18)
Moreover, given that two purifications of the same positive semi-definite matrix are connected
by an isometry (as per Exercise 2.3.32), there must be an isometry V ∈ CPTP(RÃ → AC)
satisfying
ABAB ′ C RÃ→AC AÃ BB ′ R
ψ =V Ω ⊗ϕ . (7.19)
Finally, tracing out system C and denoting by E RÃ→A := TrC ◦ V RÃ→AC gives
′
′
JNABAB = E RÃ→A ΩAÃ ⊗ ϕBB R
RÃ→A AÃ B̃→B ′ R B B̃
(7.18)→ = E Ω ⊗F Ω (7.20)
′
= E RÃ→A ◦ F B̃→B R Ω(AB)(ÃB̃) .
The equation above implies that (7.15) holds. This completes the proof.
Exercise 7.1.2. Show that if |B ′ | = 1 then any channel in CPTP(AB → AB ′ ) is A ̸→ B ′
semi-causal.
′
The Choi matrix, JNAB ÃB , of a channel N ∈ CMO(AB → ÃB ′ ) that is both conditional
unital and A ̸→ B ′ semi-causal must satisfies:
′ ′
1. JNB ÃB = JNBB ⊗ uà (i.e. N is conditionally unital; see (7.4)).
′ ′
2. JNABB = uA ⊗ JNBB (i.e. N is semi-causal; see (7.14)).
3. JNAB = I AB (i.e. N is trace preserving).
′
4. JNAB ÃB ⩾ 0 (i.e. N is completely positive).
Observe the symmetry of the first two conditions above under exchange of the local input
system A and the local output system Ã.
As a straightforward example of a CMO, consider that the reference system R in Theo-
′
rem 7.1.1 is classical. In such a scenario, we define X := R and the channel N AB→AB can
be expressed as:
′ ′ ′
X
N AB→AB = E XA→A ◦ F B→XB = A→A
E(x) ⊗ FxB→B (7.21)
x∈[m]
′
Here, {FxB→B }x∈[m] constitutes a quantum instrument, and for each x ∈ [m], the map E(x)
A→A
is a quantum channel in CPTP(A → A). We will explore later that this channel typifies
one-way LOCC (Local Operations and Classical Communication). Notably, if each E(x) is
a unital channel, this one-way LOCC is also conditionally unital. This channel essentially
represents Bob performing a quantum measurement on his system and conveying the result
to Alice, who then applies a unital channel to her system.
Exercise 7.1.3. Let ω ∈ D(B) and N ∈ CMO(AB → ÃB ′ ). Show that the channel
E ∈ CPTP(A → ÃB ′ ) defined for any ρ ∈ L(A) by
′ ′
E A→ÃB (ρA ) := N AB→ÃB (ρA ⊗ ω B ) (7.22)
is also CMO; i.e. show that E ∈ CMO(A → ÃB ′ ).
Exercise 7.1.4. Let Υ ∈ L(AB ÃB ′ → AB ÃB ′ ) be a linear map defined for all ω ∈
L(AB ÃB ′ ) as
AB ÃB ′ A B ÃB ′ BB ′ Ã Ã ABB ′ A BB ′
Υ ω := u ⊗ ω −ω ⊗u +u ⊗ ω −u ⊗ω . (7.23)
′
1. Show that a channel N ∈ CPTP(AB → ÃB ′ ) is CMO if and only if Υ(JNAB ÃB ) = 0.
2. Show that Υ is self-adjoint; i.e. show that Υ = Υ∗ .
3. Show that Υ is idempotent; i.e. Υ ◦ Υ = Υ.
Exercise 7.1.5. Show that quantum conditional majorization as defined above is a pre-order.
This definition effectively extends the concept of majorization. Specifically, when |B| =
|B ′ | = 1, the set CMO(A → A) coincides with the set of unital channels. Consequently, the
relation ρA ≻A σ A as defined above (under the condition |B| = |B ′ | = 1) transforms into the
well-known majorization relation ρA ≻ σ A . Expanding on this concept, quantum conditional
majorization exhibits the following notable property.
Lemma 7.1.2. Let ρ ∈ D(AB) and σ ∈ D(AB ′ ) be two product states; i.e.
′ ′
ρAB = ρA ⊗ ρB and σ AB = σ A ⊗ σ B . Then,
′
ρAB ≻A σ AB ⇐⇒ ρA ≻ σ A . (7.25)
Proof. If ρA ≻ σ A then there exists a unital channel such that σ A = U A→A (ρA ). Let
′
E ∈ CPTP(B → B ′ ) be a replacement channel that always outputs σ B . Then, the channel
′ ′
N AB→AB := U A→A ⊗ E B→B (7.26)
′ ′ ′
is CMO and satisfies σ A ⊗ σ B = N AB→AB ρA ⊗ ρB . Hence, ρAB ≻A σ AB . Conversely, if
′ ′ ′
ρAB ≻A σ AB then there exists a semi-causal quantum channel N AB→AB = E RA→A ◦ F B→RB
(that is also conditionally unital) that satisfies
′ ′
σ A ⊗ σ B = E RA→A ◦ F B→RB ρA ⊗ ρB .
(7.27)
′
Tracing system B ′ on both sides, and denoting τ R := TrB ′ ◦ F B→RB ρB gives
σ A = E RA→A ρA ⊗ τ R .
(7.28)
U A→A (ω A ) := E RA→A ω A ⊗ τ R
∀ ω ∈ D(A) , (7.29)
′
is a unital channel since N AB→AB is conditionally unital. Since by definition, σ A = U A→A ρA ,
we conclude that ρA ≻ σ A . This completes the proof.
′ ′
Determining whether ρAB ≻A σ AB for two given bipartite quantum states ρAB and σ AB
can be challenging. This task is essentially about verifying the existence of a Choi matrix
′
J AB ÃB N that satisfies both the four initial conditions outlined at the start of this subsection
(characteristic of a CMO) and the additional criterion:
h ′
T ′
i ′
TrAB JNAB ÃB ρAB ⊗ I ÃB = σ ÃB . (7.30)
′
These five conditions imposed on J AB ÃB N constitute an SDP (Semidefinite Programming)
feasibility problem that can be solved efficiently and algorithmically on a computer.
ρAB ≻A ρA ⊗ σ B . (7.32)
A more nuanced example is the fact that the maximally entangled state conditionally ma-
jorizes all states of the same dimensions.
Proof. In Sec.1.4.1, we discussed how the maximally entangled state ΦAB can be utilized for
teleporting an unknown quantum state. Specifically, a teleportation protocol from Bob to
Alice (note that in Sec.1.4.1, we examined teleportation from Alice to Bob) involves Bob
performing a joint quantum measurement on his part of ΦAB and the state to be teleported.
This is followed by classical communication to Alice, who then applies a unitary operation.
Such a protocol conforms to the structure of CMO channels described in (7.21), where each
A→A
E(x) is a unitary channel. This implies that for any bipartite state ρAB , there exists a
channel N ∈ CMO(AB → AB) such that
We emphasize that in the realization of the channel N AB→AB , Bob locally prepares the
state ρB B with B ′ ∼
′
= A and then employs the maximally entangled state ΦAB to teleport
′
the B subsystem to Alice, resulting in the state ρAB . The equation above implies that
ΦAB ≻A ρAB .
The next theorem characterizes the states in D(AB) that are conditionally majorized by
a state in Pure(A). Note that all the states in Pure(A) are equivalent under majorization.
Theorem 7.1.3. Let ρ ∈ D(AB). Then, the following statements are equivalent:
ψ A ≻A ρAB . (7.36)
Proof. To demonstrate that the first statement implies the second, we search for a channel
N ∈ CMO(A → AB) satisfying ρAB = N A→AB (ψ A ). Our strategy involves considering a
binary measurement-and-prepare channel defined as:
equals
I A ⊗ N A→B (uA ) = I A ⊗ ρB , (7.40)
using (7.38) with τ B = ρB . Equating these two operators dictates that τ AB must be
I A ⊗ ρB − ρAB
τ AB := . (7.41)
|A| − 1
τ AB is positive semi-definite, as we assume I A ⊗ρB ⩾ ρAB , and has unit trace, qualifying as a
density matrix. Also, it satisfies τ B = ρB . This concludes the proof that the first statement
implies the second.
Conversely, if N ∈ CMO(A → AB) is such that ρAB = N A→AB (ψ A ), then
ρB = N A→B (ψ A ) , (7.42)
where N A→B := TrA ◦ N A→AB is the marginal channel. Since N A→AB is A ̸→ B semi-causal
the marginal channel N A→B must satisfy
N A→B uA = N A→B ψ A
(7.43)
(7.42)→ = ρB .
where the last inequality follows from the fact that I A − ψ A ⩾ 0 and N A→AB is a completely
positive map. This concludes the proof.
In Theorem 7.1.2 we saw that under conditional majorization ΦAB is the maximal element
of D(AB). On the other hand, the maximally mixed state uA satisfies the opposite inequality
that ρAB ≻A uA for all ρ ∈ D(AB). Combining this maximal and minimal elements gives
the state
ΦAB ⊗ uà . (7.46)
Remarkably, the following theorem shows that under conditional majorization, this state is
equivalent to any pure state in Pure(AÃ).
Proof. We first prove that ΦAB ⊗ uà ≻Aà ψ Aà . We will denote by τ Aà the density matrix
Therefore, it is left to show that the channel N is CMO. Since the channel N AÃB→AÃ does
not have an output on Bob’s side it is trivially A ̸→ B ′ semi-causal (|B ′ | = 1). To show that
it is conditionally unital observe that for any σ ∈ D(B) we have
AÃB→AÃ AÃ B
= E AB→AÃ mI A ⊗ σ B
N I ⊗σ
(7.48)→ = I AÃ .
Hence, N ∈ CMO(AÃB → AÃ). This completes the proof that ΦAB ⊗ uà ≻Aà ψ Aà .
To prove that ΦAB ⊗ uà ≻Aà ψ Aà , we set τ AB := (I AB − ψ AB )/(m2 − 1) and denote by
N AÃ→AÃB a quantum channel defined for all ω ∈ L(AÃ) as
AÃ→AÃB AÃ AÃ→AB
N (ω ) := N (ω ) ⊗ uà .
AÃ
(7.53)
where
N AÃ→AB (ω AÃ ) := Tr[ψ AÃ ω AÃ ]ΦAB + Tr[(I AÃ − ψ AÃ )ω AÃ ]τ AB . (7.54)
Therefore, it is left to show that the channel N is CMO. To show that it is conditionally
unital observe that
AÃ→AÃB AÃ
= ΦAB + (I AB − ΦAB ) ⊗ uÃ
N I
(7.56)
AB Ã
=I ⊗u .
To show that it is A ̸→ B semi-causal observe that for all ω ∈ D(AÃ) the marginal channel
N AÃ→B satisfies
N AÃ→B ω AÃ = Tr[ψ AÃ ω AÃ ]uB + Tr[(I AÃ − ψ AÃ )ω AÃ ]τ B
(7.57)
Exercise (7.1.6)→ = uB ,
Exercise 7.1.6. Use the definition of τ AB to verify the first equality in (7.56) and the second
equality in (7.57).
With this extension, we can now define conditional entropy. Specifically, we consider
[
H: D(AB) → R; , (7.58)
A,B
as a function mapping the set of all bipartite states across finite dimensions to the real line.
The function H assigns to each bipartite density matrix ρAB a real number, denoted by
H(A|B)ρ . This notation distinguishes conditional entropy from the entropy of the marginal
state ρA .
Our objective is to identify when H constitutes a conditional entropy. For systems where
|B| = 1, we denote H(A|B)ρ as H(A)ρ := H(ρA ), aligning with the notation of conditional
entropy. This notation proves useful when examining composite systems with multiple sub-
systems. Since we define entropy functions as non-constant zero functions, we assume (im-
plicitly throughout this book, and in the definition below) the existence of a quantum system
A and a state ρ ∈ D(A) such that H(A)ρ ̸= 0.
There are several properties of conditional entropy that follows from the definition above.
First, observe that the case that system B is trivial, i.e. |B| = 1, a conditional entropy
function reduces to an entropy function. Moreover, if ρAB = ω A ⊗ τ B is a product state,
then it can be converted reversibly to the product state ω A ⊗ uB by a product channel of
the form idA→A ⊗ E B→B which is in CMO(AB → AB). Therefore, from the monotonicity
property above it follows that
so that H(A|B)ρ depends only on ω A . Moreover, the function ω A 7→ H(A|B)ωA ⊗uB satisfies
the two axioms of entropy and therefore can be considered itself as an entropy of ω A . In other
words, conditional entropy reduces to entropy on product states as intuitively expected.
Next, conditional entropy is invariant under the action of local isometric channels. That
′ ′
is, for a bipartite state ρAB and isometric channels U A→A and V B→B ,
This strict inequality allows us to set a normalization factor for conditional entropy. To be
consistent with the normalization convention for unconditional entropy, we set for the case
|A| = 2 that H(A|B)u⊗ρ = 1, which in turn implies that for |A| > 2
In the rest of this book we will always assume that H is normalized in this way.
Exercise 7.2.2. Let H be conditional entropy. Show that for every Hilbert spaces A and B
with equality if ρAB = uA ⊗ τ B for some τ ∈ D(B). Hint: Find a channel in CMO(AB →
AB) that takes ρAB to uA ⊗ ρB .
That is, the maximally entangled state has the least amount of conditional entropy. We will
see below that H(A|B)Φ is negative.
Unlike entropy, quantum conditional entropy can be negative. This unintuitive phenom-
ena puzzled the community for quite some time until an operational interpretation for the
quantum conditional entropy was found. This operational interpretation is given in terms of
a protocol known as quantum state merging (which we study in volume 2 of this book). In
the following theorem we show that certain entangled states must have negative conditional
entropy, while classical conditional entropy is always non-negative.
The lower bound in the theorem below is given in terms of the conditional min-entropy
defined on every bipartite state ρ ∈ D(AB) as
Hmin (A|B)ρ := − inf log2 {λ : ρAB ⩽ λIA ⊗ ρB } . (7.66)
λ⩾0
In the next subsection we will see that the conditional min-entropy is indeed a conditional
entropy. Originally, this quantity was given the name conditional min-entropy because it was
known to be the least among all Rényi conditional entropies. The theorem below strengthen
this observation by proving that all plausible quantum conditional entropies are not smaller
than the conditional min-entropy.
Exercise 7.3.1. Consider the conditional min-entropy as defined in (7.66).
1. Show that for the maximally entangled state ΦAB (with |A| = |B|) we have
Hmin (A|B)Φ = − log |A| . (7.67)
2. Show that if a density matrix ρ ∈ D(AB) satisfies Hmin (A|B)ρ = log |A| then ρAB =
uA ⊗ ρB .
3. Show that a state ρ ∈ D(AB) has non-negative conditional min-entropy if and only if
I A ⊗ ρB ⩾ ρAB .
4. Show that if ρAB is separable then its conditional min-entropy is non-negative.
Remark. The theorem above states that for the maximally entangled state ΦAB we have
H(A|B)Φ = Hmin (A|B)Φ . Combining this with Exercise 7.3.1 we conclude that
H(A|B)Φ = − log |A| . (7.69)
That is, all conditional entropies are negative on the maximally entangled state and equal to
− log |A|. Moreover, in conjunction with the third part of Exercise 7.3.1, the theorem above
implies that all conditional entropies are non-negative on separable states (and therefore also
on classical states).
Proof. Let’s start by examining the scenario where Hmin (A|B)ρ ⩾ 0, and denote m := |A|.
The proof strategy in this case revolves around identifying the largest integer k ∈ [m] such
that a classical system X, with dimension |X| = k, fulfills the condition uX ≻A ρAB . Once
we establish this optimal value of k, we can infer that every conditional entropy H must
satisfy
H(A|B)ρ ⩾ H(X)u = log k . (7.70)
H (A|B)
We begin by establishing that it is feasible to set k := 2 min ρ
.
H (A|B)
Let X be a classical system with dimension k := 2 min ρ
. By the assumption
that Hmin (A|B)ρ ⩾ 0, and given the dimension bound Hmin (A|B)ρ ⩽ log2 m, it follows
that k ∈ [m]. Observe that the case k = m implies that Hmin (A|B)ρ = log |A|. In this
case, according to the second part of Exercise 7.3.1 we must have ρAB = uA ⊗ ρB so that
H(A|B)ρ = log |A| = Hmin (A|B)ρ .
We therefore assume now that k < m. We look for a channel N ∈ CMO(A → AB) that
satisfies
A→AB 1 A
N Π = ρAB , (7.71)
k
where ΠA is a projection onto a k-dimensional subspace of A (i.e., set ΠA := x∈[k] |x⟩⟨x|A ,
P
where {|x⟩}x∈[m] is some orthonormal basis of A). The existence of such a channel will
prove that uX ≻A ρAB since uX (with |X| = k) is equivalent to k1 ΠA under conditional
majorization. We choose N A→AB to be a measure-and-prepare channel of the form
where τ AB is some density matrix that is chosen (see below) such that N A→AB is CMO.
Indeed,
X X the action of this channel is to perform a measurement according to the POVM
X
Π ,I − Π and prepare the state ρAB if the first outcome is obtained and the state τ AB
if the second outcome is obtained. By definition, this channel satisfies (7.71).
The channel N A→AB is A ̸→ B signalling if and only if the marginal channel N A→B :=
TrA ◦N A→AB satisfies N A→B ◦MA→A = N A→B for all M ∈ CPTP(A → A). In other words,
N A→AB is A ̸→ B signalling if and only if the marginal channel N A→B is a replacement
channel. Now, for every ω ∈ D(A) we have that
Therefore, by taking τ AB to have the property that its marginal τ B = ρB , we get that the
right-hand side does not depend on ω A , so that N A→AB is A ̸→ B semi-causal.
The channel N A→AB is conditional unital if and only if the state
where the we used (7.73) with τ B = ρB . The equality between the two states above forces
τ AB to be
I A ⊗ ρB − kρAB
τ AB := . (7.76)
m−k
The operator τ AB is positive semi-definite because m − k > 0 and
1 A
I ⊗ ρB − ρAB ⩾ 2−Hmin (A|B)ρ I A ⊗ ρB − ρAB
k (7.77)
(7.66)→ ⩾ 0.
Also, τ AB has trace equal to one (so it is a density matrix) with marginal τ B = ρB . We
therefore proved that
H(A|B)ρ ⩾ log k = log 2Hmin (A|B)ρ .
(7.78)
Finally, since all conditional entropies are additive for tensor-product states, we conclude
that
1
H(A|B)ρ = lim H(An |B n )ρ⊗n
n→∞ n
1 j n n
k
(7.78)→ ⩾ lim log 2Hmin (A |B )ρ⊗n
n→∞ n (7.79)
1 nH (A|B)ρ
= lim log 2 min
n→∞ n
= Hmin (A|B)ρ .
Next, consider the case Hmin (A|B) < 0. The idea of the proof in this case is to find the
largest possible k ∈ N such that the system X (can be taken to be a classical system) with
dimension |X| = k satisfies
ψ XA ≻AX ρA ⊗ uX , (7.80)
where ψ ∈ Pure(XA) is some pure state. Due to the monotonicity property of every condi-
tional entropy H the above relation implies that
0 = H(XA)ψ
(7.80)→ ⩽ H(XA|B)ρ⊗u (7.81)
Additivity→ = H(X)u + H(A|B)ρ .
Finally, since H(X) log k we get that H(A|B)ρ ⩾ − log(k). We first show that (7.80)
−Hu =(A|B)
holds with k := 2 min
.
To prove the relation (7.80), consider the measure-and-prepare channel N ∈ CPTP(AX →
AXB) defined on all ω ∈ L(XA) as
where ΠAX := I AX − ψ XA and τ XAB is chosen (see below) such that N is CMO. Observe
that by definition we have
N XA→XAB ψ XA = uX ⊗ ρAB .
(7.83)
We next show that there exists τ ∈ D(XAB) such that N as defined above is indeed CMO.
If N is XA ̸→ B semi-causal then we must have that the marginal channel N XA→B :=
TrXA ◦N XA→XAB is a replacement channel (i.e., a constant channel). Now, for all ω ∈ D(XA)
N XA→B ω XA = Tr ψ XA ω XA ρB + Tr ΠXA ω XA τ B .
(7.84)
Thus, by choosing τ XAB to have the property τ B = ρB , we get that the right-hand side of
the equation above does not depend on ω, so that N XA→XAB is XA ̸→ B semi-causal.
Next, the channel N XA→XAB is conditionally unital if the operator
N XA→XAB I XA = uX ⊗ ρAB + kmτ XAB
(7.85)
equals the operator
I XA ⊗ N XA→B uXA = I XA ⊗ τ B ,
(7.86)
B B
where the last equality follows from (7.84) with τ = ρ . We therefore conclude that
N ∈ CPTP(XA → XAB) as defined above is CMO if and only if τ XAB equals
kI A ⊗ ρB − ρAB
τ XAB := uX ⊗ . (7.87)
km − 1
Observe that this τ XAB is indeed a density matrix since by definition of k we have
kI A ⊗ ρB ⩾ 2−Hmin (A|B) I A ⊗ ρB
(7.88)
(7.66)→ ⩾ ρAB .
Moreover, from its definition in (7.87) we have τ B = ρB . Hence, with this τ XAB the channel
N XA→XAB is CMO that maps the pure states ψ XA to the state uX ⊗ ρAB . We therefore
conclude that
H(A|B)ρ ⩾ − log k = − log 2−Hmin (A|B) .
(7.89)
Finally, from the additivity property of conditional entropies we get
1
H(A|B)ρ = lim H(An |B n )ρ⊗n
n→∞ n
1 l
−Hmin (An |B n )ρ⊗n
m
(7.89)→ ⩾ − lim log 2
n→∞ n (7.90)
1
= − lim log 2−nHmin (A|B)ρ
n→∞ n
= Hmin (A|B)ρ .
It is left to prove the equality on maximally entangled states. Since conditional entropy is
invariant under local isometries we can assume without loss of generality that m := |A| = |B|.
From Theorem 7.1.4 we know that the state ΦAB ⊗ uà is equivalent under conditional ma-
jorization to any pure state in Pure(AÃ). Since the entropy of every pure state in Pure(AÃ)
is zero, we conclude that
0 = H(AÃ|B)Φ⊗u
Additivity→ = H(A|B)Φ + H(Ã)u (7.91)
= H(A|B)Φ + log m ,
where we used the fact that the entropy of the uniform state uà is log2 m. Hence,
H(A|B)Φ = − log m = Hmin (A|B)Φ (7.92)
This completes the proof.
We saw in the theorem above that the conditional entropy is positive for all separable
states. This does no mean that the conditional entropy is positive just for separable states.
In fact, some entangled states (i.e. states that are not separable) have positive conditional
entropy. The following corollary provides a simple criterion to determine if a bipartite state
has a positive conditional entropy.
Corollary 7.3.1. Let ρ ∈ D(AB). Then, the following statements are equivalent:
ψ A ≻A ρAB . (7.94)
4. The state ρAB can be obtained by CMO channel from a classical distribution;
i.e. there exists ω ∈ D(XY ) such that
ω XY ≻A ρAB . (7.95)
Proof. The proof follows directly from Theorem 7.3.1 in conjunction with Theorem 7.1.3.
Exercise 7.3.2. Use Exercise 4.6.7 to show that in the classical domain, every entropy
function H is also a conditional entropy; that is, show that function
is a conditional entropy of classical states. Moreover, give a counter example to the same
statement in the quantum domain.
At first glance, these functions appear quite reasonable. For example, for any two states
ρ, σ ∈ D(AB) with ρAB ≻A σ AB , if ω XY ≻X ρAB , then it necessarily follows that ω XY ≻X
σ AB . Consequently,
This means H(A|B)ρ exhibits monotonic behavior under conditional majorization, aligning
with expectations for a measure of conditional uncertainty.
However, in general, H(A|B)ρ is not well defined! This is because CMO channels form
a subset of one-way LOCC and thus cannot generate entanglement. Since ω ∈ D(XY )
is classical and hence separable, any state N (ω) resulting from a one-way LOCC channel
N ∈ CMO(XY → AB) lies within SEP(AB). Therefore, H(A|B)ρ is undefined if ρAB is
entangled.
Exercise 7.3.3. Show that H(A|B)ρ is well defined if and only if I A ⊗ ρB ⩾ ρAB . Hint:
Recall Corollary 7.3.1.
Furthermore, this measure exhibits monotonic behavior under conditional majorization and
is inherently non-negative by definition. Does this not contradict Theorem ??? The answer
is no. The key lies in understanding that Hreg is only weakly additive, rather than strongly
additive, in general.
Exercise 7.3.4. Calculate Hreg (AÃ|B)ρ , where ρ is defined as ΦAB ⊗ uà . Question: Does
this result in a value of zero?
Interestingly, the function Hreg serves as an example of a function meeting all the criteria
expected of a quantum conditional entropy, except it is only weakly additive. Its non-
negativity teaches us an important lesson: the tendency of quantum conditional entropies to
assume negative values on certain entangled states is intrinsically connected to their property
of full additivity.
The up arrow in the notation above indicates the optimization over σ B . By definition,
H↑ (A|B)ρ ⩾ H(A|B)ρ for all ρ ∈ D(AB).
Proof. First, we demonstrate that H satisfies the monotonicity property of conditional en-
′ ′
tropy. Let N AB→A B be a CMO, and consider the bipartite density matrix ρAB . We begin
by considering the case where A = A′ so that
H(A B)N (ρ) = log |A| − D N (ρAB ) uA ⊗ TrA N ρAB
. (7.105)
′ ′
Since N is A ̸→ B ′ semi-causal the marginal channel N AB→B := TrA ◦ N AB→AB satisfies
′ ′
N AB→B ρAB = N AB→B uA ⊗ ρB .
(7.106)
To see this, take MA→A in (7.13) to be the completely randomizing channel. With this at
hand, we get
uA ⊗ TrA N ρAB = uA ⊗ TrA N uA ⊗ ρB
(7.107)
A B
N is conditionally unital→ = N u ⊗ ρ . (7.108)
Substituting this into (7.105) we obtain
H(A B)N (ρ) = log |A| − D N ρAB N uA ⊗ ρB
Clearly, if V is a unitary channel (i.e. A ∼ = A′ ) we have H(A′ |B)V(ρ) = H(A|B)ρ since in this
′ A A′
case |A| = |A | and V(u ) = u . We can therefore assume without loss of generality that
AB
′ ρ 0
V A→A ρAB = ρAB ⊕ 0CB :=
(7.112)
CB
0 0
′
since the conditional entropy of V A→A ρAB does not change by a unitary channel on A′ .
|A| A′
Moreover, denote by t := |A ′ | and observe that u can be expressed as
′
uA = tuA ⊕ (1 − t)uC . (7.113)
Hence, substituting (7.112) and (7.113) into (7.111) gives
H(A|B)V(ρ) = log |A′ | − D ρAB ⊕ 0CB tuA ⊗ ρB ⊕ (1 − t)uC ⊗ ρB
To prove the additivity property, let ρ ∈ D(A1 B1 ) and σ ∈ D(A2 B2 ) and observe that
since uA1 A2 = uA1 ⊗ uA2 we have
H(A1 A2 |B1 B2 )ρ⊗σ = log |A1 A2 | − D ρA1 B1 ⊗ σ A2 B2 uA1 ⊗ ρA1 B1 ⊗ uA2 ⊗ σ A2 B2
(7.115)
D is additive→ = H(A1 |B1 )ρ + H(A2 |B2 )σ .
Finally, the normalization property follows from the fact that when |B| = 1 and |A| = 2 we
have by definition
H(A|B)u = log 2 − D uA uA = 1 .
(7.116)
This completes the proof.
Exercise 7.4.1. Show that if D = Dmax then its corresponding conditional entropy H is the
conditional min entropy.
In the exercise below you will show that H ↑ behaves monotonically under conditional
unital channels and consequently behaves monotonically under conditional majorization.
However, in general, H ↑ does not necessarily satisfy the additivity property (at least a
general proof of additivity of H↑ is unknown to the author). Still, this expression has been
used extensively by the community, particularly since it can be shown that for D = Dα or
D = D̃α it is additive (here D̃α is the sandwiched Rényi divergence; see Definition 6.4.1).
This includes the Umegaki relative entropy, however, as we will see shortly, in this case
H(A|B)ρ = H ↑ (A|B)ρ = H(A|B)ρ for all ρ ∈ D(AB).
Exercise 7.4.2. Consider the function H↑ as defined above.
1. Show that H↑ does not increase under conditional unital channels, and use it to conclude
that it satisfies the monotonicity property of conditional entropy.
2. Prove that H↑ satisfies the invariance and normalization property of conditional en-
tropy.
H(A|B)ρ = H ρAB − H ρB .
(7.119)
This formula is consistent with the intuition of conditional entropy as depicted in Fig. 7.1.
The finding from the previous section, which establishes that conditional entropy is non-
negative for separable states, leads to an intriguing implication regarding the von Neumann
entropy.
Corollary 7.5.1. Let {px , ρx }x∈[m] be an ensemble of quantum states in D(A). The
von-Neumann entropy satisfies
X X X
px H(ρx ) ⩽ H px ρx ⩽ px H(ρx ) + H(p) (7.120)
x∈[m] x∈[m] x∈[m]
Proof. The lower bound follows from the concavity of H (see Exercise 6.3.3). To get the
upper bound, let ρXA := x∈[m] px |x⟩⟨x|X ⊗ ρA XA
P
x . Since ρ is a cq-state, and in particular
separable, it follows that
where ρA = px ρ A
P
x∈[m] x is the marginal state. Therefore,
X
H px ρx ⩽ H(ρAX )
x∈[m]
X (7.122)
Exercise (7.5.1)→ = px H(ρx ) + H(p) .
x∈[m]
Exercise 7.5.2. The quantum mutual information is a quantity defined for any ρ ∈ D(AB)
as
I(A : B)ρ := D ρAB ρA ⊗ ρB .
(7.124)
1. Express the quantum mutual information in terms of H(A|B)ρ and H(A)ρ .
2. Show that the von-Neumann entropy is subadditive; i.e. prove that for all ρ ∈ D(AB)
Triangle Equality
Lemma 7.5.1. Let D be the Umegaki relative entropy. Then for any ρ ∈ D(AB),
σ, τ ∈ D(B) and ω ∈ D(A), we have
D ρAB ω A ⊗ σ B = D ρAB ω A ⊗ ρB + D ρB σ B .
(7.126)
D ρAB ω A ⊗ σ B = D ρAB ω A ⊗ ρB + D ω A ⊗ ρB ω A ⊗ σ B
(7.127)
Proof. By definition
= D ρAB ω A ⊗ ρB + D ρB σ B .
D ρAB uA ⊗ σ B = D ρAB uA ⊗ ρB + D ρB σ B .
(7.131)
Therefore,
H ↑ (A|B)ρ := log |A| − min D ρAB uA ⊗ σ B
σ∈D(B)
The equality H(A|B)ρ = H ↑ (A|B)ρ reveals that the von-Neumann conditional entropy is
monotonic under conditionally unital channels that are not necessarily A ̸→ B semi-causal
(see part 1 of Exercise 7.4.2).
Another useful property satisfied by the von-Neumann entropy is known as the strong
subadditivity property. Recall from Exercise 7.5.2 that the von-Neumann entropy is subad-
ditive, that is, for any ρ ∈ D(AB),
where H(AB)ρ denotes H(ρAB ). A stronger version of this inequality, known as the ‘strong
subadditivity of the von-Neumann entropy’ states that for any ρ ∈ D(ABC) we have
Note that this is a stronger version of the previous inequality since for |B| = 1 it reduces to
subadditivity. The above inequality is unique to the von-Neumann entropy and in general
is not satisfied by other entropy functions (at least not in this form).
We can express the strong subadditivity in terms of conditional entropies. Observe that
since H(A|BC)ρ = H(ABC)ρ − H(BC)ρ and H(A|B)ρ = H(AB)ρ − H(B)ρ , the strong
subadditivity can be expressed as
This version of the strong subadditivity is perhaps more intuitive than (7.134) since it can be
interpreted as the statement that by removing the access to system C, one can only increase
the uncertainty about system A. Note also that the above form of the strong subadditivity
is satisfied by any conditional entropy function. That is, for any conditional entropy H and
ρ ∈ D(ABC) we have
H(A|BC)ρ ⩽ H(A|B)ρ . (7.136)
The above inequality is a simply consequence of the monotonicity property of conditional
entropy, since tracing out system C is a map belonging to CMO(ABC → AB). In terms of
conditional majorization, we can express it as
with equality if ρABC is a pure state. Hint: If ρABC is a mixed state, let ψ ABCD be its
purification, and express H(A|C)ρ in terms of systems A, B, D (for example, H(AC)ψ =
H(BD)ψ ). Finally, use (7.134) with D replacing C.
Closed Formula
Theorem 7.5.1. Let ρ ∈ D(AB) and α ∈ [0, 2]. Then,
↑ α h i
B 1/α
α
where ηαB := TrA ρAB
Hα (A|B)ρ = log Tr ηα . (7.140)
1−α
The conditional min-entropy is defined in terms of an SDP. To see this, first observe that
from the above formula and from the definition of Dmax we get
↑
n o
2−Hmin (A|B)ρ = min t : tI A ⊗ σ B ⩾ ρAB , σ ∈ D(B), t ∈ R
n o (7.144)
B
ΛB := tσ B −−−−→ = min Tr Λ : I A ⊗ ΛB ⩾ ρAB , Λ ∈ Pos(B) .
N (ω B ) := I A ⊗ ω B ∀ ω ∈ L(B) , (7.145)
we conclude that
↑
2−Hmin (A|B)ρ = min Tr[ΛH1 ] : Λ ∈ K1 , N (Λ) − H2 ∈ K2 .
(7.146)
The above optimization problem has precisely the same form as the conic linear programming
given in (A.52). Since the cones K1 and K2 are the sets of positive semidefinite matrices, this
conic program is an SDP program.
The above expression has a dual given by (A.57). Therefore, the conditional min-entropy
can be expressed in terms of the following optimization problem
↑
2−Hmin (A|B)ρ = max Tr[ηH2 ] : η ∈ K∗2 , H1 − N ∗ (η) ∈ K∗1
n
AB AB
B B
o (7.147)
Exercise 7.5.5→ = max Tr η ρ : η ∈ Pos(AB), η = I .
Any η AB as above is a Choi matrix; hence, it can be expressed as η AB = E ∗Ã→B ΩAÃ for
some channel E ∈ CPTP(B → A). We therefore get that
↑
h i
2−Hmin (A|B)ρ = max Tr ρAB E ∗Ã→B ΩAÃ
E∈CPTP(B→Ã)
D E
ΦAÃ E B→Ã ρAB ΦAÃ
= |A| max (7.148)
E∈CPTP(B→Ã)
F 2 E B→Ã ρAB , ΦAÃ ,
= |A| max
E∈CPTP(B→Ã)
where F is the fidelity. That is, the conditional min-entropy can be expressed in terms
of the maximal overlap of E B→Ã ρAB with the maximally entangled state. We now use
the above expression to prove that the optimized conditional min-entropy is additive under
tensor products, and thereby prove that the optimized conditional min-entropy is indeed a
quantum conditional entropy as defined in Definition 7.2.1.
Proof. Since the optimized conditional min-entropy equals H̃α↑ with α = ∞, it is left to prove
↑
that it is additive. Let ρ ∈ D(AB), τ ∈ D(A′ B ′ ), and denote by Qmin (A|B)ρ := 2−Hmin (A|B)ρ .
Therefore, the additivity of Hmin would follow from the multiplicativity of Qmin . On the one
hand, from the primal problem (7.144) we have
n ′ ′ ′ ′ ′
o
Qmin (AA′ |BB ′ )ρ⊗τ = min Tr ΛBB : I AA ⊗ ΛBB ⩾ ρAB ⊗ τ A B , Λ ∈ Pos(BB ′ )
n o
B B AA′ B B′ AB A′ B ′ ′
⩽ min Tr Λ1 Tr Λ2 : I ⊗ Λ1 ⊗ Λ2 ⩾ ρ ⊗ τ , Λ1 ∈ Pos(B) , Λ2 ∈ Pos(B )
⩽ Qmin (A|B)ρ Qmin (A′ |B ′ )τ , (7.149)
′
where in the first inequality we restricted ΛBB to have the form ΛB B
1 ⊗ Λ2 , and in the last
AA′ B B′ AB A′ B ′
inequality we replaced the condition I ⊗ Λ1 ⊗ Λ2 ⩾ ρ ⊗ τ with the two conditions
A B AB A′ B′ A′ B ′
I ⊗ Λ1 ⩾ ρ and I ⊗ Λ2 ⩾ τ .
To get the opposite inequality we use the dual expression of the conditional min-entropy
as given in (7.148). Specifically,
′ ′ 2
′ ′ ′
Qmin (AA′ |BB ′ )ρ⊗τ = |AA′ | max F E BB →ÃÃ ρAB ⊗ τ BB , ΦAÃ ⊗ ΦA Ã
E∈CPTP(BB ′ →ÃÃ′ )
′ ′
2
′ BB ′ A′ Ã′
E1B→Ã AB
E2B →Ã
AÃ
E = E1 ⊗ E2 −−−−→ ⩾ |AA | max F ρ ⊗ τ ,Φ ⊗ Φ
E1 ∈CPTP(B→Ã)
E2 ∈CPTP(B ′ →Ã′ )
where {px }x∈[m] is a probability distribution, and each ρx ∈ D(B). Furthermore, it’s im-
portant to note that CPTP(B → Ã), which is the same as CPTP(B → X̃), comprises of
POVM channels that were initially introduced in Sec. 3.5.4. Under these circumstances, the
aforementioned equation simplifies to the following (Exercise7.5.6):
↑ X
2−Hmin (X|B)ρ = max px Tr ΛB B
x ρx (7.153)
{Λx }
x∈[m]
where the maximum is over all POVMs {ΛB x }x∈[m] on system B. The expression above can
be interpreted as the maximum probability for Bob to guess correctly the value of X. Specif-
ically, given the cq-state ρXB , Bob can try to learn the classical value of X by performing a
quantum measurement/POVM, {ΛB x }x∈[m] , on his system with m := |X| possible outcomes.
The probability that X = x is px, and the probability that Bob gets the outcome y given
that X = x is given by Tr ΛB B
y ρx . If Bob’s takes y to be his guess for the value of X then
B B
Tr Λx ρx is the probability that Bob guesses correctly the value of X. Given that X = x
with probability px , we get that
X
px Tr ΛB B
Prg (X|B)ρ := max x ρx (7.154)
{Λx }
x∈[m]
is the maximal overall probability that Bob’s guess of X is correct. With this notation, the
conditional entropy of ρXB can be expressed as
↑
Hmin (X|B)ρ = − log Prg (X|B)ρ . (7.155)
According to Theorem 7.5.1 and Exercise 7.5.4, the function Hα↑ is additive for all
↑
α ∈ [0, 2]. Hence, the additivity of the conditional max-entropy (i.e., Hα=0 ) is established.
Therefore, we can affirm that the conditional max-entropy qualifies as a legitimate condi-
tional entropy measure. Moreover, given that the min-relative entropy Dmin is the smallest
relative entropy, the following inequality holds for all conditional entropies H↑ derived from
a relative entropy D (as specified in (7.104)), for any quantum state ρ ∈ D(AB):
This inequality signifies that the conditional max-entropy establishes an upper limit for all
conditional entropies defined in relation to a relative entropy. Additionally, as will be ex-
plored in subsequent discussions, the conditional max-entropy is essentially the counterpart,
or the dual, of the conditional min-entropy.
Remark. Since the conditional entropy is invariant under local isometries (specifically, H(A|C)φ
remains invariant under isometries on system C) the dual to a conditional entropy is well
defined as it does not depend on the choice of the purifying system C.
By definition, the dual to a conditional entropy satisfies the invariance and additivity
properties of conditional entropy (see Exercise 7.6.1). To see that it satisfies also the nor-
malization property of a conditional entropy, let ρAB = uA with |A| = 2 and |B| = 1. A
purification of ρAB can be expressed as the maximally entangled state ΦAC with C = Ã.
Therefore,
Hdual (A)u = Hdual (A|B)ρ
by definition→ = −H(A|C)Φ (7.160)
(7.69)→ = log 2 = 1 .
Therefore, the dual to a conditional entropy would be itself a conditional entropy if it satisfies
the monotonicity property. We will see shortly that this is indeed the case for all the
conditional entropies studied in literature, although a general proof for all conditional entropy
functions is unknown to the author.
Exercise 7.6.1. Show that the dual to a conditional entropy satisfies the invariance and
additivity properties of a conditional entropy.
The relation (7.158) implies that the conditional von-Neumann entropy is self dual; i.e.
dual
H (A|B)ρ = H(A|B)ρ for all ρ ∈ D(AB). Consider the Petz conditional Rényi entropy of
order α ∈ [0, 2] given for all ρ ∈ D(AB) by
1 h
AB α
A
i
B 1−α
Hα (A|B)ρ = log Tr ρ I ⊗ρ . (7.161)
1−α
In the lemma below we compute it’s dual.
Proof. Let X
ρAB = px |φx ⟩⟨φx |AB (7.163)
x∈[n]
be the spectral decomposition of ρAB , and let ρABC = |φ⟩⟨φ|ABC , with C ∼ = AB, be the
purification of ρAB given by
X√ 1
|φABC ⟩ = px |φx ⟩AB |φx ⟩C = ρAB 2 ⊗ I C |Ω(AB)C ⟩ . (7.164)
x∈[n]
where |Ω(AB)C ⟩ = x∈[n] |φj ⟩AB |φj ⟩C is the maximally entangled operator between system
P
be the spectral decompositions of ρB and ρAC , respectively, and consider the following
Schmidt decomposition between system B and system AC
X√ 1/2
|φABC ⟩ = qy |y⟩B |χy ⟩AC = ρB ⊗ I AC ΩB(AC) (7.168)
y∈[m]
where |ΩB(AC) = y∈[m] |y⟩B |χy ⟩AC . Substituting the above expression for |φABC ⟩ into (7.166)
P
gives
h α A 1−α i 2−α α−1 B(AC)
Tr ρAB I ⊗ ρB = ΩB(AC) I A ⊗ ρB ⊗ ρC Ω . (7.169)
we conclude that
h α A 1−α i 2−α A α−1 B(AC)
Tr ρAB I ⊗ ρB = ΩB(AC) I B ⊗ ρAC I ⊗ ρC Ω
h 2−α A α−1 i (7.171)
= Tr ρAC I ⊗ ρC .
Therefore,
1 h 2−α A α−1 i
Hα (A|B)ρ = log Tr ρAC I ⊗ ρC
1−α (7.172)
= −H2−α (A|C)ρ .
Note that the above equality is equivalent to
↑ ↑
Remark. It is noteworthy that the lemma above establishes H̃1/2 as the dual of Hmin . Conse-
↑
quently, H̃1/2 (A|B)ρ is sometimes referred to as the conditional max-entropy. However, we
↑
choose not to use this terminology here because, generally speaking, H̃1/2 (A|B)ρ ̸= Hmax (A)ρ ,
AB A B
particularly when ρ = ρ ⊗ ρ . In fact, as we will show later, the true dual of Hmin (as op-
↑
posed to Hmin ) aligns with the conditional max-entropy as defined in (7.156). Additionally,
when integrating the above lemma with (7.148), we derive the following relationship:
F E B→Ã ρAB , ΩAÃ = max F ρAE , I A ⊗ τ E .
max (7.175)
E∈CPTP(B→Ã) τ ∈D(E)
↑
Proof. We start with the expression for Qmin (A|B)ρ = 2−Hmin (A|B)ρ , given in (7.148) as
F 2 E B→Ã ρAB , ΩAÃ .
Qmin (A|B)ρ = max (7.176)
E∈CPTP(B→Ã)
For any E ∈ CPTP(B → Ã) let VE ∈ CPTP(B → ÃR) be its Stinespring’s isometry. Observe
that VFB→ÃR ρABE is a purification of F B→Ã ρAB . Moreover, since ΩAÃ is already pure,
any purification of ΩAÃ in AÃRE must be of the form ΩAÃ ⊗ χRE , where χ ∈ Pure(RE).
Hence, from the Uhlmann’s theorem we get that
2 AÃ B→Ã AB 2 AÃ RE B→ÃR ABE
F Ω ,E ρ = max F Ω ⊗ χ , VE ψ . (7.177)
χ∈Pure(RE)
Now, observe that any purification of the state ρAE := TrB ρABE in Pure(AÃER) has the
form VEB→ÃR ρABE for some E ∈ CPTP(B → Ã). Therefore, when we add the maximiza-
tion over all E ∈ CPTP(B → Ã) to both sides of the equation above we get
Qmin (A|B)ρ = max F 2 ΩAÃ ⊗ χRE , ψ AÃRE , (7.178)
ψ∈Pure(AÃRE)
ψ AE =ρAE , χ∈Pure(RE)
where on the right-hand side we replaced that maximum over all E ∈ CPTP(B → Ã) with
a maximum over all pure states ψ ∈ Pure(AÃRE) with marginal ψ AE = ρAE . Finally,
applying the Uhlmann’s theorem to the expression above we conclude that
Exercise 7.6.3. Show that Qmin is a convex function. That is, show that for every set of n
bipartite quantum states {ρAB
x }x∈[n] and every p ∈ Prob(n) we have
X X
Qmin (A|B)ρ ⩽ px Qmin (A|B)ρx where ρAB = px ρAB
x . (7.180)
x∈[n] x∈[n]
More generally, the duals of Hα↑ and H̃α↑ can also be computed and they are given by (see
the section ‘Notes and References’ below for more details)
1 1
H̃α↑dual (A|B)ρ = H̃β↑ (A|B)ρ for + = 2 , α, β ∈ [1/2, ∞]
α β (7.181)
H̃αdual (A|B)ρ = Hβ↑ (A|B)ρ for αβ = 1 , α, β ∈ [0, ∞] .
Observe that from the first equality above, by taking α = ∞ (and hence β = 1/2) we get
the statement given in the lemma above that the dual to the optimized conditional min
↑
entropy is H̃1/2 . On the other hand, from the second equality we see that the dual of Hmin
is H0↑ = Hmax (see Definition (7.5.1)). That is, for all ρ ∈ D(AB)
dual
Hmin (A|B)ρ = Hmax (A|B)ρ . (7.182)
We therefore get the following corollary.
Corollary 7.6.1. Let H be a quantum conditional entropy and suppose its dual
Hdual is also a quantum conditional entropy. Then, for all ρ ∈ D(AB)
Remark. Previously, we established that conditional entropies defined as in (7.104) are upper
bounded by the conditional max-entropy. However, the corollary above does not require the
conditional entropy H to be defined with respect to a relative entropy. Instead, it assumes
that the dual entropy Hdual is also a valid conditional entropy. It is worth noting that it
remains an open problem whether this additional assumption can be removed, i.e., whether
the upper bound provided by the conditional max-entropy applies to all conditional entropies
or just to those whose dual is also a conditional entropy.
Proof. Let φ ∈ Pure(ABC) be a purification of ρAB . From the definition of Hdual we get
H(A|B)ρ = −Hdual (A|C)φ
Theorem 7.3.1 applied to Hdual → ⩽ −Hmin (A|C)φ (7.184)
(7.182)→ = Hmax (A|B)ρ .
This completes the proof.
Exercise 7.6.4. Use the lemma above and the relations (7.181) to show that for any φ ∈
Pure(ABC) we have
Hα (A|B)φ + Hβ (A|C)φ = 0 for α + β = 2, α, β ∈ [0, 2]
1 1
H̃α↑ (A|B)φ + H̃β↑ (A|C)φ = 0 for + = 2, α, β ∈ [1/2, ∞] (7.185)
α β
Hα↑ (A|B)φ + H̃β (A|C)φ = 0 for αβ = 1, α, β ∈ [0, ∞] .
Exercise 7.6.5. In the following, use the duality relations above.
1. Show that the dual of Hα is itself a quantum conditional entropy for all α ∈ [0, 2] (i.e.
you need to show the monotonicity property).
2. Show that H̃α↑ is a quantum conditional entropy for all α ∈ [0, ∞] (i.e. you need to
show the additivity property).
3. Show that the dual of H̃α↑ is itself a quantum conditional entropy for all α ∈ [0, ∞].
4. Use part 2 to provide an alternative proof for the additivity of the optimized conditional
min-entropy.
In this context, the G-twirling map acts as a completely randomizing channel, also known
as the completely depolarizing channel.
When a quantum channel N ∈ CPTP(A → B) is applied to both sides of the equation
above, we obtain: Z
dU A N A→B U A ρAE U ∗A = τ B ⊗ ρE
(7.187)
U(A)
where τ B := N A→B uA . The decoupling theorem estimates how closely N A→B U A ρAE U ∗A
(i.e., removing the integral and considering one specific unitary matrix) can approximate the
decoupled state τ A ⊗ ρE . Our discussion begins with a lemma using the square of the
Frobenius norm for this estimation. In this lemma, we utilize the function:
m 2 2
f ω AB := √ Tr ω AB − Tr uA ⊗ ω B
∀ ω ∈ L(AB) , (7.188)
m2 − 1
where m := |A|. To simplify the notation in this section, we will omit the square brackets
AB 2
in hcertain expressions.
i For instance, in the above formula, we used Tr ω instead of
2
Tr ω AB . It’s important to note that with this revised notation, all powers are included
within the trace operation.
Exercise 7.7.1. Let ρ ∈ Pos(AB) (we also assume ρAB is not the zero matrix) and set
m := |A|.
1. Show that
1 Tr(ρAB )2
⩽ ⩽m (7.189)
m Tr(ρA )2
h i
Hint: Start by showing Tr(ρB )2 = Tr ρAB ⊗ I à I A ⊗ ρÃB and then use the
Cauchy-Schwarz inequality. For the other side, show first that ρAB ⩽ mI A ⊗ ρB .
Lemma 7.7.1. Let ρ ∈ L(AE), m := |A|, N ∈ L(A → B), and τ AB := m1 JNAB , where
JNAB is the Choi matrix of N A→B .
Z 2
∗
dU A Tr N A→B U A ρAE U A − τ B ⊗ ρE = f ρAE f τ AB ,
(7.191)
U(A)
Remark. Observe that we do not assume that ρAE is a density matrix (not even Hermitian)
nor that N A→B is a quantum channel (just a linear map). However, if ρAE ⩾ 0 we can use
the bound (7.190) in conjunction with the lemma above to get the relatively simple upper
bound Z
2 2 2
dU A NUA→B ρAE − τ B ⊗ ρE 2 ⩽ Tr ρAE Tr τ AB ,
(7.192)
U(A)
where NUA→B := N A→B ◦ U A→A , with U A→A (·) := U A (·)U ∗A , and we used the fact that any
Hermitian matrix η ∈ Herm(BE) satisfies ∥η∥22 = Tr[η 2 ]. Moreover, taking the square root
on both sides of the equation above and using Jensen’s inequality (see Sec. B.4) we obtain
that sZ
q
2
Tr (ρAE )2 Tr (τ AB )2 ⩾ dU A ∥NUA→B (ρAE ) − τ B ⊗ ρE ∥2
U(A)
Z (7.193)
Jensen′ s Inequality→ ⩾ dU A NUA→B ρ AE
− τ B ⊗ ρE
2
U(A)
Finally, it’s important to recognize that since the average of the integrand in the above
equation is less than the expression on the
left-hand side, it implies the existence
AE of at least
AB
A A→B AE B E
one unitary U for which NU ρ − τ ⊗ ρ 2 is smaller than Tr ρ Tr τ .
Proof. For simplicity of the exposition we will omit the superscript from NUA→B and simply
write it as NU . With these notations, the integrand of (7.191) can be decomposed into three
terms:
2
Tr NU ρAE − τ B ⊗ ρE
2 2 (7.194)
= Tr NU ρAE − 2Tr τ B ⊗ ρE NU ρAE + Tr τ B ⊗ ρE .
From (7.187), the integral of the second term above can be simplified as
Z
2
dU Tr τ B ⊗ ρE NU ρAE = Tr τ B ⊗ ρE .
(7.195)
U(A)
Therefore, taking the integral over U(A) on both sides of (7.194) gives
Z
2
dU Tr NU ρAE − τ B ⊗ ρE
U(A)
Z 2 2 2 (7.196)
AE
= dU Tr NU ρ − Tr τ B Tr ρE .
U(A)
To compute the remaining integral we use a linearization technique that is based on Exer-
cise 3.5.28. That is, we linearize the square in the integrand by using Exercise 3.5.28 with
the flip operator F B B̃E Ẽ = F B B̃ ⊗ F E Ẽ . Explicitly,
AE
2 h
AE
AE
B B̃E Ẽ i
Tr NU ρ = Tr NU ρ ⊗ NU ρ F
h ⊗2 i
= Tr NU⊗2 ρAE F B B̃E Ẽ (7.197)
h ⊗2 i
= Tr ρAE NU∗⊗2 F B B̃ ⊗ F E Ẽ
.
Taking the integral over U(A) on both sides and using the fact that NU∗ = U ∗ ◦ N ∗ we obtain
Z
AE
2 h
AE ⊗2 ∗⊗2 B B̃
E Ẽ
i
dU Tr NU ρ = Tr ρ G N F ⊗F , (7.198)
U(A)
Next, we make use of the fact that the twirling channel turns states to symmetric ones
(see (3.247)). Specifically, observe that from (3.247) we get
∗⊗2 B B̃
G N F = aI AÃ + bF AÃ . (7.200)
where the coefficients a, b ∈ R will be computed shortly using (3.247). Substituting (7.200)
into (7.198) gives
Z 2 h ⊗2 AÃ i
dU Tr NU ρAE = Tr ρAE aI + bF AÃ ⊗ F E Ẽ
U(A)
(7.201)
h ⊗2 E Ẽ i h ⊗2 AÃE Ẽ i
= aTr ρE F + bTr ρAE F
2 2
(3.248)→ = aTr ρE + bTr ρAE .
It is therefore left to compute the coefficients a and b. From (3.247) they can be expressed
as: h i h AÃ i
∗⊗2 B B̃ ∗⊗2 B B̃
mTr N F − Tr N F F
a := (7.203)
m(m2 − 1)
and h i h i
mTr N ∗⊗2 F B B̃ F AÃ − Tr N ∗⊗2 F B B̃
b := . (7.204)
m(m2 − 1)
To simplify the expressions above we use the definition of the adjoint map to get
h i h ⊗2 B B̃ i
Tr N ∗⊗2 F B B̃ = Tr N (I A ) F
h i
2 B ⊗2 B B̃
N I A
= JN = mτ −−−−→ = m Tr
B B
τ F (7.205)
h 2 i
2
τB
h i
F B := TrB̃ F B B̃ = I B −−−−→ = m Tr ,
and h i h i
Tr F AÃ N ∗⊗2 F B B̃ = Tr F B B̃ N ⊗2 F AÃ . (7.206)
⊗2
Moreover, since the Choi matrix of N ⊗2 is given by m2 τ AB we get
h i h h ⊗2 AÃ ii
Tr F B B̃ N ⊗2 (F AÃ ) = m2 Tr F B B̃ TrAÃ τ AB F ⊗ I B B̃
h ⊗2 AÃ i
= m2 Tr τ AB F ⊗ F B B̃ (7.207)
(3.248)→ = m2 Tr (τ AB )2 .
Exercise 7.7.2. Demonstrate clearly that substituting the expressions in the proof above for
B 2
a − Tr τ and b into (7.202) results in the equality (7.191).
Exercise 7.7.3. Using the same notations as in the lemma above, with ρ ∈ D(AE) and
N ∈ CP(A → B), show that for all σ ∈ D(E),
Z 2
A A→B A AE
A ∗ B E
⩾ f ρAE f τ AB ,
dU Tr N U ρ U −τ ⊗σ (7.210)
U(A)
Therefore, working with this expression for the twirling map, we obtain the following corol-
lary.
Decoupling Theorem
1
Theorem 7.7.1. Let ρ ∈ D⩽ (AE), N ∈ CP(A → B), and τ AB := |A| JNAB , where
JNAB is the Choi matrix of N A→B . Then,
Z
↑ ↑
∗ 1
dU A N A→B U A ρAE U A − τ B ⊗ ρE ⩽ 2− 2 H̃2 (A|E)ρ +H̃2 (A|B)τ . (7.214)
U(A) 1
Proof. In the first step of the proof we upper bound the trace norm with the Hilbert Schmidt
norm. Working with the Frobenius norm we will be able to use (7.193). From the third part
of Exercise 5.4.2 it follows that for any matrix M ∈ Herm(A) and σ ∈ Pos(A)
p
∥M ∥1 ⩽ Tr[σ] σ −1/4 M σ −1/4 2 . (7.215)
Taking σ = η B ⊗ ζ E ∈ D(BE) and
M = NU ρAE − τ B ⊗ ρE
(7.216)
gives
NU ρAE − τ B ⊗ ρE 1
− 1 − 1 (7.217)
⩽ η B ⊗ ζ E 4 NU ρAE − τ B ⊗ ρE η B ⊗ ζ E 4
2
B − 41 1
B E
The choice of η and ζ will be made later. Denoting by := (η ) ÑUA→B (·) NUA→B (·)(η B )− 4 ,
AE := E − 14 AE E − 41 AB := 1 AB
ρ̃ (ζ ) ρ (ζ ) , and τ̃ J , we get
m Ñ
Exercise 7.7.5. Use Theorem (7.7.1) and Corollary (7.7.1) to prove the corollary above.
Exercise 7.7.6. Show that if ω AB := 1t JNAB , where t := Tr JNAB (i.e. ω AB = |A|
t
τ AB is a
AB := 1 AB AB
density matrix), and similarly, σ r
ρ with r := Tr ρ then the decoupling theorem
above can be expressed as
Z
rt − 12 H̃2↑ (A|E)σ +H̃2↑ (A|B)ω
∗
dU A N A→B U A ρAE U A − τ B ⊗ ρE ⩽ 2 . (7.222)
U(A) 1 |A|
As the dimension of a physical system grows, one can employ several tools from probability
theory and statistics (e.g. the law of large numbers), to study its behaviour and properties.
Specifically, one of the main goals of quantum resource theories is to determine the rate
at which many copies of one resource can be converted into many copies of another. The
methods and tools developed here provide the foundations for several topics in this asymp-
totic domain. We start by reviewing some of these concepts and their generalizations to the
quantum world.
373
374 CHAPTER 8. THE ASYMPTOTIC REGIME
Pr (X n = xn ) ≈ 2−nH(X) , (8.3)
Despite the variety of typical sequences xn , they all share approximately the same probability
of occurrence. This effect is known as the asymptotic equipartition property, a direct result
of the (weak) law of large numbers.
Since we only consider in this book sets with finite cardinality, these conditions will trivially
hold.
1
PThe law above is very intuitive as it shows that for very large n, the probability that
n j∈[n] Xj is close to E(X) is almost one. In particular, (8.6) is equivalent to the statement
that
1 X
lim Xj = E(X) in probability. (8.7)
n→∞ n
j∈[n]
n
1 X
E(Sn2 ) = 2 E(Xj Xk ) (8.8)
n j,k=1
A key observation is that for j ̸= k, the two random variables Xj and Xk are independent,
and consequently
X
E(Xj Xk ) = xj xk pxj pxk = E(Xj )E(Xk ) = 0 (8.9)
xj ,xk ∈X
Therefore, the only contributing terms in (8.8) are those with j = k. Hence.
1 X 1
E(Sn2 ) = 2
E(Xj2 ) = E(X 2 ) . (8.10)
n n
j∈[n]
The above equation already demonstrates that for very large n the variance of Sn is very
small, indicating that it will reach a single value in the limit n → ∞. On the other hand,
E(Sn2 ) can be splitted into two terms, those for which the value of Sn is close to zero and
Hoeffding’s Inequality
Theorem 8.1.2. Let X1 , . . . , Xn be n independent random variable satisfying
aj ⩽ Xj ⩽ bj for all j = 1, . . . , n. Then,
!
1 X 2n2 ε2
Pr Xj − E(Xj ) > ε ⩽ exp − P 2
. (8.14)
n
j∈[n] j∈[n] (bj − aj )
Note that Hoeffding’s inequality above does not assume that the random variables X1 , . . . , Xn
are identically distributed. If we add this assumption (so that X1 , . . . , Xn are i.i.d.), then
we get a simplified version of Hoeffding’s inequality given by
2nε2
1 X
Pr Xj − E(X) > ε ⩽ exp − . (8.15)
n (b − a)2
j∈[n]
Hoeffding’s Lemma
Lemma 8.1.1. Let X be a real valued bounded random variable with expected value
E(X) = µ and a ⩽ X ⩽ b for some a, b ∈ R with b > a. Then, for all t ∈ R we have
t2 (b − a)2
tX
E e ⩽ exp tµ + . (8.16)
8
Proof. Consider first the case µ = 0. We therefore must have a ⩽ 0 ⩽ b. Also, it’s important
to note that if a = 0, then the condition E(X) = 0 leads to E etX = 1 (can you see why?).
As a result, the inequality (8.16) is valid under these circumstances. Therefore, we will
proceed with the assumption that a < 0. The convexity of the function f (x) := etx implies
that for any a ⩽ x ⩽ b we have
b − x ta x − a tb
etx ⩽ e + e (8.17)
b−a b−a
where we wrote x as the convex combination x = a b−x
b−a
+ b x−a
b−a
. The key idea of the inequality
above is that the right-hand side depends linearly on x. Applying this inequality to the
random variable X we get
b − E(X) ta E(X) − a tb
E etX ⩽ e + e
b−a b−a
b ta a tb (8.18)
µ = 0 −−−−→ = e − e
b−a b−a
a
c := −
b−a
−−−−→ = (1 − c + cet(b−a) )eta .
Note that c > 0 since a < 0. Finally, denote by s := t(b − a) the right-hand side of the
equation above becomes equal to
(1 − c + ces )e−cs = ef (s) , where f (s) := −cs + log (1 − c + ces ) , (8.19)
and we used the equality eta = e−cs . Consider the Tylor expansion of f (s) up to its second
order
1
f (s) = f (0) + sf ′ (0) + s2 f ′′ (q) , (8.20)
2
where q is some real number between zero and s. By straightforward calculation, we get that
f (0) = f ′ (0) = 0 and f ′′ (q) ⩽ 41 (see Exercise 8.1.2 for more details). Combining everything
we conclude that
1 2 ′′ 1 2 1 2 2
E etX ⩽ ef (s) = e 2 s f (q) ⩽ e 8 s = e 8 t (b−a) .
(8.21)
This completes the proof for the case µ = 0. The proof for the case µ ̸= 0 is obtained
immediately by defining X̃ := X − µ and applying the theorem for X̃ (see Exercise 8.1.3).
Exercise 8.1.2. Show that for f ′′ (q) ⩽ 41 . Hint: Calculate the second derivative f ′′ (q) and
show that it can be expressed as p(1 − p) for some number p > 0 (that depends on q) and use
the fact that p(1 − p) ⩽ 14 .
Exercise 8.1.3. Show that the proof for the case µ ̸= 0 in the lemma above follows imme-
diately by defining X̃ := X − µ and applying the theorem for X̃.
Y n
−rt
=e E er(Xj −E(Xj ))
j=1
n
Y (8.22)
{Xj } are independent→ = e−rt E er(Xj −E(Xj ))
j=1
n
1 2 2
Y
Hoeffding’s Lemma→ ⩽ e−rt e8r (b j −aj )
j=1
= eg(r)
where g(r) := −rt + 18 r2 j∈[n] (bj − aj )2 is a quadratic function whose minimum is given by
P
Exercise 8.1.4. Show that the minimum of the function g(r) above is given by the right-hand
side of (8.14).
Typical Sequence
Definition 8.1.1. Let ε > 0 and let X be a random variable with cardinality
|X | = m, corresponding to an i.i.d. source. A sequence of n source outputs
xn := (x1 , . . . , xn ) is called ε-typical if
By taking the log on all sides of (8.23), the condition in (8.23) can be re-expressed as
1 1
log2 − H(X) ⩽ ε . (8.24)
n Pr(X n = xn )
More generally, for any set of sequences Kn ⊆ [m]n we will use the notation Pr(Kn ) to denote
the probability that a sequence belongs to Kn . That is,
X
Pr(Kn ) := Pr(X n = xn ) . (8.27)
xn ∈Tε (X n )
2
c := 2 > 0 , (8.28)
log(pmax /pmin )
where pmin > 0 and pmax are the smallest and largest positive (i.e., non-zero) components of
p := (p1 , . . . , pm )T .
2
Theorem 8.1.3. Let p ∈ Prob(m), ε ∈ (0, 1), δn := e−cε n where c is defined
in (8.28), X be a random variable associated with an i.i.d.∼ p source, and for each
n ∈ N, let Kn ⊆ [m]n be a set of sequences with cardinality |Kn | ⩽ 2nr for some
r < H(X). Then, for all n ∈ N the following three inequalities hold:
1. Pr(Tε (X n )) > 1 − δn .
Proof. For the first inequality, we assume without loss of generality that p > 0, since any x
with px = 0 never occur and can be removed from the alphabet of X. Let Y := − log2 (X)
be the random variable whose alphabets symbols are given by Y := {− log2 Pr(X = x)}x∈[m] ,
with corresponding probabilities px := Pr(X = x). Let Y1 , Y2 , . . . be an i.i.d. sequences of
random variables where each Yj corresponds to Xj as above. By definition, each Yj satisfies
− log pmax ⩽ Yj ⩽ − log pmin . Therefore, from Hoeffding’s inequality, particularly (8.15), we
get that
1 X 2
Pr Yj − E(Y ) > ε ⩽ e−cε n . (8.29)
n
j∈[n]
Observe that
X X
E(Y ) = − px log2 (Pr(X = x)) = − px log2 px = H(X) . (8.30)
x∈[m] x∈[m]
Moreover,
1 X 1 X 1 n 1 1
Yj = − log2 Pr(Xj ) = − log2 Pr(X ) = log2 . (8.31)
n n n n Pr(X n )
j∈[n] j∈[n]
The equation above states that the probability that the random variable X n = (X1 , . . . , Xn )
2
is not an ε-typical sequence, is no greater than e−cε n . This completes the proof of first part
of the theorem.
For the second inequality, we get from the definition of ε-typical sequences that
X X
1⩾ Pr(X n = xn ) ⩾ 2−n(H(X)+ε) = |Tε (X n )| 2−n(H(X)+ε) . (8.33)
xn ∈Tε (X n ) xn ∈Tε (X n )
Therefore,
|Tε (X n )| ⩽ 2n(H(X)+ε) . (8.34)
On the other hand, observe that from the first part and the definition of ε-typical sequences
we get
X X
1 − δn ⩽ Pr(X n = xn ) ⩽ 2−n(H(X)−ε) = |Tε (X n )| 2−n(H(X)−ε) . (8.35)
xn ∈Tε (X n ) xn ∈Tε (X n )
Hence,
(1 − δn )2n(H(X)−ε) ⩽ |Tε (X n )| . (8.36)
For the last part of the proof (i.e., third inequality), let 0 < ε′ < 21 (H(X) − r). The
probability of Kn can be expressed as:
X X X
Pr(X n = xn ) = Pr(X n = xn ) + Pr(X n = xn ) (8.37)
xn ∈Kn xn ∈Kn ∩Tε′ (X n ) xn ∈Kn
xn ̸∈Tε′ (X n )
′2n
From the first part of the theorem, the last term can not exceed δn′ := e−cε so that
X X
Pr(X n = xn ) ⩽ Pr(X n = xn ) + δn′
xn ∈Kn xn ∈Kn ∩Tε′ (X n )
′
x n is ε′ -typical→ ⩽ 2−n(H(X)−ε ) |Kn | + δn′
′
(8.38)
|Kn | ⩽ 2nr −−−−→ ⩽ 2−n(H(X)−ε −r) + δn′
1 1
ε′ <
2
(H(X) − r) −−−−→ ⩽ 2−n 2 (H(X)−r) + δn′ .
1
Since both δn′ and 2−n 2 (H(X)−r) decrease exponentially fast with zero, there exists c′ > 0
′
sufficiently small such that Pr(Kn ) ⩽ e−c n . This completes the proof.
It’s also pertinent to mention that (8.32) can be expressed equivalently as:
1 1
log2 −−−→ H(X) in probability. (8.39)
n Pr(X n ) n→∞
This expression is the exact formulation of the asymptotic equipartition property, which will
be examined in greater detail later in the book.
Exercise 8.1.5. Using the same notations as in the theorem above, show that if instead of
Hoeffding’s inequality we use the law of large numbers (i.e., Theorem 8.1.1) then we can still
show that for any δ > 0 and sufficiently large n ∈ N
Exercise 8.1.6 (Variant of Part 3 of Theorem 8.1.3). Prove the following variant of part
3 of the theorem above: Let r < H(X) and let {Kn }n∈N be sets of sequences of size n, and
suppose for each a ∈ N there exists n > a such that |Kn | ⩽ 2nr . Then, for any δ > 0 and
every b ∈ N there exists n > b such that Pr(Kn ) ⩽ δ.
Proof. Suppose r > H(X). We need to show that there exists a reliable compression scheme
of rate r. Let δ > 0 and let ε > 0 be such that r > H(X) + ε. Then, from the first
2
part of Theorem 8.1.3 for all n ∈ N we have Pr(Tε (X n )) ⩾ 1 − e−cε n , where c > 0 is
a constant defined in (8.28). Let k ∈ 1, 2, . . . , |Tε (X n )| be the index labeling all the ε-
typical sequences in Tε (X n ). We assume that Alice and Bob agreed on the order before hand.
Define the compression map C : X n → {0, 1}m , with m := ⌈log2 |Tε (X n )|⌉ as follows. If xn
is the k th sequence of Tε (X n ) then C(xn ) is the binary representation of k. If xn ̸∈ Tε (X n )
then C(xn ) = (0, . . . , 0); i.e. if Bob receives the zero sequence he knows there is an error.
Now, from the second part of the theorem of typical sequences we know that
|Tε (X n )| ⩽ 2n(H(X)+ε) < 2nr . (8.42)
Therefore, for large enough n, the sequence y m = C(xn ) is of size m = ⌈log2 |Tε (X n )|⌉ ⩽ nr.
The decoding scheme D : {0, 1}m → X n is defined as follows. If y m is the zero sequence
Bob declares an error. Otherwise, if y m is the binary representation of k, then D(y m ) = xn
with xn being the k th sequence of Tε (X n ). It is left to show that the success probability goes
to one in the asymptotic limit n → ∞. Indeed, by construction,
(
0 if xn is not ε-typical
Pr Z n = xn X n = xn =
. (8.43)
1 if xn is ε-typical
Therefore, X
Pr(Z n = X n ) = Pr(X n = xn ) Pr Z n = xn X n = xn
xn ∈X n
X
(8.43)→ = Pr(X n = xn )
xn ∈Tε (X n )
= Pr(Tε (X n )) ⩾ 1 − δn .
Since limn→∞ δn = 0 we conclude that limn→∞ Pr(Z n = X n ) = 1. Hence, the compression-
decompression scheme above is reliable.
Conversely, suppose there exists compression-decompression scheme of rate r < H(X).
Then, there are at most 2nr outputs for D(y m ). Consequently, the set
Kn := { xn : D (C(xn )) = xn } , (8.44)
satisfies |Kn | ⩽ 2nr for all n. From the third part of Theorem 8.1.3 we get that limn→∞ Pr(Kn ) =
0. Hence, a compression-decompression scheme of rate r < H(X) cannot be reliable. This
completes the proof.
Note that in the proof above we showed that if r < H(X), then not only that the scheme is
not reliable, but in fact the probability that Z n = X n goes to zero; i.e. limn→∞ Pr(Kn ) = 0.
In other words, the error probability goes to one. This type of behaviour is known in
classical and quantum Shannon theories as the strong converse, whereas the weak converse
corresponds to a proof in which the error probability is shown to be bounded away from zero
as n goes to infinity (but not necessarily goes to one).
In contrast to classical sequences y n := (y1 , . . . , yn ), the quantum sequence above may not
be distinguishable, as the states {|ϕy ⟩}y∈[k] of the source are not orthogonal in general.
Imagine that Alice wishes to transmit the aforementioned state |ϕyn ⟩ to Bob. If Alice is
aware of the value of y n , she can employ Shannon’s compression coding to send y n to Bob
over a classical channel at a rate of H(Y ) (meaning the transmission of each source symbol
incurs a cost of H(Y )[c → c]). Upon receiving y n , Bob can recreate the state |ϕyn ⟩ (assuming
Bob knows the quantum source). However, as we will discuss, if Alice and Bob have access to
noiseless quantum channels, not only is it unnecessary for Alice to know y n , but she can also
transmit the state |ϕyn ⟩ to Bob more efficiently! Specifically, each source state transmission
costs H(A)ρ [q → q], where H(A)ρ := −Tr[ρA log ρA ] denotes the von Neumann entropy of
the state X
ρA := qy |ϕy ⟩⟨ϕy |A . (8.46)
y∈[k]
Observe that from the upper bound in (7.120) with ρx replaced by the pure state ϕy and p
replaced by q, we get that H(A)ρ ⩽ H(q) = H(Y ).
Without the classical knowledge of y, the quantum source generates the state ρ as men-
tioned above at each usage. Thus, we will denote by i.i.d.∼ ρ, an i.i.d. quantum source drawn
from an ensemble of states whose average state, as described in (8.46), is ρ. Additionally,
after n uses of the source, the produced state is:
ρ ⊗ ρ ⊗ · · · ⊗ ρ := ρ⊗n . (8.47)
While we assume Alice lacks access to the classical register y of the source, it’s plausible
that this value is recorded in some register system R. If R is classical, each source use
generates the cq-state:
X
ρRA := qy |y⟩⟨y|R ⊗ ϕA
y . (8.48)
y∈[k]
Alternatively, if the registrar system R is quantum, then each use of the source produces the
state X√
|ψ RA ⟩ = qy |y⟩R |ϕy ⟩A . (8.49)
y∈[k]
Both the classical and quantum register systems record the value of y. However, without
access to R, Alice and Bob cannot distinguish
P the source {qy , ϕy }y∈[k] from another source
{rz , ψz }z∈[ℓ] , whose average state is z∈[ℓ] rz ψz = ρ.
Although we assume that Alice and Bob do not have access to system R, it is necessary to
think about the quantum source with a recording system. Otherwise, without the knowledge
of y n , the states |ϕyn ⟩ become equivalent to ρ⊗n so that Bob, in principle, can prepare any
number of copies of ρ without any communication from Alice. However, if other parties have
access to y n , then they can verify that the state ρ⊗n that Bob prepared is not the original
state |ϕyn ⟩ that Alice intended to send.
Note that by applying the completely dephasing map ∆R on system R, the entangled
state |ψ RA ⟩ in (8.49) becomes the cq-state in (8.48). This demonstrates that taking the
registrar to be quantum is more general and we therefore adopt the entangled description
|ψ RA ⟩ of a quantum source. Note that in this picture, after n uses of the source, Alice shares
with the registrar the state
n n
|ψ R A ⟩ := |ψ RA ⟩⊗n . (8.50)
In the quantum version of the compression scheme discussed above, the task of Alice is to
transfer her system An to Bob using the smallest possible number of noiseless qubit channels
[q → q]. We postpone the full details of this task to volume 2 of the book where we study
quantum Shannon theory in more details.
where {px } are the eigenvalues of ρ and {|x⟩} are the corresponding eigenvectors. We define
the classical system (random variable) X to have alphabet symbols x ∈ [m] corresponding
to a probability distribution {px }. We point out that the alphabet symbols of the system Y
that we discussed above corresponds to a different probability distribution {qy }.
Now, observe that the state
n
X X X
ρ⊗n = ··· px1 px2 · · · pxn |x1 · · · xn ⟩⟨x1 · · · xn |A
x1 ∈[m] x2 ∈[m] xn ∈[m]
X n
(8.52)
:= pxn |x ⟩⟨xn |A
n
xn ∈[m]n
where |xn ⟩ := |x1 · · · xn ⟩ and pxn := px1 px2 · · · pxn . We use the notation Tε (X n ) to denote
the set of all ε-typical sequences xn with respect to a classical system X corresponding to
an i.i.d.∼ p source. It is important to note that the components of the vector p ∈ Prob(m)
are the eigenvalues of ρ. With this notation, for every such i.i.d.∼ ρA source, we define a
corresponding typical subspace
Tε (An ) := span {|x1 · · · xn ⟩ : xn ∈ Tε (X n )} ⊆ An , (8.53)
and a typical projection X
Πnε := |xn ⟩⟨xn | . (8.54)
xn ∈T ε (X n )
2
Theorem 8.2.1. Let ρ ∈ D(A), ε ∈ (0, 1), δn := e−cε n where c is defined in (8.28),
Tε (An ) and Πnε be the typical subspaces and projections associated with a quantum
i.i.d.∼ ρ source. Further, for each n ∈ N, let Pn ∈ Pos(An ) be an orthogonal
projection to a subspace with dimension Tr [Pn ] ⩽ 2nr for some r < H(A)ρ (r is
independent on n). Then, for all n ∈ N the following three inequalities hold:
Tr Πnε ρ⊗n ⩾ 1 − δn .
1. (8.55)
2. (1 − δn )2n(H(A)ρ −ε) ⩽ Tr [Πnε ] ⩽ 2n(H(A)ρ +ε) . (8.56)
′
Tr Pn ρ⊗n ⩽ e−c n for some c′ > 0 .
3. (8.57)
Proof. The proofs of the first two parts of the theorem follow from their classical counter-
parts. Particularly, for the first part
X
Tr Πnε ρ⊗n =
p xn
n n
x ∈Tε (X ) (8.58)
n
= Pr (Tε (X )) ⩾ 1 − δn .
For the second part
Tr [Πnε ] = dim (Tε (An )) = |Tε (X n )| . (8.59)
Therefore, this part follows as well from its classical counterpart. It is therefore left to prove
the third part.
We first split the trace into two parts:
Tr Pn ρ⊗n = Tr Pn ρ⊗n Πnε + Tr Pn ρ⊗n (I − Πnε ) .
(8.60)
Our objective is to show that both of these terms are going to zero as n → ∞. For the first
one X
Tr Pn ρ⊗n Πnε = pxn ⟨xn |Pn | xn ⟩
xn ∈Tε (X n )
X
pxn ⩽ 2−n(H(A)−ε) −−−−→ ⩽ 2−n(H(A)−ε) ⟨xn |Pn | xn ⟩
xn ∈Tε (X n )
(8.61)
⩽ 2−n(H(A)−ε) Tr [Pn ]
Tr [Pn ] ⩽ 2nr −−−−→ ⩽ 2−n(H(A)−ε−r) −−−→ 0 ,
n→∞
where we assumed that ε > 0 is small enough so that H(A) − r > ε. For the second term,
X X
Tr Pn ρ⊗n (I − Πnε ) = pxn ⟨xn |Pn | xn ⟩ ⩽
pxn −−−→ 0 . (8.62)
n→∞
xn ̸∈Tε (X n ) xn ̸∈Tε (X n )
1 p xn
D(p∥q) − log ⩽ε. (8.63)
n q xn
We saw in Sec. 8.1.1 that given an i.i.d.∼ p source, all typical sequences with large size n,
that are generated by the source, have approximately the same probability to occur given by
≈ 2−nH(p) . This phenomenon was dubbed as the asymptotic equipartition property (AEP).
Here we study a variant of this property as described in the following theorem.
1. Pr(Trel n
ε (X )) > 1 − δn .
2. (1 − δn )2n(D(p∥q)−ε) ⩽ Trel n
ε (X ) ⩽ 2
n(D(p∥q)+ε)
.
′
3. Pr(Kn ) ⩽ e−c n , for some c′ > 0.
Remark. Roughly speaking, the above theorem indicates that almost all sequences xn ∈ [m]n
have the same ratio pqxxnn ≈ 2−nD(p∥q) . Observe also that although we consider two probability
distributions p and q, the sequences X1 , X2 , . . . are drawn from a single p-source.
Proof. Due to the similarity of this theorem to Theorem 8.1.3, we provide here only the proof
of the first inequality, leaving the remaining two inequalities as an exercise for the reader.
Without loss of generality suppose q > 0 and let Y := log pqXX be the random variable with
alphabet {log(px /qx )}x∈[m] and corresponding probability p = (p1 , . . . , pm )T . Consider the
sequence X n = (X1 , X2 , . . . , Xn ) drawn from an i.i.d.∼ p source, and for each j ∈ [n] let
Therefore, since
1 pX n 1 pX · · · pXn
log = log 1
n qX n n qX1 · · · qXn
1 X pX (8.66)
= log j
n qXj
j∈[n]
we conclude that
1 pX n
Trel n
Pr ε (X ) = Pr log − D(p∥q) < ε
n qX n
1 X
= Pr Yj − E(Y ) < ε (8.67)
n
j∈[n]
2
Hoeffding′ s Inequality→ > 1 − e−ncε .
Exercise 8.3.2. Prove the outstanding inequalities of the aforementioned theorem. Hint:
Utilize a method similar to that applied in deriving the analogous inequalities in Theo-
rem 8.1.3.
The theorem above demonstrate that the probability that a sequence is relative ε-typical
is very high. Note however that the probability Pr Trel n
ε (X ) is computed with respect to
an i.i.d.∼ p source. If on the other hand, we change p with q we would get the probability
X
Pr Trel n
ε (X ) q
:= q xn . (8.68)
xn ∈Trel n
ε (X )
In the following exercise you show that this probability goes to zero exponentially fast with
n.
Exercise 8.3.3. Let p, q ∈ Prob(m) with supp(p) ⊆ supp(q), and ε ∈ (0, 1). Show that for
all n ∈ N
(1 − δn )2−n(D(p∥q)+ε) ⩽ Pr Trel n −n(D(p∥q)−ε)
ε (X ) q ⩽ 2 . (8.69)
Hint: Take the sum
over all xn ∈ Tε (X n ) in all sides of (8.64), and use the inequalities
1 ⩾ Pr Trel n
ε (X ) > 1 − δn .
The pair of probability vectors (p⊗n , q⊗n ) becomes more distinguishable as we increase
n. In the following corollary, we use the theorem above to characterize this distinguishability
with relative majorization.
Proof. Let ε > 0 be a small number and define the stochastic evolution matrix E ∈
STOCH(2, 2n ) by its action on the standard basis {exn := ex1 ⊗ · · · ⊗ exn }xn ∈[m]n as
(
(2)
e1 if xn ∈ Trel n
ε (X )
Eexn := (2) (8.71)
e2 if xn ̸∈ Trel n
ε (X )
(2) (2)
where e1 := (1, 0)T and e2 = (0, 1)T form the standard basis of R2 . We then get
2−n(D(p∥q)−ε) (see (8.69)) the pair of vectors on the right-hand side approaches the pair
(e1 , e2 ) as n → ∞, where {e1 , e2 } is the standard basis of R2 . Therefore, combining this
with Exercise 4.3.22 we conclude that for any s, t ∈ Prob>0 (2), and sufficiently large n ∈ N,
Similar to the notations in the previous section, for any integer n we will denote by y n :=
(y1 , . . . , yn ), qyn = qy1 · · · qyn , and |ϕyn ⟩ := |ϕy1 ⟩ ⊗ · · · ⊗ |ϕyn ⟩. Then, for any ε > 0 and n ∈ N
We also denote by Πrel,nε the projection to the relative typical subspace Trel n
ε (A ). Note that
Tr [ρ log σ] is well defined since supp(ρ) ⊆ supp(σ).
The definition provided above clearly does not revert to the classical definition of a
relative typical sequence when ρ and σ commute. Nevertheless, as we will explore in the
theorem below and the subsequent sections, this definition proves to be an effective tool for
examining the distinguishability of quantum states. Furthermore, as demonstrated in the
upcoming exercise, relative typical subspaces do indeed converge to typical subspaces in the
case where ρ = σ.
Exercise 8.3.4. Show that if ρ = σ then the relative typical subspace reduces to the typical
subspace of ρ as defined in the previous section. That is, show that in this case Trel n
ε (A ) =
Tε (An ).
Theorem 8.3.2. Let ε > 0, ρ, σ ∈ D(A) with supp(ρ) ⊆ supp(σ), and c > 0 as
defined in (8.82). Then, for all n ∈ N
⊗n 2
Tr Πrel,n ⩾ 1 − e−ncε .
ε ρ (8.76)
Therefore, denoting the relative distribution ry := ⟨ϕy |ρ|ϕy ⟩, and by Y the random variable
whose alphabet is [m], and its corresponding distribution is {ry }y∈[m] , we get that
Then, by definition, X
⊗n
Tr Πrel,n ⟨ϕyn ρ⊗n ϕyn ⟩
ε ρ =
y n ∈Cn
ε
X (8.80)
= ry n ,
y n ∈Cn
ε
where the last term is the probability that a sequence Y n belongs to Cnε . Therefore,
⊗n
Tr Πrel,n = Pr {Cnε }
ε ρ
n 1X o
= Pr E (log qY ) − log qYi ⩽ ε (8.81)
n
i∈[n]
−cnε2
Hoeffding′ s Inequality→ ⩾ 1 − e .
where qmin is the smallest non-zero eigenvalue of σ, and qmax is the largest eigenvalue
of σ.
Type of a Sequence
Definition 8.4.1. Let n, m ∈ N. For every xn := (x1 , . . . , xn ) ∈ [m]n and z ∈ [m], let
N (z|xn ) be the number of elements in the sequence xn that are equal to z. The type
of the sequence xn is a probability vector in Prob(m) given by
T 1
t(xn ) := t1 (xn ), . . . , tm (xn ) , where tz (xn ) := N (z|xn ) ∀ z ∈ [m]. (8.84)
n
For example, for m = 3, the type of the sequence x6 = (2, 1, 1, 3, 2, 2) is the probability
vector t(x6 ) = (1/3, 1/2, 1/6).
The significance of types comes into play when considering an i.i.d∼ p source. In this
case, the probability of a sequence xn ∈ [m]n drawn from the source is given by
N (1|x ) n n)
pxn := px1 · · · pxn = p1 · · · pN
m
(m|x
N (z|xn ) log2 pz
P
∀ r > 0 r = 2log r −−−−→ = 2 z∈[m]
P n ) log (8.85)
N (z|xn ) = ntz (xn ) −−−−→ = 2n z∈[m] tz (x 2 pz
−n H(t(xn ))+D(t(xn )∥p)
=2 ,
where H(t(xn )) is the Shannon entropy of the type of the sequence xn , and D (t(xn )∥p) is
the KL-divergence between t(xn ) and p (see (5.24)). The above formula manifest that the
probability distributions of sequences drawn from an i.i.d. source only depend on the type
of the sequence. As we will see below, this property can lead to a significant simplification
in some applications.
We denote by Type(n, m) ⊆ Prob(m) the set of all types of sequences in [m]n . For
example, for sequences of bits (i.e. m = 2)
( T T )
n−1 1 n−2 2 T
Type(n, 2) = (1, 0)T , , , , , . . . , (0, 1) (8.86)
n n n n
Note that any type t ∈ Type(n, m) has m components of the form nk where k ∈ {0, . . . , n}.
Therefore, the number of types in Type(n, m) cannot exceed (n + 1)m , which is polynomial
in n. The exact number of types can be computed using the “stars and bars” method in
combinatorics. It is given by
n+m−1
|Type(n, m)| = ⩽ (n + 1)m . (8.87)
n
On the other hand, the number all sequences of size n is mn which is exponential in n.
The set of all sequences xn of a given type t = (t1 , . . . , tm ) will be denoted as X n (t). We
emphasize that X n (t) denotes a set of all sequences in [m]n whose type is t, whereas t(xn )
denotes a single probability vector (i.e. the type of a specific sequence xn ). The number of
sequences in the set X n (t) is given by the combinatorial formula of arranging nt1 , . . . , ntm
objects in a sequence,
n n n!
|X (t)| = := Qm . (8.88)
nt1 , . . . , ntm x=1 (ntx )!
The above formula is somewhat cumbersome, but by using Stirling’s approximation we can
find simpler lower and upper bound.
Proof. Let xn be a sequences of size n drawn from an i.i.d. source according to the distribu-
tion t. Then, X X
1= txn ⩾ txn
xn ∈[m]n xn ∈X n (t)
X
(8.85) with p = t −−−−→ = 2−nH(t) (8.90)
xn ∈X n (t)
= |X n (t)|2−nH(t) .
This proves that |X n (t)| ⩽ 2nH(t) . For the other inequality, we make use the Stirling’s
bounds √ 1 1
2πnn+ 2 e−n ⩽ n! ⩽ enn+ 2 e−n . (8.91)
By using the lower bound for n! and the upper bound for each (ntx )! of (8.88) we get that
√ 1
n n! 2πnn+ 2 e−n
|X (t)| = Qm ⩾ Qm ntx + 21 −ntx
x=1 (ntx )! x=1 e(ntx ) e
√ 1
2πn 2
= m Q ntx + 12
(8.92)
em n 2 m tx
√ x=1√
2π n
= √ m√ 2−nH(t) .
(e n) t1 · · · tm
It is left as an exercise (see Exercise 8.4.1) to show that
√ √
2π n 1
√ m√ ⩾ (8.93)
(e n) t1 · · · tm (n + 1)m
for all n, m, and t1 , . . . , tm .
Exercise 8.4.1. Prove the inequality in (8.93). Hint: Use the fact that the product t1 · · · tn
is Schur concave and achieves its maximum when t1 = t2 = · · · = tm = m1 .
Exercise 8.4.2. Let K ⊂ Type(n, m) be a set of probability distributions (that are types),
and define Cn := {xn ∈ [m]n : t(xn ) ∈ K}. Fix q ∈ K. Show that
X
Pr(Cn )q := q xn (8.94)
xn ∈Cn
approaches one in the limit n → ∞. Hint: Denote by Ccn the complement of Cn in [m]n and
show that Pr(Ccn )q approaches zero in the limit n → ∞. Use (8.85), (8.89), and the fact that
D(t∥q) > 0 for any type t ̸= q.
Exercise 8.4.3. Let n ∈ N. Use the strong Stirling’s approximation, which states that
√ n n √ n n 1
2πn ⩽ n! ⩽ 2πn e 12n (8.95)
e e
to show that:
2nh(p)
n
⩽p (8.96)
np πnp(1 − p)
where spec(σ) = {q1 , . . . , qm }, and {Px }x∈[m] are orthogonal projectors. For n copies of sigma
X
σ ⊗n = q xn Pxn (8.99)
xn ∈[m]n
where qxn := qx1 · · · qxn and Pxn = Px1 ⊗ · · · ⊗ Pxn . From (8.85) the probability qxn =
H(t(xn ))+D(t(xn )∥q)
2−n depends only on the type of xn . Therefore, we can express σ ⊗n as
X
⊗n −n H(t)+D(t∥q)
σ = 2 Pt (8.100)
t∈Type(n,m)
Note that the set {Pt }t∈Type(n,m) is itself a set of orthogonal projectors. The significance of
the formula above is that the number of terms in the sum is given by
which is polynomial in n. Therefore, the original sum in (8.99) that consists of mn terms,
has been reduced to a sum with a polynomial number of terms.
This exponential reduction in the number of terms can be applied to the pinching map
PH given in Eqs. (3.227,3.233). For the case that H = σ ⊗n for some σ ∈ D(A) we have
|spec(H)| = |spec (σ ⊗n )| ⩽ (n + 1)m . Therefore, when combined with the pinching inequal-
ity (3.235) we conclude that for all ρ ∈ D(An )
n 1 n
Pσ⊗n ρA ρA .
⩾ m
(8.103)
(n + 1)
We will see later on that this inequality can be very useful as polynomial terms such as
(n + 1)m turns out to be “negligible” in some applications.
where
Kn := xn ∈ [m]n : t(xn ) ∈ C ∩ Type(n, m) .
(8.105)
When n is very large one can expect the type of xn to be relatively close to q (we will
make this notion precise in the next subsection when we study strong typicality). Therefore,
if q ̸∈ C one can expect that the probability Prn (C) decreases with n. Indeed, Sanov’s
⋆
theorem states that for large n we have Prn (C) ≈ 2−nD(p ∥q) where the probability vector
p⋆ ∈ Prob(m) is defined as
p⋆ := arg min D(p∥q) , (8.106)
p∈C
where D is the KL-divergence. This result has a geometrical interpretation that the expo-
nential decay of Prn (C) is increasing with the “distance” (as measured by the KL-divergence)
of q from the set C (see Fig. 8.2).
Figure 8.2: Sanov’s Theorem. The exponential decay factor is determined by the distance of q
from C (as measured by the KL-divergence). The green triangle represents the probability simplex
Prob(m), and the oval shape the set C.
Sanov’s Theorem
Theorem 8.4.1. Let C ∈ Prob(m) be a non-empty set of probability distributions
such that C is the closure to its interior, and consider an i.i.d.∼ q source. Using the
same notations as above,
1
lim − log Prn (C) = min D(p∥q) := D(p⋆ ∥q) . (8.107)
n→∞ n p∈C
Remark. Note that while the vector p⋆ is not necessarily in n∈N Type(n, m), there exists
S
a sequence of types {pn }n∈N , with pn ∈ C ∩ Type(n, m) for sufficiently large n, such that
pn → p∗ as n → ∞.
Proof. We will prove the theorem by finding upper and lower bounds for Prn (C). For the
upper bound, using (8.85) we get
H(t(xn ))+D(t(xn )∥q)
X X
Prn (C) = 2−n = |X n (t)|2−n H(t)+D(t∥q)
xn ∈Kn t∈C∩Type(n,m)
X
(8.89)→ ⩽ 2−nD(t∥q)
t∈C∩Type(n,m)
X ⋆ ∥q)
(8.108)
By definition of p → ⩽ ⋆
2−nD(p
t∈C∩Type(n,m)
⋆ ∥q)
⩽ |Type(n, m)|2−nD(p
⋆ ∥q)
(8.87)→ ⩽ (n + 1)m 2−nD(p .
Note that we got this upper bound without assuming that C is the closure of its interior.
We only assumed that C is non-empty so that p⋆ exists.
For the lower bound, let {pn }n∈N , with pn ∈ C ∩ Type(n, m) for sufficiently large n, such
that pn → p∗ as n → ∞ (see the remark above). Then, for sufficiently large n we have
X
Prn (C) = |X n (t)|2−n H(t)+D(t∥q)
t∈C∩Type(n,m)
T aking only the term n −n H(pn )+D(pn ∥q) (8.109)
t=pn in the sum → ⩾ |x (pn )|2
1
(8.89)→ ⩾ 2−nD(pn ∥q) .
(n + 1)m
In other words, xn is ε-typical if the entropy of its type approximates the entropy of p,
with a negligible correction term D (t(xn )∥p) as t(xn ) converges to p. To establish a more
robust concept of typicality, one might require that the type of xn is ε-close to p. This is
a somewhat more natural requirement and the question is then: which metric to use? The
fundamental requirement is that each element of t(xn ) should be ε-close to its counterpart
in p, necessitating
|pz − tz (xn )| ⩽ ε ∀ z ∈ [m] . (8.112)
Note that any sequence xn that satisfies ∥p − t(xn )∥p ⩽ ε (for some p ⩾ 1) will satisfies
the above equation, and therefore it impose a slightly stronger condition on the sequence
xn . On the other hand, the condition ∥p − t(xn )∥∞ ⩽ ε is precisely equivalent to the above
condition, and we will therefore use it to measure the distance between t(xn ) and p.
Note that, in accordance with our previous notation, the condition stipulated in the
above definition can be equivalently expressed as:
N (z|xn )
pz − ⩽ε ∀ z ∈ [m] such that pz > 0 , (8.114)
n
and N (z|xn ) = 0 whenever pz = 0. This latter condition intuitively implies that a typ-
ical sequence should not include any alphabet characters that have a zero probability of
occurrence.
In the next theorem we prove several properties of typical sequences. We will use the
notation X
Pr(Tst n
ε (X )) := p xn (8.115)
xn ∈Tst n
ε (X )
to denote the probability that a sequences is strongly ε-typical (with respect to an i.i.d.∼ p
source). Moreover, we set the constant a > 0 to be
Y
a := − log pz . (8.116)
z∈supp(p)
Remark. Observe that the probability that a sequence is strongly ε-typical approach one
exponentially fast with n. The second property highlights the equipartition property, indi-
cating that the probability of every typical sequence approximately 2−nH(X) . The first and
third properties bear resemblance to their counterparts related to weak typical sequences.
However, due to the subtle distinctions between the two concepts of typicality, the upcoming
proof will incorporate additional tools to address these differences.
Proof. For any z ∈ [m], let 1z (X) be the indicator random variable that equals 1 if X = z
and 0 otherwise. Fix z ∈ [m], and let Z1 , Z2 , . . . be an i.i.d. sequences of random variables
where each Zj := 1z (Xj ) is an indicator random variable as above that corresponds to Xj .
From the law of large numbers, particularly the application of Hoeffding’s inequality (8.15)
to the sequence Z1 , . . . , Zn we get
1 X 2
Pr Zj − E(Z) > ε ⩽ e−2nε , (8.117)
n
j∈[n]
where we used the fact that 0 ⩽ Zj ⩽ 1 so that the constants a and b in (8.15) are given by
a = 0 and b = 1. By definition,
X
E(Z) = px δxz = pz , (8.118)
x∈[m]
and
1 X 1 X 1
Zj = δxj z = N (z|xn ) = tz (xn ) . (8.119)
n n n
j∈[n] j∈[n]
The equation above holds for all z ∈ [m] and all n ∈ N. Hence, it states that the probability
that the random variable X n = (X1 , . . . , Xn ) is not an ε-typical sequence, is no greater than
2
e−2nε . This completes the proof of first part of the theorem.
Suppose now that xn ∈ Tst n
ε (X ) and observe that
(z|xn )
Y
p xn = p x1 p x2 · · · p xn = pN
z . (8.121)
z∈supp(p)
pz − ε ⩽ tz (xn ) ⩽ pz + ε . (8.123)
That is,
1
−εa − H(X) ⩽ log pxn ⩽ εa − H(X) (8.125)
n
Multiplying both sides by n and raising all sides to the power of 2 completes the proof of
the second part.
To prove the third part, observe that from the lower bound of Part 2 we get
X X
1⩾ p xn ⩾ 2−n(H(X)+εa)
xn ∈Tst n
ε (X ) xn ∈Tst n
ε (X ) (8.126)
st n −n(H(X)+εa)
= Tε (X ) 2 .
Therefore, |Tst n
ε (X )| ⩽ 2
n(H(X)+εa)
. On the other hand, from Part 1 we get
2
X
1 − e−2nε ⩽ p xn
xn ∈Tst n
ε (X )
X
Upper bound of Part 2→ ⩽ 2−n(H(X)−εa) (8.127)
xn ∈Tst n
ε (X )
−n(H(X)−εa)
= Tst n
ε (X ) 2 .
2
Hence, 1−e−2nε 2n(H(X)−εa) ⩽ |Tst n
ε (X )|. This completes the proof of the third part.
Exercise 8.5.1. Let xn ∈ [m]n and y k ∈ [m]k be two sequences of size n and k, respectively,
drawn from the same i.i.d.∼ p source. Show that if both xn and y k are strongly ε-typical then
also the joint sequence (xn , y k ) ∈ [m]n+k is strongly ε-typical with respect to the i.i.d.∼ p
source.
Exercise 8.5.2. Let 1 < n ∈ N, ε ∈ ( n1 , 1), and consider an i.i.d.∼ p source, where
p ∈ Prob(m).
1. Show that if a sequence xn−1 is strongly ε-typical then for any y ∈ [m] the sequence
xn := (y, xn−1 ) is strongly (ε + n1 )-typical.
2. Let ε′ := ε − n1 . Show that T′ ⊂ Tst n
ε (X ), where
where {px }x∈[m] are the eigenvalues of ρ and {|x⟩}x∈[m] are the corresponding eigenvectors.
As before we denote
n
X
ρ⊗n = pxn |xn ⟩⟨xn |A (8.131)
xn ∈[m]n
where |xn ⟩ := |x1 , . . . , xn ⟩ and pxn := px1 px2 · · · pxn . For any i.i.d. quantum source ρ we
define a corresponding strongly typical subspace
Tst n
n
ε (A ) := span |x ⟩ ∈ An : xn ∈ Tst n
ε (X ) (8.132)
where Tst n
ε (X ) is the set of (classical) sequences of size n, drawn from an i.i.d.∼ p source
(with p being the probability vector whose components are the eigenvalues of ρ). The
strongly typical projection to this subspace is given by
X
Πn,st
ε := |xn ⟩⟨xn | . (8.133)
xn ∈Tst n
ε (X )
Theorem 8.5.2. Let ρ ∈ D(A), ε > 0, and for each n ∈ N let Tεn (ρ) and Πn,st
ε be the
strongly typical subspace and projection associated with a quantum i.i.d.∼ ρ source.
The following inequalities hold for all n ∈ N:
⊗n 2
1. Tr [Πn,st
ε ρ ] ⩾ 1 − e−2ε n .
3. (1 − δ)2n(H(ρ)−εa) ⩽ Tst n
ε (A ) ⩽ 2
n(H(ρ)+εa)
.
The proof follow directly from the classical version of this theorem and is left as an
exercise.
Exercise 8.5.4. Let ρ ∈ D(A), ε > 0, integer m = o(n) (e.g. m = ⌊ns ⌋ for some 0 < s < 1),
and σm ∈ D(Am ). Let also Πn,st
ε be the strongly typical projection associated with the quantum
i.i.d.∼ ρ source. Show that
assumption can be high. However, with an increase in the number of rolls, her probability
of making a mistake significantly reduces. This prompts a natural question: how rapidly
does the error probability decrease as the number of rolls, n, gets larger and larger? This
situation encapsulates the essence of a classical hypothesis testing problem.
In the realm of hypothesis testing, an observer or player aims to decide between two
hypotheses related to two i.i.d. sources. These hypotheses are represented as the p-source
and the q-source. Upon n independent interactions with this source, the observer receives
a sequence denoted as xn = (x1 , . . . , xn ) that belongs to the set [m]n . The challenge is to
ascertain the correct hypothesis based on this sequence.
The observer’s decision-making process can be represented by a function gn : [m]n →
{0, 1}. This function divides all potential sequences into two distinct groups:
1. The set {xn ∈ [m]n : gn (xn ) = 0} corresponds to the first hypothesis. Here, the
observer believes the sequences in this set are from the p-source.
2. The set {xn ∈ [m]n : gn (xn ) = 1} pertains to the second hypothesis, indicating that
the observer surmises the sequences are from the q-source.
Given this decision-making framework, two potential errors can emerge:
1. Type I Error. The observer incorrectly concludes that the sequence is from the q-
source when, in reality, it is from the p-source. The probability of this error occurring
is: X
α(gn ) := pxn . (8.135)
xn ∈[m]n
gn (xn )=1
2. Type II Error. The observer mistakenly assumes the sequence is from the p-source
when it actually originates from the q-source. The likelihood of this error is:
X
β(gn ) := q xn . (8.136)
xn ∈[m]n
gn (xn )=0
In the two errors above, we considered a deterministic hypothesis test, where the func-
tion gn : [m]n → {0, 1} remains fixed. A more general approach introduces an element of
randomness to the problem. Here, the observer randomly selects the function gn based on a
specific probability distribution.
To illustrate, consider a set of ℓ functions denoted as {gn,k }k∈[ℓ] , where each gn,k : [m]n →
{0, 1}. Accompanying these functions is a probability vector s ∈ Prob(ℓ). In this probabilis-
tic framework, when given the sequence xn , the observer first samples a values k according to
the distribution s. The observer then attributes the sequence to the p-source if gn,k (xn ) = 0,
and to the q-source if gn,k (xn ) = 1. It’s crucial to note that for each k ∈ [ℓ]:
X
β(gn,k ) = qxn = q⊗n · bk (8.137)
xn ∈[m]n
gn,k (xn )=0
n
where the xn -component of the bit vector bk ∈ {0, 1}m is one if gn,k (xn ) = 0 and zero
otherwise. The vector
X
t := sk bk (8.138)
k∈[ℓ]
is termed the probabilistic hypothesis test. With the aforementioned notations and consider-
ing this broader context, the two types of errors can be described as follows:
1. Type I Error. This pertains to the likelihood of the observer incorrectly attributing
the sequence to the q-source when it originates from the p-source:
X X
α(t) := sk pxn = 1 − p⊗n · t . (8.139)
k∈[ℓ] xn ∈[m]n
gn,k (xn )=1
2. Type II Error. This represents the chance of the observer mistakenly deducing the
sequence belongs to the p-source when it is from the q-source:
X X
β(t) := sk qxn = q⊗n · t . (8.140)
k∈[ℓ] xn ∈[m]n
gn,k (xn )=0
From its definition (8.138), all the components of the probabilistic hypothesis test vector t are
n n
between zero and one (i.e. t ∈ [0, 1]m ). Conversely, any vector in [0, 1]m can be expressed
n
as a convex combination of bit vectors in {0, 1}m . Hence, t uniquely characterizes the
probabilistic hypothesis test performed by the observer.
The goal of the observer is therefore to choose a probabilistic test vector t such that both
types of error are very small. There are two common ways to do that, and we discuss both
now. The first one is the asymmetric method in which the observer minimizes the type II
error, β(t), while at the same time keep the type I error, α(t), below a certain threshold
ε > 0. The optimal way to do it is characterized by the Stein’s lemma. The second method,
also known as the symmetric way, in which one assumes a prior {s0 , s1 } known to the observer
in which the p-source occur with probability s0 , and the q-source with probability s1 . In
this case, the goal is to minimize the error probability that is given by s0 α(t) + s1 β(t). The
optimal value of this probability of error is characterized by the Chernoff information. A
fundamental instrument in these methods is the divergence used in hypothesis testing.
Definition 8.6.1. For any p, q ∈ Prob(m) and ε ∈ [0, 1), the classical hypothesis
testing divergence is defined as
ε
(p∥q) := − log min q · t : p · t ⩾ 1 − ε , t ∈ [0, 1]m
Dmin (8.141)
where the minimization is over all probabilistic hypothesis test vectors t whose
components are in the interval [0, 1].
= Dmin (p∥q) .
ε
To see that Dmin in the definition above is indeed an (unnormalized) divergence, let p, q ∈
Prob(m), E ∈ STOCH(n, m), and observe that
ε
(Ep∥Eq) = − log min (Eq)T s : (Ep)T s ⩾ 1 − ε , s ∈ [0, 1]n
Dmin
−−−−→ ⩽ − log min q · t : p · t ⩾ 1 − ε , t ∈ [0, 1]m
Replacing E T s∈[0,1]m (8.143)
with arbitrary t∈[0,1]m
ε
= Dmin (p∥q) .
Exercise 8.6.1. Show that the constraint p · t ⩾ 1 − ε in (8.141) can be replaced with
ε
p · t = 1 − ε (i.e. both constraints leads to the same value of Dmin (p∥q)).
ε
Exercise 8.6.2. Show that for all p, q ∈ Prob(m), Dmin (p∥q) is non-decreasing in ε, and
ε
Dmin (p∥q) ⩾ − log(1 − ε) , (8.144)
with equality if p = q.
The classical hypothesis testing divergence is closely related to the testing region defined
in (4.139). To see the connection, first observe that we can replace the condition p · t ⩾ 1 − ε
in (8.141) with the equality p·t = 1−ε (since any t that satisfies p·t > 1−ε is not optimal).
With this change, the optimal q · t in (8.141) can be interpreted as the lowest point of the
intersection of the testing region T(p, q) with the vertical line x = 1 − ε (see Fig. 8.3). That
is, the optimal q · t is the y-component of the lower Lorenz curve LC(p, q) at x = 1 − ε.
We can use the above geometrical interpretation of the hypothesis testing divergence to
ε
obtain a closed formula for Dmin . Without loss of generality, suppose that the components
of p and q are ordered as in (4.116). Then, from Theorem 4.3.3 we know that the vertices
of the lower Lorenz curve of (p, q) are given by {(ak , bk )}mk=0 as defined in (4.142). Let ℓ be
Figure 8.3: The location of the point in the testing region T(p, q) with the optimal testing vector
t that minimizes (8.141).
an integer such that aℓ < 1 − ε ⩽ aℓ+1 . Then, the optimal point on the lower Lorenz curve is
located between the ℓ and the ℓ + 1 vertices. The line between these two vertices has a slop
bℓ+1 − bℓ qℓ+1
= . (8.145)
aℓ+1 − aℓ pℓ+1
Hence, the y component of the optiml point is given by
qℓ+1
q · t = bℓ + (1 − ε − aℓ ) . (8.146)
pℓ+1
To summarize, we can express the hypothesis testing divergence as
ε
qℓ+1
Dmin p q = − log bℓ + (1 − ε − aℓ ) (8.147)
pℓ+1
ε
where ℓ ∈ {0, . . . , m − 1} is the integer satisfying aℓ < 1 − ε ⩽ aℓ+1 . Recall that Dmin
is non-decreasing with ε (see Exercise 8.7.4), as it is also evident from the equation above.
Therefore, we can bound the hypothesis testing divergences by taking above the two extreme
cases 1 − ε = aℓ and 1 − ε = aℓ+1 to get the simpler bounds
ε
− log bℓ+1 ⩽ Dmin p q ⩽ − log bℓ (8.148)
where ℓ, as before, is the integer satisfying aℓ < 1 − ε ⩽ aℓ+1 .
Remark. The theorem above states that the type II error can be made as small as ≈ 2−nD(p∥q)
while at the same time keeping the type I error below the threshold ε. The rate of this
exponential decay is given by the KL-divergence. We postpone the proof of this theorem to
the next section in which we prove the more general theorem known as the quantum Stein’s
lemma. For the interested reader, we also provide two more direct proofs of this theorem
(that applicable only to the classical case) in Appendix D.4.
= s0 − maxm (s0 p − s1 q) · t
t∈[0,1]
X
= s0 − (s0 px − s1 qx )+ (8.151)
x∈[m]
s1 X
λ := → = s0 1 − (px − λqx )+
s0
x∈[m]
3. Show that
1 X
lim inf − log Prerror (p⊗n , q⊗n , s0 ) ⩾ − log pαx qx1−α . (8.156)
n→∞ n
x∈[m]
The exercise above demonstrates that in the asymptotic limit the optimal probability of
error is bounded by
1
lim inf − log Prerror (p⊗n , q⊗n , s0 ) ⩾ ξ(p, q) , (8.157)
n→∞ n
where X
ξ(p, q) := − log min pαx qx1−α
α∈[0,1]
x∈[m] (8.158)
Definition 6.2.2→ = max (1 − α)Dα (p∥q) .
α∈[0,1]
In the following theorem we show that the inequality in (8.157) is in fact an equality.
Remark. Note that the Chernoff bound ξ(p, q) does not depend on s0 . Moreover, the
theorem above also states that for very large n we have that the probability of error,
Prerror (p⊗n , q⊗n , s0 ) ≈ 2−nξ(p,q) , decays exponentially fast with n with an exponential factor
given by the Chernoff bound.
Proof. We need to prove the opposite inequality of (8.157). We will establish it by finding a
lower bound on the probability of error Prerror (p⊗n , q⊗n , s0 ). For this purpose, set λ := s1 /s0
and K := {x ∈ [m] : px ⩾ λqx }. From (8.151) we have
X X nX X o
Prerror (p, q, s0 ) = s0 px + s1 qx ⩾ min px , qx , (8.160)
x∈Kc x∈K x∈Kc x∈K
Next, we characterize the set Kn . From (8.85) the inequality pxn ⩾ λqxn holds if and only if
−n H(t(xn ))+D(t(xn )∥p) −n H(t(xn ))+D(t(xn )∥q)
2 ⩾ λ2 , (8.162)
which is equivalent to
1
D (t(xn )∥q) − D (t(xn )∥p) ⩾ log λ . (8.163)
n
Therefore, denoting by
1
Cn := t ∈ Prob(m) : D (t∥q) − D (t∥p) ⩾ log λ (8.164)
n
The first sum corresponds to the probability (with respect to an i.i.d.∼ p source) that a
sequence of size n has a type belonging to Ccn , whereas the second sum corresponds to the
probability (with respect to an i.i.d.∼ q source) that a sequence of size n has a type belonging
to Cn . These probabilities are very similar to the one appearing in Sanov’s theorem (see
Theorem 8.4.1) except that here the set Cn depends on n. To remove this dependancy on n,
observe that in the limit n → ∞ the set Cn approaches the set
and similarly, the set Ccn (in Prob(m)) approaches the set
D(p⋆ ∥p) := min D(r∥p) and D(q⋆ ∥q) := min D(r∥q) . (8.168)
r∈S r∈C
From definitions and the continuity of the relative entropy, there exists two sequences of
vectors {pn }n∈N and {qn }n∈N with limits pn → p∗ and qn → q∗ as n → ∞ such that for
each n ∈ N the vector pn ∈ Ccn ∩Type(n, m) and the vector qn ∈ Cn ∩Type(n, m). Therefore,
following similar lines as given in Sanov’s theorem we get for the first sum
X X
n −n H(t)+D(t∥p)
p xn = |X (t)|2
xn ∈[m]n t∈Ccn ∩Type(n,m)
t(xn )∈Ccn
(8.169)
⩾ |X n (pn )|2−n H(pn )+D(pn ∥p)
T aking only the term
t=pn in the sum →
1
(8.89)→ ⩾ m
2−nD(pn ∥p) .
(n + 1)
∂L(r) rx qx
= log(e) + log + µ log +ν =0 (8.174)
∂rx px px
which gives after isolating rx
µ
qx
rx = apx = apx1−µ qxµ (8.175)
px
p1−µ
x qx
µ
rx = P 1−µ µ (8.176)
x′ ∈[m] px′ qx′
Hence, denoting by rµ the probability vector whose components are as above we conclude
that
min D(r∥p) = D(rµ ∥p) (8.177)
r∈K
Exercise 8.6.6. Prove the second equality of (8.172). Hint: Suppose that the minimum is
obtained for some r ∈ Prob(m) that satisfies D (r∥q) < D (r∥p) and get a contradiction by
showing that the vector t = (1 − ε)r + εp (with small ε > 0) satisfies D(t∥p) < D(r∥p).
Exercise 8.6.7.PProve Eq. (8.180). Hint: Show that the condition D(rµ ∥p) = D(rµ ∥q)
qx
is equivalent to x∈[m] p1−µ µ
x qx log px = 0 and compare it with the derivative of the function
f (s) := x∈[m] p1−s s
P
x qx .
the measurement outcome x she may infer the state to be ρ or σ. In this section, we delve
into the best strategy Alice can employ to accurately determine the state of her quantum
system.
We note that it’s adequate to contemplate POVMs composed of only two elements. We
can define Λ ∈ Eff(A) to be the sum of all effects {Λx }, where x leads Alice to infer ρ.
Conversely, I − Λ is the sum of the remaining POVM elements, corresponding to x values
that result in Alice inferring σ. Two types of errors might arise:
1. Type I Error: Alice possesses the state ρ but incorrectly infers it as σ. The associated
probability is:
α(Λ) := Tr [ρ(I − Λ)] (8.181)
2. Type II Error: Alice has the state σ but mistakenly deduces it as ρ. The correspond-
ing probability is:
β(Λ) := Tr [σΛ] (8.182)
As in the classical scenario, we explore strategies to minimize the error probabilities α(Λ)
and β(Λ). With the asymmetric approach, the objective is to minimize the Type II error,
β(Λ), while ensuring that the Type I error, α(Λ), stays beneath a specific threshold ε > 0.
The optimal approach is encapsulated by the quantum Stein’s lemma. In the symmetric
strategy, the observer is aware of a prior {s0 , s1 } where ρ occurs with probability s0 , and σ
with probability s1 . The aim here is to reduce the overall error probability represented by
s0 α(Λ) + s1 β(Λ).
It’s worth noting that any pair of quantum states ρ, σ ∈ D(A) that aren’t identical satisfy
Tr[ρσ] < 1. This implies:
In essence, as n approaches infinity, the states ρ⊗n and σ ⊗n become orthogonal with respect
to the Hilbert-Schmidt inner product. Naturally, we might question the rate at which these
states turn distinguishable. Both the quantum Stein’s lemma and the quantum Chernoff
bound address this question, in the asymmetric and symmetric contexts, respectively.
Exercise 8.7.1. Show that the optimal probability β ∗ (ε) = 0 for all ε ⩾ 0 if and only if ρ
and σ satisfies ρσ = 0.
By taking the − log of the above error probability one obtains a quantity known as the
quantum hypothesis testing divergence.
Definition 8.7.1. The quantum hypothesis testing divergence is defined for all
ρ, σ ∈ D(A) and ε ∈ [0, 1) as
ε
Dmin (ρ∥σ) := − log min Tr [σΛ] : Tr [ρΛ] ⩾ 1 − ε . (8.185)
Λ∈Eff(A)
The hypothesis testing divergence is invariably non-negative and reaches infinity when ρ
and σ are orthogonal. Furthermore, Exercise 8.7.2 guides you to demonstrate that when ρ
and σ are diagonal in the same basis, the quantum hypothesis testing divergence, denoted
ε ε
as Dmin (ρ∥σ), simplifies to its classical equivalent, Dmin (p∥q). Here, p and q represent
the diagonal elements of ρ and σ, respectively. In the classical context, we observed that for
ε
ε = 0, the divergence Dmin reduces to the min relative entropy. This observation is consistent
in the quantum scenario as well (refer to Exercise 8.7.2). Such parallels justify the use of
the ‘min’ subscript in naming the quantum hypothesis testing divergence. Next, we aim to
establish that this function indeed qualifies as an (unnormalized) divergence.
Exercise 8.7.2. Consider the definition above of the quantum hypothesis testing.
ε
1. Show that if ρ, σ ∈ D(A) are diagonal in the same basis of A then Dmin (ρ∥σ) reduces
ε
to its classical counterpart Dmin (p∥q), where p and q are the diagonals of ρ and σ,
respectively.
2. Show that for ε = 0, the quantum hypothesis testing divergence simplifies to the quan-
tum min relative entropy. That is, show that for all ρ, σ ∈ D(A)
ε=0
Dmin (ρ∥σ) = Dmin (ρ∥σ) = − log Tr [σΠρ ] . (8.189)
Exercise 8.7.3. Show that the constraint Tr [ρΛ] ⩾ 1 − ε in (8.185) can be replaced with
ε
Tr [ρΛ] = 1 − ε (i.e. both constraints leads to the same value of Dmin (ρ∥σ)).
Exercise 8.7.4.
1. Show that for all ρ, σ ∈ D(A) we have
ε
Dmin (ρ∥σ) ⩾ − log(1 − ε) , (8.190)
with equality if ρ = σ.
ε
2. Show that Dmin (ρ∥σ) is non-decreasing in ε.
Exercise 8.7.5. Show that the quantum hypothesis testing divergence equals its minimal
extension from classical states. That is, show that for all ρ, σ ∈ D(A)
ε
ρA σ A = ε
E A→X (ρA ) E A→X (σ A ) ,
Dmin sup Dmin (8.191)
E∈CPTP(A→X)
where the supremum is over all classical systems X and POVM channels E ∈ CPTP(A → X)
that takes ρ and σ to diagonal density matrices (i.e. probability vectors).
Note that N is a linear map and its dual map N ∗ : R ⊕ Herm(A) → Herm(A) is given by
(see the exercise below)
N ∗ (t ⊕ ω) = tρ − ω . (8.194)
From (A.57) it then follows that the dual to the above SDP optimization problem is given
by
ε
2−Dmin (ρ∥σ) = max
∗
Tr[(t ⊕ ω)H2 ] (8.195)
H1 −N (t⊕ω)⩾0
t∈R+ , ω∈Pos(A)
The maximization above is over all t ∈ R+ and ω ∈ Pos(A). For a fixed t, we want to
minimize Tr[ω] such that ω ⩾ 0 and ω ⩾ tρ − σ. Under these constraints, it follows trom
Exercise 8.7.7 that the choice ω := (tρ − σ)+ has the minimal trace. We therefore conclude
that
ε
Dmin (ρ∥σ) = − log max f (t) (8.197)
t∈R+
Remark. Given that D̃α represents the minimal quantum extension of the classical α-Rényi
relative entropy, it follows that D̃α (ρ∥σ) ⩽ Dα (ρ∥σ). Consequently, in the upper bound
of (8.201), we can substitute D̃α (ρ∥σ) with Dα (ρ∥σ).
ε
Proof. Let Λ ∈ Eff(A) be such that 2−Dmin (ρ∥σ) = Tr[Λσ] and Tr[Λρ] = 1−ε. Set p := Tr[Λσ],
and define the binary POVM channel E ∈ CPTP(A → X) via
1
log (1 − ε)α p1−α + εα (1 − p)1−α
By definition→ =
α−1
1
log (1 − ε)α p1−α
Removing εα (1 − p)1−α −−−−→ ⩾ (8.204)
α−1
α
= log(1 − ε) − log p
α−1
α ε
By definition of p→ = log(1 − ε) + Dmin (ρ∥σ) .
α−1
This concludes the proof of (8.201).
ε
To prove (8.202), let α ∈ (0, 1). We will use the expression for Dmin (ρ∥σ) as given
in (8.197) and (8.198). To bound the expression Tr(tρ − σ)+ in equation (8.198), we employ
the quantum weighted geometric-mean inequality given by (B.68). This inequality asserts
that for any pair of matrices M, N ∈ Pos(A) and any value of α within the range [0,1]:
1 h i
Tr M + N − M − N ⩽ Tr M α N 1−α .
(8.205)
2
Since the term |M − N | can be expressed as |M − N | = 2(M − N )+ − (M − N ), the above
inequality is equivalent to
It is straightforward to check that for fixed α, ρ, σ, ε, the function t 7→ −tε + tα 2(α−1)Dα (ρ∥σ)
obtains its maximal value at 1
α 1−α
t= 2−Dα (ρ∥σ) . (8.209)
ε
Substituting this value into the optimization in (8.208) gives
α
α 1−α
ε (ρ∥σ)
−Dmin
2 ⩽ (1 − α) 2−Dα (ρ∥σ) . (8.210)
ε
By taking − log on both sides we get (8.202). This concludes the proof.
where D(ρ∥σ) := Tr[ρ log ρ] − Tr[ρ log σ] is known as the Umegaki relative entropy.
Remark. The quantum Stein’s lemma indicates that the optimal type II error approximately
follows the behavior of ≈ 2−nD(ρ∥σ) with respect to the number of copies, n, of ρ and σ. Specif-
ically, the lemma offers an operational interpretation of the Umegaki divergence, D(ρ∥σ),
defining it as the maximal rate at which the type II error diminishes to zero in an exponential
manner with increasing n. Additionally, it’s worth noting that the theorem above implies
that the limit on the right-hand side of (8.211) exists and is independent on ε.
Proof. The proof follows from the bounds in Theorem 8.7.2. Specifically, from (8.201) we
get for any ε ∈ (0, 1) and any α > 1
1 ε ⊗n ⊗n
1 ⊗n ⊗n
α 1
lim sup Dmin ρ σ ⩽ lim sup D̃α ρ σ + log
n→∞ n n→∞ n α−1 1−ε (8.212)
= D̃α (ρ∥σ) ,
where in the last equality we used the additivity (under tensor products) of D̃α . Since the
equation above holds for all α > 1 we conclude that
1 ε
ρ⊗n σ ⊗n ⩽ lim+ D̃α (ρ∥σ)
lim sup Dmin
n→∞ n α→1 (8.213)
= D(ρ∥σ) ,
where the equality above follows from continuity in α of the function α 7→ D̃α (ρ∥σ).
For the opposite inequality, we use the bound (8.202) to get for all α ∈ (0, 1)
1 ε ⊗n ⊗n
1 ⊗n ⊗n
α h(α)
lim inf Dmin ρ σ ⩾ lim inf Dα ρ σ + + log ε
n→∞ n n→∞ n 1−α α (8.214)
= Dα (ρ∥σ) ,
where we used the additivity of Dα . Since Dα is continuous in α, and since the equation
above holds for all α ∈ (0, 1), it must also hold for α = 1; that is,
1 ε
Dmin ρ⊗n σ ⊗n ⩾ D(ρ∥σ) .
lim inf (8.215)
n→∞ n
Combining this with the inequality (8.213), we conclude that the limit
1 ε
Dmin ρ⊗n σ ⊗n
lim (8.216)
n→∞ n
Exercise 8.7.8. [The Umegaki Relative Entropy] Let D be the Umegaki relative entropy.
ε
1. Show that D satisfies the DPI. Hint: Use (8.211) and the fact that Dmin satisfies the
DPI.
2. Show by direct calculation that for any two cq-states in D(AX), ρAX := px ρA
P
x∈[n] x ⊗
|x⟩⟨x|X and σ AX := x∈[n] qx σxA ⊗ |x⟩⟨x|X we have
P
X
D ρAX σ AX = px D ρ A A
x σx + D(p∥q) , (8.217)
x∈[n]
where the components of the probability vectors p and q are {px }x∈[n] and {qx }x∈[n] ,
respectively.
3. Use the above two properties to show that for any two ensembles of states {px , ρx }x∈[n]
and {qx , σx }x∈[n] we have
X X X
D px ρ x qx σx ⩽ px D (ρx ∥σx ) + D(p∥q) . (8.218)
x∈[n] x∈[n] x∈[n]
ε Prerror (ρ, σ, t) − tε
2−Dmin (ρ∥σ) = sup . (8.222)
t∈(0,1) 1−t
Hint: Recall that Tr(ρ − rσ)+ = supΛ∈Eff(A) Tr[Π(ρ − rσ)] and split the supremum over
all ε ∈ (0, 1) and all Λ ∈ Eff(A) such that Tr[Λρ] = 1 − ε.
As previously discussed, with increasing n copies of ρ and σ, the states ρ⊗n and σ ⊗n
become more distinguishable. We will demonstrate in the upcoming theorem that the error
probability, Prerror (ρ⊗n , σ ⊗n , t), diminishes at an exponential rate as n approaches infinity.
This rate is characterized by what is known as the quantum Chernoff bound. The classical
counterpart of the subsequent theorem, along with its proof, can be found in Section 8.6 (see
Theorem 8.6.2). We will use the notation ξQ (ρ, σ) to denote the quantum extension of the
classical Chernoff bound ξ(p, q) as given in (8.158). In the quantum domain it is defined as
Proof. In the proof of Theorem 8.7.2 we used (8.207) to bound Tr(tρ − σ)+ . Dividing both
sides of (8.207) by t and denoting r := 1/t we get that (8.207) is equivalent to
Hence,
n
Prerror (ρ⊗n , σ ⊗n , t) ⩽ tα0 t11−α Tr ρα σ 1−α
(8.228)
so that
1
lim inf − log Prerror (ρ⊗n , σ ⊗n , t) ⩾ − log Tr ρα σ 1−α .
(8.229)
n→∞ n
Since the above equation holds for all 0 ⩽ α ⩽ 1 we have
1
lim − log Prerror (ρ⊗n , σ ⊗n , t) ⩾ max − log Tr ρα σ 1−α
n→∞ n α∈[0,1]
(8.230)
= − log min Tr ρα σ 1−α .
α∈[0,1]
be the spectral decomposition of ρ and σ (here m := |A|), where ψx , ϕy ∈ Pure(A) for all
x, y ∈ [m]. Then, for any projection Π ∈ Pos(A) (i.e. Π2 = Π) we have
X X
Tr [Πρ] = px ⟨ψx |Π|ψx ⟩ = px ⟨ψx |Π2 |ψx ⟩
x∈[m] x∈[m]
X X
−−−−→ = px ψ x Π |ϕy ⟩⟨ϕy |Π ψx
X
ϕy = I A
y∈[m]
(8.232)
x∈[m] y∈[m]
X
= px |⟨ψx |Π|ϕy ⟩|2
x,y∈[m]
Moreover, since for any two complex numbers c1 and c2 satisfies |c1 |2 + |c2 |2 ⩾ 21 |c1 + c2 |2 ,
we get that
1 X 2
Prerror (Π, ρ, σ, t0 ) ⩾ min{t0 px , t1 qy } ⟨ψx |I − Π|ϕy ⟩ + ⟨ψx |Π|ϕy ⟩
2
x,y∈[m]
1 X 2
= min{t0 px , t1 qy } ⟨ψx |ϕy ⟩
2
x,y∈[m] (8.235)
1 X
= min{t0 pxy , t1 qxy }
2
x,y∈[m]
1
(8.154)→ = Prerror (p, q, t) ,
2
where p = (pxy ) ∈ Prob(m2 ) and q = (qxy ) ∈ Prob(m2 ) are probability vectors with
components
pxy := px |⟨ψx |ϕy ⟩|2 and qxy := qy |⟨ψx |ϕy ⟩|2 . (8.236)
Moreover, note that the relation (8.236) respects tensor products. That is, for ρ⊗n and σ ⊗n
the corresponding probability vectors are p⊗n and q⊗n , respectively. Hence,
1 1 1
lim inf − log Prerror (ρ⊗n , σ ⊗n , t) ⩽ lim inf − log Prerror (p⊗n , q⊗n , t)
n→∞ n n→∞
n 2
Theorem 8.6.2→ = max (1 − α)Dα (p∥q)
0⩽α⩽1
(8.237)
(6.103)→ = max (1 − α)Dα (ρ∥σ)
0⩽α⩽1
= ξQ (ρ, σ) .
423
CHAPTER 9
In this chapter, we present a precise definition of a quantum resource theory (QRT) and
explore its general characteristics. As mentioned in the introduction, any set of natural
constraints on a physical system results in a QRT. A prime example is the spatial separation
between two individuals, Alice and Bob, which naturally leads to the LOCC (Local Oper-
ations and Classical Communication) constraint, forming the basis of entanglement theory.
In this theory, every physical system is analyzed in the context of spatial separation. This
implies that any physical system, for instance, system A, is considered a bipartite composite
system, denoted as A = (AA , AB ). Here, AA represents a subsystem on Alice’s side, and
AB is a subsystem on Bob’s side. It’s important to note that even if A is not inherently
a composite system and is solely located on Alice’s side, it can still be regarded in this
framework with AA := A and AB being a trivial subsystem (i.e., |AB | = 1). For simplicity,
in entanglement theory, the notations A for AA and B for AB are often used. However,
in the context of general resource theories, it is crucial to remember that physical systems,
symbolized as A, B, C, etc., are interpreted in relation to the constraints applied to them.
425
426 CHAPTER 9. STATIC QUANTUM RESOURCE THEORIES
and conversely. By adopting this identification, we can interpret all entities in quantum
mechanics — such as states, POVMs, quantum instruments, and others — as specific forms
of quantum channels. This integrative perspective aligns with the methodologies utilized in
resource theories. We will embrace this approach in our discussions throughout the book.
1. Doing nothing is free. For any physical system A, the identity channel
idA ∈ F(A → A).
3. Discarding a system is free. For any system A, the set F(A → 1) ̸= ∅; i.e.
F(A → 1) = CPTP(A → 1) = {Tr}.
Moreover, the set F(A → B) is called the set of free operations from system A to
system B, and the set F(A) := F(1 → A), is identified as the set of free states.
(i.e. E ∈ CPTP(A → B) but E ̸∈ F(A → B)). In particular, the second property above
implies the following rule, known as the “golden” rule of QRTS.
We included in the definition above the property that the trace is a free operation. In all
QRTs studied in literature this is indeed the case although one can consider a QRT in which
“waste” or “trash” is considered a resource. In this book we will not consider such resource
theories, and will always consider the trace as a free operation. This assumption also leads
to the following very useful property of QRTs.
Suppose σ ∈ F(B) is a free state, and define the replacement channel
NσA→B (ρA ) := Tr[ρA ]σ B ∀ ρ ∈ L(A) . (9.1)
Then, if we view σ B as a channel, σ 1→B , from the trivial system 1 to B, the channel NσA→B
can be expressed as a combination of the trace channel and the channel σ 1→B ; specifically,
NσA→B = σ 1→B ◦ Tr , (9.2)
and since both Tr and σ 1→B are free, it follows that NσA→B is free. Note that this means
that we can convert any state ρ ∈ D(A) to any free state σ ∈ F(B) by free operations (as
intuitively expected).
QRTs emerge from a specific set of limitations or constraints applied to the entire spec-
trum of quantum operations. The mapping F exemplifies this, as the set F(A → B) generally
forms a strict subset of all channels in CPTP(A → B). While every QRT is linked to a unique
set of restrictions, these restrictions frequently share common characteristics that contribute
to extra structural complexity. These characteristics are so prevalent that some researchers
have integrated them into the foundational definition of a QRT.
According to the fundamental principle of QRTs, if ρ ∈ F(A) is a free state, then the state
E A→BX (ρA ) must also be a free state in F(BX). This state, expressed as
X
E A→BX (ρA ) = px σxB ⊗ |x⟩⟨x|X (9.4)
x∈[m]
is a classical-quantum (cq) state, where for each x ∈ [m], px := Tr[Ex (ρ)] and σxB :=
1 A→B A
E
px x
(ρ ). If there existed an x ∈ [m] for which px ̸= 0 and σx ̸∈ F(B), then the
quantum instrument E A→BX would create a resource σxB from the free state ρ ∈ F(A) with
a non-zero probability. To prevent such scenarios in QRTs, in this book we always assumes
the axiom of free instruments.
Note that the axiom of free instruments (AFI) reduces to the golden rule of QRTs when
|X| = 1, thus serving as an extension of this rule to encompass quantum measurements.
Additionally, when |X| > 1, the golden rule of QRTs only ensures that E A→BX (ρA ) is a
A→B (ρA )
free cq-state. Without further assumptions like the AFI, we cannot infer that each ETr[E
x
x (ρ)]
is a free state. Since physical QRTs comply with the AFI (as do all QRTs studied in the
literature), the rest of this book will proceed under the assumption that QRTs adhere to
the AFI, without explicitly stating it each time. We will use the notation F⩽ (A → B) ⊂
CP⩽ (A → B) for the set of trace non-increasing CP maps that are part of free quantum
instruments. Specifically, E ∈ F⩽ (A → B) if there exists a classical system X with dimension
m ∈ N and mapsP E1 , . . . , Em ∈ CP⩽ (A → B), with the properties that (1) Ex = E for some
x ∈ [m], and (2) x∈[m] Ex ⊗ |x⟩⟨x| ∈ F(A → BX).
4. Completely free operations: For any three systems A, B, and C, and a channel
E ∈ F(A → B), it holds that E A→B ⊗ idC ∈ F(AC → BC).
The condition 5 above is very intuitive as it just state that the relabeling of An =
Figure 9.1: Illustration of the fifth condition. Relabeling maintains the “freeness” of N .
Conditions 1-5 above have several additional implications. First, note that since the trace
is a free channel (property 3), the partial trace is also free (since id ⊗ Tr is free). Second,
note that if N ∈ F(A → B) and M ∈ F(A′ → B ′ ) then
In particular, this means that if two states are free then their tensor product is also free.
Finally, appending a free state is also a free channel. Specifically, let σ ∈ F(B) and define
This channel can be viewed as a tensor product of two free channels, namely N A→AB =
idA ⊗ σ 1→B , and therefore is free.
6. For any physical system A the set of free states F(A) is closed.
7. For any physical system A the set of free states F(A) is convex.
Property 6 states that if for a sequence of states {ρn }n∈N ⊂ F(A) the limit ρ := limn→∞ ρn
exists, then that limit is in F(A) as well. Equivalently, if {ρn }n∈N ⊂ F(A) and there exists
ρ ∈ D(A) such that limn→∞ T (ρ, ρn ) = 0, where T is the trace distance (or any other distance
measure) then ρ ∈ F(A). Note that if this property does not hold then it would mean that
there exists a sequence of free states that approaching a resource ρ. However, if T (ρ, ρn ) is
extremely small, say 10−100 , for all practical purposes it is not possible to distinguish between
ρ and ρn . Therefore, the assumption that F(A) is closed is very practical and consequently
satisfied by all the QRTs studied in literature so far.
Property 7 is not satisfied by all QRTs, e.g. non-Gaussianity in quantum optics, although
many resource theories like entanglement do satisfy it and it is quite common. Besides of
being a convenient mathematical property, we can develop some intuition for this property.
Consider a QRT in which an agent, say Alice, has access to an unbiased coin. She can flip the
unbiased coin and prepares the state ρ ∈ F(A) if she get a head and otherwise prepares the
state σ ∈ F(A). Since ρ and σ are free, she can prepare them at no cost. Suppose now that
Alice forgets which state she prepared. We will assume here that this “forgetting” is itself
a free operation. Then, her state of the system is now 21 ρ + 21 σ. Therefore, we can assume
that the convex combination 12 ρ + 12 σ := τ is also free since Alice prepared it at no cost.
Moreover, if τ is free, Alice can repeat the same process with ρ and τ to get that 34 ρ + 14 σ
is also free. Repeating this process, Alice can prepare any combination 2kn ρ + (1 − 2kn )σ,
with n ∈ N and k ∈ [2n ]. Therefore, such convex combinations must also be free. Finally,
since the set { 2kn } is dense in [0, 1], Property 6 implies that for any t ∈ [0, 1] the convex
combination tρ + (1 − t)σ is free.
2. Suppose that Alice has access to a biased coin, with probability 0 < p < 1 to get a
head (and 1 − p to get a tail). Show that Alice can use the coin to prepare any convex
combination of free states.
The two properties of closedness and convexity can also be applied to quantum channels.
That is, we can require that for any two physical systems A and B the set F(A → B) is
both closed and convex. However, we postpone the discussion of them to the second volume
of this book where we study dynamical QRTs.
states. It’s interesting to note that these free states, represented as F(A) = F(1 → A),
can themselves be considered a unique kind of free operations, specifically as preparation
channels. This approach affords substantial flexibility in choosing a consistent set of free
operations for any given set of free states, even within QRTs that admits a tensor-product
structure.
Consider, for example, the phenomenon of quantum coherence. Quantum coherence epit-
omizes a key aspect of quantum mechanics, illustrating the principle that particles, such as
electrons or photons, can simultaneously exist in multiple states. This phenomenon stems
from the principle of superposition, enabling particles to exist in a mixture of states, or
in coherent superposition, thus allowing them to interfere with one another in predictable
manners. However, coherence is a fragile state, easily disturbed by external influences in a
process known as decoherence, where quantum systems relinquish their superposition and
adopt more classical behaviors. In recent developments, the capability to control and pre-
serve quantum coherence has become crucial for the advancement of cutting-edge quantum
technologies, including quantum computing and quantum cryptography, empowering the
execution of tasks that surpass the capabilities of classical physics.
Considering the significance of this pivotal phenomenon, extensive efforts have been
dedicated to characterizing it within the realm of quantum resource theories. How is this
achieved? We start by identifying the set of free states in D(A). This is accomplished as fol-
lows: for any system A, a classical basis of the system is identified, denoted as {|x⟩}x∈[m] ⊂ A.
Subsequently, the set of free states, or incoherent states, is defined as all diagonal density
matrices in D(A) with respect to the classical basis. Thus, in the QRT of coherence, the set
of free states is clearly defined and is specified for any system A with dimension m := |A| as:
nX o
F(A) = px |x⟩⟨x| : p ∈ Prob(m) . (9.7)
x∈[m]
Consequently, the primary challenge in the resource theory of quantum coherence lies in
identifying a set of free operations that aligns consistently with the above set of free states.
Exercise 9.2.1. Let F(A) be the set of free states defined in (9.7), and let ∆ ∈ CPTP(A →
A) be the completely dephasing map defined with respect to the classical basis. Show that for
all ρ ∈ D(A) we have that ρ ∈ F(A) if and only if ∆(ρ) = ρ.
Physical factors often play a pivotal role in determining the choice of free operations
within the realm of quantum mechanics. Nonetheless, even when these free operations are
well-defined and grounded in physical principles, it is advantageous to investigate other
classes of free operations that correspond to the same set of free states. This exploration
can provide valuable insights and potentially reveal alternative mathematical or theoretical
frameworks, as alternate classes might offer simpler or more elegant solutions that are not
immediately apparent in operations primarily motivated by physical factors. A pertinent
example is found in entanglement theory, where characterizing the class of LOCC is notably
complex. To circumvent these complexities, considerable research has focused on entangle-
ment theory within broader and more mathematically accessible sets of operations, such as
RNG Operations
Definition 9.2.1. Let F(A) ⊂ D(A) be the set of free states on any physical system
A. The set of resource non-generating operations (RNG) between two physical
systems A and B is defined as:
n o
RNG(A → B) := N ∈ CPTP(A → B) : N (ρ) ∈ F(B), ∀ ρ ∈ F(A) (9.8)
RNG operations form the maximal set of free operations. That is, every other QRT F
In the QRT of coherence this set of RNG operations is denoted by MIO(A → B) where the
acronym MIO stands for maximally incoherent operations. Denoting the ∆A ∈ CPTP(A →
A) and ∆B ∈ CPTP(B → B) the completely dephasing channels with respect to the clas-
sical systems A and B, respectively, we get from the definition above in conjunction with
Exercise ?? that
n o
MIO(A → B) = N ∈ CPTP(A → B) : ∆B ◦ N A→B ◦ ∆A = N A→B ◦ ∆A . (9.10)
Exercise 9.2.3. Consider the QRT of coherence where F(A) and F(B)
are sets of diagonal
density matrices with respect to some fixed bases |x⟩A x∈[m] and |y⟩B y∈[n] of A and B,
respectively. Show that a quantum channel N ∈ MIO(A → B) if and only if there exists
conditional probability distribution {py|x } such that for all x ∈ [m] the state
X
N A→B |x⟩⟨x|A = py|x |y⟩⟨y|B . (9.11)
y∈[n]
nature. This leads us to a new definition that not only adheres to the golden rule of Quantum
Resource Theories (QRTs) but also integrates the tensor product structure.
The definition above generalizes the concepts of k-positivity and complete-positivity (see
Definition 3.4.1) to QRTs. Specifically, if we take the free set F(A) = D(A) to be the set
of all density matrices acting on A, then maps that are k-RNG and completely-RNG are
equivalent to maps that are k-positive and completely positive, respectively. Moreover, the
set of k-RNG maps with k = 1 is simply the set of RNG maps.
As an example, let F(AB) := SEP(AB) ⊂ D(AB) be the set of all separable states, and
let N ∈ CRNG(AB → A′ B ′ ) be a (bipartite) quantum channel that takes separable states
to separable states even when acting on subsystems. Specifically, for any composite reference
′ ′
system R = RA RB the channel idR ⊗ N AB→A B is non-entangling (i.e. RNG). Recall the
discussion at the beginning on this chapter that every system R in entanglement theory is
viewed as a bipartite system RA RB with RA on Alice’s side and RB on Bob’s side. Taking
RA ∼= A and RB ∼ = B we get that the state ΦRA A ⊗ ΦRB B is a product state between Alice
composite system RA A and Bob’s composite system RB B (see Fig. 9.2a). Therefore, since
′ ′
product states are in particular separable, and since idR ⊗ N AB→A B is non-entangling we
get that
′ ′
N AB→A B ΦRA A ⊗ ΦRB B ∈ SEP(RA A′ RB B ′ )
(9.12)
Figure 9.2: The state ΦRA A ⊗ ΦRB B in the lens of two bipartite cuts.
A key observation in this example is that the state ΦRA A ⊗ ΦRB B can be viewed as a
maximally entangled state between system R = RA RB ∼ = AB and system AB (see Fig. 9.2b);
i.e.
Φ(RA RB )(AB) = ΦRA A ⊗ ΦRB B . (9.13)
Therefore, the state in (9.12) is proportional to the Choi matrix of N . Since RA ∼
= A and
RB ∼ = B we conclude that the Choi matrix
′ ′ ′ ′
JNABA B := N ÃB̃→A B ΩAÃ ⊗ ΩB B̃ (9.14)
is an unnormalized separable state between Alice’s system AA′ and Bob’s system BB ′ . From
Exercise 3.2.4 it follows that the Choi matrix can be expressed as
′ ′ ′ ′
X
JNABA B = ψjAA ⊗ ϕBB
j (9.15)
j∈[k]
′ ′
where the sets {ψjAA }j∈[k] and {ϕBB
j }j∈[k] are sets of (possibly unnormalized) pure states
(i.e. rank one operators) in Pos(AA ) and Pos(BB ′ ), respectively. For each j ∈ [k], we can
′
write
′ ′
|ψjAA ⟩ = I A ⊗ Mj |ΩAÃ ⟩ and |ϕBB B
B B̃
j ⟩ = I ⊗ Nj |Ω ⟩ , (9.16)
for some complex matrices Mj ∈ L(A, A′ ) and Nj ∈ L(B, B ′ ). Using this notation in (9.15)
and comparing it with (9.14) we conclude that the channel N has the following operator
sum representation
′ ′ X
N AB→A B ρAB = (Mj ⊗ Nj )ρAB (Mj ⊗ Nj )∗ ∀ ρ ∈ L(AB) . (9.17)
j∈[k]
Exercise 9.2.4. Let RNG(AB → A′ B ′ ) be the set of non-entangling operations (i.e. RNG
with respect to the set F(AB) = SEP(AB)).
2. Show that
We saw above that for F(AB) = SEP(AB) we have CRNG ⊂ RNG where the inclusion
is strict since the global swap operator is non-entangling but also not a separable channel.
Moreover, we will see in Chapter 12 that also the inclusion RNG ⊇ 2-RNG in the exercise
above can be strict. However, in other resources theories some of these inclusions can be
equalities. For example, in the QRT of coherence, in which F(A) ⊂ D(A) consists of diagonal
states with respect to a fixed basis {|x⟩A }x∈[m] , we have that RNG = CRNG. To see this,
let N ∈ MIO(A → B) (recall that in the QRT of coherence we denote all RNG operations
from system A to B by MIO(A → B)). According to (9.10), ∆B ◦ N ◦ ∆A = N ◦ ∆A . We
need to show that for any system C, we have N ⊗ idC ∈ MIO(AC → BC). Let ∆C be the
completely dephasing channel with respect to the classical basis of system C. In the exercise
below you show that ∆AC = ∆A ⊗ ∆C . Therefore,
Exercise 9.2.5. Let {|x⟩A }x∈[m] and {|y⟩B }y∈[n] , be respectively, two orthonormal bases of
A and B. Further, let ∆AB be the completely dephasing channel with respect to the basis
{|xy⟩AB }x,y . Show that ∆AB = ∆A ⊗ ∆B , where ∆A and ∆B are the completely dephasing
channels with respect to the bases {|x⟩A }x∈[m] and {|y⟩B }y∈[n] , respectively.
Now, from Exercise 9.2.3 we know that a channel V ∈ CPTP(A → A) is MIO if and only if
V(|x⟩⟨x|A ) is a diagonal state in D(A) for all x ∈ [m] (here m := |A|). Therefore, if V is a
where π is some permutation on m elements. This relation implies that the unitary matrix V
satisfies V |x⟩A = eiθx |π(x)⟩A . In other words, up to phases, all the free unitary operations in
the QRT of coherence are permutations. Given that permutation matrices form an extremely
small set of operations relative to the set of all unitary channels, it is not too hard to show (see
the relevant references at the end of this chapter) that there exists channels in MIO(A → A)
that do not have the form (9.20) with free (i.e. incoherent) U AE and free (i.e. diagonal)
γ E . In other words, it costs coherence (i.e. resources) to implement some free channels in
MIO(A → A).
The above problem does not occur in the QRT of entanglement in which the set of
free operations is LOCC. This is always the case whenever a QRT is defined in terms of a
physical restriction (e.g. distant labs in entanglement theory) that is imposed on the set of
free operations. On the other hand, any QRT such as quantum coherence, in which first the
free states are identified, and only then consistent free operations are proposed, may face
such an implementation problem. Aside from the QRT of coherence, all the free operations
of the QRTs studied in this book will have a physically implementable set of free operations.
Definition 9.2.3. Let F be a QRT, and A and B two physical systems. We say that
F(A → B) is physically implementable if any channel in F(A → B) can be generated
by a sequence of unitary channels (possibly on composite systems), projective
measurements, appending of free states, and processing of the classical outcomes,
where each element in the sequence is itself a free action (see Fig. 9.3 for an
illustration).
Remark. In the definition above we added classical processing as a possible free physically
implementable operation. This include for example classical communication between sub-
systems, if these were allowed in the QRT (e.g. entanglement theory). Note also that if
the free operations in a QRT are not physically implementable (according to the definition
above), then the QRT would identify certain maps as being free with no way to physically
implement these processes using free operations.
For a given designation of free states F(A), it is possible to construct a unique physically
implementable QRT that admits a tensor-product structure. Simply define the free opera-
tions to be any composition of (i) appending arbitrary free states, (ii) CRNG unitaries and
projective measurements, (iii) discarding subsystems, and (iv) all free classical-processing
maps. For a given two subsystems A and B we denote this set of physically implementable
operations (PIO) as PIO(A → B). By design, PIO(A → B) is physically implementable and
has tensor-product structure. Most QRTs that were studied in literature have the property
that all the isometries in RNG are completely free. In such QRTs, PIO is the minimal set
of free operations that is consistent with the set of free states F(A). The class PIO(A → B)
Figure 9.3: Example of a physically implementable free operation on a composite system AB.
Exercise 9.2.6. Consider the resource theory of quantum entanglement. Show that if E ∈
SEP(AB → A′ B ′ ) then
′ ′
E ∗ σA B
∗ A ′B′ ∈ SEP(AB) ∀ σ ∈ SEP(A′ B ′ ) . (9.24)
Tr [E (σ )]
The need for normalization in (9.23) can be eliminated by broadening the definition of
RNG operations to include cone-preserving operations. Specifically, for each system A, let
us define K(A) ⊆ Pos(A) as the cone represented by:
K(A) := tσ : σ ∈ F(A) , t ∈ R+ . (9.25)
We then classify a map E ∈ CP(A → B) as a K-preserving operation if, for every η ∈ K(A),
it holds that E(η) ∈ K(B). By extending the scope of RNG operations to maps that are
not necessarily trace-preserving, the condition in (9.23) essentially signifies that E ∗ is a K-
preserving operation.
Definition 9.2.4. Using the same notations as above, we say that a quantum
channel E ∈ CPTP(A → B) is dually resource non-generating if both E and its dual
map E ∗ are K-preserving operations.
Tr σ B E ρA = Tr σ B E ◦ ∆A ρA .
(9.28)
Since σ ∈ F(B) if and only if σ = ∆B (τ ) for some τ ∈ D(B), we can rewrite the left-hand
side of the equation as:
Tr σ B E ρA = Tr τ B ∆B ◦ E ρA ,
(9.29)
Tr σ B E ◦ ∆A ρA = Tr τ B ∆B ◦ E ◦ ∆A ρA
B A (9.30)
ρA .
E ∈ MIO(A → B) −−−−→ = Tr τ E ◦ ∆
Combining these equations, it is concluded that for all ρ ∈ D(A) and τ ∈ D(B), the following
holds:
Tr τ B E ◦ ∆A ρA = Tr τ B ∆B ◦ E ρA .
(9.31)
Hence, the condition becomes:
This means that if E ∈ dRNG(A → B), it must satisfy the above condition. The exercise
below demonstrates that the converse is also true, leading to the conclusion that a quantum
channel E ∈ CPTP(A → B) is in dRNG(A → B) if and only if it satisfies the condition
in Equation (9.32). Notably, for the case where A = B, this condition simplifies to the
commutation relation [E, ∆] = 0.
We note that in the QRT of coherence, quantum channels that satisfy Equation (9.32)
are identified as Dephasing-covariant Incoherent Operations, abbreviated as DIO. Therefore,
we have demonstrated that within the resource theory of coherence, the set dRNG(A → B),
is equivalent to the set of DIO channels, denoted as DIO(A → B).
Exercise 9.2.7. Show that DIO(A → B) ⊆ dRNG(A → B); i.e., let E ∈ CPTP(A → B) be
a quantum channel satisfying (9.32), and show that E ∈ dRNG(A → B).
An Affine Set
Definition 9.3.1. Let A be a physical system, and F(A) ⊆ D(A) be a set of density
matrices. The set F(A) is called an affine set, if every affine combination of n free
states X
σ := tx ρx ρ1 , . . . , ρn ∈ F(A) , t1 , . . . , tn ∈ R , (9.33)
x∈[n]
As an example of an affine set, let F be the QRT of coherence in which F(A) is the
set of all diagonal states in D(A) with respect to a fixed basis of A. The set of diagonal
states
P F(A) is affine since for any affine combination of diagonal states is diagonal;
P that is,
if x∈[n] tx ρx ⩾ 0, where each state ρx ∈ F(A) (i.e. each ρx is diagonal) and x∈[n] tx = 1,
P
then also x∈[n] tx ρx is a diagonal state. Therefore, the set of free states in the QRT of
coherence is affine.
Exercise 9.3.1. Let F(A) be the set of all density matrices in D(A) with real components
with respect to a fixed basis {|x⟩}x∈[m] of A. That is, ρ ∈ F(A) ⊂ D(A) if and only if the
number ⟨x|ρ|x′ ⟩ is real for all x, x′ ∈ [m]. Show that F(A) is affine.
Exercise 9.3.2. Let F(A) ⊆ D(A) be a set of density matrices, and let K(A) := spanR {F(A)}
be the subspace of Herm(A) consisting of all linear combinations of the elements of F(A).
Show that F(A) is affine if and only if
Not all convex sets are affine. For example, the set of separable states in entanglement
theory is not affine. To see why, recall that product states of the form ψ A ⊗ϕB are free states.
Let m := |A|, n := |B|, {ψxA }x∈[m2 ] be a rank one basis of Herm(A), and {ϕB y }y∈[n2 ] be a
2 2 A B
rank one basis of Herm(B). Then, the m n states {ψx ⊗ ϕy }x,y form a basis of Herm(AB).
This means that any density matrix ρ ∈ D(AB) can be expressed as an affine combination
of product states; i.e. X X
ρAB = txy ψxA ⊗ ϕB
y , (9.35)
x∈[m2 ] y∈[n2 ]
where {txy } is a set of real numbers. Hence, the set of separable states is not affine since
even entangled states can be expressed as affine combination of product states. In this sense,
the set of separable states is maximally non-affine. More generally, we say that a set F(A)
is maximally non-affine if
Herm(A) = spanR {F(A)} . (9.36)
P
Exercise 9.3.3. Show that the set {txy }x,y in (9.35) must satisfy x,y txy = 1.
such that N ∈ CPTP(A → B), the channel N is free (i.e. N ∈ F(A → B)).
We will see in the next two chapters that ARTs have several properties that make them
much easier to study. Particularly, many problems in ARTs can be solved with semidefinite
programming, unlike certain convex QRTs, such as entanglement theory, in which even the
determination of whether a state is free or not is very hard (more precisely, belongs to a
complexity class known as NP-hard). In the following theorem we show that if the set of
free operations is RNG or CRNG then the QRT is affine if and only if the set of free states
is affine.
Theorem 9.3.1. Let F(A) ⊆ D(A) be a set of free states on any physical system A,
and for any two physical systems A and B, let RNG(A → B) and CRNG(A → B) be
as defined in Definitions 9.2.1 and 9.2.2 (with respect to the sets F(A) and F(B)).
Then, the following statements are equivalent:
We need to show that N ∈ RNG(A → B). Let σ ∈ F(A) be a free state. Since N is a
quantum channel N A→B (σ A ) ∈ D(B). On the other hand, from the definition of N , we can
write this relation as
X
N A→B (σ A ) = tx ωxB ∈ D(B) where ωxB := ExA→B σ A .
(9.39)
x∈[n]
Since σ ∈ F(A) and Ex ∈ RNG(A → B) it follows that each ωx ∈ F(B). Finally, using the
assumption that F(B) is affine we get that N A→B (σ A ) ∈ F(B). Since σ was arbitrary state
in F(A) we conclude that N ∈ RNG(A → B).
The implication 2 ⇒ 3: Let E1 , . . . , En ∈ CRNG(A → B) be a set of n CRNG channels,
and let t1 , . . . , tn ∈ R be such that (9.38) holds. We need to show that N ∈ CRNG(A → B)
or equivalently, that for any reference system R, idR ⊗ N ∈ RNG(RA → RB). Since each
idR ⊗ Ex ∈ RNG(RA → RB) and since we assume that RNG(RA → RB) is affine it follows
that X
idR ⊗ N = tx idR ⊗ ExA→B ∈ RNG(RA → RB) . (9.40)
x∈[n]
state, essentially ”destroying” the resource. At the same time, it functions as the identity
channel when applied to free states. This dual capability highlights its unique role in these
particular QRTs. We point out that such maps do not necessarily have to be channels, and
they might not even be linear. However, in the context of this book, where our exploration
is confined to theories within the realm of quantum mechanics, we will consistently assume
that these resource-destroying maps are at least linear.
Remark. It is important to note that the definition of a RDM is relative to the set of free
states F(A). Consequently, different QRTs with an identical set of free states F(A) may
share the same RDM. Conversely, there could exist multiple RDMs corresponding to the
same set F(A). Furthermore, it is evident that ∆ ∈ Pos(A → A). This is because for any
positive operator Λ ∈ Pos(A), the transformed state ∆(Λ/Tr[Λ]) is a free state and therefore
positive semidefinite. However, it is crucial to understand that a RDM is not necessarily
completely positive.
As an example of a RDM, consider the QRT of coherence in which F(A) ⊂ D(A) is the
set of diagonal states with respect to the basis {|x⟩}x∈[m] (here m := |A|). Relative to this
basis, define the completely dephasing channel
X
∆(ρ) := ⟨x|ρ|x⟩|x⟩⟨x| . (9.41)
x∈[n]
It is simple to check that ∆ as defined above is a RDM with respect to the set of diagonal
matrices. In this example ∆ is a quantum channel.
Exercise 9.3.5. Show that ∆ as defined in (9.41) is a RDM. Moreover, show that it is
self-adjoint; i.e. ∆∗ = ∆.
Exercise 9.3.6. Show that any RDM ∆ ∈ L(A → A) is idempotent; i.e. ∆ ◦ ∆ = ∆.
In the following lemma we show that not all QRTs have a RDM.
Lemma 9.3.1. Let F be a QRT and let A be a quantum system. If F(A) is not
affine then the QRT F does not have a RDM.
Proof. Suppose by contradiction that F(A) is not affine, and yet, there exists a RDM ∆ ∈
L(A → A). Then, by definition, there exists t1 , . . . , tn ∈ R and σ1 , . . . , σn ∈ F(A) such that
X
σ := tx σx ∈ D(A) but σ ̸∈ F(A) . (9.42)
x∈[n]
Moreover, since ∆ is a RDM we must have ∆(σ) ∈ F(A). On the other hand,
X
∆(σ) = tx ∆(σx )
x∈[n]
X
= tx σx (9.43)
x∈[n]
= σ ̸∈ F(A) ,
in contradiction with the fact that ∆(σ) ∈ F(A). Therefore, if a QRT admits a RDM then
it’s set of states must be affine.
Self-Adjoint RDM
Definition 9.3.4. Let F(A), K(A), and K(A)⊥ be as defined above. The linear map
∆ : Herm(A) → Herm(A)
Remark. Note that in the context above, the map ∆ is referred to as the self-adjoint RDM.
This designation is justified by demonstrating that for every ART, there exists a unique
self-adjoint RDM (see theorem below). This uniqueness within each ART underscores the
specific role that such a map plays in the theory.
Exercise 9.3.7. Verify that ∆ in the definition above is indeed a RDM which is self-adjoint
(with respect to the Hilbert-Schmidt inner product).
Theorem 9.3.2. Let F(A) ⊆ D(A) be an affine set. Then, there exists a unique
self-adjoint RDM associated with F(A) which is given by ∆ as defined in (9.44).
Proof. The existence of ∆ follows from Definition 9.3.4 and Exercise 9.3.7. To prove unique-
ness, let ∆ : Herm(A) → Herm(A) be a self-adjoint RDM. We would like to show that it
coincide with the RDM given in Definition 9.3.4. Indeed, from the linearity of ∆, and the
fact that ∆ is a RDM, we must have ∆(η) = η for all η ∈ K(A). Moreover, since ∆ is
self-adjoint for every ζ ∈ K(A)⊥ and η ∈ K(A) we have
0 = Tr[ηζ] = Tr[∆(η)ζ] = Tr[η∆(ζ)] . (9.45)
Since the above equation holds for all η ∈ K(A) we conclude that ∆(ζ) ∈ K(A)⊥ . However,
since ∆ is a RDM we must have ∆(ζ) ∈ K(A). Both conditions hold only if ∆(ζ) = 0. To
summarize, we got that for all η ∈ K(A) and all ζ ∈ K(A)⊥ we have
∆(η + ζ) = ∆(η) + ∆(ζ)
∆(ζ) = 0 −−−−→ = ∆(η) (9.46)
=η.
Hence, ∆ coincides with the map defined in (9.44). This completes the proof.
Exercise 9.3.8. Let F(A) ⊆ D(A) be an affine set, and let {ηx }x∈[m] be an orthonormal
basis of spanR {F(A)} (w.r.t. to Hilbert-Schmidt inner product). Show that the linear map
X
∆(ω) := Tr [ηx ω] ηx ∀ω ∈ L(A) , (9.47)
x∈[m]
Resource Witness
Definition 9.4.1. Let F be a QRT and let A be a physical system. An operator
W ∈ Herm(A) is called a resource witness if the following two conditions holds:
Tr [W ρ] < 0 . (9.50)
Since Pos(A) ⊆ F(A)∗ we conclude that the set of all resource-witnesses can be viewed as
the non-positive semidefinite matrices in F(A)∗ . If F(A) is closed and convex then the set of
all witnesses completely determines the set of free states.
Theorem 9.4.1. Let A be a physical system, F(A) ⊆ D(A) be a closed and convex
subset of density matrices, and σ ∈ D(A). Then, σ ∈ F(A) if and only if
Tr [W σ] ⩾ 0 (9.52)
Proof. This theorem follows from the property that any closed and convex set K ⊂ Herm(A)
satisfies K∗∗ = K (see Theorem A.8.1). Hence, in particular, F(A)∗∗ = F(A). The latter
Note that the inequality above holds trivially for all W ⩾ 0. Therefore, it is sufficient to
check it for all 0 ̸⩽ W ∈ F(A)∗ ; i.e. for all resource witnesses.
We point out that for affine QRTs, determining whether a quantum state is free or not is
relatively an easy task. Since any affine set F(A) has a self-adjoint resource destroying map
(see Definition 9.3.4), to determine if a state ρ ∈ D(A) is free or not, all we have to do is to
check if ∆(ρ) = ρ. Such a simplification does not occur in certain important convex QRTs
(e.g. entanglement theory).
One of the most useful aspects of a QRT is that it generates precise and operationally
meaningful ways to quantify a given physical resource. Here we study a variety of resource
measures that can be introduced in any QRT. We start with the definition of a resource
measure, and discuss some additional desirable properties that any resource measure should
satisfy. After that, we study different families of specific resource measures applicable to
any QRT. We put emphasis on the Umegaki relative entropy of a resource, as it turns out
that this resource measure has several operational interpretations, and it plays a major role
throughout this book.
where the union is over all Hilbert spaces A. This union also includes the trivial system
A = C (i.e. |A| = 1) in which case D(A) = {1} consists of only one element, namely the
number 1.
2. Normalization: M(1) = 0.
449
450 CHAPTER 10. QUANTIFICATION OF QUANTUM RESOURCES
The first property is fundamental in resource theories. It asserts that the value of any
resource measure cannot be increased through the use of free operations. This principle,
known as monotonicity, is consistent with the “golden rule” of QRTs that free operations
cannot generate resources. The normalization condition, in conjunction with monotonicity,
leads to the positivity of every resource measure M , which can be expressed as
The positivity follows from the fact that the trace is a free operation in any QRT, and
therefore, from the monotonicity of M under free operations we have
Similarly, the two conditions of normalization and monotonicity implies that for any finite
dimensional system A,
σ ∈ F(A) ⇒ M (σ) = 0 . (10.4)
Indeed, any state σ ∈ F(A) can be viewed as a free channel σ 1→A , where 1 represent the
trivial system corresponding to the Hilbert space C. Hence,
where the inequality follows from the monotonicity property of M. Therefore, since M is
non-negative we must have M(σ) = 0 for all σ ∈ F(A) and all finite dimensional systems A.
We discuss now several properties that are satisfied by some, but not all, resource measures.
Faithfulness
The condition expressed in (10.4) quantitatively defines the notion of “no resource.” Intu-
itively, one might be inclined to consider that the reverse of (10.4) should also hold true.
This concept is referred to as faithfulness. A general resource measure M is deemed faithful
if M (ρ) = 0 necessarily implies that ρ is a free state.
However, it’s important to recognize that for certain tasks, some resource states may not
offer any operational advantage over free states. In such scenarios, these states should be
assigned a zero value by any measure that quantifies their utility for performing the specified
task. For instance, as we will explore later, the measure of distillable entanglement, which
is a significant measure of entanglement, is zero for all bound entangled states. Therefore,
although faithfulness is an intuitively attractive property, it is not an essential requirement
for a resource measure. This perspective allows for a more nuanced understanding of resource
measures and their application in various contexts within quantum resource theories.
Resource Monotones
In certain QRTs, quantum measurements do not belong to the set of free operations. One
such example is quantum thermodynamics as we will see in Chapter 17. However, in many
other QRTs, like entanglement, quantum measurements can be free, and they represent
an important component of the theory. In such QRTs, the set of quantum instruments
F(A → BX), where X is a classical ‘flag’ system, is not empty.P In particular, any such
channel in E ∈ F(A → BX) can be express as E A→BX = E
x∈[m] x
A→B
⊗ |x⟩⟨x|X
, and
consequently, any resource measure M satisfies for all ρ ∈ D(A)
X
M ρA ⩾ M E A→BX (ρA ) = M px σxB ⊗ |x⟩⟨x|X ,
(10.6)
x∈[m]
where
1 A→B A
px := Tr[ExA→B (ρA )] and σxB := E (ρ ) . (10.7)
px x
X X
M px σxB ⊗ |x⟩⟨x|X = px M(σxB ⊗ |x⟩⟨x|X ) . (10.8)
x∈[m] x∈[m]
Almost all resource measures studied in literature are convex linear on QC states. One
reason for that is that the equality above is satisfied by many functions, like the von-Neuman
entropy, Rényi entropies, all the Schatten p-norms, etc. If a QRT admits a tensor product
structure then the partial trace is considered free so that for every x ∈ [m]
Combining this with (10.6) and (10.8) we get that in such QRTs
X
M ρA ⩾ px M(σxB ) .
(10.10)
x∈[m]
Resource Monotone
Definition 10.1.2. A resource measure M is called a resource monotone if it
satisfies:
As we delve further, we will see that the convexity property above is extremely useful
from a mathematical perspective in calculating the resource monotone for a specific state.
Concurrently, a common physical interpretation of convex measures is that the process of
mixing states does not result in an increase in the resource quantity. However, it’s important
to be cautious in drawing parallels between this mathematical notion of convexity and the
physical process of mixing states, as the latter typically involves discarding information. We
have previously discussed this important distinction in Section 9.1.3. Additionally, it’s worth
noting that in QRTs where a freely available classical (flag) basis does not exist, and thus
strong monotonicity is not a relevant concept, convex resource measures will be referred to
as resource monotones.
Subadditivity
Some resource measures have additional properties that are mathematically convenient. One
of such properties is subadditivity. A resource measure M is said to be subadditive if for any
ρ ∈ D(A) and σ ∈ D(B),
M(ρ ⊗ σ) ⩽ M(ρ) + M(σ) . (10.13)
While subadditivity is a natural property to expect from a resource measure, it will not
hold for all measures in a general QRT. In particular, we will see examples of that when we
discuss superactivation.
Additivity
An even stronger property of a resource measure is additivity. That is, M is said to be
additive when equality holds in (10.13) for all states. While most resource measures do not
satisfy this property, there exists a procedure known as regularization that allows for the
general construction of measures that are additive on multiple copies of the same state. We
have already encountered this procedure implicitly in few places of previous chapters. The
regularization of a resource measure M is defined for all ρ ∈ D(A) as
1
Mreg (ρ) := lim M ρ⊗n ,
(10.14)
n→∞ n
provided the limit exists. In the following exercise you show that the limit above exists if M
satisfies a weaker form of subadditivity.
Exercise 10.1.1. Show that the limit in (10.14) exists if M satisfies for all n, m ∈ N and
any density matrix ρ
M ρ⊗(m+n) ⩽ M ρ⊗m + M ρ⊗n .
(10.15)
Hint: Use Exercise 6.4.2.
Asymptotic Continuity
It’s a reasonable expectation for any resource measure with physical significance to exhibit
continuity. This expectation stems from the idea that if one quantum state is a slight per-
turbation of another, their resource
S contents should be very similar. However, it’s important
to note that a function f : A D(A) → R+ satisfying the following condition:
is indeed continuous. Yet, in the context of very large dimensions (i.e., |A| ≫ 1), this type of
continuity may not be practically useful. This is because, for the difference |f (ρ)−f (σ)| to be
small, ρ and σ need to be so closely aligned that they are virtually identical for all practical
purposes. Therefore, a more robust notion of continuity, known as asymptotic continuity,
is often considered. Asymptotic continuity is especially pertinent in the realm of large
dimensions. It limits the dependence on dimension to a logarithmic scale, thereby providing
a more practical and realistic measure of continuity when dealing with high-dimensional
quantum states. This concept is particularly useful in assessing the continuity of resource
measures in quantum systems where the dimensionality plays a significant role.
Asymptotic Continuity
Definition 10.1.3. A resource measure M is said to be asymptotically continuous if
for any ρ, σ ∈ D(A), and ε := 21 ∥ρ − σ∥1 ,
Note that the above notion of continuity is stronger than regular notion of continuity
in the sense that the right-hand side of (10.17) depends on the dimension through a log
Moreover, since f is continuous, max0⩽ε⩽1 f (ε) := c < ∞. Hence, taking σ ∈ F(A) to be free
we get from the above equation that for all n ∈ N
1
M ρ⊗n ⩽ c log(|A|) .
(10.18)
n
Hence, taking the limit n → ∞ we get Mreg (ρ) < ∞.
Exercise 10.1.2. Let f : ∪A D(A) → R+ be a function that satisfy (10.16), and suppose
there exists a state σ ∈ D(A) such that limn→∞ n1 f (σ ⊗n ) < ∞. Show that for all other
ρ ∈ D(A) we must have
1
lim f ρ⊗n = ∞.
(10.19)
n→∞ n
If the set of free states, F(A), contains a full rank state for any system A, then one can
define a slightly weaker version of asymptotic continuity that will be very useful for our
study, since most QRTs have this property.
Observe that any density matrix η ∈ D(A) satisfies ∥η −1 ∥∞ ⩾ |A|. Therefore, the above
notion of asymptotic continuity is a weaker one than the version given in Definition 10.1.3.
On the other hand, if the QRT F has the property that there exists a constant 0 < c < ∞,
independent of the dimensions, such that
for any choice of system A (and c is independent on |A|) then the two notions of asymptotic
continuity become equivalent. Since all the QRTs studied in this book satisfies the above
condition, we will use these two notions of asymptotic continuity interchangeably.
Exercise 10.1.3. Let F be a QRT in which the maximally mixed state is free. Show that
the two notions of asymptotic continuity coincide in this case.
Asymptotic continuity is a property that is extensively utilized in QRTs, especially in
the asymptotic regime. Functions that are asymptotically continuous often incorporate the
von Neumann entropy or the Umegaki relative entropy. This reliance is partly because the
Umegaki relative entropy is the only asymptotically continuous relative entropy, making it
a unique and pivotal tool in QRTs. The proof of this uniqueness theorem, which establishes
the singular nature of the Umegaki relative entropy in terms of asymptotic continuity, is
an important aspect of these theories. However, we will delve into the details of this proof
later in Section 11.4. For now, our focus will shift to introducing key examples of resource
measures. These examples will provide a practical illustration of how the theoretical concepts
discussed above applied in QRTs.
Remark. In this book, we consistently regard the set of free states, F(A), as a closed and
compact set. Consequently, the infimum in (10.22) can be substituted with a minimum. This
means that there exists an optimal state σ ⋆ ∈ F(A) which fulfills the following equation:
D(ρ∥F) = D ρ σ ⋆ .
(10.23)
The state σ ⋆ is called a closest free state (CFS); see Fig. 10.1 for an illustration.
Remarkably, the two conditions that a quantum divergence has to satisfy, namely, DPI
and normalization, are sufficient to guarantee that D(·∥F) is a resource measure. Indeed,
D(·∥F) is non-negative since D is non-negative, and if ρ ∈ F(A) then D(ρ∥F) = 0. To see the
monotonicity of D(·∥F) under free operations observe that for any E ∈ F(A → B) and any
ρ ∈ D(A) we have
D E A→B (ρA ) F := inf D E A→B (ρA ) ω B
ω∈F(B)
A→B A
(ρ ) E A→B (σ A )
restricting ω = E(σ) −−−−→ ⩽ inf D E
σ∈F(A)
(10.24)
DPI→ ⩽ inf D ρA σ A
σ∈F(A)
= D ρA F .
D ρA B A B
restricting σ = σ1 ⊗ σ2 −−−−→ ⩽ inf 1 ⊗ ρ2 σ1 ⊗ σ2
σ1 ∈F(A)
σ2 ∈F(B) (10.27)
ρA σ1A ρB σ2B
subadditivity of D −−−−→ ⩽ inf D 1 + inf D 2
σ1 ∈F(A) σ2 ∈F(B)
= D ρA B
1 F + D ρ2 F .
1
Dreg (ρ∥F) := lim D ρ⊗n F
(10.28)
n→∞ n
exists. Note that Dreg (·∥F) is at least weakly additive in the sense that Dreg ρ⊗m F =
where the Umegaki relative entropy D(ρ∥σ) := Tr[ρ log ρ] − Tr[ρ log σ], and the infimum is
taken over all free states σ ∈ F(A). Under the construction of D-divergence of a resource,
strong monotonicity (10.11) is not guaranteed to be satisfied, but for the relative entropy of
a resource this is indeed the case.
Theorem 10.2.1. Let D be the Umegaki relative entropy and F be a convex QRT.
Then, the relative entropy of a resource, D(·∥F), is a resource monotone.
Proof. Since F is convex, and since the Umegaki relative entropy is jointly convex (see
Exercise 8.7.8), it follows from (10.29) that D(·∥F) is convex. It is therefore left to show
that D(·∥F) satisfies the strong monotonicity property. Consider a free quantumP instrument
X
P
E = x∈[m] Ex ⊗ |x⟩⟨x| ∈ F(A → BX), with each Ex ∈ CP(A → B), and x∈[m] Ex is
trace-preserving. For any resource state ρ ∈ D(A), denote σ BX := x∈[m] px σxB ⊗ |x⟩⟨x|X ,
P
D ρA F ⩾ D E A→BX ρA F = D σ BX F = min D σ BX ω BX ,
(10.31)
ω∈F(BX)
where we used thePmonotonicity under free operations of the relative entropy of a resource.
Denoting ω BX := x∈[m] qx ωxB ⊗ |x⟩⟨x|X we continue
X X
A
px σxB X
qx ωxB X
D ρ F ⩾ min D ⊗ |x⟩⟨x| ⊗ |x⟩⟨x|
ω∈F(BX)
x∈[m] x∈[m]
X
px D σxB ωxB + D(p∥q)
Exercise 8.7.8→ = min
ω∈F(BX)
x∈[m]
X
px D σxB ωxB
D(p∥q) ⩾ 0 −−−−→ ⩾ min
{ωx }⊂F(B)
x∈[m]
X X
px min D σxB ω B = px D σxB F .
=
ω∈F(B)
x∈[m] x∈[m]
So far we learned that the relative entropy of a resource is a faithful, subadditive resource
monotone assuming F is a QRT whose set of free states F(A) is convex. The final property
that we prove is that the Umegaki relative entropy of a resource is also asymptotically
continuous. Later on, we will see that this measure is the only asymptotically continuous
relative entropy of a resource, making the Umegaki relative entropy of a resource very unique.
Moreover, from the following chapters it will follow that asymptotic continuity has several
important applications in QRTs.
The relative entropy of a resource is not always bounded. As a very simple example,
suppose the set of free states F(A) consists of only one pure state |0⟩⟨0|. In this case, we get
that
D(|1⟩⟨1|∥F) = ∞ . (10.32)
Hence, for such a (pathological) example the relative entropy of a resource is not bounded,
and in particular, cannot be asymptotically continuous. However, in most QRTs, the set of
free states F(A) contains a full rank state. If such a state exists, say η, then since η −1 exists
and satisfies η −1 ⩽ ∥η −1 ∥∞ I A it follows that
= log ∥η −1 ∥∞ .
For example, if the set of free states contains the maximally mixed state then D ρA F ⩽
log |A|.
then
ε
D(ρ∥F) − D(σ∥F) ⩽ εκ + (1 + ε)h (10.34)
1+ε
where h(x) = −x log x − (1 − x) log(1 − x) is the binary Shannon entropy.
where for the second equality we used (5.170). The key idea of the proof is to find lower and
upper bounds for D(γ∥F) in terms of D(γ∥F), D(σ∥F), and κ.
Since F(A) is convex, we saw above that the relative entropy of a resource is convex, so
that
D(γ∥F) ⩽ tD(σ∥F) + (1 − t)D(ω+ ∥F)
(10.36)
⩽ tD(σ∥F) + (1 − t)κ .
To get a lower bound, let η be such that D(γ∥F) = D(γ∥η). Then, from the definition of
the Umegaki relative entropy we have
From Corollary 7.5.1 we have that the first term on the right-hand side above satisfies
where in the last line we removed the term (1 − t)D(ω− ∥η) ⩾ 0, and replaced D(ρ∥η) with
D(ρ∥F). Combining the lower bound above with the upper bound in (10.36) we conclude
that
1−t
D(ρ∥F) − D(σ∥F) ⩽ t−1 h (t) + κ
t (10.40)
ε
= (1 + ε)h + εκ .
1+ε
The same upper bound holds for D(σ∥F) − D(ρ∥F) by exchanging between ρ and σ every-
where above. Hence, this completes the proof.
Exercise 10.2.1. Let F be as in Theorem 10.2.2 and suppose further that there exists a free
state η ∈ F(A) that is full rank; i.e. η > 0. Show that there exists a continuous function
f : [0, 1] → R+ , independent on the dimension of A, such that f (0) = 0 and
We now argue that the Umegaki relative entropy is asymptotically continuous. This is a
simple consequence of Theorem 10.2.2.
Proof. Let ρ, ρ′ , σ ∈ D(A) and set ε := 12 ∥ρ − ρ′ ∥1 . Since supp(ρ) ⊆ supp(σ) and supp(ρ′ ) ⊆
supp(σ) we can assume without loss of generality that σ > 0. Let F(A) := {σ} be the
set consisting of σ (i.e., F(A) contains only one density matrix). The set F(A) is trivially
closed and convex. Moreover, note that for this F we get D(ρ∥F) = D(ρ∥σ) and similarly
D(ρ′ ∥F) = D(ρ′ ∥σ). Therefore, applying Theorem 10.2.2 gives
′ ε
|D(ρ∥σ) − D(ρ ∥σ)| ⩽ εκ + (1 + ε)h , (10.43)
1+ε
with
κ = max D(ω∥σ) ⩽ max −Tr[ω log σ] , (10.44)
ω∈D(A) ω∈D(A)
where we droped the term Tr[ω log ω] = −H(ω) as it is negative. Moreover, note that
−Tr[ω log σ] = Tr[ω log(σ −1 )], and since σ −1 ⩽ ∥σ −1 ∥∞ I A we get that
where we used the operator monotonicity of the log function. This completes the proof (see
Exercise 10.2.2).
Exercise 10.2.2. Show that there exists function f : [0, 1] → R+ , independent on the
dimension of A, such that f (0) = 0 and
−1 ε
ε log ∥σ ∥∞ + (1 + ε)h ⩽ f (ε) log ∥σ −1 ∥∞ . (10.46)
1+ε
Exercise 10.2.3. Show that any conditional entropy H (see Definition 7.2.1) is a resource
measure in the QRT in which F(AB → AB ′ ) = CMO(AB → AB ′ ).
cf . (7.132)→ = D ρAB uA ⊗ ρB
(10.48)
(7.117)→ = log |A| − H(A|B)ρ .
κ := max D ω AB F
ω∈F(AB)
Exercise 10.2.4. Use Theorem 10.2.2 and the expressions above to prove the corollary.
Exercise 10.2.5. Using the same notations as in the corollary above, show that there exists
a continuous function f : [0, 1] → R+ satisfying limδ→0+ f (δ) = 0 and
mixing with noise. In other words, they quantify the ability of the resource to maintain its
usefulness in the presence of disturbances. The term “global” refers to the fact that the noise
can be any density matrix ω ∈ D(A), which represents a wide range of possible disturbances.
However, if we limit the density matrix ω to only represent free states, then the resulting
quantity is called the robustness.
By definition, Rg (ρ) ⩽ R(ρ) (see exercise below) and if ρ ∈ F(A) then R(ρ) = Rg (ρ) = 0
since s above can be taken to be zero. Furthermore, from the following exercise, the converse
of this statement is also true; that is, Rg is faithful.
Exercise 10.2.7. Consider the robustness and global robustness as defined above.
1. Show that Rg (ρ) ⩽ R(ρ) for all ρ ∈ D(A).
2. Show that if F is affine then R(ρ) = ∞ for all ρ ∈ D(A) that is not free.
3. Show that Rg (ρ) = 0 if and only if ρ ∈ F(A).
Exercise 10.2.8. Let ρ ∈ D(A) and suppose R(ρ) < ∞.
1. Show that there exist τ, ω ∈ F(A) such that
ρ = 1 + R(ρ) τ − R(ρ)ω . (10.55)
Proof. Since we already saw that R Pg (ρ) = 0 for all ρ ∈ F(A), we prove now the strong
monotonicity property. Let E := x∈[m] Ex ⊗ |x⟩⟨x| ∈ F(A → BX) be a free quantum
instrument (if F(A → BX) is an empty set then strong monotonicity holds trivially). Let
ρ ∈ D(A) be as in (10.56) and for all x ∈ [m] denote by
1 1
σx := Ex (ρ) = 1 + Rg (ρ) Ex (τ ) − Rg (ρ)Ex (ω)
Tr[Ex (ρ)] Tr[Ex (ρ)]
(10.58)
Ex (τ ) Ex (ω)
= (1 + s) −s ,
Tr[Ex (τ )] Tr[Ex (ω)]
for some ωx ∈ D(B) and τx ∈ F(B). Therefore, from the two expressions above for σx , and
the optimality of the pseudo-mixture in (10.59), we get that Rg (σx ) is no greater than s;
that is,
Tr[Ex (ω)]
Rg (σx ) ⩽ Rg (ρ) . (10.60)
Tr[Ex (ρ)]
From the above inequality we conclude that
X X
Tr[Ex (ρ)]Rg (σx ) ⩽ Tr[Ex (ω)]Rg (ρ) = Rg (ρ) , (10.61)
x∈[m] x∈[m]
P
where in the last equality we used the fact that x∈[m] Ex is trace-preserving. This completes
the proof of strong monotonicity.
Next, suppose that F(A) is convex, and let {px , ρx }x∈[m] be an ensemble of quantum
states in D(A). Express each ρx as a pseudo-mixture
ρx = 1 + Rg (ρx ) τx − Rg (ρx )ωx (10.62)
P
for some τx ∈ F(A) and ωx ∈ D(A). Denote by ρ̄ := x∈[m] px ρx . Then, from the equation
above we have
X
ρ̄ = px 1 + Rg (ρx ) τx − Rg (ρx )ωx = (1 + r)τ − rω (10.63)
x∈[m]
where
X 1 X 1 X
r := px Rg (ρx ), τ := px 1 + Rg (ρx ) τx , ω := px Rg (ρx )ωx . (10.64)
1+r r
x∈[m] x∈[m] x∈[m]
Note that τ ∈ F(A) since each τx ∈ F(A) and F(A) is convex. Since the pseudo-mixture
in (10.63) is not necessarily the optimal one we conclude that
X
Rg (ρ̄) ⩽ r = px Rg (ρx ) . (10.65)
x∈[m]
Exercise 10.2.9. Let F be a convex QRT, and let R be the corresponding robustness mea-
sure. Prove that R is a resource monotone. Hint: Follow similar steps as in the proof of
Theorem 10.2.3.
The terminology of Dmax (ρ∥F) is due to the following connection between Dmax (ρ∥F) and
Rg .
Proof. By definition of Dmax and the logarithmic global robustness of a resource, we have
for all ρ ∈ D(A)
n o
Dmax (ρ∥F) = min log t : tσ ⩾ ρ , σ ∈ F(A)
n o (10.68)
= min log t : tσ − ρ = (t − 1)ω , σ ∈ F(A), ω ∈ D(A), t ⩾ 1 ,
For any α ∈ [0, 2], we also define the α-Rényi relative entropy of a resource as
Note that the case α = 0 corresponds to Dmin (ρ∥F). The special case of α = 1 is Dα=1 (ρ∥F) =
D(ρ∥F) (the relative entropy of a resource). Since the Petz quantum Rényi divergence,
Dα (·∥·), is non-decreasing with α, also Dα (·∥F) is not decreasing in α. The continuity of
Dα (·∥·) in α also carries over to Dα (·∥F) including the continuity at α = 1. This result is a
simple consequence of Sion’s minimax theorem.
Lemma 10.2.2 (Sion’s Minimax Theorem). Let X be a compact convex subset of a linear
topological space, and let Y be a convex subset of a topological space. Let f : X × Y →
R ∪ {−∞, +∞} be a real valued function satisfying
1. For every fixed y ∈ Y , the function x 7→ f (x, y) is lower semicontinuous and quasi-
convex on X.
2. For every fixed x ∈ X, the function y 7→ f (x, y) is upper semicontinuous and quasi-
concave on Y.
Then
min sup f (x, y) = sup min f (x, y) . (10.72)
x∈X y∈Y y∈Y x∈X
Lemma 10.2.3. Let ρ ∈ D(A) be a fixed density matrix, and define g : [0, 2] → R+
as g(α) := Dα (ρ∥F) for all α ∈ [0, 2]. Then, g(α) is a continuous function.
In order to switch the order between the sup and min above we need to verify that all the
conditions in Sion’s minimax theorem are satisfied. Indeed, the function f (ω, α) := Dα (ρ∥ω)
has the property that it is continuous in ω (and therefore lower semi-continuous). Moreover,
note that for a fixed α ∈ [0, 2], the function ω 7→ f (ω, α) is a quasi-convex function since for
any t ∈ [0, 1] and ω0 , ω1 ∈ F(A) we have
f (tω0 + (1 − t)ω1 , α) = Dα ρ tω0 + (1 − t)ω1
(6.111)→ ⩽ max Dα (ρ∥ω0 ), Dα (ρ∥ω1 ) (10.75)
= max f (ω0 , α), f (ω1 , α) .
On the other hand, for a fixed ω ∈ F(A) the function α 7→ f (ω, α) is a continuous function
(and therefore upper semi-continuous) and quasi-concave since for any t ∈ [0, 1] and α0 , α1 ∈
[0, 2] we have
f (ω, tα0 + (1 − t)α1 ) = Dtα0 +(1−t)α1 (ρ∥ω)
monotonicity of Dα in α→ ⩾ Dmin{α0 ,α1 } (ρ∥ω) (10.76)
= min f (ω, α0 ), f (ω, α1 ) .
Therefore, f (ω, α) satisfies all the requirements of Sion’s minimax theorem. This means that
we can switch the order of the sup and min in (10.74) to get
The limit above exists since the α-Rényi relative entropy of a resource is subadditive (see
Exercise 10.1.1). From Corollary 10.2.3 it follows that
1 ε
ρ⊗n F ⩽ lim+ Dαreg (ρ∥F) ,
lim sup Dmin (10.81)
n→∞ n α→1
and similarly
1 ε
Dmin ρ⊗n F ⩾ lim− Dαreg (ρ∥F) .
lim inf (10.82)
n→∞ n α→1
Observe that in general we do not know if Dαreg (ρ∥F) is continuous at α = 1, but we can
show continuity from the right.
Lemma 10.2.4. Let ρ ∈ D(A) and let F be a quantum resource theory admitting a
tensor product structure, and has the property that F(A) ⊆ D(A) is closed and
convex. Then,
lim+ Dαreg (ρ∥F) = Dreg (ρ∥F) . (10.83)
α→1
In several resource theories the opposite inequality also holds, but in general we do not know
if the limit limα→1− Dαreg (ρ∥F) equals to Dreg (ρ∥F). At the time of writing this book it is
a big open problem in the field to determine under what conditions the inequality in the
equation above can be replaced with an equality.
where Bε (ω) is the set of all density matrices that are ε-close (in trace distance) to ω. That
is, in any neighbourhood of a state on the boundary of F(A) there exists at least one state
in F(A) and at least one state not in F(A).
The state σ can be thought of as the closest free state (CFS) to ρ, when we measure the
“distance” with the relative entropy. As we already mentioned, the computation of σ can
be very hard. However, for a given state ω ∈ ∂F(A) we can compute all the resource states
in D(A) for which ω is the CFS. This converse problem has several applications and can be
used to produce examples of resource states for which one knows the value of the relative
entropy of a resource.
Note that if 0 < ρ ̸∈ F(A) and D(ρ∥F) = D(ρ∥σ) (i.e. σ is a CFS) then σ > 0 or
otherwise D(ρ∥F) = ∞. For simplicity of the exposition here, we will always assume that σ
has full rank, and refer the interested reader to the end of this chapter for more details and
references on the singular case. We start by showing that if 0 < σ ∈ F(A) is a CFS then
σ ∈ ∂F(A).
Theorem 10.3.1. Let 0 < σ ∈ F(A) be a closest free state of a resource state
ρ ∈ D(A). Then, σ ∈ ∂F(A).
Proof. Consider the following Taylor expansion of the logarithmic function. This expansion
is based on the divided difference approach discussed in Appendix D.1. For any t > 0,
0 < σ ∈ D(A), and η ∈ Herm(A) we have
where Lσ : Herm(A) → Herm(A) is a linear operator defined as follows. Let {px }x∈[m] (with
m := |A|) be the eigenvalues of σ, and let {ηxy }x,y∈[m] be the matrix components of a matrix
η ∈ Herm(A) in the eigenbasis of σ. Then, the matrix components of Lσ (η) are given by
log px − log py
Lσ (η) xy := ηxy ∀x, y ∈ [m] , (10.88)
px − py
In Exercise 10.3.1 below you show that Lσ is a linear self-adjoint map that satisfies Lσ (σ) = I.
Now, suppose by contradiction that 0 < σ ∈ F(A) is a CFS of ρ ∈ D(A), and σ ̸∈ ∂F(A).
This means that σ is in the interior of F(A), and in particular, there exists ε > 0 such that
Bε (σ) does not contain any resource state (i.e. Bε (σ) ⊂ F(A)). Moreover, since σ > 0 it
follows that for any σ ′ ∈ D(A) (i.e. not necessarily free) and small enough |t|, where t ∈ R
can be negative, the state ω := (1 − t)σ + tσ ′ ∈ Bε (A) ⊂ F(A). Hence, for small enough |t|,
since σ is a CFS of ρ. The above expression is equivalent to f (t) ⩽ Tr[ρ log σ] , where
Since f (0) = Tr[ρ log σ] achieves the maximum value, we must have f ′ (0) = 0. Using (10.87)
we get
f ′ (0) = Tr [ρLσ (σ ′ − σ)]
= Tr ρ Lσ (σ ′ ) − I
(10.92)
′
= Tr [Lσ (ρ)σ ] − 1
Therefore, the condition that f ′ (0) = 0 implies that Tr [Lσ (ρ)σ ′ ] = 1 for all σ ′ ∈ D(A). This
means that Lσ (ρ) = I which is possible only if ρ = σ. But since we assume that ρ ̸∈ F(A)
we get a contradiction. This completes the proof.
2. Show that Lσ is a linear self-adjoint map. That is, show that for any η, ζ ∈ Herm(A)
−1 px − py
Lσ (ζ) xy := ζxy ∀ x, y ∈ [m] , ∀ ζ ∈ Herm(A) . (10.94)
log px − log py
The next theorem provides a formula for all the resource states that have the same CFS.
We will use the notation WITF (A) to denote the subset of Herm(A) that consists of all the
normalized resource witnesses of the QRT F. Explicitly,
n o
WITF (A) := η ∈ F(A)∗ : η ̸⩾ 0 , ∥η∥1 = 1 . (10.95)
Note that we normalized the resource witnesses to have a unit trace norm since if η ∈
Herm(A) is a resource witness also aη with 0 < a ∈ R is a resource witness, and for our
purposes it will be sufficient to consider only one representative of the set {aη}a>0 .
where amax is the largest positive number that satisfies amax L−1
σ (η) ⩽ σ.
Remark. The conditions Tr[ση] = 0 and a ⩽ amax ensure that the state σ − aL−1 σ (η) is a
density matrix. Indeed, the condition a ⩽ amax ensures that it is positive semidefinite, and
its trace is one since the self-adjointness of L−1
σ gives
Tr L−1
−1
σ (η) = Tr Lσ (I)η = Tr[ση] = 0 . (10.97)
Proof. From the supporting hyperplane theorem, (see Theorem A.6.4) it follows that for any
(fixed) σ ∈ ∂F(A) there exists an Hermitian matrix η ∈ Herm(A) such that
Moreover, since both σ and σ ′ are normalized, if η satisfies the equation above, also η + aI
satisfies it for any a ∈ R. We will therefore assume without loss of generality that Tr[ση] = 0
which means that Tr[σ ′ η] ⩾ 0 for all σ ′ ∈ F(A); i.e. η is a resource witness (observe that
the condition Tr[ση] = 0 implies that η ̸⩾ 0 since σ > 0). Note also that we can always
normalize η such that ∥η∥1 = 1. Quite often, such a resource witness that satisfies these
three conditions (i.e. Tr[ση] = 0, Tr[σ ′ η] ⩾ 0 for all σ ′ ∈ F(A), and ∥η∥1 = 1) is unique,
although for some special boundary points σ ∈ ∂F(A), there is a cone of such witnesses of
dimension greater than one (see Fig. 10.2).
Figure 10.2: A schematic diagram of free states (red) and resource states (green). Most points
on the boundary, like the points D and E, have a unique supporting hyperplane (which is also the
tangent plane). The point E is the closest free state of all the points on the vertical line from it.
Some of the points, like the points C and F, have more than one supporting hyperplane.The point
F is the closest free state of all the points in the shaded black area. Some points on the boundary,
like the points A and B, can not be a closest free states; for example, separable states of rank 1
(i.e.product states) are on the boundary of separable states, but can never be the closest separable
states of some entangled state.
Let ρ be a resource state in D(A) for which σ is the closest free state. The main idea of
the proof is the observation that η ′ := I A − Lσ (ρ) is a resource witness. To see that, first
observe that
Tr[η ′ σ] = 1 − Tr[σLσ (ρ)]
Lσ is self adjoint→ = 1 − Tr[Lσ (σ)ρ] (10.99)
Lσ (σ) = I A −−−−→ = 1 − Tr[ρ] = 0 .
Moreover, for every σ ′ ∈ F(A), define f (t) as in (10.91), but with non-negative t ∈ [0, 1]
(recall that here σ is a boundary point not in the interior of F(A), so that we can only
conclude that ω := (1 − t)σ + tσ ′ is a free state for non-negative t ∈ [0, 1]). Since σ is the
closest free state to ρ, we must have that f ′ (0) ⩽ 0 (we cannot conclude that the derivative
is zero since t cannot be negative). From (10.92) we get for all σ ′ ∈ F(A)
0 ⩽ −f ′ (0) = 1 − Tr [Lσ (ρ)σ ′ ]
= Tr I − Lσ (ρ) σ ′
(10.100)
= Tr[η ′ σ ′ ] .
Hence, η ′ is a resource witness. We can then normalize it η := a1 η ′ with a > 0 such that
∥η∥1 = 1. We then conclude from the definition η ′ := I A − Lσ (ρ) that
ρ = L−1 A ′
σ I − η
−1
I A − aL−1
η = aη −−−−→ = Lσ
′
σ (η) (10.101)
L−1
σ (I) = σ −−−−→ = σ − aL−1
σ (η) .
Conversely, suppose ρ = σ −aL−1 σ (η) for some η ∈ WITF (A) and a > 0. We need to show
that D(ρ∥F) = D(ρ∥σ). For this purpose, let σ ′ ∈ F(A) be any free state, and observe that
D(ρ∥σ) ⩽ D(ρ∥σ ′ ) if and only if f (0) ⩾ f (1), where f (t) is defined in (10.91). From the joint
convexity of the relative entropy (and particularly its convexity in the second argument) it
follows that the function f (t) is concave (see Exercise 10.3.2). This means that if f ′ (0) ⩽ 0
then we must have f (0) ⩾ f (1) (Exercise 10.3.2). Now, note that from (10.92)
f ′ (0) = Tr [Lσ (ρ)σ ′ ] − 1
−−−→ = Tr Lσ σ − aL−1
′
σ (η) −
ρ = σ − aL−1 σ (η) σ −1
(10.102)
= −aTr [ησ ′ ]
η is a resource witness→ ⩽ 0 .
Hence, f (0) ⩾ f (1) which is equivalent to D(ρ∥σ) ⩽ D(ρ∥σ ′ ). Since σ ′ was arbitrary state
in F(A), this completes the proof.
The significance of the theorem above is that if for a given resource state ρ we have a
candidate σ that we believe to be a closest free state, then we can check it with the formula
in (10.96). Specifically, what needs to be checked is whether the matrix I − Lσ (ρ) is a
resource witness. We will see how this can be done when we compute the relative entropy
of entanglement on pure bipartite states. We also point out that the techniques used above
are not limited to the Umegaki relative entropy, and similar results can be obtained for the
α-Rényi relative entropy of a resource as defined in (10.71).
Exercise 10.3.2. Let f (t) be the function defined in (10.91).
1. Show that f (t) is concave. Hint: Use the convexity of D(ρ∥σ) in σ (with fixed ρ).
2. Show that if f ′ (0) ⩽ 0 then f (0) ⩾ f (1).
Exercise 10.3.3. Let 0 < ρ ∈ D(A) be a full rank resource state (i.e. ρ ̸∈ F(A)). Show
that the closest free state to ρ is unique. Hint: Let σ ̸= σ ′ be two closest free states, define
tσ + (1 − t)σ ′ , and use the strict concavity of the function f (σ) = Tr[ρ log σ].
This quantity has a simple closed formula if the set of free states is affine (see Sec. 9.3) and
in addition satisfy for any α ∈ [0, 2]
σα
∈ F(A) ∀ σ ∈ F(A) . (10.104)
Tr[σ α ]
Tr ρα σ 1−α = Tr ρα ∆ σ 1−α
= Tr ∆ (ρα ) σ 1−α
(10.106)
= ∥∆ (ρα )∥1/α Tr γ α σ 1−α
where 1/α
∆ (ρα )
γ := h 1/α i (10.107)
Tr ∆ (ρα )
The proof is concluded with the observation that for α ⩽ 1 we have Tr [γ α σ 1−α ] ⩽ 1 (Hölder’s
inequality), and for α > 1 we have Tr [γ α σ 1−α ] ⩾ 1 (reverse Hölder’s inequality), where
equality holds in both cases for σ = γ. Therefore, σ = γ is the optimizer.
Exercise 10.3.4. Let α ∈ [0, 2]. Give a closed expression for Dα (ρ∥F) for the following
cases:
1. ρ ∈ D(A) and F(A) consists of a set of diagonal density matrices in some fixed basis.
3. ρ ∈ D(A) and F(A) consists of a set of symmetric density matrices (i.e. σ ∈ F(A) if
and only if σ = σ T where the transpose is taken in some fixed basis).
4. G is a unitary group, ρ ∈ D(A), and F(A) consists of the set of G-invariant states
(i.e. σ ∈ F(A) if and only if σ = U σU ∗ for all U ∈ G).
Exercise 10.3.5. Show that the expression given in (10.105) for the α-relative entropy of a
resource can be rewritten as
Dα (ρ∥F) = H1/α ∆(ρα ) − Hα (ρ) (10.108)
This lack of smoothness in the logarithmic robustness can be traced back to the discontinuity
present in the max relative entropy. Contrasting with the Umegaki relative entropy, Dmax ,
does not exhibit asymptotic continuity. In fact, more broadly, it is not continuous with
respect to its first argument. For example, take
1 1 1 1
σ = |0⟩⟨0| + |1⟩⟨1| and ρε = − ε |0⟩⟨0| + |1⟩⟨1| + ε|2⟩⟨2| . (10.111)
2 2 2 2
For these choices we have D(ρε ∥σ) = ∞ for all ε ∈ (0, 1/2] whereas D(ρε=0 ∥σ) = 0. On the
other hand, in the laboratory, the preparation of a physical system in a state ρ always results
in some error such that the intended state ρ is differ (in trace distance) from the prepared
state by some small ε > 0. Therefore, discontinuous resource measures are unlikely to have
practical physical significance unless some smoothing process has been applied to them.In
the following definition we provide a simple method to smooth a resource measure.
Remark. In the definition we employed the notation Bε (ρ) := ρ′ ∈ D(A) : 12 ∥ρ′ − ρ∥1 ⩽ ε
to denote the “ball” of states ρ′ that are ε-close to ρ in trace distance. The rationale for
choosing the minimum over the ball Bε (ρ) stems from the intention to identify the minimum
amount of resource present within this ball. This approach ensures that the value Mε (ρ)
represents the minimum guaranteed resource level in the system, even when our knowledge
is limited to the state of the system being ε-close to ρ. Essentially, this method accounts
for uncertainty in the system’s state by considering the least amount of resource that can
be confidently ascribed to states within an ε-radius of ρ. This approach is both cautious
and practical, as it provides a conservative estimate of the resource quantity in the practical
situations where exact state information is not available.
= inf Dε (ρ∥σ) ,
σ∈F(A)
where Dε is defined as
Dε (ρ∥σ) := ′ min D (ρ′ ∥σ) . (10.115)
ρ ∈Bε (ρ)
The quantity Dε is called the ε-smoothed version of the quantum divergence D. Smoothed di-
vergences play key roles in QRTs and in the next theorem we prove some useful relationships
among some of them.
Exercise 10.4.1. Let D be a quantum divergence and ε > 0. Show that Dε is itself a quantum
divergence.
ε ε
We denote the ε-smoothed version of Dmax as Dmax . For Dmin , the notation Dmin already
signifies the quantum hypothesis testing divergence (see Sec.8.7.1). This aligns with the
notion that the quantum hypothesis testing divergence is a smoothed version of Dmin (refer
to Exercise 8.7.4). Additionally, when smoothing Dmin in the form minρ′ ∈Bε (ρ) Dmin (ρ′ |σ),
the result is always zero. This is because for any ε > 0 and ρ ∈ D(A), there’s a ρ′ ∈ D(A)
that’s ε-close to ρ with ρ′ > 0, making Dmin (ρ′ ∥σ) = 0. Henceforth, Dmin ε
will exclusively
represent the quantum hypothesis testing divergence in this book.
ε
In the forthcoming theorem, we will establish specific inequalities that involve Dmin and
ε
Dmax . These inequalities are crucial and will play a significant role in the subsequent dis-
ε ε
cussions and analyses. The relationships between Dmin and Dmax are fundamental in un-
derstanding various aspects of quantum resources both in the single-shot regime as well as
in the asymptotic domain. Furthermore, later on we will use some of these relationships to
ε ε
provide operational interpretation to both Dmin and Dmax .
ρ̃ = ρ + ε1 (ω+ − ω− ) (10.119)
where without loss of generality we assumed the equality 12 ∥ρ̃ − ρ∥1 = ε1 . Denote r :=
ε1
Dmax (ρ∥σ) so that ρ̃ ⩽ 2r σ. Combining this with the inequality ρ ⩽ ρ̃ + ε1 ω− (that follows
from the equation above) gives
ρ ⩽ 2r σ + ε1 ω− . (10.120)
GρG∗
Hence, Λ ∈ Eff(A). Finally, denoting by ρ′ := Tr[Λρ]
(so that ρ′ ∈ D(A)), the condition
tρ ⩽ σ + ω implies that tTr[Λρ]ρ′ ⩽ σ, so that
Dmax (ρ′ ∥σ) ⩽ − log tTr[Λρ] .
(10.130)
Next, we estimate Tr[Λρ]:
Tr[Λρ] = 1 − Tr [(I − Λ)ρ]
1
tρ ⩽ σ + ω −−−−→ ⩾ 1 − Tr [(I − Λ)(σ + ω)]
t (10.131)
Tr[ω]
Tr [Λ(σ + ω)] = 1 −−−−→ = 1 −
t
(10.127)→ ⩾ ε ,
where we used the expression for Λ in (10.129) to get that Tr [Λ(σ + ω)] = 1. Combining
the two equations above gives Dmax (ρ′ ∥σ) ⩽ − log(tε). To estimate t, we use the fact that
ε
Tr[ω] ⩾ 0 to get from the relation in (10.126) that 2−Dmin (ρ∥σ) ⩽ (1 − ε)t. Substituting this
lower bound on t, into the inequality Dmax (ρ′ ∥σ) ⩽ − log(tε) gives
′ ε −Dmin ε (ρ∥σ)
Dmax (ρ ∥σ) ⩽ − log 2
1−ε
(10.132)
ε ε
= Dmin (ρ∥σ) − log .
1−ε
√
It is therefore left to show that ρ′ is 1 − ε2 -close to ρ.
√
Let |ψ AÃ ⟩ := ( ρ ⊗ I)|ΩAÃ ⟩ and |ψ̃ AÃ ⟩ := (G ⊗ I)|ψ AÃ ⟩. Observe that ψ AÃ and ψ̃ AÃ are
purifications of ρ and ρ̃ := GρG∗ , respectively. Moreover, observe that |ψ ′ AÃ ⟩ := √ 1 |ψ̃ AÃ ⟩
Tr[ρ̃]
is a purification of ρ′ . From Uhlmann’s theorem the fidelity between ρ and ρ′ satisfies:
AÃ
F (ρ, ρ′ ) ⩾ ⟨ψ ′ |ψ AÃ ⟩
−−−−→ ⩾ ⟨ψ̃ AÃ |ψ AÃ ⟩
Tr[ρ̃] ⩽ 1
1 AÃ AÃ
Real Part→ ⩾ ⟨ψ̃ |ψ ⟩ + ⟨ψ AÃ |ψ̃ AÃ ⟩ (10.133)
2
1
P :=
2
(G + G∗ ) −−−−→ = ⟨ψ AÃ |P ⊗ I Ã |ψ AÃ ⟩
= Tr[ρP ] .
Combining this with the fact that P ⩽ I A (see Exercise 2.3.14) we obtain
F (ρ, ρ′ ) ⩾ Tr[ρP ] = 1 − Tr[ρ(I − P )]
1
tρ ⩽ σ + ω −−−−→ ⩾ 1 − Tr[(σ + ω)(I − P )]
t
1 Tr[ω] 1 h 1 1
i
By definition of P→ = 1 − − + Tr σ 2 (σ + ω) 2 (10.134)
t t t
1 1 Tr[ω]
(σ + ω) 2 ⩾ σ 2 → ⩾ 1 −
t
(10.127)→ ⩾ ε .
Therefore, from the relation (5.202) between the trace distance and the fidelity we get 12 ∥ρ −
√
ρ′ ∥1 ⩽ 1 − ε2 . This completes the proof.
The relation between the smoothed max and min entropies can be used to obtained a
generalized version of the AEP property.
Corollary 10.4.1. For any ρ, σ ∈ D(A) with supp(ρ) ⊆ supp(σ), and any ε ∈ (0, 1)
1 ε
Dmax ρ⊗n σ ⊗n = D(ρ∥σ) .
lim (10.135)
n→∞ n
Proof. From (10.116) we get that for any ε1 , ε2 ∈ (0, 1) with ε1 + ε2 < 1
1 ε1 1
ρ⊗n ∥σ ⊗n ⩾ lim inf ε2
ρ⊗n ∥σ ⊗n + log (1 − ε1 − ε2 )
lim inf Dmax Dmin
n→∞ n n→∞ n
1 ε2 (10.136)
ρ⊗n ∥σ ⊗n
= lim inf Dmin
n→∞ n
(8.211)→ = D(ρ∥σ) .
√
Conversely, from (10.118), after replacing the roles between δ := 1 − ε2 and ε, we get for
every ε ∈ (0, 1)
1 ε ⊗n ⊗n
1 δ ⊗n ⊗n
δ
lim sup Dmax ρ ∥σ ⩽ lim sup Dmin ρ ∥σ − log
n→∞ n n→∞ n 1−δ
1 δ (10.137)
ρ⊗n ∥σ ⊗n
= lim sup Dmin
n→∞ n
(8.211)→ = D(ρ∥σ) .
From the two equations above it follows that (10.135) must hold.
The technique applied in the aforementioned theorem, especially in the proof of (10.118),
is also applicable for upper-bounding the smoothed max relative entropy. This can be
achieved by smoothing the second argument of Dmax . For further details, please refer to
Appendix D.3.
Exercise 10.4.2. Let ρ ∈ D(AB) and ε ∈ (0, 1). The smoothed version of Hmin and Hmax
(see Definition 7.5.1) are defined, respectively, as
ε ε
Hmin (A|B)ρ := ′max Hmin (A|B)ρ′ and Hmax (A|B)ρ := ′ min Hmax (A|B)ρ′ . (10.138)
ρ ∈Bε (ρ) ρ ∈Bε (ρ)
1. Show that
1 ε
Hmin An B n ρ⊗n = H(A|B)ρ .
lim (10.139)
n→∞ n
Hint: Use Corollary 10.4.1.
2. Show that
1 ε
Hmax An B n ρ⊗n = H(A|B)ρ .
lim (10.140)
n→∞ n
Hint: Use the duality relation between Hmin (A|B) and Hmax (A|B).
Observe that this definition is consistent with the definition of a smoothed relative entropy.
Specifically, if H is related to a quantum divergence D as H(ρ) = log |A| − D(ρ∥u) then Hε
as defined above is related to the smooth relative entropy Dε as Hε (ρ) = log |A| − Dε (ρ∥u).
Theorem 10.4.2. Let H be a quantum entropy and let ε ∈ [0, 1). Then, the
ε-smoothed version of H is given by
Hε (ρ) = H p(ε)
∀ ρ ∈ D(A) , (10.142)
Proof. Let ρ′ be an optimal quantum state in Bε (ρ) such that Hε (ρ) = H(ρ′ ). We first argue
that without loss of generality we can assume that ρ′ commutes with ρ. To see this, let
∆ ∈ CPTP(A → A) be the completely dephasing channel in the eigenbasis of ρ. Then, since
∆ is a doubly stochastic channel we have H (∆(ρ′ )) ⩾ H (ρ′ ). Moreover, since ∆(ρ) = ρ we
have
1 1
∥∆ (ρ′ ) − ρ∥1 = ∥∆ (ρ′ ) − ∆(ρ)∥1
2 2 (10.143)
1 ′
DPI→ ⩽ ∥ρ − ρ∥1 ⩽ ε .
2
Hence, ∆ (ρ ) is also ε-close to ρ, so that ∆(ρ′ ) is also an optimizer of (10.141). Hence,
′
without loss of generality we can assume that ρ′ is diagonal in the same eigenbasis of ρ.
Let p = p↓ be the vector consisting of the eigenvalues of ρ. From the argument above
we can epress the smoothed entropy in (10.141) as
Now, since Bε (p) has the property that for every p′ ∈ Bε (p) the vector p(ε) as defined
in (4.76) satisfies p′ ≻ p(ε) so that H(p′ ) ⩽ H(p(ε) ). Therefore, the choice p′ = p(ε) gives
the maximum value.
As a simple example, consider the min-entropy as defined in (6.22); i.e., Hmin (ρ) =
− log ∥ρ∥∞ for all ρ ∈ D(A). Note that this entropy is related to the max relative entropy
ε
via Hmin (ρ) = log |A| − Dmax (ρ∥u). From the theorem above, Hmin (ρ) = − log p(ε) ∞ .
Using the definition of p(ε) in (4.76) we get from (10.142) that
ε
Hmin (ρ) = − log(a)
k (10.145)
(4.81)→ = log ,
∥p∥(k) − ε
where k is the integer satisfying (4.82) which is equivalent to
∥p∥(k) − ε
pk+1 < ⩽ pk . (10.146)
k
ε
Alternatively, observe that from (4.87) we can also express Hmin (ρ) as
∥p∥(ℓ) − ε
ε
Hmin (ρ) = − log max . (10.147)
ℓ∈[n] ℓ
It is worth noting that for the case that H = Hmax (recall that Hmax (A)ρ := log Rank(ρ)
for all ρ ∈ D(A)), the definition in (10.141) results with a quantity that always equals log |A|
since for any ε ∈ (0, 1) and any ρ ∈ D(A), there always exists a full rank state ρ′ ∈ D(A)
that is ε-close to ρ. Therefore, in this case instead of taking the maximum in (10.141) we
take the minimum, so that the smoothed version of Hmax is defined as
ε
Hmax (A)ρ := ′ min Hmax (A)ρ′ . (10.148)
ρ ∈Bε (ρ)
Lemma 10.4.2. Let ε ∈ [0, 1) and ρ ∈ D(A). Then, the ε-smoothed max-entropy is
given by
ε
Hmax (A)ρ = log(m) (10.149)
where m is the integer satisfying ∥ρ∥(m−1) < 1 − ε ⩽ ∥ρ∥(m) .
where we used the notation Dm (A) to denote the set of all density matrices in D(A), whose
rank is not greater than m. In Theorem 5.4.3 we showed that T ρ, Dm (A) = 1 − ∥ρ∥(m) .
Substituting this to the equation above we conclude that
ε
Hmax (A)ρ = min log m : ∥ρ∥(m) ⩾ 1 − ε . (10.151)
This completes the proof.
It’s crucial to emphasize, however, that this quantity is not necessarily a divergence, given the
optimization uses a maximum instead of a minimum (thus, Lemma 10.4.1 isn’t applicable).
Nevertheless, in this book, we will encounter this function in some applications.
Exercise 10.4.4. Show that for every ρ ∈ D(A) we have
ε (ε)
Hmax (A)ρ = log |A| − Dmin (ρ∥u) . (10.155)
Proof. We first prove the theorem for the classical case. For every ℓ ∈ [m] denote by Probℓ (m)
the set of probability vectors in Prob(m) with at most ℓ non-zero components. Then, by
definition,
(ε)
Dmin p u(m) = max Dmin q u(m)
q∈Bε (p)
| supp(q)|
= − min log (10.157)
q∈Bε (p) m
ℓ := | supp(q)| −−−−→ = log m − log min ℓ : T p, Probℓ (m) ⩽ ε
Theorem 5.4.3→ = log m − log min ℓ : ∥p∥(ℓ) ⩾ 1 − ε
Hence,
(ε)
Dmin p u(m) = log(m/ℓ) ,
(10.158)
where ℓ ∈ [m] is the smallest integer satisfying ∥p∥(ℓ) ⩾ 1 − ε. The above expression coincide
with the lower bound in (8.148) for the case that q = u(m) . Hence, Dmin ε
p u(m) ⩾
(ε)
Dmin p u(m) .
For the case that q has positive rational components (as given in (4.133)) we use Theo-
rem 4.3.2, particularly the relation (p, q) ∼ (r, u(k) ), where r is defined in (4.134) to get
ε ε
r u(k)
Dmin (p∥q) = Dmin
(ε)
⩾ Dmin r u(k)
(10.159)
′ (k)
= ′max Dmin r u .
r ∈Bε (r)
(ε) (ε)
It is crucial to observe that since Dmin is not a divergence we cannot conclude that Dmin r u(k) =
(ε)
Dmin (p∥q). Instead, let C be the set of all vectors r′ ∈ Prob(k) that satisfies (r′ , u(k) ) ∼
(p′ , q) for some p′ ∈ Bε (p), and observe that C ⊂ Bε (r). Combining this with the equation
above gives
ε ′ (k)
Dmin (p∥q) ⩾ max ′
D min r u
r ∈C
= ′max Dmin (p′ ∥q) (10.160)
p ∈Bε (p)
(ε)
= Dmin (p∥q) .
ε
Finally, the case case with arbitrary q ∈ Prob(m), follows from the continuity of both Dmin
(ε)
and Dmin in their second argument. This completes the proof for the classical case.
For the quantum case we get from (8.191) that
ε ε
Dmin ρ σ = sup Dmin E(ρ) E(σ)
E∈CPTP(A→X)
(ε)
From the classical case→ ⩾ sup Dmin E(ρ) E(σ)
E∈CPTP(A→X)
where the supremums are over all classical systems X and POVM channels E ∈ CPTP(A →
X) that takes ρ and σ to diagonal density matrices (i.e. probability vectors).
Exercise 10.4.5. Let ρ, σ ∈ D(A) and ε ∈ (0, 1). Show that
√ √ !
(ε) Λρ Λ
Dmin ρ σ ⩾ Dmin σ , (10.162)
Tr[Λρ]
for any Λ ∈ Eff(A) that satisfies Tr[Λρ] ⩾ 1 − ε2 . Hint: Use the gentle measurement lemma
(Lemma 5.4.3).
ε
In the quantum Stein’s lemma (Theorem 8.7.3) we saw that the regularization of Dmin
(ε)
yields the Umegaki relative entropy. We show now that the same holds also for Dmin . We
will use this result later on when we discuss the uniqueness of the Umegaki relative entropy.
Theorem 10.4.4. Let ε ∈ (0, 1), and ρ, σ ∈ D(A) with supp(ρ) ⊆ supp(σ). Then,
1 (ε) ⊗n ⊗n
lim Dmin (ρ ∥σ ) = D(ρ∥σ) . (10.163)
n→∞ n
1 (ε) 1 ε
lim sup Dmin (ρ⊗n ∥σ ⊗n ) ⩽ lim sup Dmin (ρ⊗n ∥σ ⊗n )
n→∞ n n→∞ n (10.164)
Theorem 8.7.3→ = D(ρ∥σ) .
In order to prove the opposite inequality, we make use of the method of relative typical
subspace introduced in Sec. 8.3.2. Set ε, δ ∈ (0, 1) and let Πrel,n
δ be the projection to the
n
relative typical subspace given in (8.75), Pδ be the projection to the δ-typical subspace
associated with ρ, and define
Πrel,n n ⊗n n rel,n
δ hPδ ρ Pδ Πδ
ρn := i . (10.165)
rel,n n ⊗n
Tr Πδ Pδ ρ
(8.76), (8.55)→ ⩾ 1 − δ1 − δ2 ,
Due to the first property, it follows by definition (8.75) of the relative-typical subspace that
Tr [Πρn σ ⊗n ] ⩽ 2n(Tr[ρ log σ]+δ) Tr [Πρn ]. Combining this with the above equation we conclude
that for sufficiently large n
(ε)
Dmin (ρ⊗n ∥σ ⊗n ) ⩾ − log 2n(Tr[ρ log σ]+δ) Tr [Πρn ]
As a simple application of the result above, consider the smoothed max-entropy as defined
in (10.148). Then, from Exercise 10.4.4 and the theorem above we get the following version
of the AEP: For all ε ∈ (0, 1) and all ρ ∈ D(A)
1 ε
lim Hmax (An )ρ⊗n = H(A)ρ , (10.171)
n→∞ n
where H(A)ρ is the von-Neumann entropy of ρ.
Exercise 10.4.6. Prove this AEP version, and compare it with (10.140) for the case |B| = 1.
R (10.172)
A
where U(A) is the group of all unitary matrices acting on A, and U(A)
dU denotes
the integral over the Haar measure on U(A).
Proof. This corollary concerns with the replacement of the terms involving H2 in the decou-
pling theorem with the smoothed min-entropy. For this purpose, let ρ̃AE and τ̃ AB be such
ε ε
that Hmin (A|E)ρ = Hmin (A|E)ρ̃ and Hmin (A|B)τ = Hmin (A|B)τ̃ . Note also that by definition
AE AE AB AB
∥ρ − ρ̃ ∥1 ⩽ 2ε and ∥τ − τ̃ ∥1 ⩽ 2ε. Denoting by Ẽ the CP map whose Choi matrix
is τ̃ AB we get
− 21 Hmin
ε (A|E) +H ε (A|B)
ρ τ − 12 Hmin (A|E)ρ̃ +Hmin (A|B)τ̃
2 min =2
− 21 H2 (A|E)ρ̃ +H2 (A|B)τ̃
−−−−→ ⩾ 2
Hmin ⩽ H2
Z
(10.173)
A ∗
dU A A→B A AE
− τ̃ B ⊗ ρ̃E
Theorem 7.7.1→ ⩾ Ẽ U ρ̃ U
U(A) 1
Z
A ∗
See (10.174) below→ ⩾ dU A Ẽ A→B U A ρ̃AE U − τ B ⊗ ρE − 4ε
U(A) 1
where in the last inequality we used the fact that η := Ẽ (U ρ̃U ∗ ) ∈ Pos(BE) satisfies
Ẽ U ρ̃U ∗ = E (U ρU ∗ ) + E U (ρ̃ − ρ)U ∗ + (Ẽ − E) U ρ̃U ∗ . (10.175)
Ẽ U ρ̃U ∗ − τ ⊗ ρ 1
(10.176)
⩾ E U ρU ∗ − τ ⊗ ρ − E U (ρ̃ − ρ)U ∗ − (Ẽ − E) U ρ̃U ∗
1 1 1
It is therefore left to bound the average of the last two terms over the group U(A). Denote
by η± := (ρ̃AE − ρAE )± and ζ± := (τ̃ AB − τ AB )± . Since ρ̃ and τ̃ are ε-close to ρ and τ ,
respectively, we have Tr[η+ + η− ] ⩽ 2ε and Tr[ζ+ + ζ− ] ⩽ 2ε. Now, denote by N± the
CP maps whose Choi matrices are ζ± , respectively. We then have Ẽ − E = N+ − N− and
ρ̃ − ρ = η+ − η− , so that
Z Z
∗
dU E U (ρ̃ − ρ)U =E U (η+ − η− )U ∗
dU
U(A) 1 1
ZU(A) h i Z h i
∗
Triangle inequality→ ⩽ dU Tr E U η+ U + dU Tr E U η− U ∗
U(A) U(A)
Z
dU U A η±
AE A∗
U = uA ⊗ η±
E
−−−−→ = Tr [E(u)] Tr [η+ ] + Tr [E(u)] Tr [η− ]
U(A)
= Tr[τ ] Tr [η+ ] + Tr[η− ]
⩽ 2ε .
(10.177)
Similarly,
Z Z
dU (Ẽ − E) U ρU ∗ = (N+ − N− ) U ρU ∗
dU
U(A) 1 U(A) 1
Z h i Z h i
∗ ∗
⩽ dU Tr N+ U ρU + dU Tr N− U ρU
U(A) U(A)
= Tr N+ (uA ) ⊗ ρE + Tr N− (uA ) ⊗ ρE
= Tr[ζ+ + ζ− ]Tr[ρE ]
⩽ 2ε .
(10.178)
Combining everything we get
Z
− 12 Hmin
ε (A|E) +H ε (A|B)
2 ρ min τ
⩾ dU E U ρU ∗ − τ ⊗ ρ − 8ε . (10.179)
U(A) 1
′
h ′
′
i
Gη N A→A (ρA ) = sup Tr η B MA →B N A→A (ρA ) − cη
M∈F(A′ →B)
Tr η B E A→B ρA − cη
Replaceing M ◦ N with E −−−−→ ⩽ sup
E∈F(A→B)
= Gη ρA .
(10.181)
Tr η B E A→B σ A = sup Tr η B ω B
sup (10.182)
E∈F(A→B) ω∈F(B)
where ω B = E A→B σ A ∈ F(B) can be taken to be any free state (by choosing
E to
B A
be a replacement channel in F(A → B) that outputs ω ). Hence, Gη σ = 0 for all
σ ∈ F(A).
instrument. Then, observe that for any free channel M ∈ F(A′ X → B) we have
′ ′ ′ ′
X
→B
MA X→B ◦ N A→A X = MA
x ◦ NxA→A (10.183)
x∈[m]
′ ′ ′
′
MxA →B (ω A ) := MA X→B ω A ⊗ |x⟩⟨x|X ∀ ω ∈ L(A′ ) . (10.184)
′ ′ ′
where σxA := p1x NxA→A (ρA ) and px := Tr NxA→A (ρA ) . Combining this with the
On the other hand, since the partial trace is a free operation we get that
Gη ρA ⩽ Gη ρAX .
(10.188)
Recall that the combination of monotonicity and normalization properties ensures that
Gη (ρ) ⩾ 0 for all density matrices. Additionally, if we define Cρ := E(ρ) : E ∈ F(A → B),
then the support function of Cρ in the space of Hermitian matrices Herm(B) is described by:
As we will explore later, this family of resource monotones is complete, meaning that it can
be utilized to fully determine exact interconversions among resources. Furthermore, these
monotones are formulated as conic linear programming problems, and in some QRTs, they
reduce to semidefinite programming, which are comparatively simpler to compute.
F(AB) = uA ⊗ σ B : σ ∈ D(B) .
(10.191)
Therefore, for this QRT, for any η ∈ D(AB ′ ) the coefficient cη is given by
h ′
′
i 1 ′
cη = sup Tr η AB uA ⊗ σ B = ηB ∞ (10.192)
σ∈D(B ′ ) |A|
Hence, for every η ∈ D(AB ′ ) the function Gη as defined above can be expressed as
h ′ ′ i 1 ′
Gη (A|B)ρ := sup Tr η AB N AB→AB ρAB − ηB ∞ . (10.193)
N ∈CMO(AB→AB ′ ) |A|
We denote by fη the first term on the right-hand side above. In terms of the Choi matrix of
N , this function can be expressed as
h ′
′
i
fη (A|B)ρ = sup Tr J AB ÃB ρAB ⊗ η ÃB (10.194)
J AB ÃB ′
Exercise 10.5.1. Use the strong duality relation of an SDP to show that the function fη
can also be expressed as:
n ′
′
′
o
fη (A|B)ρ = |A| inf′ Tr ξ B : uAÃB ⊗ ξ B + Υ ξ AB ÃB ⩾ ρAB ⊗ η ÃB . (10.196)
ξ AB ÃB ⩾0
Exercise 10.5.2. Show that if |A| = |B| then for the maximally entangled state ρAB = ΦAB
Gη as defined in (10.193) is given by
′ 1 ′
Gη (A|B)Φ = η AB ∞
− ηB ∞
. (10.197)
|A|
Hint: Recall that for all states ρ ∈ D(AB) we have ΦAB ≻A ρAB .
Manipulation of Resources
One of the central goals of QRTs is to understand optimal and efficient ways to convert one
resource to another. A resource in this context correspond to a class of equivalent resource-
F
states. We say that two resource-states ρ, σ ∈ D(A) are equivalent if both ρ →− σ (i.e. ρ can
F
be converted to σ by free operations) and σ →− ρ. In this chapter we study the conversion of
resources in two regimes: the single-shot regime and the asymptotic regime.
491
492 CHAPTER 11. MANIPULATION OF RESOURCES
Theorem 11.1.1. Let F be a closed convex QRT, ρ ∈ D(A), and σ ∈ D(B). The
following are equivalent:
1. There exists N ∈ F(A → B) such that σ B = N A→B ρA .
Proof. Let Cρ := E A→B ρA : E ∈ F(A → B) . Observe that Cρ is a convex set in Herm(B).
From the hyperplane separation theorem (see Theorem A.1.1), σ ̸∈ Cρ if and only if there
exists a hyperplane η ∈ Herm(B) that separates them; that is,
Tr η B σ B > max Tr η B ω B .
(11.2)
ω∈Cρ
Tr η B σ B ⩽ max Tr η B ω B
ω∈Cρ
(11.3)
= max Tr η B E A→B ρA .
E∈F(A→B)
Note that if the equation above holds for some η ∈ Herm(B) then it also holds if we replace
η B with η B + cI B and vice versa (here c is any real number). Therefore, the equation above
holds for all η ∈ Herm(B) if and only if it holds for all η ∈ Pos(B). Similarly, by dividing
both sides of the equation above by Tr[η] we conclude that σ ∈ Cρ if and only if (11.3) holds
for all density matrices η ∈ D(B).
Now, observe that σ ∈ Cρ if and only if for all M ∈ F(B
→ B) we have that M(σ) ∈ Cρ .
B A→B A
To see this, suppose σ ∈ Cρ so that σ = E ρ for some E ∈ F(A → B). Then,
B→B B B→B A→B A
M σ =M ◦E ρ and since M ◦ E ∈ F(A → B) we conclude that M(σ) ∈
Cρ . Conversely, if M(σ) ∈ Cρ for all M ∈ F(B → B), by taking the identity channel
M = idB ∈ F(B → B) we get immediately that σ ∈ Cρ .
Finally, from (11.3) we get that for any M ∈ F(B → B) we have M(σ) ∈ Cρ if and only
if for all η ∈ D(B)
Hence, σ ∈ Cρ if and only if the above equation holds for all M ∈ F(B → B). Taking
the maximum over all such M ∈ F(B → B) we conclude that σ ∈ Cρ if and only if for all
η ∈ D(B)
max Tr η B MB→B σ B ⩽ max Tr η B E A→B ρA .
(11.5)
M∈F(B→B) E∈F(A→B)
The proof is concluded by recognizing that the above inequality is equivalent to (11.1).
In general, the theorem above does not provide an efficient way to determine if one
resource can be converted to another by free operations. This is the case even if the resource
monotones Gη themselves can be computed efficiently, as we need to check the conditions
for all η ∈ D(B). Therefore, instead, we can use (11.3) to conclude that ρA can be converted
to σ B by free operations if and only if
h i
max Tr η B E A→B ρA − σ B ⩾ 0 .
min (11.6)
η∈D(B) E∈F(A→B)
For some QRTs, the optimization problem above is an SDP and therefore can be solved
efficiently.
Theorem 11.1.2. Let σ ∈ D(B) be such that σ ̸∈ F(B) (i.e. σ is a resource state).
Then, the function fσ : D(A) → [0, 1], defined via
F
fσ (ρ) := Pr(ρ →
− σ) ∀ ρ ∈ D(A) , (11.10)
Proof. First observe that from the axiom of free instruments and the fact that σ is a resource
state, we must have fσ (ρ) = 0 for all ρ ∈ F(A). Next, we show that fσ is a resource measure.
Let N ∈ F(A → C) be a free channel. Let M ∈ F⩽ (C → B) be an optimal free instrument
satisfying
F
Pr(N (ρ) → − σ) = Tr M N (ρ) . (11.11)
F
Define E := M ◦ N , and observe that E ∈ F⩽ (A → B) and Pr(N (ρ) →
− σ) = Tr [E (ρ)].
F
Hence, from the definition of Pr(ρ →
− σ) in (11.9) we get
F F
Pr(ρ →
− σ) ⩾ Pr(N (ρ) →
− σ) . (11.12)
By definition, this is equivalent to fσ (ρ) ⩾ fσ N (ρ) . We therefore established that fσ is a
resource measure.
To prove strong monotonicity, let N ∈ F(A → BY ) and denote by
X
τ BY := N A→CY (ρA ) = ty τyC ⊗ |y⟩⟨y|Y (11.13)
y∈[n]
where each τy ∈ D(C) and {ty }y∈[n] is a probability distribution. From the monotonicity of
fσ under free channels (in particular, under N ) we get
fσ ρA ⩾ fσ τ CY
X
= Pr
F
ty τyC ⊗ |y⟩⟨y|Y →
− σB . (11.14)
y∈[n]
F
Let E (y) ∈ F⩽ (C → B) be an optimal trace non-increasing CP map such that Pr τyC → − σB =
Tr E (y) (τy ) and E (y) (τy ) is proportional to σ. We also define M ∈ F⩽ (CY → B) as:
In Exercise 11.1.1 you show that MCY →B is indeed an element of F⩽ (CY → B). By
Now, let M be a resource measure that satisfies the strong monotonicity property. Then, by
definition M satisfies
|X|
X
A B
M(ρ ) ⩾ p1 M(σ ) + px M(ωxB ) ⩾ p1 M(σ B ) . (11.20)
x=2
In other words, the probability p1 to convert ρA to σ B cannot exceed the ratio M(ρA )/M(σ B ).
Since this is true for all resource measures that satisfies the strong monotonicity property
we get that
F M(ρA )
Pr(ρA →− σ B ) ⩽ inf , (11.21)
M M(σ B )
where the infimum is over all resource measures, M, that satisfy the strong monotonicity
F
property. Moreover, from the theorem above, for a fixed σ, the function Mσ (ω A ) := Pr(ω A →
−
σ B ) is itself a resource measure that satisfies the strong monotonicity property. Hence,
M(ρA ) Mσ (ρA ) F
inf B
⩽ B
= Mσ (ρA ) = Pr(ρA →
− σB ) . (11.22)
M M(σ ) Mσ (σ )
F M(ρA )
Pr(ρA →
− σ B ) = inf , (11.23)
M M(σ B )
where the infimum is over all resource measures, M, that satisfy the strong
monotonicity property.
F
It’s evident that the conversion distance is zero if ρ →
− σ is achievable. However, determin-
istic conversion from ρ to σ is often not feasible, raising the question of how closely σ can
be approximated by applying free operations to ρ. As such, conversion distance not only
provides a meaningful way to evaluate the efficiency of these conversions but, as the following
lemma demonstrates, also serves as a resource measure in its own right.
Exercise 11.1.3. Let F be a QRT, ρ, σ ∈ D(A), ε ∈ [0, 1], and k ∈ N. Show that if
F
T (ρ →
− σ) ⩽ ε then
⊗k F ⊗k
T ρ → − σ ⩽ kε . (11.28)
The subsequent lemma highlights an additional property of the conversion distance: small
changes in ρ result in only minor variations in the conversion distance. This property un-
derscores the stability of the conversion distance measure against slight perturbations in the
resource state.
Lemma 11.1.2. Let ε ∈ (0, 1), ρ ∈ D(A), σ ∈ D(B), and ρ̃ ∈ Bε (ρ). Then,
F F
T ρ→ − σ − T ρ̃ → − σ ⩽ε. (11.29)
Definition 11.1.1. The sequence of resource states {Φm }m∈N is called a golden unit
if for all m, n ∈ N the following two conditions hold:
1. Φn ⊗ Φm ∼ Φnm .
F
2. If n ⩾ m then Φn →
− Φm .
A golden unit, can be used as a scale to measure the resourcefulness of a given state
ρ ∈ D(A). There are two distinct ways to do that.
Definition 11.1.2. Let ρ ∈ D(A), {Φm }m∈N be a golden unit, and ε ∈ (0, 1).
That is, the ε-single-shot cost can be seen as the smoothed version of its respective zero-error
counterpart. Why something similar does not hold for Distillε (ρ)?
In some resource theories there exists a golden unit {Φm }m∈N with a property that
κ := max D(ω∥F) = D (Φm ∥F) = log(m) . (11.35)
ω∈D(A)
In such QRTs, one can use the asymptotic continuity of the relative entropy of a resource
to obtain an upper bound on the single-shot ε-distillable
resource
of some resource state
F
ρ ∈ D(A). Specifically, let m ∈ N be such that T ρ →− Φm ⩽ ε. Then, for such m there
F
exists σ ∈ D(A′ ) with m := |A′ | such that ρ →
− σ and σ ≈ε Φm . Now, since the relative
entropy of a resource is an entanglement monotone we get
D(ρ∥F) ⩾ D(σ∥F)
(11.36)
ε
(10.34)→ ⩾ D (Φm ∥F) − εκ − (1 + ε)h .
1+ε
In many resource theories there exists a golden unit {Φm }m∈N with a property that κ =
D (Φm ∥F) = log(m). Therefore, for such resource theories the inequality above take the
form
ε
(1 − ε) log(m) ⩽ D(ρ∥F) + (1 + ε)h . (11.37)
1+ε
F
Since m was an arbitrary integer satisfying T ρ →
− Φm ⩽ ε, the inequality above implies
that
ε 1 1+ε ε
Distill (ρ) ⩽ D(ρ∥F) + h . (11.38)
1−ε 1−ε 1+ε
Note that for a small ε > 0 the upper bound is close to the relative entropy of a resource.
Recall also that for any ε > 0 the smoothed version of the logarithmic robustness is defined
as
ε
Dmax (ρ∥F) := ′ min Dmax (ρ′ ∥F) , (11.40)
ρ ∈Bε (ρ)
From the exercise below it follows that Dreg (ρ∥F) is well defined since the limit on the
right-hand side of the equation above exists.
Exercise 11.2.1.
1. Show that for any sequence of real numbers, {aj }∞n=1 , that is sub-additive, i.e. an+m ⩽
an + am , the limit limn→∞ an exists. Hint: See the hint given in Exercise 6.4.2.
2. Show that the limit of the sequence {an }, with an := n1 D ρ⊗n F , exists.
Remark. At first glance it may not be very clear why the theorem above corresponds to
the AEP property. Therefore, after the proof we will will give examples demonstrating that
for different choices of F(A), the above theorem reduces to the various variants of the AEP
studied in literature. In other words, the above theorem unifies all the variants of AEP into
a single formula.
We divide the proof into two lemmas.
Combining this with the inequality (10.78), we then get for all α ∈ (1, 2)
1 ε 1
ρ⊗n ∥F ⩽ lim sup Dα ρ⊗n F = Dαreg (ρ∥F) ,
lim sup Dmax (11.45)
n→∞ n n→∞ n
where Dα (·∥F) is the α-relative entropy of a resource as defined in (10.71). Finally, since the
above inequality holds for all α ∈ (1, 2) we conclude that for all ε ∈ (0, 1)
1 ε
ρ⊗n ∥F ⩽ lim+ Dαreg (ρ∥F)
lim sup Dmax
n→∞ n α→1 (11.46)
reg
Lemma 10.2.4→ = D (ρ∥F) .
This completes the proof.
Note that in the conjecture above we removed the limit ε → 0+ that appear in Theo-
rem 11.2.1. One may be able to prove the conjecture above by showing first that
D1reg reg
− (ρ∥F) := lim Dα (ρ∥F) ,
−
(11.53)
α→1
is equal to Dreg (ρ∥F). From the next lemma we get that this continuity conjecture of
Dαreg (ρ∥F) at α = 1, if true, would imply the strong AEP.
Proof. Recall from Lemma 10.4.1 that for any ε1 , ε2 ∈ (0, 1) with ε1 + ε2 < 1 we have
(cf. (10.116))
ε1 ε2
Dmin (ρ∥σ) ⩽ Dmax (ρ∥σ) − log (1 − ε1 − ε2 ) . (11.55)
Then, from the definitions it follows that for any such ε1 , ε2 ∈ (0, 1) with ε1 + ε2 < 1 we have
ε1 ε2
Dmin (ρ∥F) ⩽ Dmax (ρ∥F) − log (1 − ε1 − ε2 ) , (11.56)
so that
1 ε2 1 ε1
Dmax ρ⊗n F ⩾ lim inf Dmin ρ⊗n F
lim inf
n→∞ n n→∞ n (11.57)
(10.82)→ ⩾ D1reg
− (ρ∥F) .
Hence, if D1reg
− (ρ∥F) = D
reg
(ρ∥F) then the strong AEP holds. In other words, the conjecture
that the function α 7→ Dαreg (ρ∥σ) is continuous at α = 1 is stronger than the conjecture of
strong AEP.
However, in this case, the inequality above still hold even if we remove the limit over ε.
Indeed, let ε ∈ (0, 1) and let δ > 0 such that ε + δ < 1. Observe that from (10.116) we have
1 ε 1 1−ε−δ ⊗n ⊗n
Dmax ρ⊗n σ ⊗n ⩾ lim inf Dmin
lim inf ρ σ
n→∞ n n→∞ n (11.60)
(8.211)→ = D(ρ∥σ) .
Combining this with the general result of Lemma 11.43 we arrive at the following stronger
result for the special case that F(An ) = {σ ⊗n }.
To see how this corollary relates the AEP discussed in Sec. 8.1.1, take σ A = uA and
ε ε
observe that D(ρ∥u) = log |A| − H(ρ) and Dmax (ρ∥u) = log |A| − Hmin (ρ), where
ε
Hmin (ρ) := ′max Hmin (ρ′ ) , (11.62)
ρ ∈Bε (ρ)
is known as the smoothed min-entropy of ρ. Therefore, the corollary above implies in par-
ticular that for any ε ∈ (0, 1)
1 ε
Hmin ρ⊗n = H(ρ) .
lim (11.63)
n→∞ n
Recall that Hmin (ρ) = − log λmax (ρ) so that the above equation states that for any ε > 0
and sufficiently large n, there exists a state ρ′n ∈ D(An ) that is ε-close to ρ⊗n , and for which
λmax (ρ′n ) ≈ 2−nH(ρ) . In other words, it only requires a small perturbation to make all of the
eigenvalues of ρ⊗n to be bounded from above by 2−nH(ρ) .
The second example we consider here is a variant of the AEP involving the conditional
entropy. In this variant, we take the set of free states to be
F(AB) := uA ⊗ ρB : ρ ∈ D(A) .
(11.64)
= min D ρAB uA ⊗ σ B
σ∈D(B) (11.65)
(7.132)→ = log |A| − H(A|B)ρ
Additivity of the conditional entropy→ = Dreg ρAB F .
Similarly,
Dmax ρAB ∥F := min Dmax ρ̃AB ∥σ AB
σ∈F(AB)
ρ̃∈Bε (ρ)
1 ε 1 ε
H(A|B)ρ = lim+ lim inf Hmin (An |B n )ρ⊗n = lim+ lim sup Hmin (An |B n )ρ⊗n . (11.67)
ε→0 n→∞ n ε→0 n→∞ n
One of the open problems in the field is whether or not the following generalization of the
quantum Stein’s lemma holds.
Recall from Lemma 10.4.1 that for any ε1 , ε2 ∈ (0, 1) with ε1 +ε2 < 1 we have (cf. (10.116))
ε1 ε2
Dmin (ρ∥σ) ⩽ Dmax (ρ∥σ) − log (1 − ε1 − ε2 ) . (11.71)
Then, from the definitions it follows that for any such ε1 , ε2 ∈ (0, 1) with ε1 + ε2 < 1 we have
ε1 ε2
Dmin (ρ∥F) ⩽ Dmax (ρ∥F) − log (1 − ε1 − ε2 ) , (11.72)
so that
1 ε1 1 ε2
ρ⊗n F ⩽ lim sup Dmax ρ⊗n F
lim sup Dmin
n→∞ n n→∞ n (11.73)
Lemma 11.43→ ⩽ Dreg (ρ∥F) .
This provides a proof for the strong converse of the conjecture above (note that we already
showed it in (10.85) using a different approach). However, a proof for the direct part is
unknown at the time of writing this book.
Equivalence of Conjectures
Theorem 11.3.1. Let F be a QRT and ρ ∈ D(A). Then the following two
statements are equivalent:
Proof. Recall the relation (10.118) from Lemma 10.4.1. The relation (10.4.1) implies that
for any ε ∈ (0, 1) we have
√
ε 1−ε2 ε
Dmin (ρ∥F)(ρ) ⩾ Dmax (ρ∥F) + log . (11.74)
1−ε
Therefore, if Eq. (11.52) holds for all ε ∈ (0, 1) we get from the above equation that
1 ε
Dmin (ρ∥F) ρ⊗n ⩾ Dreg (ρ∥F) .
lim inf (11.76)
n→∞ n
Combining this with (11.73) gives (11.70).
Conversely, suppose (11.70) holds. Then, from (11.72) it follows that
1 ε2 1 ε1
Dmax ρ⊗n F ⩾ lim inf Dmin ρ⊗n F
lim inf
n→∞ n n→∞ n (11.77)
Assuming (11.70) holds→ = Dreg (ρ∥F) .
Since the above inequality holds for all ε2 ∈ (0, 1), by combining it with Lemma 11.43 we
get that (11.52) must hold for all ε ∈ (0, 1). This completes the proof.
We now give an example in which the generalized quantum Stein’s lemma does hold.
Consider, the set of free states defined in (11.64); i.e.
F(AB) := uA ⊗ ρB : ρ ∈ D(A) .
(11.79)
Since Hα↑ is additive under tensor product (see Theorem 7.5.1 and Exercise 7.5.4) we conclude
that Dα (ρ∥F) = Dαreg (ρ∥F). Combining this with (11.81) gives
1 ε
Dmin ρ⊗n F ⩾ lim− Dα (ρ∥F)
lim inf
n→∞ n α→1 (11.83)
= D(ρ∥F) .
Finally, combining this with (11.73) we get that Conjecture 11.3.1 does hold for the set of
free states defined in 11.79. We summarize it in the following theorem.
Theorem 11.3.2. Let F(AB) be the set given in (11.79). Then, for all ρ ∈ D(AB)
and all ε ∈ (0, 1) we have
1 ε
Dmin ρ⊗n F = D(ρ∥F) .
lim (11.84)
n→∞ n
Observe that the combination of the above theorem with Theorem 11.3.1 implies that
for F(AB) as in (11.79) we also have
1 ε
Dmax ρ⊗n F = D(ρ∥F) .
lim (11.85)
n→∞ n
The above relation is equivalent to (11.68). Hence, Theorem 11.2.2 can be viewed as a
corollary of Theorem 11.3.2.
This quantum relative entropy plays a key role in numerous applications in quantum in-
formation theory and beyond. We already saw in the quantum Stein’s lemma that it can
be interpreted as the optimal decay rate of the type-II error exponent. Among all relative
entropies, it is the most well known, and in this section we show that the Umegaki rela-
tive entropy can be singled out as the only quantum relative entropy that is asymptotically
continuous. We will see later on in the book that this is the key reason of its “popularity”.
Following Definition 10.2.2, we say that a relative entropy D is asymptotically continuous
if there exists a continuous function f : [0, 1] → R+ such that f (0) = 0 and for all ρ, ρ′ , σ ∈
D(A), with supp(ρ) ⊆ supp(σ) and supp(ρ′ ) ⊆ supp(σ)
|D(ρ∥σ) − D(ρ′ ∥σ)| ⩽ f (ε) log ∥σ −1 ∥∞ (11.87)
where ε := 21 ∥ρ − ρ′ ∥1 , and σ −1 is the generalized inverse of σ. We emphasize that f is
independent of |A|.
Recall from Corollary 10.2.1 that the Umegaki relative entropy is shown to be asymp-
totically continuous. The theorem we are discussing asserts that no other relative entropy
possesses this property of asymptotic continuity. To substantiate this claim, we will utilize
the following lemma, which introduces a notation for any relative entropy D:
D(ε) (ρ∥σ) := ′max D(ρ′ ∥σ) . (11.88)
ρ ∈Bε (ρ)
In other words, D(ε) represents a form of smoothing, albeit using the maximum rather than
the minimum over all states that are ε-close to ρ. Consequently, unlike Dε , the function
D(ε) does not qualify as a divergence. It’s also worth noting that this specific notation was
previously used in the context of the min relative entropy in Theorem 10.4.4. Both the
lemma and this notation are crucial for our proof, as they facilitate the examination of how
relative entropies respond to minor perturbations in the state ρ.
Proof. Let ε ∈ (0, 1) and for each n ∈ N let ρ′n ∈ D(An ) be such that 21 ∥ρ′n − ρ⊗n ∥1 ⩽ ε.
Then, applying (11.87) to n copies of ρ and σ gives
1
D(ρ∥σ) − D(ρ′n ∥σ ⊗n ) ⩽ f (ε) log ∥σ −1 ∥∞ . (11.90)
n
Therefore, by taking the lim inf n→∞ or lim supn→∞ on both sides of the equation above
followed by limε→0+ completes the proof.
Proof of Theorem 11.4.1. Since the lemma above states that (11.87) implies (11.89), it is
sufficient to prove that the Umegaki relative entropy is the only relative entropy that satis-
fies (11.89). Let D(ρ∥σ) be a relative entropy satisfying (11.89). Therefore,
1 ε ⊗n ⊗n
D ρ
D(ρ∥σ) = lim+ lim inf σ
ε→0 n→∞ n
1 ε (11.91)
ρ⊗n σ ⊗n
(6.113)→ ⩽ lim+ lim inf Dmax
ε→0 n→∞ n
Conversely,
1
D(ρ∥σ) = lim+ lim sup D(ε) ρ⊗n σ ⊗n
ε→0 n→∞ n
1 (ε) (11.92)
(6.113)→ ⩾ lim+ lim sup Dmin ρ⊗n σ ⊗n
ε→0 n→∞ n
Theorem 10.4.4→ = D(ρ∥σ) .
Remark. The two definitions above are not independent of each other. Specifically, observe
that
1
Distill(ρ → σ) = . (11.95)
Cost(ρ → σ)
This relationship is consistent with the intuition that if ρ is a free state and σ is a resource
state, then Cost(ρ → σ) is equal to infinity, while Distill(ρ → σ) equals zero. This is because,
in the former case, no matter how many copies of ρ you have, they are insufficient to prepare
even a single copy of σ. In the latter case, it is impossible to distill or extract a resource
state σ from a free state ρ.
In the above definitions of cost and distillation, we did not impose any constraints on the
integers m and n. However, as intuition suggests, it is typically the case that m and n are
both very large. In fact, for any natural number a, we can include the condition n, m ⩾ a
in the aforementioned definitions without altering their value. Specifically, we argue that:
nn o
⊗n F ⊗m
Cost(ρ → σ) = lim+ inf : T ρ → − σ ⩽ε , (11.96)
ε→0 n,m∈N m
n,m⩾a
(and similarly we can add n, m ⩾ a to Distill(ρ → σ)). To see why, observe first that the
left-hand side of the equation above cannot be greater than the right-hand side since by
adding the restriction n, m ⩾ a one can only increase the infimum. To prove
that wemust
⊗n F
have equality, recall from Exercise 11.1.3 that for any such a ∈ N, if T ρ → − σ ⊗m ⩽ ε
F
then T ρ⊗na → − σ ⊗ma ⩽ aε. Therefore,
nn o
⊗na F ⊗ma
Cost(ρ → σ) ⩾ lim+ inf : T ρ →
− σ ⩽ aε
ε→0 mn,m∈N
′
′ ′ n ′
⊗n F ⊗m′
replacing na, ma with n , m → ⩾ lim+ ′ inf′ ′
: T ρ →
− σ ⩽ aε
ε→0 n ,m ∈N m (11.97)
m′ ,n′ ⩾a
′
n ′
⊗n F ⊗m ′
′
ε′ := aε −−−−→ = lim inf : T ρ →
− σ ⩽ε
ε′ →0+ n′ ,m′ ∈N
′ ′
m′
m ,n ⩾a
Exercise 11.5.1. Consider the asymptotic cost and distillable rates defined above.
1. Show that for a fixed resource state σ ∈ D(B), the function fσ (ρ) := Distill(ρ → σ) is
a resource measure.
2. Show that for a fixed resource state ρ ∈ D(B), the function gρ (σ) := Cost(ρ → σ) is a
resource measure.
Exercise 11.5.2. Let T ′ be another metric that is topologically equivalent to the trace dis-
tance T (i.e., there exists a, b > 0 such that aT ⩽ T ′ ⩽ bT ). Further, for every ρ ∈ D(A)
and σ ∈ D(B) let Distill′ (ρ → σ) and Cost′ (ρ → σ) be the distillation and cost rates obtained
by replacing the trace distance in (11.93) and (11.94)with the metric T ′ . Show that for all
ρ ∈ D(A) and all σ ∈ D(B)
Note that due to (11.95) the condition that a QRT is reversible can also be expressed as
or as
Cost(ρ → σ)Cost(σ → ρ) = 1 . (11.101)
Dreg (ρ∥F)
Distill(ρ → σ) ⩽ (11.102)
Dreg (σ∥F)
Remark. The theorem above can also be expressed in terms of the asymptotic cost rate.
Specifically, we have the bound
Dreg (σ∥F)
Cost(ρ → σ) ⩾ , (11.103)
Dreg (ρ∥F)
(11.105)
⊗mn
εn
(10.34)→ ⩾ D σ F − εn κn − (1 + εn )h
1 + εn
where κn := maxω∈D(B mn ) D(ω∥F). Dividing both sides by n and taking the limit n → ∞
yields
mn 1
Dreg (ρ∥F) ⩾ lim D σ ⊗mn F = Distill(ρ → σ)Dreg (σ∥F)
(11.106)
n→∞ n mn
Dreg (ρ∥F)
Distill(ρ → σ) ⩽ ⩽ Cost(σ → ρ) . (11.108)
Dreg (σ∥F)
Hence, if F is reversible then both the inequalities above must be equalities. This completes
the proof.
of Φ2 ) that can be distilled from each copy of ρ. For this reason the quantity Distill(ρ → Φ2 )
is called the distillable resource of ρ, and denoted by
Distill(ρ) := Distill(ρ → Φ2 ) . (11.109)
Conversely, one can use the asymptotic cost rate to quantify the cost (in resource units Φ2 )
of a resource state ρ. Specifically, the quantity
Cost(ρ) := Cost(Φ2 → ρ) (11.110)
quantifies the cost in resource units (i.e. copies of Φ2 ) that are needed to prepare each copy of
ρ. The asymptotic cost and distillation of a resource are related to their single-shot versions
as follows.
Proof. We prove the first equality and leave the second one to Exercise 11.5.5. By definition,
1 ε ⊗n
log m
F ⊗n
inf Cost ρ = inf : T Φm → − ρ ⩽ ε , n, m ∈ N
n∈N n n
k k
F ⊗n
restricting m = 2 → ⩽ inf : T Φ2k →− ρ ⩽ ε , n, k ∈ N (11.113)
n
k
⊗k F ⊗n
property of a golden unit→ = inf : T Φ2 → − ρ ⩽ ε , n, k ∈ N .
n
Hence,
1
Costε ρ⊗n
Cost(ρ) ⩾ lim+ inf
ε→0 n∈N n
(11.114)
1 ε ⊗n
See Exercise 11.5.4 below→ = lim+ inf Cost ρ ∀a∈N.
ε→0 a⩽n∈N n
Conversely, let ε, δ ∈ (0, 1) and let a ∈ N be large enough such that a1 < δ. Then,
1 1
lim inf Costε ρ⊗n ⩾ inf Costε ρ⊗n
n→∞ n a⩽n∈N n
log(m)
F
(11.116)
⊗n
by definition→ = inf : T Φm → − ρ ⩽ε .
n,m∈N n
n⩾a
k F F ⊗n
where we used the fact that m ⩽ 2 so that Φ2k →
− Φm and consequently T Φ2k →− ρ ⩽
F
T Φm →− ρ⊗n . Now, observe that for n ⩾ a we have (k − 1)/n ⩾ k/n − δ so that
1 ε ⊗n
k
F ⊗n
lim lim inf Cost ρ ⩾ lim+ inf : T Φ2k →
− ρ ⩽ε −δ
ε→0+ n→∞ n ε→0 n,k∈N n (11.118)
= Cost(ρ) − δ .
Since the above inequality holds for all δ ∈ (0, 1) we conclude that
1
Costε ρ⊗n ⩾ Cost(ρ) .
lim+ lim inf (11.119)
ε→0 n→∞ n
The two inequalities (11.115) and (11.119) then gives the desired equality (11.111).
1 1
Cost ρ⊗m ⩾ Cost(ρ) and Distill ρ⊗m ⩽ Distill(ρ) .
(11.120)
m m
Exercise 11.5.4. Let ρ ∈ D(A). Show that for any a ∈ N we have
1 1
Costε ρ⊗n = lim+ inf Costε ρ⊗n .
lim+ inf (11.121)
ε→0 n∈N n ε→0 a⩽n∈N n
Hint: Use similar arguments that were used to prove the equality in (11.96).
In many QRTs it is possible to choose the golden unit such that DFreg (Φ2 ) = 1. With this
normalization we get from Theorem 11.5.1 that
Particularly, if the QRT F is reversible then both the asymptotic cost and the asymptotic
distillation of the resource ρ equals Dreg (ρ∥F). Therefore, for reversible QRTs, the regularized
relative entropy of a resource is the unique measure of a resource in the asymptotic domain.
We make this statement rigorous in the following corollary.
Corollary 11.5.1. Let F be a reversible QRT with a golden unit {Φk }k∈N such that
D(Φ2 ∥F) = 1, and let M be a resource measure that is asymptotically continuous
and normalized such that M(Φ2 ) = 1. Then,
Exercise 11.5.6. Prove the corollary above. Hint: Follow all the lines leading to (11.122),
but with M replacing everywhere D(·∥F).
Definition 11.5.3. Let δ ∈ [0, 1] and F be a QRT. We say that a quantum channel
N ∈ CPTP(A → B) is RNGδ if it belong to the set
n o
RNGδ (A → B) := E ∈ CPTP(A → B) : Rg E(σ) ⩽ δ ∀ σ ∈ F(A) , (11.124)
We have used the global robustness in the definition above sinceit is a resource monotone
Rg E(σ) ⩽ δ implies that E(σ) is
that is faithful (see Exercise 10.2.7) so that the inequality
close to a free state. Specifically, suppose µ := Rg E(σ) ⩽ δ. Then, from (10.56) it follows
that
E(σ) = (1 + µ)τ − µω (11.125)
for some τ ∈ F(B) and ω ∈ D(B). Hence, from the above equality we get
1 1
E(σ) − τ 1 = µ(τ − ω) 1 ⩽ µ ⩽ δ . (11.126)
2 2
In other words, if Rg E(σ) ⩽ δ then E(σ) is δ-close to a free state.
Definition 11.5.4. Let F be a QRT, and for each n ∈ N let An and Bn be two
physical systems. A sequence of quantum channel {En }n∈N , with
En ∈ CPTP(An → Bn ), is said to be asymptotically RNG if there exists a sequence of
non-negative real numbers {δn }n∈N with limn→∞ δn = 0 such that for each n ∈ N,
En ∈ RNGδn (An → Bn ).
Note that in the definition above we do not specify how quickly δn goes to zero. The
main result of this section will not be effected even if we require in addition that δn goes to
zero exponentially fast with n. However, to keep the notion of asymptotically RNG in its
most generality we did not include such a condition in the definition above.
That is, the sequence {mn }n∈N is such that ρ⊗n can be converted by RNGδn to a state that
is ε-close to σ ⊗mn . Hence, the set Rε (ρ → σ) ⊂ R+ consists of all achievable conversion
rates under asymptotically RNG that tolerate an ε-error. To get the optimal distillable rate
we will have to take the limit ε → 0+ .
Exercise 11.5.7. Let ρ ∈ D(A), σ ∈ D(B), and
rε := sup r : r ∈ Rε (ρ → σ) . (11.128)
1. Show that Rε (ρ → σ) = [0, rε ]; in particular, show that the supremum in the definition
of rε can be replaced with a maximum.
Definition 11.5.5. Let F be a QRT, ρ ∈ D(A) and σ ∈ D(B). Using the notation
given in (11.128) of the exercise above, the asymptotically RNG distillable rate is
defined as
Distill(ρ → σ) := lim+ rε . (11.129)
ε→0
The condition (11.127) implies that there exists En ∈ RNGδn (An → B mn ) such that
En (ρ⊗n ) ≈ε σ ⊗mn . Moreover, the condition that limn→∞ δn = 0 implies that this sequence
of channels {En } is asymptotically RNG. Therefore, for a given ε ∈ (0, 1), we get that
ρ⊗n can be converted by RNGδn to σ ⊗mn up to an ε-error. The reason that we require
limn→∞ mnn = r instead of just sup{ mnn } = r is that the supremum can be achieved with a
finite n in which case δn may not be very small. Taking the limit n → ∞ ensures that the
RNGδ
conversion ρ⊗n −−−−→
n
σ ⊗mn (up to an ε-error) is achieved with a very small δn .
Exercise 11.5.8. Show that if in the definition of Rε (ρ → σ) we require sup{ mnn } = r
instead of limn→∞ mnn = r then we will get that Distill(ρ → σ) = ∞. Hint: Let n0 be a large
integer and take δn = n0 for n ⩽ n0 and δn = 0 if n > n0 .
For simplicity of the notation, we did not include a subscript in Distill(ρ → σ) to indi-
cate that the asymptotic distillable rate is calculated with respect to asymptotically RNG
operations. Similarly, we denote by Cost(ρ → σ) = 1/Distill(ρ → σ) the asymptotic cost
rate of ρ into σ under asymptotic RNG operations.
Towards Reversibility
In this book, we will restrict our attention to QRTs that meet the following condition:
1
κ(A) := lim sup max D(ω∥F) < ∞ . (11.130)
n→∞ n ω∈D(An )
It’s worth noting that this assumption is extremely lenient and is fulfilled by the majority, if
not all, of the QRTs discussed in the existing literature. In fact, for many QRTs κ(A) = 0.
Theorem 11.5.2. For any ρ ∈ D(A) and σ ∈ D(B), the asymptotic distillable rate
of ρ into σ under asymptotic RNG is bounded by
Dreg (ρ∥F)
Distill(ρ → σ) ⩽ . (11.131)
Dreg (σ∥F)
Remark. The theorem above does not follow from Theorem 11.5.1 since Distill(ρ → σ) is
calculated with respect to asymptotically RNG operations. Since these operations allow
for the generation of a resource (although small amount that vanishes asymptotically), the
proof of Theorem 11.5.1 cannot be applied directly, and a revised version is necessary to
accommodate this case.
Dreg (ρ∥F)
Distill(ρ → σ) > + 2δ (11.132)
Dreg (σ∥F)
for some small positive δ. By definition, this means in particular that for sufficiently small
ε ∈ (0, 1) there exists r ∈ Rε (ρ → σ) such that
Dreg (ρ∥F)
r> +δ . (11.133)
Dreg (σ∥F)
Since r ∈ Rε (ρ → σ) there exists a sequence {mn }n∈N ⊂ N satisfying both r = limn→∞ mnn
and (11.127). From (11.127) it follows that there exists En ∈ RNGδn (An → B mn ) such that
En ρ⊗n ≈ε σ ⊗mn .
(11.134)
where
cn := max D(ω∥F) . (11.136)
ω∈D(B mn )
1
Dreg (σ∥F) ⩽ lim D En ρ⊗n F + κ(B)ε .
(11.137)
n→∞ mn
D ρ⊗n ωn = D ρ⊗n F .
(11.138)
mn 1
lim =r −−−−→ = Dreg (ρ∥F) + κ(B)ε .
n→∞ n r
Dreg (ρ∥F)
However, since r > Dreg (σ∥F)
+ δ for sufficiently small ε ∈ (0, 1) we get the contradiction
1
Dreg (σ∥F) ⩽ Dreg (ρ∥F) + κ(B)ε
r (11.142)
Exercise 11.5.9→ < Dreg (σ∥F) .
Recall from Eqs. (10.85) (see also (11.73)) and (10.82) that for all ε ∈ (0, 1)
ε,reg
D1reg
− (ρ∥F) ⩽ Dmin (ρ∥F) ⩽ D
reg
(ρ∥F) . (11.147)
If the generalized quantum Stein’s lemma is valid, then the upper bound simplifies to an
equality.
Theorem 11.5.3. For any ρ ∈ D(A), σ ∈ D(B), and ε ∈ (0, 1), the asymptotic
distillable rate of ρ into σ under asymptotic RNG is bounded by
ε,reg
D (ρ∥F) D1reg
− (ρ∥F)
Distill(ρ → σ) ⩾ lim+ min
reg
⩾ reg
. (11.148)
ε→0 D (σ∥F) D (σ∥F)
ε,reg
Proof. The second inequality follows from the fact that Dmin (ρ∥F) ⩾ D1reg − (ρ∥F) for all
and let r be a positive number satisfying r < a/Dreg (σ∥F). Our goal is to prove that
Distill(ρ → σ) ⩾ r. For this purpose, we fix ε ∈ (0, 1), and denote by mn := ⌈nr⌉ so that
limn→∞ mnn = r. We need to construct a sequence of channels {En }n∈N with the following
two properties:
1. For sufficiently large n ∈ N, the channel En ∈ RNGδn (An → B mn ) with δn := 2−nδ (for
some δ > 0). Hence, the sequence {En }n∈N is asymptotically RNG.
Note that from the definition of Distill(ρ → σ), if for any choice of ε ∈ (0, 1) there exists
a sequence {En }n∈N that satisfies the above two conditions then we must have Distill(ρ →
σ) ⩾ r.
The idea behind the construction of the channels {En }n∈N is to try to achieve the rate r
with a (two-outcome) measurement-prepare channels of the form
n
En (η) := Tr [Λn η] σn + Tr I A − Λn η ωn ∀ η ∈ L(An ) ,
(11.150)
for some σn , ωn ∈ D(B mn ) and some Λn ∈ Eff(An ). We therefore need to check if there
exists σn , ωn and Λn that satisfy both En (ρ⊗n ) ≈ε σ ⊗mn and En ∈ RNGδn (An → B mn ). Note
that if we choose Λn such that Tr[Λn ρ⊗n ] is close to one then En (ρ⊗n ) will be close to σn .
Therefore, if σn is close to σ ⊗mn we will get in this case that En (ρ⊗n ) is also close to σ ⊗mn .
We take ωn ∈ F (B mn ) to be any free density matrix, and define now Λn and σn .
Combining the two equations above implies that the optimal effect Λn satisfies (for
and δ > 0 and sufficiently large n ∈ N)
ε
maxn Tr [Λn τn ] ⩽ 2−n(a−δ) and Tr[ρ⊗n Λn ] = 1 − . (11.153)
τn ∈F(A ) 2
The intuition behind this choice is that besides of being ε/2-close to σ ⊗mn , the density
matrix σn does not have “too much” robustness. To see why, recall first that from
Lemma 11.2.1 it follows that
1 ε/2
Dreg (σ∥F) ⩾ lim sup Dmax σ ⊗mn F
n→∞ mn
1
= lim sup Dmax (σn ∥F) (11.155)
n→∞ mn
mn 1 1
lim = r −−−−→ = lim sup Dmax (σn ∥F) .
n→∞ n
r n→∞ n
Now, since the inequality r < a/Dreg (σ∥F) is strict, there exists δ > 0 sufficiently small
such that r < (a − 2δ)/Dreg (σ∥F), or equivalently
rDreg (σ∥F) < a − 2δ . (11.156)
Hence, by combining the two equations above we get that for sufficiently large n
Dmax (σn ∥F) ⩽ n (a − 2δ) . (11.157)
To show that for these choices the channel En is RNGδn , let η ∈ F(An ) be a free state,
and denote by tn := Tr [Λn η] and rn := Rg (σn ). Then, from the convexity of the global
robustness we get
Rg En (η) ⩽ tn Rg (σn ) + (1 − tn )Rg (ωn )
(11.159)
ωn ∈ F (B mn ) −−−−→ = tn rn ⩽ tn (1 + rn ) .
2. Show that the right-hand side of (11.162) is larger than the bound on R(σn ) given
in (11.158).
As a second example, consider the QRT consisting of conditional unital channels. In this
QRT the free states are given by
F(AB) = uA ⊗ σ B : σ ∈ D(B)
(11.167)
Since this QRT is also an affine QRT it has a self-adjoint resource destroying channel given
by
∆AB→AB ω AB := uA ⊗ ω B
∀ ω ∈ L(AB) . (11.168)
Also in this QRT the α-relative entropy of a resource is additive so that we get reversibility.
In particular, for this QRT we have
log |A| − H(A|B)ρ
Distill ρAB → σ AB = . (11.169)
log |A| − H(A|B)σ
Entanglement Theory
525
CHAPTER 12
Pure-State Entanglement
Quantum Entanglement
Definition 12.1.1. Entanglement is a characteristic of a composite physical system
that cannot be created or enhanced through local (quantum) operations and classical
communication (LOCC).
This definition precisely captures the intuition that entanglement is a quantum property
of a composite system that corresponds to correlations that are not classical. Historically,
this intuition led many researchers to associate entanglement with the non-local correlations
exhibited by composite physical systems. These correlations find expression in the proba-
527
528 CHAPTER 12. PURE-STATE ENTANGLEMENT
Note that we do not impose any constraint on the classical system Y , only that it is finite
dimensional. Setting n := |Y |, an LOCC1 channel can be expressed as
′ ′ B→B ′ ′
X
F BY →B ◦ E A→A Y = F(y) ⊗ EyA→A . (12.2)
y∈[n]
′ ′
Here, for every y ∈ [n],
P the operation F(y) ∈ CPTP(B → B ), and Ey ∈ CP(A → A ).
Furthermore, the sum y∈[n] Ey is trace preserving.
By incorporating an additional round of communication from Bob to Alice, we obtain
channels in LOCC2 . Specifically, a channel N ∈ LOCC2 (AB → A′ B ′ ) can be expressed as
follows:
′ ′ ′
N AB→A B = E1A1 X1 →A ◦ F BY1 →B1 X1 ◦ E0A→A1 Y1 , (12.3)
where A1 represents an additional system on Alice’s side, E0 and E1 are channels on Alice’s
side, and F is a channel on Bob’s side. It’s important to note that without the second round
of communication, which corresponds to the case when |X1 | = 1, the description reverts to
a channel in LOCC1 .
Exercise 12.1.1. Show that if |X1 | = 1 then the channel in (12.3) belongs to LOCC1 .
In the same fashion, one can continue and express the most general protocol in LOCCn .
Clearly, from the construction above it is obvious that the expression of LOCC protocols
can be very complicated particularly if it involves a large number of classical communication
rounds (see Fig. 12.1). Moreover, it is also known that LOCCn is a strict subset of LOCCn+1
for all n ∈ N. Due to this notorious complexity of LOCC, and despite the enormous body
of work in recent years on the study of LOCC, there are still many open problems in entan-
glement theory. For this reason, it is sometimes convenient to consider a slightly larger class
of operations that contains LOCC and have a simpler characterization. We will consider in
the next chapter two such sets of operations known as the separable set and the PPT set.
However, as we will see in this chapter, the complexity of LOCC is reduced dramatically
when the bipartite system is initially in a pure state.
Figure 12.1: An LOCC operation. The double purple lines represents classical communication
between the parties.
1. Show that
LOCC(AB → A′ B ′ ) ⊆ SEP(AB → A′ B ′ ) . (12.5)
where {|ψx ⟩A }x∈[d] and {|ϕx ⟩B }x∈[d] are orthonormal bases of A and B, respectively, ΛA =
√ √
Diag( p1 , . . . , pd ) is a diagonal matrix in the basis {|ψx ⟩A }x∈[d] , and |ΩAB ⟩ = x∈[d] |ψxA ⟩⊗
P
|ϕB
x ⟩ is an (unnormalized) maximally entangled state.
We consider now the effect of a local measurement on Bob side. For this purpose, let N
be some d × d complex matrix, and note that
I A ⊗ N |ψ AB ⟩ = (Λ ⊗ N ) |ΩAB ⟩ = ΛN T ⊗ I B |ΩAB ⟩ .
(12.8)
Exercise 12.2.1. Show that the matrix ΛN T has same singular values as N Λ. Hint: Show
that for any square matrix C, the matrices C and C T have the same singular values.
Since ΛN T and N Λ are two square matrices with the same singular values, it follows from
the singular value decomposition that there exists two unitary matrices U and V such that
ΛN T = U N ΛV T (12.9)
where the transpose on V is for convenience. Substituting this into (12.8) gives
I A ⊗ N |ψ AB ⟩ = U N ΛV T ⊗ I B |ΩAB ⟩
= (U N Λ ⊗ V ) |ΩAB ⟩ (12.10)
= (U N ⊗ V ) |ψ AB ⟩ .
That is, the vectors I A ⊗ N |ψ AB ⟩ and N ⊗ I B |ψ AB ⟩ are equivalent up to the local
unitary map U ⊗ V . This observation leads to the following result.
Lo-Popescu’s Theorem
Theorem 12.2.1. The effect of any LOCC map on a pure bipartite state can be
simulated by the following protocol: Alice performs a generalized quantum
measurement {Myx }x,y , sends the result (x, y) to Bob who then performs a local
unitary map Vyx on his system, and in the final step, Alice and Bob discard the value
of y.
y∈[n]
X (12.11)
∗ ∗ ∗
(Uyx Nyx ⊗ Vyx ) ψ AB Nyx
(12.10)→ = Uyx ⊗ Vyx
y∈[n]
where we used (12.10) for each x ∈ [m] and y ∈ [n], with Uxy and Vxy being unitary matrices.
Denoting by Myx := Uyx Nyx the above equation becomes
X
∗ ∗
ExB→B (ψ AB ) = (Myx ⊗ Vyx ) ψ AB Myx
⊗ Vyx . (12.12)
y∈[n]
∗
Myx = I A we conclude that any quantum instrument that
P P
Moreover, since x∈[m] y∈[n] Myx
is performed by Bob, can be simulated with the following protocol: Alice performs a gen-
eralized quantum measurement {Mxy }x∈[m],y∈[n] , sends the outcome (x, y) to Bob, who then
performs a unitary matrix Vxy . At the end of the protocol, Alice and Bob discard or forget
the value of y. Therefore, in any LOCC protocol, all the local quantum instruments on Bob’s
side can be simulated with unitaries and measurements on Alice’s side. Since a sequence
of quantum instruments (generalized measurements) on Alice’s side can be combined into a
single generalized measurement (followed by coarse graining, i.e. discarding of information),
we conclude that the most general LOCC protocol on a pure bipartite state can be simu-
lated with a single generalized measurement by Alice’s side followed by a unitary on Bob’s
side that depends on Alice’s measurement outcome, and ends with the discarding of partial
information of the measurement outcome.
Exercise 12.2.2. Show that a sequence of two generalized measurements can be viewed as
a single generalized measurement. That is, given two generalized measurement {Mx }x∈[m]
and {Ny }y∈[n] show that the set of matrices {Lxy := Mx Ny }x∈[m],y∈[n] is also a generalized
measurement.
Exercise 12.2.3. Let ψ ∈ Pure(AB) and σ ∈ D(AB). Show that if there exists a deter-
LOCC
ministic LOCC protocol that converts ψ AB to σ AB , i.e. ψ AB −−−→ σ AB , then there exists a
set {Mx }x∈[m] of complex matrices in L(A), and a set {Ux }x∈[m] of unitary matrices in L(B)
such that X
σ AB = (Mx ⊗ Ux ) ψ AB (Mx ⊗ Ux )∗ . (12.13)
x∈[m]
σ AB = U BX→B ◦ E A→AX ψ AB .
(12.14)
Theorem 12.2.1 can be simplified further if we consider only LOCC protocols that take
pure bipartite states to pure bipartite states. In this case, any LOCC transformation can
be simulated by the following simple protocol: Alice performs a generalized measurement
{Mx }, sends the outcome x to Bob, who then performs a local unitary operation Vx . This
simplification of LOCC will be crucial for the study of pure-state entanglement theory.
Exercise 12.2.4. The Schmidt rank of a pure bipartite state is defined as the number of
non-zero Schmidt coefficients; for example, the Schmidt rank of the state given in (12.7) is
the rank of the matrix ΛA . We denote the Schmidt rank of a bipartite state ψ ∈ Pure(AB)
by SR(ψ). Show that for two bipartite states ψ, ϕ ∈ Pure(AB) with SR(ϕ) > SR(ψ) it is
impossible to convert ψ to ϕ by LOCC (not even with probability less than one).
where {|ψx ⟩A }x∈[d] and {|ϕx ⟩B }x∈[d] are orthonormal bases of A and B, respectively. Let U
and V be unitary matrices such that U |ψx ⟩A = |x⟩A and V |ϕx ⟩B = |x⟩B , where {|x⟩A } and
{|x⟩B } are the standard bases of A and B, respectively. Hence,
X√
U ⊗ V |ψ AB ⟩ = px |xx⟩AB . (12.16)
x∈[d]
Note also that by applying additional local permutations (which are unitaries) to the state
above we can rearrange the order that the Schmidt coefficients. Therefore, there exist unitary
matrices U ′ ∈ L(A) and V ′ ∈ L(B) such that
X√
|ψ̃ AB ⟩ := U ′ ⊗ V ′ |ψ AB ⟩ = px |xx⟩AB and p1 ⩾ p2 ⩾ · · · ⩾ pd . (12.17)
x∈[d]
The above form is called the standard form of |ψ AB ⟩. Note that |ψ AB ⟩ can be converted by
LOCC to another state |ϕAB ⟩ if and only if the standard form of |ψ AB ⟩ can be converted by
LOCC to the standard form of |ϕAB ⟩ (see Fig. 12.2). Therefore, without loss of generality
we will assume here that both |ψ AB ⟩ and |ϕAB ⟩ are given in their standard form.
Figure 12.2: LOCC maps between two pure bipartite states and their standard forms.
The next theorem provides a connection between LOCC conversions and majorization.
For any two density matrices ρ, σ ∈ D(A), we will say that ρ majorizes σ, and write ρ ≻ σ,
if the probability vectors p and q, consisting, respectively, of the eigenvalues of ρ and σ,
satisfy p ≻ q.
Proof. From the argument above we can assume without loss of generality that both ψ AB
and ϕAB are given in their standard forms. Furthermore, Exercise 12.2.4 implies that we
can assume without loss of generality that supp(σ A ) ⊆ supp(ρA ). This in turn implies that
we can assume without loss of generality that ρA > 0 since otherwise we embed both |ψ AB ⟩
and |ϕAB ⟩ in supp(ρA ) ⊗ B.
Form Theorem 12.2.1 any LOCC map that takes pure bipartite state |ψ AB ⟩ to another
pure bipartite state |ϕAB ⟩ can be simulated by 1-way LOCC (i.e. LOCC1 ) of the following
form: Alice performs a single generalized measurement {Mz }z∈[m] on her system, sends the
measurement outcome z to Bob, who then performs the unitary Vz . Therefore, after outcome
z occurred, the post-measurement state is given by
1
√ Mz ⊗ Vz |ψ AB ⟩ , (12.19)
tz
where tz := ⟨ψ AB |Mz∗ Mz ⊗ I B |ψ AB ⟩ is the probability that Alice’s measurement outcome
is z. Hence, if |ψ AB ⟩ can be converted by LOCC to |ϕAB ⟩ with 100% success rate, then
there must exist a generalized measurement {Mz }z∈[m] and a collection of unitary matrices
{Vz }z∈[m] such that
1
√ Mz ⊗ Vz |ψ AB ⟩ = |ϕAB ⟩ ∀ z ∈ [m] . (12.20)
tz
Since we assume without loss of generality that both |ψ AB ⟩ and |ϕAB ⟩ are given in their
standard form, we have
X√ √
|ψ AB ⟩ = px |xx⟩AB = ρ ⊗ I B |ΩAB ⟩
x∈[d]
X√ √ (12.21)
AB
|ϕ ⟩= qx |xx⟩AB = σ ⊗ I B |ΩAB ⟩
x∈[d]
where
Pρ and σ are, Arespectively,
Pthe reduced density matrices of |ψ AB ⟩ and |ϕAB ⟩. Explicitly,
ρ = x∈[d] px |x⟩⟨x| and σ = x∈[d] qx |x⟩⟨x|A . Substituting this into (12.20) gives
1 √ √
√ (Mz ρ ⊗ Vz ) |ΩAB ⟩ = σ ⊗ I B |ΩAB ⟩ .
(12.22)
tz
From Exercise 2.3.26 it follows that the above equation hold if and only if
1 √ √
√ Mz ρVzT = σ (12.23)
tz
T
Note that the matrix Uz := (Vz−1 ) is unitary. With this notation, the above equation is
equivalent to
√ √
Mz = tz σUz ρ−1/2 . (12.24)
The only constraint on Mz is that z∈[m] Mz∗ Mz = I A . We therefore conclude that |ψ AB ⟩
P
can be converted to |ϕAB ⟩ by LOCC if and only if there exists m ∈ N, unitary matrices
{Uz }z∈[m] , and probabilities {tz }z∈[m] such that
X
tz ρ−1/2 Uz∗ σUz ρ−1/2 = I A , (12.25)
z∈[m]
or equivalently,
X
ρ= tz Uz∗ σUz . (12.26)
z∈[m]
In other words, |ψ AB ⟩ can be converted to |ϕAB ⟩ by LOCC if and only if there exists a
mixture of unitaries that transforms the reduced density matrix of |ϕAB ⟩ to the reduced
density matrix of |ψ AB ⟩. Observe that such a random unitary channel is a unital channel. In
Section 3.5.9 we showed that if ρ = E(σ), with E being a unital channel, then there exists a
doubly stochastic matrix D such that p = Dq (recall that p and q are the probability vectors
whose components consist of the eigenvalues of ρ and σ, respectively). From Theorem 4.1.1
it then follows that q ≻ p or equivalently, σ ≻ ρ. We therefore conclude that if ψ AB can be
converted to ϕAB by LOCC then we must have σ ≻ ρ.
Conversely, if σ ≻ ρ then from Theorem 4.1.1 we have p = Dq for some doubly stochastic
matrix D. From Birkhoff/von-Neumann theorem (see Theorem A.5.1) every doubly stochas-
tic matrix can we written as a convex combination of permutation matrices. Therefore, there
therePexists m ∈ N, permutation matrices {Πz }z∈[m] , and probabilities {tz }z∈[m] , such that
p = z tz Πz q. In Exercise 12.2.5 you will show that this relation can be expressed as
X
ρ= tz Πz σΠTz . (12.27)
z∈[m]
Exercise 12.2.5. Show that if ρ is a diagonal matrix, and p is the vector consisting of
its diagonal elements, then Πz ρΠTz is also a diagonal matrix,
P with the diagonal elements
given by the components of Πz p. Use this to show that ρ = z∈[m] tz Πz σΠTz if and only if
P
p = z∈[m] tz Πz q.
Exercise 12.2.6. Show that the maximally entangled state |ΦAB ⟩ := √1d x∈[d] |xx⟩ can be
P
converted by LOCC to any other state in Pure(AB). Moreover, show that any state |ψ⟩ ∈ AB
can be converted by LOCC to any product state of the form |ϕ⟩|χ⟩ ∈ AB.
Exercise 12.2.7. For any bipartite state ψ ∈ Pure(AB), and for any k = 1, . . . , d, define
X
E(k) (ψ AB ) := 1 − p↓x = 1 − ∥p∥(k) , (12.28)
x∈[k]
where p is the Schmidt vector of |ψ AB ⟩, and ∥p∥(k) is the Ky Fan norm of p (cf. Defini-
tion 2.3.2). Show that Nielsen Majorization Theorem can be expressed as
LOCC
ψ AB −−−−→ ϕAB ⇐⇒ E(k) (ψ AB ) ⩾ E(k) (ϕAB ) ∀ k ∈ [d] . (12.29)
1. Prove the following theorem, assuming that Alice and Bob share the state ψ AB (but no
other entangled systems). Theorem. Faithful teleportation of a d-dimensional qudit
is possible if, and only if,
where pmax is the largest Schmidt coefficient of ψ AB . That is, teleportation is possible
if, and only if, none of the Schmidt coefficients are greater than 1/d. This also implies
that the Schmidt rank m is greater than or equal to d. Hint: Use Nielsen majorization
theorem.
2. Find a protocol for faithful teleportation of a qubit from Alice’s lab to Bob’s lab as-
suming Alice and Bob share the partially entangled state
1 1 1
|ψ AB ⟩ = √ |0⟩A |0⟩B + |1⟩A |1⟩B + |2⟩A |2⟩B . (12.31)
2 2 2
In particular, determine the projective measurement performed by Alice and the unitary
operators performed by Bob. What is the optimal classical communication cost? That
is, how many classical bits Alice has to send to Bob?
Exercise 12.2.9. Consider m bipartite states {ψzAB }z∈[m] in Pure(AB). Find an optimal
state ϕAB ∈ Pure(AB) such that:
1. The state ϕAB can be converted by LOCC to ψzAB , for all z ∈ [m].
2. If another state, χ ∈ Pure(AB), can be converted by LOCC to ψzAB , for all z ∈ [m],
then χAB can also be converted to ϕAB . That is, ϕAB is optimal.
to temporarily use an entangled state during their LOCC protocols. The condition here is
that they must return the entangled systems in their original state at the end of the protocols.
At first glance, it might seem that borrowing an entangled system wouldn’t provide any
advantage for tasks that cannot be accomplished with standard LOCC. However, as we will
now demonstrate, Nielsen’s majorization theorem reveals that this entanglement-assisted
LOCC (eLOCC) actually represents a significantly broader set of operations compared to
LOCC alone.
We start with the following example. Consider the two entangled states
p p p p
|ψ AB ⟩ = 2/5|00⟩ + 2/5|11⟩ + 1/10|22⟩ + 1/10|33⟩
p p p (12.32)
|ϕAB ⟩ = 1/2|00⟩ + 1/4|11⟩ + 1/4|22⟩ .
The Schmidt probability vectors associated with the two states above are given by
T T
2 2 1 1 1 1 1
p := , , , and q = , , ,0 , (12.33)
5 5 10 10 2 4 4
respectively. In Exercise 4.4.1, you confirmed that neither p majorizes q nor q majorizes p,
symbolized as p ̸≻ q and q ̸≻ p. Consequently, Nielsen’s majorization theorem implies that
neither ψ AB can be converted to ϕAB , nor can ϕAB be converted to ψ AB using LOCC. Now,
consider the state
′ ′
p p
|χA B ⟩ = 3/5|00⟩ + 2/5|11⟩ , (12.34)
and let its Schmidt vector be denoted by r := (3/5, 2/5)T . Interestingly, it is easy to verify
that
q⊗r≻p⊗r. (12.35)
Therefore, according to Nielsen’s theorem, the transformation
′ ′ LOCC ′ ′
|ψ AB ⟩ ⊗ |χA B ⟩ −−−−→ |ϕAB ⟩ ⊗ |χA B ⟩ (12.36)
is achievable with a 100% success rate. In this context, the state χAB functions as a catalyst
for the conversion of ψ AB into ϕAB , and thus is referred to as an entanglement catalyst.
Exercise 12.2.10. Show that there is no entanglement catalyst if both ψ AB and ϕAB have
Schmidt rank 3.
Exercise 12.2.11. Show that the maximally entangled state cannot act as a catalyst in any
eLOCC conversions that are not possible by LOCC.
Entanglement catalysis motivates the definition of a new partial order between probability
vectors that we studied in Sec. 4.4 and is called the trumping relation. Recall that for any
p, q ∈ Prob(n) we say that q trumps p and write
q ≻∗ p , (12.37)
Exercise 12.2.12. Show that any Schur concave function f : Prob(n) → R that is additive
under tensor product, behaves monotonically under the trumping relation. That is, for any
p, q ∈ Prob(n) we have
q ≻∗ p ⇒ f (p) ⩾ f (q) . (12.38)
A well known family of functions that behaves monotonically under the trumping relation
are the Rényi entropies. The set of functions
(
sign(α)
log x∈[m] pαx if 0 ̸= α ∈ [−∞, ∞]
P
1−α
fα (p) := (12.39)
− log(p1 · · · pm ) if α = 0 .
satisfies the monotonicity under the tramping relation and additivity. For α ⩾ 0 (i.e. fα =
Hα is the Rényi entropy) these functions are entropy functions as they also satisfy the
normalization condition that they are zero for p = (1, 0, . . . , 0). For α ⩽ 0 they are defined
to be −∞ if p ̸> 0. Such functions are not entropy functions since they do not satisfies the
normalization condition, however, they are useful in the characterization of the tramping
relation. Note that any convex combination of the functions above is also additive and
monotonic under the trumping relation.
Exercise 12.2.13. Show that the condition (4.182) of Theorem 4.4.1 is equivalent to the
condition that
fα (p) > fα (q) ∀ α ∈ [−∞, ∞] , (12.40)
where fα is defined in (12.39)
For each α ∈ [−∞, ∞] and ψ ∈ Pure(AB) define
Eα ψ AB := fα (p)
(12.41)
where p is the Schmidt probability vector of ψ AB , and fα is defined in (12.39). We will see
later on that the functions {Eα }α are measures of entanglement on pure states. From Theo-
rem 4.4.1 it follows that these functions can be used to characterize eLOCC transformations.
be quantified using functions that are monotonic under LOCC. In this section, we focus on
pure state entanglement and consider a measure of entanglement to be a function
[
E: Pure(AB) → R (12.43)
A,B
where U and V are d × d unitary matrices, {px }x∈[d] are the Schmidt coefficients of ψ AB ,
and {|x⟩A } and {|x⟩B } are fixed bases of A and B. By definition, since the LOCC map
U ⊗ V is reversible (having an LOCC inverse U ∗ ⊗ V ∗ ), we must have for any measure of
entanglement on pure states:
E ψ AB = E ψ̃ AB = f (p) ,
(12.45)
The entropy of entanglement is arguably the most important measure of pure-state entan-
glement with several operational interpretations. It is detonated by E and defined for any
ψ ∈ Pure(AB) by
E ψ AB := H(p) ,
(12.47)
where H is the Shannon entropy and p is the Schmidt probability vector of ψ AB . We will
see in the next sections that the entropy of entanglement equals both the entanglement cost
and the distillable entanglement in the asymptotic regime.
C ψ AB = ⟨ψ̄ AB |σy ⊗ σy |ψ AB ⟩
(12.51)
where ψ̄ AB is defined such that if |ψ AB ⟩ = x,y∈{0,1} cxy |x⟩|y⟩ then |ψ̄ AB ⟩ = x,y∈{0,1} c̄xy |x⟩|y⟩.
P P
where Z is a ‘flag’ system registering the value z. In this view, the LOCC protocol converted
the state ψ AB to the cq-state σ ZAB . The question we study here is to which cq-states the
pure state ψ AB can be transformed into by LOCC. Since the output state σ ZAB is not a pure
state, we cannot apply Nielsen majorization theorem. Yet, as we show now, Nielsen theorem
is imperative to answer such a question.
Proof. Let p be the Schmidt probability vector associated with ψ AB . For each z ∈ [n], define
qz := (q1|z , . . . , qd|z )T as the Schmidt probability vector associated with ϕAB
z . We can assume
↓ ↓
without loss of generality that p = p and qz = qz , since the order of the Schmidt vectors
can always be rearranged by applying local unitary (permutation) maps to |ψ AB ⟩ and |ϕAB z ⟩.
Also, denote by
X X√
q := tz qz and by |ϕAB ⟩ := qx |xx⟩AB , (12.54)
z∈[n] x∈[d]
the bipartite state whose Schmidt vector is q. Since qz = q↓z for all z ∈ [n], it follows that
q = q↓ as well. The components of q are thus given by
X
qx = tz qx|z ∀ x ∈ [d] . (12.55)
z∈[n]
Consequently, for each k ∈ [d], the entanglement measure E(k) for ϕAB is given by
d
X
AB
E(k) (ϕ )= qx
x=k+1
X d
X
(12.55)→ = tz qx|z (12.56)
z∈[n] x=k+1
X
= tz E(k) (ϕAB
z ) .
z∈[n]
Hence, from Nielsen majorization theorem as expressed in Exercise 12.2.7, it follows that (12.53)
holds if and only if ψ AB can be converted to ϕAB by LOCC. Therefore, to complete the proof
AB AB
we now show that ϕ can be converted by LOCC to the ensemble ϕz , tz z∈[n] . The
conversion is achieved by the following single measurement performed by Alice
r
X tz qx|z
Mz := |x⟩⟨x|A (12.57)
qx
x∈[d]
Remark. Note that for k = 1, E1 (ψ AB ) = 1 for all ψ ∈ Pure(AB). Therefore, the expression
on the right-hand side of the equation above can never exceed one. Note further that the
corollary above is a simplification of the formal result given in Corollary 11.1.1. That is, for
pure bipartite states it is sufficient to check the ratios of only d resource measures in order
to compute the maximum probability of conversion.
Proof. Consider an optimal LOCC that converts ψ AB to ϕAB with the maximum possible
AB AB LOCC AB
probability. Such LOCC protocol yields ϕ with probability p := Pr ψ −−−→ ϕ
and other states with probability 1 − p. All such other states can always be converted
deterministically (by LOCC) to the product state |0⟩⟨0|A ⊗ |0⟩⟨0|B . Therefore, without
loss of generality we can assume that ψ AB is converted to ϕAB with probability p, and to
|0⟩⟨0|A ⊗ |0⟩⟨0|B with probability 1 − p. Since Ek (|0⟩⟨0|A ⊗ |0⟩⟨0|B ) = 0 for all k = 2, . . . , d,
Theorem 12.4.1 implies that such an LOCC protocol is possible if and only if
E(k) (ψ AB ) ⩾ pE(k) (ϕAB ) ∀ k ∈ {2, . . . , d} . (12.61)
The proof is concluded by recognizing that the equation above is equivalent to
E(k) (ψ AB )
p⩽ min . (12.62)
k∈{2,...,d} E(k) (ϕAB )
where the maximum is over all pure-stateP decompositions of σ (i.e. over all pure-state
ensembles {px , ϕx }x∈[m] that satisfy σ = x∈[m] px ϕx ).
Proof. Suppose the condition in (12.63) P holds. Then, there exists an ensemble of states
AB AB AB
{px , ϕx }x∈[m] such that E(k) (ψ ) ⩾ x∈[m] px E(k) (ϕx ) for all k ∈ [d]. From Theo-
rem 12.4.1 it follows that ψ AB can be converted by LOCC to the cq-state
X
σ XAB = px |x⟩⟨x|X ⊗ ϕABx . (12.64)
x∈[m]
LOCC
Since tracing out the classical system X is an LOCC operation we conclude that ψ AB −−−→
σ AB .
LOCC
Conversely, suppose ψ AB −−−→ σ AB . From Theorem 12.2.1 it follows that there exists a
generalized measurement on Alice’s system {Mx }x∈[m] and a set of unitary matrices on Bob’s
system {Ux }x∈[m] such that
X
σ AB = (Mx ⊗ Ux ) ψ AB (Mx ⊗ Ux )∗ . (12.65)
x∈[m]
Denote by |ϕAB 1
x ⟩ := √px (Mx ⊗ Ux ) |ψ
AB
⟩, where px := ψ AB Mx∗ Mx ⊗ I B ψ AB . Then, from
the equation above we get that the ensemble {px , ϕABx }x∈[m] form a pure state decomposition
AB AB
of σ . Moreover, by definition, ψ can be converted by LOCC to ϕAB x with probability
px . Therefore, from Theorem 12.4.1 it follows that
n X o
min E(k) (ψ AB ) − px E(k) (ϕAB
x ) ⩾0. (12.66)
k∈[d]
x∈[n]
LOCC
It’s important to note that the value of T (ψ AB −−−→ ϕAB ) is not greater than that of
LOCC
P⋆ (ψ AB −−−→ ϕAB ). This is because restricting σ AB to be a pure state in the calculation of
LOCC
T (ψ AB −−−→ ϕAB ) can only increase the minimum value obtained in the optimization process.
Furthermore, as we will explore in the next chapter (specifically, in Lemma 13.3.1), it will
LOCC LOCC
be shown that T (ψ AB −−−→ ϕAB ) can actually be strictly smaller than P⋆ (ψ AB −−−→ ϕAB ).
However, we will soon discover that the value of P⋆ remains unchanged even when the
optimization is extended from Pure(AB) to the full set of density matrices D(AB).
The optimization problem in (12.68) can be simplified as follows. Initially, observe that
if p, r ∈ Prob(d) are the Schmidt vectors of ψ AB and φAB , respectively, then according to
LOCC
Nielsen’s theorem, the condition ψ AB −−−→ φAB is equivalent to r ≻ p. Thus, for any given
Schmidt vector r of φAB , we first perform the optimization over all states φ ∈ Pure(AB)
AB
P √
with the same Schmidt vector r. Denoting by |φ̃ ⟩ = x∈[d] rx |xx⟩, this is equivalent to
optimization over the local unitaries U ⊗ V , such that |φAB ⟩ = U ⊗ V |φ̃AB ⟩. Due to the
relationship between trace distance and fidelity, we have
r
min P ϕAB , φAB = 1 − max |⟨ϕAB |φAB ⟩|2 (12.69)
U,V ∈U(d) U,V ∈U(d)
Denoting by |ϕAB ⟩ = N ⊗ I B |ΩAB ⟩ and by D the diagonal matrix with diagonal r, we obtain
where q ∈ Prob(d) is the Schmidt vector of ϕAB . Taking everything into consideration we
obtain the following simplification:
n o
AB LOCC AB
P⋆ ψ −−−→ ϕ = min P (q, r) : r ≻ p , (12.71)
r∈Prob(d)
where p is the Schmidt vector of ψ AB , q the Schmidt vector of ϕAB , and P (q, r) :=
p
1 − F 2 (q, r) is the purified distance between probability vectors.
Observe that we added the subscript ⋆ to P⋆ since the conversion distance between two
pure states ψ, ϕ ∈ Pure(AB) as measured by the purified distance is defined as:
n o
AB LOCC AB AB AB AB LOCC AB
P ψ −−−→ ϕ := min P ϕ ,σ : ψ −−−→ σ . (12.72)
σ∈D(AB)
At first glance, this conversion distance may seem to be different than P⋆ , however, the
following theorem demonstrate the two are equal.
follows trivially since restricting σ AB in (12.72) to be a pure state φAB can only increase
the quantity. To prove the opposite inequality let σ AB be an optimizer of (12.72). Since
LOCC
ψ AB −−−→ σ AB it follows from Corollary 12.4.2 and its proof that there exists an ensemble
LOCC
{tz , φz }z∈[k] such that σ = z∈[k] tz φz and ψ AB −−−→ {tz , φAB
AB AB AB
P
z }z∈[k] . For each z ∈ [k]
AB
let rz be the Schmidt vector of φz and define
X
r := tz r↓z . (12.75)
z∈[k]
LOCC
Let φAB be a pure state with a Schmidt vector r, so that ψ AB −−−→ φAB . Observe that the
square fidelity is given by
2 X 2
F ϕAB , σ AB = ⟨ϕAB |σ AB |ϕAB ⟩ = tz ⟨ϕAB |φAB
z ⟩
z∈[k]
X 2
cf . (12.70)→ ⩽ tz F q↓ , r↓z (12.76)
z∈[k]
2
(5.188)→ ⩽ F q↓ , r↓ .
Hence, q
= 1 − F (ϕAB , σ AB )2
AB LOCC AB
P ψ −−−→ ϕ
q
(12.76)→ ⩾ 1 − F (q↓ , r↓ )2 (12.77)
LOCC
⩾ P⋆ ψ AB −−−→ ϕAB .
Comparing the above inequality with (12.74) we get the equality of (12.73).
Remark. Observe that the corollary above provides an operational meaning to the entangle-
ment monotones E(m) . That is, E(m) (ψ AB ) measures how close (in terms of the square of the
purified distance) Φm can reach to ψAB by LOCC. Furthermore,
it’s noteworthy that when
LOCC AB
m ⩾ |A|, the conversion distance P Φm −−−→ ψ equals zero. This outcome arises be-
cause, in this scenario, the majorization theorem by Nielsen guarantees that the conversion
LOCC
Φm −−−→ ψ AB can be accomplished exactly.
Proof. Let p ∈ Prob↓ (n), with n := |A|, be the Schmidt vector corresponding to ψ AB . From
Theorem 12.5.1 it follows that
n o
LOCC AB (m)
P Φm −−−→ ψ = min P (p, r) : r ≻ u . (12.79)
r∈Prob↓ (n)
where u(m) is the uniform density matrix in Prob(m). Now, observe that the condition
r ≻ u(m) holds if and only if r has at most m non-zero components. Denoting by |r| the
number of non-zero components in r, and using the fact that the square of the purified
distance equals one minus the square of the fidelity, we get
X√ 2
LOCC
P 2 Φm −−−→ ψ AB = 1 − max rx px
r∈Prob(n),|r|=m
x∈[n]
X√ 2
=1− max rx px
r∈Prob(m) (12.80)
x∈[m]
X
Exercise 12.5.1→ = 1 − px
x∈[m]
= E(m) ψ AB .
Exercise 12.5.1. Let {sx }x∈[n] be a set of non-negative real numbers. Show that
X√ sX
max rx sx = ∥s∥2 := s2x (12.81)
r∈Prob(n)
x∈[n] x∈[n]
Considering the established equivalence between the two conversion distances, it makes sense
to primarily use T⋆ for further analysis, owing to its following closed-form expression.
Closed Formula
Theorem 12.5.2. Let ψ, ϕ ∈ Pure(AB) be two bipartite states with d := |A| = |B|,
and let p, q ∈ Prob(d) be the corresponding Schmidt probability vectors of ψ AB and
ϕAB , respectively. Then,
LOCC
T⋆ ψ AB −−−→ ϕAB = max ∥p∥(k) − ∥q∥(k) .
(12.87)
k∈[d]
Proof. The proof follows directly from Theorem 4.2.2. To see this, observe that
AB LOCC AB
1
T⋆ ψ −−−→ ϕ := min ∥q − r∥1
r∈Majo(p) 2
(12.88)
= T q, Majo(p)
Theorem 4.2.2→ = max ∥p∥(k) − ∥q∥(k) .
k∈[d]
LOCC
From Nielsen majorization theorem, ψ AB −−−→ Φm if and only if m1 ⩾ p1 , where p1 is the
first component of p = p↓ . Hence,
ε=0 AB
n 1o
Distill ψ = max log m : m ⩽
m∈N p1
(12.90)
1
m is an integer→ = log .
p1
The quantity 1/p1 can be express in terms of the min-entropy of p. Specifically,
Distillε=0 ψ AB = log 2Hmin (A)ρ ,
(12.91)
where Hmin (A)ρ is the min-entropy (see (6.22)) of the reduced density matrix ρA := TrB ψ AB .
To extend the formula above for the case that ε > 0, we use the computable conversion
distance, to calculate the single-shot distillable entanglement. Specifically, for any ε ∈ (0, 1)
and ψ ∈ Pure(AB), we define the ε-single-shot distillable entanglement as:
n o
LOCC
Distillε ψ AB := max log m : T⋆ ψ AB −−−→ Φm ⩽ ε .
(12.92)
m∈N
Theorem 12.5.3. Using the same notations as above, for every ε ∈ [0, 1) and
ψ ∈ Pure(AB) we have
ε
Distillε ψ AB = log 2Hmin (A)ρ ,
(12.93)
ε
where Hmin (A)ρ is the smoothed min-entropy as given in (10.147).
Remark. In (10.147) we found a closed form to the smoothed min-entropy. Using this form
we can expressed the ε-single-shot distillable entanglement of ψ AB as:
ε AB
k
Distill ψ = min log , (12.94)
k∈{ℓ,...,d} ∥p∥(k) − ε
where ℓ ∈ [d] is the integer satisfying ∥p∥(ℓ−1) ⩽ ε < ∥p∥(ℓ) .
Proof. From Theorem 12.5.2 and Exercise 12.5.2 we have
LOCC
T⋆ ψ AB −−−→ Φm = max ∥p∥(k) − ∥u(m) ∥(k)
k∈[d]
(12.95)
k
= max ∥p∥(k) − .
k∈[d] m
Combining this with the definition in (12.92) we obtain
ε AB
k
Distill ψ = max log m : ∥p∥(k) − ⩽ε ∀ k ∈ [d]
m∈N m
(12.96)
k
= max log m : ∥p∥(k) − ⩽ε ∀ k ∈ {ℓ, . . . , d} ,
m∈N m
k
since from the definition of ℓ, if k < ℓ then the inequality ∥p∥(k) − m ⩽ ε holds trivially.
k
Finally, observe that for each k ∈ {ℓ, . . . , d}, the condition ∥p∥(k) − m ⩽ ε can be expressed
k
as m ⩽ ∥p∥(k) −ε
, and since m is an integer, this condition is equivalent to m ⩽ ak , where
k
ak := . (12.97)
∥p∥(k) − ε
Exercise 12.5.3. Use the formula in (12.94) to compute Distillε (ψ AB ) for the two extreme
cases: (1) ψ AB is a maximally entangled state; (2) ψ AB is a product state. Give a physical
interpretation to the results.
lim− Distillε ψ AB = ∞ .
(12.99)
ε→1
Proof. From Theorem 12.5.2 and Exercise 12.5.2 we have for any m ∈ N
LOCC
X1
k
AB
T⋆ Φm −−−→ ψ = max − px = max − ∥p∥(k) . (12.102)
k∈[m] m k∈[m] m
x∈[k]
We therefore have
ε AB
k
Cost ψ = min log m : − ∥p∥(k) ⩽ ε ∀ k ∈ [m]
m∈N m
k
= min log m : m ⩾ ∀ k ∈ [m]
m∈N ∥p∥(k) + ε (12.103)
m
Exercise 12.5.5→ = min log m : m ⩾
m∈N ∥p∥(m) + ε
= min log m : ∥p∥(m) ⩾ 1 − ε .
m∈N
Corollary 12.5.2. Let ε ∈ [0, 1), ψ ∈ Pure(AB), and denote by ρA :=B ψ AB its
reduced density matrix. Then, the ε-single-shot entanglement cost of ψ AB is given by
Costε ψ AB = Hmax ε
(A)ρ . (12.109)
Proof. The proof follows trivially from a combination of the theorem above and the expres-
ε
sion for Hmax as given in Lemma 10.4.2.
where the normalization factor Hn := x∈[n] x1 is known as the harmonic number. We will
P
demonstrate that |χn ⟩ can serve as a catalyst for the generation of any arbitrary bipartite
state |ψ⟩ ∈ Cm ⊗ Cm , with |χn ⟩ undergoing minimal change. More precisely, for any ε > 0,
there exists an n ∈ N such that
LOCC
T⋆ χn −−−→ ψ ⊗ χn ⩽ ε . (12.111)
This remarkable result implies that it’s feasible to ‘embezzle’ a copy of |ψ⟩ from the catalyst
|χn ⟩, effectively borrowing some of its entanglement while leaving it largely unchanged.
To see how it works, recall from Lemma 11.1.1 that
LOCC LOCC
T⋆ χn −−−→ ψ ⊗ χn ⩽ T⋆ χn −−−→ Φm ⊗ χn , (12.112)
since the maximally entangled state |Φm ⟩ = √1m x∈[m] |xx⟩ can be converted by LOCC to
P
the state ψ. It is therefore sufficient to show that the right-hand side of the equation above
can be made arbitrarily small as we increase the dimension n. Let p be the Schmidt vector of
χn and q be the Schmidt vector of Φm ⊗ χn . Observe that p ∈ Prob(n) and q ∈ Prob(nm).
From Theorem 12.5.2 we know that
LOCC
T⋆ χn −−−→ Φm ⊗ χn = max ∥p∥(k) − ∥q∥(k) . (12.113)
k∈[n]
Now, observe that the components of q have the form pmx . Therefore, for any decomposition
k
k = am + b, with a := m and some b ∈ {0, 1, . . . , m − 1} we have
b
∥q∥(k) = ∥p∥(a) + pa+1 . (12.114)
m
Substituting this into (12.113) gives
LOCC
b
T⋆ χn −−−→ Φm ⊗ χn = max ∥p∥(k) − ∥p∥(⌊k/m⌋) − p⌊k/m⌋+1
k∈[n] m
b∈{0,...,m−1} (12.115)
= max ∥p∥(k) − ∥p∥(⌊k/m⌋) .
k∈[n]
Now, from the specific form of χn in (12.110) we have ∥p∥(k) = Hk /Hn so that the above
equality is equivalent to
Hk − H⌊k/m⌋
LOCC
T⋆ χn −−−→ Φm ⊗ χn = max . (12.116)
k∈[n] Hn
Finally, we use the well known bounds on the harmonic number Hn given by
1
ln(n) + ⩽ Hn ⩽ ln(n) + 1 . (12.117)
n
Using these bounds we estimate
k 1
Hk − H⌊k/m⌋ ⩽ ln(k) + 1 − ln − k
m m (12.118)
Exercise 12.5.7→ ⩽ 1 + ln(2m) .
We therefore conclude that
1 + ln(2m)
LOCC n→∞
T⋆ χn −−−→ ϕ+
m ⊗ χ n ⩽ −−−→ 0 , (12.119)
Hn
since Hn goes to infinity as n goes to infinity.
Exercise 12.5.7. Prove the second inequality of (12.118).
Exercise 12.5.8. Fix α ∈ R, and consider the bipartite entangled state
1 X√ α
|φn ⟩ := √ x |xx⟩ , (12.120)
Nn x∈[n]
where Nn = x∈[n] xα is the normalization factor. Show that only for α = −1 the state |φn ⟩
P
can be used to embezzle entanglement.
In the general case of arbitrary family of pure bipartite states, {χn }n∈N , observe that for
any integer ℓ ⩽ a := ⌊n/m⌋ + 1
max ∥p∥(k) − ∥p∥(⌊k/m⌋) = max ∥p∥(k) − ∥p∥(ℓ−1)
k∈[m(ℓ−1),...,mℓ−1] k∈[m(ℓ−1),...,mℓ−1]
(12.121)
= ∥p∥(mℓ−1) − ∥p∥(ℓ−1)
where we used the convention that ∥p∥(k) := 1 for an integer k > n. With this convention,
we conclude that
LOCC
T⋆ χn −−−→ Φm ⊗ χn = max ∥p∥(mℓ−1) − ∥p∥(ℓ−1) . (12.122)
ℓ∈[a]
However, observe that the condition above is in general insufficient to determine if the states
{χn }n∈N form an embezzling family, since the maximizer ℓ in (12.122) can depend on n.
where |Φ2 ⟩ := √12 (|00⟩ + |11⟩) is the 2 × 2 dimensional maximally entangled state (i.e. a
Bell state). In the following subsections we provide closed formulas for these measures of
entanglement and discuss their relations to the single-shot quantities Costε and Distillε that
we studied in the previous section.
In the following theorem we compute the entanglement cost and prove a stronger version of
the above relation.
Remark. Observe that from the theorem above it follows that there is no need to take the
limit ε → 0+ in (12.127) ;that is, by taking the limit n → ∞ the dependance on ε is
eliminated (as long as ε ∈ (0, 1)).
Proof. The proof follows directly from a combination of Corollary 12.5.2 and the variant of
the AEP property given in (10.171). Specifically, denoting by ρA := TrB ψ AB , we get from
Corollary 12.5.2 that
1 1 ε
Costε ψ ⊗n = lim Hmax (An )ρ⊗n
lim
n→∞ n n→∞ n (12.129)
(10.171)→ = H(A)ρ .
In the following theorem we compute the distillable entanglement and prove a stronger
version of the above relation.
Proof. The proof follows directly from a combination of Theorem 12.5.3 and the variant of
the AEP property given in (11.61). Specifically, denoting by ρA := TrB ψ AB , we get from
Corollary 12.5.2 that
1 1 ε
Costε ψ ⊗n = lim Hmin (An )ρ⊗n .
lim (12.132)
n→∞ n n→∞ n
For those readers seeking additional insights, an alternative proof utilizing the concept of
typicality is provided in Appendix D.5. This proof offers a different perspective and leverages
the principles of typicality, which may be of interest to readers who are keen on exploring
diverse approaches and methodologies within the field.
Exercise 12.6.2. Use the theorems above to show that for any ψ, ϕ ∈ Pure(AB)
E(ψ) E(ϕ)
Distill ψ AB → ϕAB = and Cost ψ AB → ϕAB = . (12.134)
E(ϕ) E(ψ)
Mixed-State Entanglement
To gain a better understanding of bipartite entanglement theory, we will delve into its most
general form in this chapter. The free states of the theory consists of separable states,
denoted for any composite system AB by
X
SEP(AB) := px σxA ⊗ ωxB : σx ∈ D(A), ωx ∈ D(B), p ∈ Prob(n), n ∈ N (13.1)
x∈[n]
where we used the notation p := (p1 , . . . , pn )T . Observe that SEP(AB) is a closed convex
set. Any quantum state ρ ∈ D(AB) that does not belong to SEP(AB) is referred to as
an entangled state. This chapter will reveal the intricate structure of entangled states,
highlighting the complexity of mixed-state entanglement theory.
557
558 CHAPTER 13. MIXED-STATE ENTANGLEMENT
Entanglement Witness
Definition 13.1.1. An operator Γ ∈ Herm(AB) is called an entanglement witness if
the following two conditions holds:
From the condition in (13.2), and the fact that SEP(AB) is the convex hull of product
states it follows that if Γ ∈ Herm(AB) is an entanglement witness then for any product state
ψ ⊗ ϕ ∈ Pure(AB) we must have
ψ A ⊗ ϕB ΓAB ψ A ⊗ ϕB ⩾ 0 . (13.4)
On the other hand, the condition 13.3 also implies that the exists a state χ ∈ Pure(AB)
such that
χAB ΓAB χAB < 0 . (13.5)
In other words, the condition (13.3) implies that ΓAB is not positive semidefinite so that we
can take χAB , for example, to be an eigenstate corresponding to a negative eigenvalue of
ΓAB .
Based on Theorem 9.4.1, we can conclude that entanglement witnesses are an effective
tool for detecting entanglement. Specifically, ρ ∈ D(AB) is an entangled state if and only if
there exists an entanglement witness Γ ∈ Herm(AB) such that
Tr ΓAB ρAB < 0; .
(13.6)
This characteristic can be employed to demonstrate that the set of separable states occupies
a non-zero volume.
Theorem 13.1.1. The set SEP(AB) has a non-zero volume in D(AB). Specifically,
there exists ε > 0 such that Bε (uAB ) ⊂ SEP(AB), where Bε (uAB ) is the “ball” of all
states in D(AB) that are ε-close to the maximally mixed state uAB = uA ⊗ uB .
Proof. Suppose by contradiction that the statement in the theorem is false. Then, there
exists a sequence of bipartite entangled states {τnAB }n∈N such that
1 AB
lim τn − uAB 1
=0. (13.7)
n→∞ 2
Since we assume that τnAB is entangled, we have τn ̸∈ SEP(AB). Therefore, there exists an
entanglement witness ΓAB
n such that
Tr τnAB ΓAB
n <0. (13.8)
Without loss of generality we can assume that for each n ∈ N the witness ΓAB
n is normalized
with respect to the Hilbert-Schmidt inner product; i.e.
h 2 i
Tr ΓABn =1. (13.9)
Therefore, the sequence {ΓAB n } is a sequence of Hermitian operators in the unit sphere
of Herm(AB). Since the unit sphere is compact, there exists a subsequence {nk }k∈N of
integers such that the limit limk→∞ ΓAB AB
nk exists and equal to some normalized operator Γ⋆ ∈
AB AB
Herm(AB). Since each Γnk is an entanglement witness, the limit Γ⋆ must satisfy
Tr ΓAB AB
⋆ σ ⩾0 ∀ σ ∈ SEP(AB) . (13.10)
On the other hand, taking the limit k → ∞ on both sides of the inequality Tr τnAB
k
ΓAB
nk
<0
gives
Tr ΓAB AB
⋆ u ⩽0, (13.11)
AB
so that Tr Γ⋆ ⩽ 0. Now, let {|ψx ⟩A }x∈[m] be an orthonormal basis of A, and {|ϕy ⟩B }y∈[ℓ]
be an orthonormal basis of B. Then,
X X
0 ⩾ Tr ΓAB Tr ΓAB ψxA ⊗ ϕB
⋆ = ⋆ y . (13.12)
x∈[m] y∈[ℓ]
From (13.10) it follows that for each x ∈ [m] and y ∈ [ℓ] we have Tr ΓAB ψ A
⊗ ϕB
⩾ 0.
AB A B
⋆ x y
We therefore get that for any x ∈ [m] and y ∈ [ℓ], Tr Γ⋆ ψx ⊗ ϕy = 0. Finally, since
the orthonormal bases {|ψx ⟩A }x∈[m] and {|ϕy ⟩B }y∈[ℓ] where arbitrary, we conclude that
Tr ΓAB ψ A ⊗ ϕB = 0
⋆ ∀ ψ ∈ Pure(A) ∀ ϕ ∈ Pure(B) . (13.13)
However, from Exercise 3.3.6 it follows that the above equation holds if and only if ΓAB
⋆ =0
in contradiction with the fact that ΓAB
⋆ is normalized, so in particular, cannot be the zero
matrix. This completes the proof.
Exercise 13.1.1. Let A1 , . . . , Am be m physical systems and let SEP(A1 · · · Am ) be the set
of multipartite separable states; i.e. SEP(A1 · · · Am ) is the convex hull of the set of all m-
fold product states of the form ρ1 ⊗ · · · ⊗ ρm , with ρx ∈ D(Ax ) for all x ∈ [m]. Show that
SEP(A1 · · · Am ) has a non-zero volume in D(A1 · · · Am ).
The following theorem shows a close connection between entanglement witnesses and
positive maps.
Theorem 13.1.2. Any entanglement witness is the Choi matrix of a positive map
that is not completely positive. Explicitly, Γ ∈ Herm(AB) is an entanglement
witness if and only if ΓAB = JEAB for some positive map E ∈ Pos(A → B) that is not
completely positive (i.e. E ̸∈ CP(A → B)).
Proof. Suppose first that ΓAB = JEAB for some positive map E ∈ Pos(A → B) and suppose
E ̸∈ CP(A → B). Then, for any product state ρ ⊗ σ ∈ Pure(AB) we have
Tr ΓAB ρA ⊗ σ B = Tr JEAB ρA ⊗ σ B
B A→B
(ρA )T
(13.14)
h i
TrA JEAB ρA ⊗ I B = E A→B (ρA )T −−−−→ = Tr σ E
⩾0
where the last inequality follows from the fact that E ∈ Pos(A → B) so that E ρT ⩾ 0.
Finally, the existence of a state χ ∈ Pure(AB) that satisfies (13.5) follows from the fact that
E is not completely positive so its Choi matrix ΓAB is not positive semidefinite.
Conversely, suppose ΓAB is an entanglement witness and let E ∈ L(A → B) be such
that ΓAB = JEAB (but we do not assume that E is positive). Then, for any ρ ∈ D(A) and
σ ∈ D(B) we have h i
T
Tr σ B E A→B ρA = Tr JEAB ρA ⊗ σ B
h T i
= Tr ΓAB ρA ⊗ σ B (13.15)
⩾0
where the last inequality follows from the fact that ΓAB is an entanglement witness and
T
ρA ⊗ σ B ∈ SEP(AB). Since ρA and σ B where arbitrary states, the above inequality
implies that E ∈ Pos(A → B). The map E is not completely positive since its Choi matrix,
ΓAB , is not positive semidefinite (as ΓAB is an entanglement witness). This completes the
proof.
Observe that the set of all entanglement witnesses consists of all the non-positive semidef-
inite matrices that are in the dual cone of the set of separable states. Specifically, for the
composite system AB, the set of all entanglement witnesses, denoted by WIT(AB), is given
by
WIT(AB) = Γ ∈ SEP(AB)∗ : ΓAB ̸⩾ 0 .
(13.16)
We will now provide two examples of how Theorem 9.4.1, adapted to entanglement theory
with the set WIT(AB) mentioned above, can be used to determine whether a quantum state
is entangled.
ρAB
t = tΦAB
m + (1 − t)τ
AB
(13.17)
Observe that Φm τ = τ Φm = 0, and furthermore, since ΦAB m is invariant under the action
AB
of the
AB twirling channel G defined in (3.251) also ρ t has this property. In fact, the state
ρt t∈[0,1] can be viewed as the set of all quantum states that are invariant under G
(see (3.255)). In the following, we will utilize this property to make the argument that the
isotropic state ρAB
t satisfies:
1
ρAB
t ∈ SEP(AB) ⇐⇒ t⩽ . (13.19)
m
To prove the above statement we follow Theorem 9.4.1. Specifically, ρAB t is separable if
and only if Tr[ΓAB ρAB
t ] ⩾ 0 for all entanglement witnesses Γ ∈ WIT(AB). The key idea is
AB
to use the invariance of ρt under G to get that
Tr G ΓAB σ AB = Tr ΓAB G σ AB ⩾ 0 .
(13.21)
where a, b ∈ R. In the final equality, we made use of the fact that I AB and ΦABm spans the
subspace of G-invariant operators in Herm(AB).
To ensure that the matrix Γ = aI + bΦm is an entanglement witness, we need to appro-
priately specify the coefficients a and b. Let’s start by noting that a must be non-negative,
as evidenced by
a = Tr Γ |1⟩⟨1| ⊗ |2⟩⟨2| ⩾ 0 . (13.23)
Furthermore, it’s important to recognize that Γ exhibits two distinct eigenvalues: a with
multiplicity |AB| − 1, and a + b with multiplicity one. Given that an entanglement witness
has at least one negative eigenvalue, and considering that a ⩾ 0, it is necessary for b to
satisfy b < −a. It’s also worth mentioning that the scenario where a = 0 does not yield an
entanglement witness (this is an interesting point to ponder – why this is the case?).
Consequently, after rescaling ΓAB by a positive factor a > 0, we can, without loss of
generality, assume that ΓAB takes the form
W = I AB − rΦAB
m , (13.24)
where r > 1. From (13.4) the matrix ΓAB is an entanglement witness if and only if for any
product state ψ ⊗ ϕ ∈ Pure(AB)
Proof. Without loss of generality suppose |A| = 2 and |B| ⩽ 3. From Theorem 13.1.2 there
exists E ∈ Pos(A → B) such that
ΓAB = E Ã→B ΩAÃ . (13.33)
is a positive semidefinite matrix. We will say that ρAB has positive partial transpose (PPT)
if this property hold, and otherwise, we will say that it has a negative partial transpose
(NPT) or simply that the state is an NPT state1 .
We have seen before that the 2-qubit maximally entangled state is an NPT state, and in
Exercise 3.4.2 you showed that all pure entangled states are NPT. Therefore, it is natural
to ask if all entangled states are NPT. In low dimensions, the following theorem states that
this is indeed the case.
Proof. If ρAB is an NPT state then from (13.34) it cannot be separable. Conversely, suppose
ρAB is a PPT state, and recall from Theorem 9.4.1 (when applied to entanglement theory)
that ρAB is separable if and only if
Tr ρAB ΓAB ⩾ 0
∀ Γ ∈ WIT(AB) . (13.35)
Now, fix Γ ∈ WIT(AB). From Theorem 13.1.3 ΓAB have the form (13.32) for some η1 , η2 ∈
Pos(AB). Hence,
Tr ρAB ΓAB = Tr ρAB η1AB + T B→B η2AB
ρAB is PPT→ ⩾ 0 .
Since ΓAB was an arbitrary entanglement witness in WIT(AB) we conclude that the above
equation holds for all Γ ∈ WIT(AB) so that ρAB must be a separable state. This concludes
the proof.
The condition |AB| ⩽ 6 in the theorem above is optimal. Indeed, there are examples of
PPT entangled states in higher dimensions, including the case |A| = 2 and |B| = 4, as well
as the case |A| = |B| = 3.
where I9 is the 9×9 identity matrix. It then follows that the bipartite density matrix ρ := 31 Π
is entangled (see Exercise 13.1.6). However, the state ρ is also PPT since
1 AB X B→B AB 1 AB X AB
T B→B (ρAB ) = I − T ψx = I − ψx = ρAB ⩾ 0 . (13.39)
3 3
x∈[5] x∈[5]
Exercise 13.1.8. Let K be the set of PPT operators in Pos(AB). Show that K∗ consists of
all operators Γ ∈ Herm(AB) of the form (13.32).
I A ⊗ ρB ⩾ ρAB . (13.41)
In particular, if ρAB is separable then it must satisfy the condition above. This criterion for
separability is known as the reduction criterion.
The reduction criterion can be expressed in terms of the positive map P ∈ Pos(A → A)
P(ω A ) := Tr ω A I A − ω A
∀ ω ∈ L(A) . (13.42)
Recall from Exercise 3.4.15 that the map described above is positive but not 2-positive,
and therefore not completely positive. Utilizing this map, the reduction criterion can be
expressed as:
P A→A ρAB ⩾ 0 .
(13.43)
Exercise 13.1.9 below provides an alternative expression for the positive map P A→A when
|A| = 2; specifically, for this case
T
P A→A ω A = σy ω A σy
∀ ω ∈ L(A) . (13.44)
Therefore, in this case the reduction criterion is equivalent to the PPT criterion.
In general, however, the PPT criterion is a more powerful criterion for detecting entan-
glement than the reduction criterion, since there exist entangled states that violate the PPT
criterion, yet cannot be detected by the reduction criterion, whereas the converse is not true.
This means that the PPT criterion can detect a larger class of entangled states than the
reduction criterion. The reason for that is that for |A| > 2 the map P above has the form
(see Exercise 13.1.9)
P = P1 + T ◦ P2 , (13.45)
where P1 , P2 ∈ CP(A → A) and T ∈ Pos(A → A) is the transpose map.
Although the reduction criterion may not be as effective as the PPT criterion in detecting
entanglement, further investigation reveals that it still holds importance in the field of quan-
tum resource theories. In fact, quantum states that do not satisfy the reduction criterion
possess non-zero distillable entanglement, which emphasizes the usefulness of the criterion
in other aspects of quantum information. In the upcoming sections, we will explore some of
these implications in greater detail.
Exercise 13.1.9. Let P A→A be as in (13.42).
1. Prove the relation (13.44) for the case that |A| = 2.
2. Prove the relation (13.45) for the case |A| ⩾ 2. Hint: Show that the partial transpose
of the Choi matrix of P A→A is positive semidefinite.
Theorem 13.1.5. Using the same notations as above, a quantum state ρ ∈ D(AB)
is entangled if it satisfies X
λx > 1 . (13.47)
x∈[k]
it is sufficient to show that ΛAB is an entanglement witness. Indeed, let ψ ∈ Pure(A) and
ϕ ∈ Pure(B). Then,
X
Tr ΛAB ψ A ⊗ ϕB = 1 −
Tr[ψηx ]Tr[ϕζx ] . (13.49)
x∈[k]
Now, let v, u ∈ Rk be the vectors whose components are {Tr[ψηx ]}x∈[k] and {Tr[ϕζx ]}x∈[k] ,
respectively. Then, we need to show that v · u ⩽ 1. Since {ηx }x∈[k] is an orthonormal set in
Herm(A) (which can be completed to a full orthonormal basis of Herm(A)), in terms of the
Frobenius norm (i.e. the norm induced by the Hilbert-Schmidt inner product)
X 2
1 = ∥ψ∥22 ⩾ Tr[ψηx ]ηx
2
x∈[k]
X (13.50)
{ηx }x∈[k] is orthonormal −−−−→ = |Tr[ψηx ]|2 = v · v .
x∈[k]
We then argue (see Exercise 13.1.10) that the sum appearing in (13.47) can be expressed as
X
λx = ρ̃AB 1 . (13.53)
x∈[m2 ]
In other words, the realignment criterion can be stated as follows: if the trace-norm of the
realigned matrix ρ̃AB is greater than one (i.e. ∥ρ̃AB ∥1 ⩾ 1) then the state ρAB is entangled.
Exercise 13.1.11. Consider the case |A| = 2 and |B| = 4. Let p ∈ [0, 1] and ρ ∈ Herm(AB)
be the matrix
1 pI4 pξ
ρAB = (13.54)
7p + 1 pξ T η
This extension is considered symmetric since the original state can be obtained by tracing
out either the B or the B̃ systems; i.e., the marginals of σ AB B̃ satisfy σ AB = σ AB̃ . On
the other hand, when dealing with entangled states, it is not immediately clear whether
a symmetric extension, ρAB B̃ , with the property ρAB = ρAB̃ exists for an entangled state
ρ ∈ D(AB). While this property holds trivially for separable states, it doesn’t hold for all
entangled states.
Note that the extension of the separable state in (13.56) can also be extended to k-copies
of B via
k
X
ρAB = px ψxA ⊗ ϕB Bk
x ⊗ · · · ⊗ ϕx ,
1
(13.58)
x∈[m]
where B ∼
= B1 ∼
= ··· ∼
k
= Bk . We say that ρAB has a symmetric k-extension of ρAB .
We saw above that every separable quantum state is k-extendible for all k ∈ N. Sur-
prisingly, the converse of this statement is also true! This means that if a quantum state
ρ ∈ D(AB) is k-extendible for all k ∈ N, then it must be separable. However, proving this
statement requires certain techniques that are beyond the scope of this book. Specifically,
it involves the use of the quantum de Finetti theorem. Interested readers can find more
information in the ‘notes and references’ section at the end of this chapter.
Given a quantum state ρ ∈ D(AB), how can we determine if it is k-extendible? Observe
that the conditions ρAB1 = ρABj , for all j ∈ [k], can be expressed as
h k k
i
Tr ρAB ΛAB j =0, (13.60)
where η ∈ Herm(AB1 ) and ξ ∈ Herm(ABj ). Note that the linearity of the condition above
implies that we can restrict η and ξ to belong to orthonormal bases of Herm(AB1 ) and
Herm(ABj ), respectively. Thus, we conclude that there exists a finite number of operators
{Λjℓ }j∈[k],ℓ∈[n] such that ρAB1 = ρABj , for all j ∈ [k], if and only if
h k k
i
Tr ρAB ΛAB
jℓ =0 ∀ j ∈ [k], ℓ ∈ [n] . (13.62)
The conditions specified above indicate that the determination of whether ρAB is k-extendible
requires the solution of an SDP feasibility problem. Therefore, the criterion for k-extendibility
can be computed algorithmically and efficiently.
Exercise 13.1.12. Using the same notations as above:
1. Find an upper bound on n.
2. Use Farkas lemma of Exercise 4.6.16 to express the dual form of (13.62).
2. E(1) = 0, where 1 correspond to the only element of D(AB) when |A| = |B| = 1.
Exercise 13.2.1. Show that any measure of entanglement E as defined above satisfies the
following two conditions: (1) It is always non-negative, that is, E(ρAB ) ⩾ 0 for all ρ ∈
D(AB), and (2) it satisfies E(σ AB ) = 0 for all σ ∈ SEP(AB).
In general, LOCC can be stochastic, in the sense that ρAB can be converted to σxAB with
some probability px . In this case, the map from ρAB to σxAB can not be described by a
CPTP map. However, by introducing a classical ‘flag’ system X, we can view the ensemble
{σxAB , px }x∈[m] as a classical quantum state σ XAB := X AB
P
x∈[m] px |x⟩⟨x| ⊗ σx . Hence, if
ρAB can be converted by LOCC to σxAB with probability px , then there exists map E ∈
LOCC(AB → XAB) such that E(ρAB ) = σ XAB . Since the ‘flag’ system X is classical
both Alice and Bob have access to it since if Alice holds it she can communicate it to Bob,
and vice versa. Therefore, the definition above of a measure of entanglement
capture also
XAB AB
probabilistic transformations. Particularly, E must satisfy E σ ⩽E ρ .
Almost all measures of entanglement studied in literature (although not all) satisfy
X
E σ XAB = px E(σxAB ) ,
(13.65)
x∈[m]
which is very intuitive since X is just a classical system encoding the value of x. We call this
relation
L in (13.65) the direct sum property since, mathematically, σ XAB can also be viewed
AB XAB
as x∈[m] px σx . If the direct sum property holds then the condition E σ ⩽ E ρAB
becomes x px E(σxAB ) ⩽ E ρAB meaning that LOCC can not increase entanglement on
P
average. Therefore, the direct sum property is in general stronger than the strong monotonic-
ity property (10.10) of a resource measure M. In fact, the condition (13.65) also implies that
E is convex (see Exercise 13.2.2). We therefore conclude that any measure of entanglement
that satisfies the direct sum property is an entanglement monotone.
Exercise 13.2.2. Let E be a measure of entanglement satisfying the direct sum prop-
erty (13.65). Show that E is convex; i.e. for any ensemble of states {px , σxAB }x∈[m] we
have X X
px σxAB ⩽ px E σxAB .
E (13.66)
x∈[m] x∈[m]
Entanglement of Formation
Definition 13.2.1. Let E be a measure
S of pure state entanglement. The convex roof
extension of E, is a function EF : A,B D(AB) → R defined on any ρ ∈ D(AB) via
X
EF ρAB = inf px E ψxAB
(13.68)
x∈[m]
where the infimum is over all pure-state decompositions {px , ψxAB }x∈[m] of ρAB . EF is
called the entanglement of formation associated with the pure-state measure E.
Remark. The term “entanglement of formation” originated from historical reasons, as it was
originally believed that EF (ρAB ), with E taken as the entropy of entanglement, represented
the entanglement cost required to create the state ρAB . However, we will discover later on
that it is actually the regularized entanglement of formation that can be interpreted as the
entanglement cost of ρAB .
Exercise 13.2.3. Show that if E is the entropy of entanglement then its corresponding
entanglement of formation satisfies for all ρ ∈ D(AB),
EF ρAB ⩽ min H ρA , H ρB
. (13.69)
To show that the entanglement of formation is indeed a measure of entanglement, recall
that any measure of pure state entanglement, E, can be expressed for all ψ ∈ Pure(AB) as
E ψ AB = g ρA with ρA := TrB ψ AB ,
(13.70)
for some Schur concave function g. A slightly stronger condition than Schur concavity is the
condition that g is both symmetric and concave. In this case, the resulting entanglement of
formation is an entanglement monotone.
We first prove the following auxiliary lemma. In this lemma we only consider a quantum
instrument on Bob’s system and consider a pure initial bipartite state. We show that for
this simpler case, the convex roof extension of E satisfies strong monotonicity.
′ ′
Lemma 13.2.1. Let ψ ∈ Pure(AB), E B→B X := x∈[m] ExB→B ⊗ |x⟩⟨x|X be a
P
′ 1 B→B ′ AB h ′ i
σxAB := E ψ where px := Tr ExB→B ψ AB . (13.71)
px x
Then,
′
X
px EF σxAB ⩽ EF ψ AB .
(13.72)
x∈[m]
Remark. In the proof below, we adopt the notations like ϕA := TrB ϕAB to denote the
reduced density matrix of a pure bipartite state. This notation is instrumental in reducing
the number of symbols used, enhancing clarity and conciseness. However, it’s crucial to
remember that ϕA in this context represents a mixed state, despite the notation resembling
that typically used for pure states. This distinction is important for a correct understanding
of the concepts and calculations involved in the proof.
Hence,
′
X X
σxAB px g σxA
px EF ⩽
x∈[m] x∈[m]
X (13.75)
g is concave→ ⩽ g px σxA .
x∈[m]
Finally, observe that the reduced density matrix of ψ AB satisfies ψ A = x∈[m] px σxA . Substi-
P
tuting this into the equation above gives
X ′
px EF σxAB ⩽ g ψ A = E ψ AB = EF ψ AB .
(13.76)
x∈[m]
We first prove the strong monotonicity of EF . The proof strategy aims to show that
EF does not increase on average under a general quantum instrument on Bob’s subsystem.
We can then apply a similar argument to Alice’s side, demonstrating that EF remains non-
increasing under quantum instruments on either subsystem. The significance of this lies
in the fact that LOCC consists of such local quantum instruments, coupled with rounds
of classical communication, which do not affect the monotonicity property. Therefore, by
demonstrating the non-increasing nature of EF under a quantum instrument on both Bob’s
side (and, by symmetry arguments, also on Alice’s side), it can be concluded that EF satisfies
the strong monotonicity property under LOCC.
′
Let {EzB→B }z∈[k] be a quantum instrument on Bob’s subsystem, and for each z ∈ [k], let
′
ρAB
z be the post measurement state
after′ outcome
z occurred. Moreover, for every z ∈ [k]
′
and x ∈ [m] we denote by rz := Tr EzB→B ρAB , tz|x := Tr EzB→B ψxAB , and
AB ′ 1 ′
EzB→B ψxAB .
σxz := (13.78)
tz|x
′
We need to show that the average entanglement z∈[k] rz EF ρAB cannot exceed EF (ρAB ).
P
z
For this purpose, for each state ρAB z we need to find a suitable pure state decomposition
that can be related to ρAB . Observe that the equation above involves the mixed states
AB ′ AB ′ ′
{σxz }x∈[m] . Therefore, for each σxz we denote by {sy|xz , ϕAB
xyz }y∈[n] its optimal pure state
decomposition, so that X
AB ′ AB ′
EF σxz = sy|xz E ϕxyz . (13.80)
y∈[n]
′
With this final notation, we get our desirable pure-state decomposition of ρAB
z :
′
X X px tz|x sy|xz ′
ρAB
z = ϕAB
xyz . (13.81)
rz
x∈[m] y∈[n]
′
Since the above pure-state decomposition of ρAB
z is not necessarily optimal, we conclude
X
AB
X px tz|x sy|xz AB ′ X
AB ′
rz EF ρz ⩽ rz E ϕxyz = px tz|x sy|xz E ϕxyz
x,y,z
rz x,y,z
z∈[k]
AB ′
X
(13.80)→ = px tz|x EF σxz
x,z (13.82)
X
px EF ψxAB
Lemma 13.2.1→ ⩽
x∈[m]
(13.77)→ = EF ρAB .
We therefore conclude that EF does not increase on average under a general quantum in-
strument on Bob’s subsystem. This completes the proof of strong monotonicity.
It is therefore left to show that EF is convex. Indeed, let {px , ρAB
x }x∈[m] be an ensemble of
AB
bipartite entangled states, and for each x ∈ [m] let {qy|x , ψxy }y∈[n] be an optimal pure-state
decomposition of ρAB x such that
X
EF ρAB AB
x = qy|x E ψxy . (13.83)
y∈[n]
AB
px ρAB
P
Now, observe that {px qy|x , ψxy }x,y is a pure-state decomposition of x x . Thus,
!
X X
px ρAB AB
EF x ⩽ p x q y|x E ψxy
x x,y
X (13.84)
ρAB
(13.83)→ = px EF x .
x∈[m]
Exercise 13.2.4. Consider E and C as the entropy of entanglement and concurrence for
pure states, respectively. Let A and B be two-qubit systems (i.e., |A| = |B| = 2), and define
the function g : [0, 1] → [0, 1] as
√
1+ 1 − x2
g(x) = h2 (13.85)
2
The exercise above shows that in order to compute the entanglement of formation of a
two-qubit state ρAB it is sufficient to compute its concurrence of formation. In the following
theorem we give a closed formula for the concurrence of formation. The closed formula is
given in terms of the density matrix
ρAB
⋆ := (σ2 ⊗ σ2 )ρ̄AB (σ2 ⊗ σ2 ) (13.88)
where the orthonormal basis {|x⟩}x∈{0,1} (and similarly {|y⟩}y∈{0,1} ) is such that σ2 has the
form −i|0⟩⟨1| + i|1⟩⟨0|.
Closed Formula
Theorem 13.2.2. Let ρ ∈ D(AB) be a two qubit mixed state (i.e. |A| = |B| = 2).
Then, the concurrence of formation of ρAB is given by
CF ρAB = max{0, λ1 − λ2 − λ3 − λ4 }
(13.90)
√ √
where {λ1 , . . . , λ4 } are the four eigenvalues of the matrix ρ ρ⋆ , arranged in a
non-increasing order.
In the derivation of the formula (13.90), we will use a bilinear form denoted as ·, · :
C2 ⊗ C2 → C. This bilinear form will not only be instrumental in proving the formula but
will also be valuable in the analysis of multipartite entanglement. It is defined for any two
vectors |ψ⟩, |ϕ⟩ ∈ AB as
|ψ⟩, |ϕ⟩ := ⟨ψ̄ AB |σ2 ⊗ σ2 |ϕAB ⟩ (13.91)
where ψ̄ AB is defined such that if |ψ AB ⟩ = x,y∈{0,1} cxy |x⟩|y⟩ then |ψ̄ AB ⟩ = x,y∈{0,1} c̄xy |x⟩|y⟩.
P P
Note that ψ̄ AB is well define only with respect to some fixed bases {|0⟩A , |1⟩A } ⊂ A and
{|0⟩B , |1⟩B } ⊂ B of A and B, respectively. These orthonormal bases are chosen such that
σ2 has the form −i|0⟩⟨1| + i|1⟩⟨0|. The relation of the above bilinear form to our study
here can be found in Exercise 12.3.2 in which you had to show that the concurrence of a
AB
two-qubit pure state ψ ∈ Pure(AB) can be expressed as C ψ = |ψ⟩, |ψ⟩ . In the
following exercise you prove several additional properties of this bilinear form.
1. Show the linearity of the bilinear form; that is, show that for any vectors |ψ⟩, |ψ1 ⟩,
|ψ2 ⟩, |ϕ⟩, |ϕ1 ⟩, and |ϕ2 ⟩ in C2 ⊗ C2
2. Show that the bilinear form is symmetric; that is, for any two vectors |ψ⟩, |ϕ⟩ ∈ C2 ⊗C2
Proof of Theorem 13.2.2. Consider first the case that λ1 > λ2 + λ3 + λ4 . Let {px , ψx }x∈[4]
and {qy , ϕAB
y }y∈[n] be two pure-state decompositions of ρ
AB
, and for each x ∈ [4] and y ∈ [n]
√ √
(here n ⩾ 4), let |ψ̃x ⟩ := px |ψx ⟩ and |ϕ̃AB
y ⟩ := qy |ϕAB
y ⟩. Recall from Exercise 2.3.15 that
there exists an n × 4 isometry V = (vyx ) such that for all y ∈ [n]
X
|ϕ̃y ⟩ = vyx |ψ̃x ⟩ . (13.95)
x∈[4]
Denoting by Mψ and Mϕ the matrices whose components are ψ̃x , ψ̃x′ and ϕ̃y , ϕ̃y′ , respec-
tively, we get that the above equation can be written as
Mϕ = V Mψ V T . (13.97)
Since Mψ is symmetric (see second part of Exercise 13.2.5), there exists a 4 × 4 unitary
matrix U such that U Mψ U T is diagonal. Moreover, by appropriate choice of U , the diagonal
elements of U Mψ U T can always be made real and positive (i.e. they are the singular values
of Mψ ), and arranged on the diagonal of U Mψ U T in a non-increasing order. We therefore
conclude that there exists a pure-state decomposition that is diagonal with respect to the
bilinear form. For simplicity of the exposition, we take it to be {px , ψxAB }x∈[4] itself; that is,
X X X
qy C ϕAB
y = qy |ϕy ⟩, |ϕy ⟩ = |ϕ̃y ⟩, |ϕ̃y ⟩
y∈[n] y∈[n] y∈[n]
X X
2
(13.99)→ = vyx λx
y∈[n] x∈[4] (13.100)
X
|vy1 |2 λ1 − |vy2 |2 λ2 − |vy3 |2 λ3 − |vy4 |2 λ4
⩾
y∈[n]
= λ1 − λ2 − λ3 − λ4 ,
where we used the inequality |a + b + c + d| ⩾ |a| − |b| − |c| − |d| for every a, b, c, d ∈ C.
Moreover, the inequality above can be saturated by taking V to be the unitary matrix (i.e.
taking n = 4)
−1 i i i
1 −i i
1 i
V = . (13.101)
2
1 i −i i
1 i i −i
Indeed, observe that the matrix V above is unitary and has the property that for all y ∈ [4],
1
P 2
x∈[4] vyx λx = 4 λ1 − λ2 − λ3 − λ4 . Therefore, with this V we get from (13.100) that the
average concurrence of {qy , ϕy }y∈[4] is
X X
2
vyx λx = λ1 − λ2 − λ3 − λ4 . (13.102)
y∈[4] x∈[4]
Exercise 13.2.9. Let ρ ∈ D(AB) with |A| = |B| = 2, and define the quantity
X
Ca ρAB := max px C(ψxAB )
(13.107)
x∈[m]
where the maximum is over all pure-state decompositions of ρAB (i.e. Ca is defined similarly
to CF but with a maximum instead of a minimum). Show that
where F is the fidelity. Hint: Use similar lines as in the proof above. Show also that the
square of the fidelity above can be expressed as
p p 2
= Tr ρAB ρAB
ρAB ρAB
⋆ ⋆ . (13.109)
1
where the minimum is over all pure-state decompositions ρAB = px ψxAB , with ρA
P
x x :=
TrB [ψxAB ], and ∥ · ∥(k) is the Ky Fan norm.
Exercise 13.2.10. Show that the functions E(k) as defined above are entanglement mono-
tones.
Note that for the two-qubit case, the only non-trivial measure E(k) is when k = 1. In
this case X
E(1) ρAB := min px λmin ρA
x , (13.111)
x∈[m]
since each ρA A A
x is a qubit so its the minimum eigenvalue λmin ρx = 1 − ρx (1) . Moreover,
observe that
1 p
λmin ρAx = 1 − 1 − 4 det (ρ A)
x
2 (13.112)
1 p
2 AB
= 1 − 1 − C (ψx )
2
where C 2 ψxAB is the square of the concurrence of ψxAB . In the following exercise you show
that the above relation can be used to show that for any two-qubit state ρ ∈ D(AB) with
|A| = |B| = 2 we have
AB
1 q
2
E(1) ρ = AB
1 − 1 − Cf (ρ ) , (13.113)
2
where Cf2 ρAB is the square of the concurrence of formation of ρAB . Hence, the closed
formula for the concurrence of formation can be used to compute E(1) .
Exercise 13.2.11. Let ρ ∈ D(AB) be a two-qubit state with |A| = |B| = 2.
1. Prove the relation (13.112).
2. Prove the relation (13.113). Hint: The proof is similar to the proof of (13.87).
In Corollary 12.4.2 we found necessary and sufficient conditions to convert by LOCC
a pure bipartite state to a mixed bipartite state. Interestingly, for the case that d :=
|A| = |B| = 2, the minimization in (12.63) over k ∈ {1, 2} becomes trivial since for all
ψ ∈ Pure(AB) we have E(2) ψ AB = 0. We therefore arrive at the following corollary.
Optimal Extensions
In Chapter 5 we introduced a method to extend divergences from classical to quantum
systems. This method is in fact quite general and can be slightly modified to incorporate
extensions of measures of entanglement from pure to mixed states. Specifically, let E be a
measure on pure state entanglement. For any ρ ∈ D(AB) the maximal extension of E is
defined as n ′ ′ o
′ ′ LOCC
E ρAB := inf E ψ A B : ψ A B −−−→ ρAB ,
(13.117)
where the infimum is over all systems A′ B ′ and all pure states ψ ∈ Pure(A′ B ′ ) for which
′ ′
ψ A B can be converted by LOCC to ρAB . Similarly, the minimal extension is defined as
n ′ ′ ′ ′
o
LOCC
E ρAB := sup E ψ A B : ρAB −−−→ ψ A B .
(13.118)
Exercise 13.2.13. Let E be a measure of pure state entanglement, and let E and E be its
maximal and minimal extensions.
3. Show that if E is additive under tensor products of pure bipartite states, then E is
sub-additive and E super-additive under tensor products of mixed bipartite states.
The minimal extension E is not very useful measure of entanglement since typically a
mixed bipartite state cannot be converted by LOCC to a pure entangled state. Therefore,
for such mixed entangled states E takes the zero value. On the other hand, the maximal
extension is a faithful measure of entanglement (i.e. takes the zero value only on separable
states).
As an example, recall that the Schmidt rank is a measure of entanglement on pure states.
Its maximal extension to mixed states is given by
n ′ ′ ′ ′ LOCC
o
SR ρAB := inf SR ψ A B : ψ A B −−−→ ρAB .
(13.120)
′ ′ LOCC
In general, the condition ψ A B −−−→ ρAB can be very complicated. However, we can replace
′ ′ ′ ′
ψ A B in the equation above with the maximally entangled state Φk , where k := SR ψ A B =
′ ′ LOCC LOCC
SR (Φk ), since whenever ψ A B −−−→ ρAB we also have Φk −−−→ ρAB . We therefore conclude
that n o
AB
AB
LOCC AB
SR ρ := SR ρ = min k : Φk −−−→ ρ , (13.121)
where, for simplicity of the exposition, we removed the over-line symbol from SR ρAB .
1. At least one of the states, in any pure-state decomposition of ρAB , has a Schmidt rank
no smaller than k.
2. There exists a pure-state decomposition of ρAB with all states having Schmidt rank at
most k.
ER ρAB := D ρAB σ AB .
min (13.123)
σ∈SEP(AB)
As discussed in Sec. 10.3, computing the relative entropy of entanglement can generally be
quite challenging. However, in certain special cases, such as with pure states and symmetric
states, it is feasible to compute this measure. The complexity in computing the relative
entropy of entanglement typically arises from the need to optimize over the large set of
separable states, which can be a demanding task for most mixed states. Yet, for pure
states and certain states with specific symmetrical properties, this complexity is significantly
reduced, making the calculation manageable.
ER ψ AB = E ψ AB := H (A)ρ ,
(13.124)
Proof. The proof is based on the closed formula given in Theorem 10.3.2 for the relative
entropy of a resource. Let X√
|ψ⟩ = px |xx⟩ (13.125)
x∈[n]
We argue now that σ⋆ is the closest separable state (see Fig. 13.1); that is, we argue that
min D ψ AB σ AB = D ψ AB σ⋆AB .
(13.127)
σ∈SEP(AB)
Indeed, from Theorem 10.3.2 it follows that σ⋆AB satisfies the above equality if and only if
there exists an entanglement witness η ∈ WIT(AB) such that
ψ AB = σ⋆AB − aL−1
σ⋆ (η) . (13.128)
The above equality can be expressed as L−1 AB
σ⋆ (aη) = σ⋆ − ψ AB , which is equivalent to
aη = Lσ⋆ (σ⋆ − ψ) = I − Lσ⋆ (ψ) . (13.129)
That is, σ⋆ satisfies (13.127) if and only if the right-hand side of the equation above is an
entanglement witness. Now, by direct computation with have (see Exercise 13.2.16)
X √ log px − log py
Lσ⋆ (ψ) = p x py |xx⟩⟨yy| . (13.130)
px − py
x,y∈[n]
px
Denote by rxy := py
and observe that for any x, y ∈ [n]
√
√ log px − log py rxy log(rxy )
cxy := px py = ⩽1, (13.131)
px − py rxy − 1
where the
√ last inequality follows from the fact that logarithm function satisfies log(r) ⩽
(r − 1)/ r for r ⩾ 1 and opposite inequality for 0 < r ⩽ 1. We therefore get for any product
state ϕA ⊗ φB
X
Tr ϕA ⊗ φB Lσ⋆ (ψ) =
cxy ⟨ϕ|x⟩⟨φ|x⟩⟨y|ϕ⟩⟨y|φ⟩
x,y∈[n]
X
⩽ cxy ⟨ϕ|x⟩⟨φ|x⟩⟨y|ϕ⟩⟨y|φ⟩
x,y∈[n]
X
cxy ⩽ 1 −−−−→ ⩽ ⟨ϕ|x⟩⟨φ|x⟩⟨y|ϕ⟩⟨y|φ⟩ (13.132)
x,y∈[n]
1 1 X
|⟨ϕ|x⟩|2 + |⟨φ|x⟩|2 |⟨y|ϕ⟩|2 + |⟨y|φ⟩|2
|a|2 + |b|2 −−−−→ ⩽
|ab| ⩽
2 4
x,y∈[n]
=1.
Tr ϕA ⊗ φB I − Lσ⋆ (ψ) ⩾ 0 ,
(13.133)
min D ρAB G σ AB
G SEP(AB) ⊆ SEP(AB) −−−−→ ⩽
σ∈SEP(AB)
(13.134)
min D G ρAB G σ AB
ρAB
= G(ρ ) −−−−→ =
AB
σ∈SEP(AB)
The importance of the above formula lies in the fact that the minimization over separable
states of the form G σ AB might involve significantly fewer parameters compared to the
minimization over the entire set of separable states SEP(AB). This simplification is what
makes the computation of the relative entropy of entanglement feasible for these symmetric
states.
As an example, let’s consider the isotropic state defined for all t ∈ [0, 1] (with m := |A| =
|B|) as
I AB − Φm
ρAB
t = tΦAB
m + (1 − t)τ AB
where τ AB
:= . (13.136)
m2 − 1
Previously, we observed that this state is invariant under the twirling channel described in
(3.251) and is separable if and only if t ⩽ 1/m. At one extreme (t = 0), the isotropic state is
the separable state τ AB . At the other extreme (t = 1/m), the isotropic state is the separable
state
m 1
ρAB
1/m = uAB + ΦAB . (13.137)
m+1 m+1 m
For an entangled isotropic state ρAB
t with t > 1/m, equation (13.135) yields
since for every separable state σ ∈ SEP(AB), the twirled state G σ AB = ρAB t′ for some
t′ ∈ [0, 1/m]. This demonstrates how the symmetry of ρAB t simplifies the optimization
problem. Furthermore, in Exercise 13.2.17, you will show that the optimal t′ is t′ = 1/m,
resulting in
ER ρAB = D ρAB ρAB
t t 1/m
1−t (13.139)
= log m + t log t + (1 − t) log .
m−1
This result is intuitive as the separable state ρAB
t′ with t′ = 1/m is on the boundary of the
set of separable states and, roughly speaking, the closest to ρAB
t .
Exercise 13.2.17. Prove the equalities in (13.139). Hint. Observe that [ρAB AB
t , ρt′ ] = 0 for
′ AB AB ′
all t, t ∈ [0, 1] and use it to show that D ρt ρt′ = D(t∥t ), where t := (t, 1 − t)T and
t′ := (t′ , 1 − t′ )T .
ER ρAB AB AB
W = D ρW ω , (13.140)
where
1 1
ω AB := ΠAB
Sym + ΠAB . (13.141)
m(m + 1) m(m − 1) Asy
ρΓ −1
N ρAB := 1
. (13.149)
2
Remark. We chose as a convention for this book that ρΓ represents partial transpose on
Bob’s side. This convention does not effect the definition above since
Therefore, the definition of the negativity above is independent on whether the partial trans-
pose is taken on Bob’s side or on Alice’s side.
Note that ρΓ has trace one since the partial transpose does not effect the trace. Therefore,
if λ1 , . . . , λn are the eigenvalues of ρΓ then they sum to one. Suppose, without loss of
generality that the first k eigenvalues of ρΓ are non-negative, and the remaining n − k are
negative. We then get
X X Xn
Γ
ρ 1= |λx | = λx − λx
x∈[n] x∈[k] x=k+1
n
(13.151)
X
−−−−→ = 1 − 2 λx .
X
λx = 1
x∈[n]
x=k+1
That is, the negativity of ρAB is the absolute value of the sum of all the negative eigenvalues
of ρΓ . We can therefore express it also as
N ρAB = Tr ρΓ− .
(13.153)
where ρΓ− := ρΓ − is the negative part of ρΓ . Note that this also demonstrates that the
negativity is zero on separable states.
Exercise 13.2.24. Let ρ ∈ D(AB). Show that there exists density matrices ρ+ , ρ− ∈ D(AB)
such that ρ+ ρ− = ρ− ρ+ = 0 and
ρΓ = 1 + N ρAB ρAB AB
AB
+ − N ρ ρ− . (13.154)
The decomposition (13.154) of ρΓ in the exercise above is optimal in the following sense.
Suppose there exists σ, τ ∈ D(AB) such that
ρΓ = (1 + a)σ AB − aτ AB , (13.155)
−N ρAB = Tr Π− (1 + a)σ AB − aτ AB
⩾ −aTr Π− τ AB
(13.157)
⩾ −a .
Hence, we must have a ⩾ N(ρAB ). In other words, we can express the negativity of ρAB as
n o
AB Γ AB AB
N ρ = inf a ∈ R : ∃ σ, τ ∈ D(AB) s.t. ρ = (1 + a)σ − aτ . (13.158)
Proof. To prove the strong monotonicity property, Let {Ex }x∈[m] be a quantum instru-
ment on Alice’s system, with each Ex ∈ CP(A → A′ ) being trace non-increasing and
′ A′ B := 1 A→A′
P AB
x∈[m] E
x ∈ CPTP(A → A ). For each x ∈ [m], denote by ρx E
px x
ρ , where
A→A ′ AB
AB
px := Tr Ex ρ . Finally, set ν := N ρ By definition we have
1 A→A′ AB Γ
ρΓx = E ρ
px x
P artial transpose 1 A→A′ Γ
′ → = E ρ (13.159)
acts on Bob s side
px x
1 + ν A→A′ AB ν ′
ρ+ − ExA→A ρAB
(13.154)→ = Ex − .
px px
Since the above decomposition of ρΓx is not necessarily optimal (in the sense of (13.158)) we
must have ′ ν h
A→A′
i
N ρAx
B
⩽ Tr E x ρ AB
− . (13.160)
px
We therefore get that
′ h i
A→A′
X X
px N ρ A
x
B
⩽ ν Tr E x ρ AB
−
x∈[m] x∈[m] (13.161)
AB
=ν=N ρ ,
P
where we used the fact that x∈[m] Ex is trace preserving. That is, the negativity of en-
tanglement cannot increase on average by a quantum instrument on Alice’s side. Since
the negativity is not affected if we take the partial transpose on Alice’s system (instead of
Bob’s), using similar arguments as above, we get that the negativity cannot increase on
average under any quantum instrument applied on Bob’s side. We therefore conclude that
the negativity satisfies the strong monotonicity condition of an entanglement monotone. It
is left to show that the negativity is convex.
Let {px , ρAB
x }x∈[m] be an ensemble of density matrices in D(AB). Then, by definition,
X 1 X Γ 1
AB
N px ρ x = px ρ x −
2 1 2
x∈[m] x∈[m]
1 X 1
= px ρΓx − (13.162)
2 1 2
x∈[m]
1 X 1 X
px ρΓx px N ρAB
⩽ 1
− = x .
2 2
x∈[m] x∈[m]
Exercise 13.2.25. Show that the negativity of a pure bipartite state ψ ∈ Pure(AB) with
m := |A| = |B| is given by X √
N(ψ AB ) = px py (13.163)
x<y
x,y∈[m]
LN ρAB = log ρΓ 1 .
(13.164)
Note that the logarithmic negativity can be expressed as a function of the negativity,
namely,
LN ρAB = log 2N ρAB + 1 .
(13.165)
Therefore, the logarithmic negativity is a measure of entanglement since the negativity is an
entanglement monotone. On the other hand, the logarithmic negativity is not an entangle-
ment monotone, in particular, it is in general not convex (Exercise 13.2.26).
The logarithmic negativity is additive under tensor products. To see why, let ρ ∈ D(AB)
and σ ∈ D(A′ B ′ ). Then,
′ ′
LN ρAB ⊗ σ A B = log (ρ ⊗ σ)Γ 1
= log ρΓ ⊗ σ Γ 1 (13.166)
Γ Γ
= log ρ 1 1
= log ρΓ
σ 1
+ log σ Γ 1
′ ′
= LN ρAB + LN σ A B .
We will see later on that the logarithmic negativity provides an upper bound to the distillable
entanglement.
Exercise 13.2.26. Show that the logarithmic negativity is not convex.
The κ-Entanglement
The κ-Entanglement is another measure of entanglement that is based on the partial trans-
pose. In Sec. 13.9 we will see that the regularized version of this measure has an operational
meaning as the zero-error entanglement cost under PPT operations. The κ-entanglement is
defined for all ρ ∈ D(AB) as
In Sec. 13.9 we will see that Eκ behaves monotonically under a set of operations that is
larger than LOCC. Moreover, if ρ ∈ PPT(AB) then we can take in the equation above
Remark. From the lemma above it follows that the κ-entanglement equals the logarithmic
negativity if
|ρΓ |Γ ⩾ 0 . (13.169)
To see why, note that in this case we have that the state ρ⋆ := |ρΓ |/∥ρΓ ∥1 ∈ PPT(AB), so
by taking σ = ρ⋆ we get that the upper bound
min Dmax |ρΓ | σ ⩽ Dmax |ρΓ | ρ⋆
σ∈PPT(AB)
(13.170)
Exercise 13.2.27→ = LN(ρ) .
= Tr Π+ ΛΓ Π+ + Tr Π− ΛΓ Π−
(13.174)
Γ
(13.173)→ ⩾ 1 + 2N(ρ) = ∥ρ ∥1 .
Since the above inequality holds for all Λ ∈ Pos(AB) that satisfies −ΛΓ ⩽ ρΓ ⩽ ΛΓ , we
conclude that the lower bound in (13.168) must hold.
To get an upper bound observe that for all ρ ∈ D(AB)
log Tr[Λ] : ΛΓ ⩾ |ρΓ |
Eκ (ρ) ⩽ min
Λ∈Pos(AB)
log(t) : tσ ⩾ |ρΓ |
Λ = tσ→ = min (13.175)
σ∈PPT(AB)
This function quantify the total (i.e. both quantum and classical) amount of correlation be-
tween Alice and Bob (see Fig. 13.2a). An extension of this quantity, known as the conditional
mutual information (CMI), is a function on a tripartite density matrix defined by
Since the CMI is defined with respect to the conditional von-Neumann entropy, it can also
be expressed for all ρ ∈ D(ABR) as
2. Show that
I(AA′ : B|R)ρ ⩾ I(A : B|R)ρ . (13.181)
That is, tracing out a local subsystem cannot increase the CMI.
Exercise 13.2.30. Show that for any state of the form
′ ′
X
σ ABRA = px σxABR ⊗ |x⟩⟨x|A , (13.182)
x∈[n]
we have that X
I(A : B|RA′ )σ = px I(A : B|R)σx . (13.183)
x∈[n]
Figure 13.2: Venn Diagrams. (a) The intersection area (white) illustrates the mutual information.
(b) The white area illustrates the conditional mutual information.
The mutual information measures the overall correlations between Alice and Bob, and
only takes on the value zero for product states. However, it does not necessarily vanish for
separable states that are not products. In contrast, the CMI can be zero even for separable
states. For instance, consider the state
X
ρABR := px ρ A B
x ⊗ ρx ⊗ |x⟩⟨x|
R
(13.184)
x∈[n]
represents a pure state in a third system R that depends on the discrete variable x. Although
ρAB is a separable state, its CMI is zero. This is because knowing the value of x through
system R allows Alice and Bob to share the product state ρA B
x ⊗ ρx . The state ρ
ABR
above
belong to a special type of states known as quantum Markov states.
Quantum Markov states are a type of quantum state that exhibit a special type of corre-
lation structure between different subsystems. In a quantum Markov state, the correlation
between two subsystems is entirely mediated by a third subsystem R, which serves as a kind
of “bridge” or mediator between A and B. This correlation structure is analogous to the
Markov property in classical probability theory, where the future state of a system depends
only on its present state and not on its past states. A quantum Markov state ρ ∈ D(ABR)
is defined as follows.
Exercise 13.2.31.
Remark: The converse of this statement is also true! That is, any density matrix
ρ ∈ D(ABR) with zero CMI is necessarily a quantum Markov state.
The exercise above demonstrate that if ρAB is a separable state then it has a tripartite
extension ρABR as in (13.184) for which the conditional mutual information is zero. This
observation motivates the following definition of a measure of entanglement known as the
squashed entanglement.
Esq ψ AB = H(A)ψ ,
(13.188)
where H(A)ψ is the von-Neuman entropy of the reduced density matrix on system A.
Proof. Let ρ ∈ D(AB) and let ω ABR be an extension of ρAB ; i.e. ω AB = ρAB . Suppose Alice
applies on her system A a quantum instrument E ∈ CPTP(A → A′ A′′ ) of the form
′ ′′ ′ ′′
X
E A→A A = ExA→A ⊗ |x⟩⟨x|A . (13.189)
x∈[n]
Then, the state of the composite system ABR after Alice’s measurement is given by
′ ′′ BR ′ ′′
σA A := E A→A A ω ABR .
(13.190)
From Stinespring’s dilation theorem there exists an isometry V : A → A′ A′′ E such that
′ ′′ BR
σA A = TrE V ω ABR V ∗ .
(13.191)
1 1
I(A : B|R)ω = I(A′ A′′ E : B|R)V ωV ∗ . (13.192)
2 2
Combining the equality above with the fact that by tracing out a local subsystem the CMI
cannot increase (see (13.181)), we get by tracing out system E
1 1
I(A : B|R)ω ⩾ I(A′ A′′ : B|R)σ . (13.193)
2 2
Combining this with the chain rule (13.180) gives
1 1 1
I(A : B|R)ω ⩾ I(A′′ : B|R)σ + I(A′ : B|RA′′ )σ
2 2 2
1 ′ ′′
I(A′′ : B|R)σ ⩾ 0 −−−−→ ⩾ I(A : B|RA )σ (13.194)
2
1X
Exercise 13.2.30→ = px I(A′ : B|R)σx ,
2
x∈[n]
′ ′ ′
where px := Tr ExA→A ω ABR and σxA BR := p1x ExA→A ω ABR . Finally, by definition, for
′
any x ∈ [n] we have 12 I(A′ : B|R)σx ⩾ Esq σxA B so that the inequality above gives
1 X ′
I(A : B|R)ω ⩾ px Esq σxA B . (13.195)
2
x∈[n]
In other words, the squashed entanglement does not increase on average under any local quan-
tum instrument on system A. From symmetry, the same holds for any quantum instrument
on Bob’s system. Therefore, the squashed entanglement satisfies the strong monotonicity
property of an entanglement monotone.
To prove the convexity of Esq , let ρ1 , ρ2 ∈ D(AB) and t ∈ [0, 1]. Let ω1ABR and ω2ABR be
extensions of ρAB
1 and ρAB
2 , respectively. Note that in general we can always assume that these
extensions has the same reference system R as otherwise we embed the lower dimensional
reference system in the higher dimensional one. Finally, let R′ be a qubit system and denote
by
′ ′ ′
ω ABRR := ω1ABR ⊗ |0⟩⟨0|R + (1 − t)ω2ABR ⊗ |1⟩⟨1|R . (13.197)
Since ω AB = tρAB + (1 − t)σ AB we get that
Since the extensions ω1ABR and ω2ABR were arbitrary, we conclude that
′ ′ ′ ′
with equality if ρAA BB = ρAB ⊗ ρA B .
′ ′
Proof. Let ω ∈ D(AA′ BB ′ R) be a quantum extension of ρAA BB . Then, by applying the
chain rule in (13.180) one time with respect to Alice’s systems and one time with respect to
Combining this with (13.200) gives an equality. This completes the proof.
Exercise 13.2.33. Prove (13.202) and (13.204).
The most general one-way LOCC operation that Alice and Bob can
P perform is for Alice to
apply a quantum instrument {Ex }x∈[m] , with Ex ∈ CP(A → A′ ) and x∈[m] Ex ∈ CPTP(A →
A′ ), send the outcome x to Bob, who then applies a quantum channel Fx ∈ CPTP(B → B ′ )
that depends on the outcome x received from Alice. The overall operation can be described
by the quantum channel
′ ′ ′ ′
X
N AB→A B := ExA→A ⊗ FxB→B . (13.207)
x∈[m]
where the second equality is due to the duality relation of the conditional
von-Neumann entropy given in (7.158). Moreover, the coherent information of
entanglement of the state ρAB is defined as
E→ ρAB :=
sup I A⟩BX E(ρ) , (13.209)
E∈CPTP(A→AX)
where the supremum is also over all finite dimensions of the classical system X.
2. Show that the coherent information is convex. That is, prove that
X
I(A⟩B)ρ ⩽ px I(A⟩B)ρx . (13.211)
x∈[m]
Hint: Either use the DPI directly, or recall that a single channel on Bob’s system is a
conditionally mixing operation.
Exercise 13.2.35. ShowP that the supremum in (13.209) can be restricted quantum channels
of the form E A→AX = x∈[n] ExA→A ⊗ |x⟩⟨x|X , where each ExA→A is a CP map with a single
Kraus operator. Hint: Use the joint convexity of the relative entropy.
with the supremum extending over all dimensions of system A′ . To understand this, first
consider that if |A′ | ⩽ |A|, every channel E ∈ CPTP(A → A′ B) can be embedded in
CPTP(A → AX), as the coherent information is invariant under local isometries (a prop-
erty shared by all conditional entropies). Consequently, in this case, the supremum over
CPTP(A → AX) is at least as great as that over CPTP(A → A′ X).
′ A→A′
Conversely, if |A′ | > |A|, consider a quantum instrument E A→A X =
P
x∈[n] Ex ⊗
X A→A′ ∗
|x⟩⟨x| , where each Ex (·) = Mx (·)Mx is a CP map with a single Kraus operator Mx :
A → A′ . Through polar decomposition, each Mx can be written as Mx = Vx Nx , with
each Nx : A → A being part of a generalized measurement, and each Vx : A → A′ an
isometry. Due to the invariant property of coherent information under isometries, the CP
′
maps ExA→A (·) = Mx (·)Mx∗ can be substituted with NxA→A (·) = Nx (·)Nx∗ , allowing the
optimization over all channels in CPTP(A → A′ X) to be replaced with optimization over
all quantum instruments in CPTP(A → AX).
This observation is significant as it can be used to prove that the coherent information
of entanglement exhibits monotonic behavior under one-way LOCC.
where we replaced E ◦ N with arbitrary M ∈ LOCC1 (AB → A′ B ′ X). Now, recall that every
element of LOCC1 (AB → A′ B ′ X) can be expressed as
′ ′ ′ ′
X
MAB→A B X := EyA→A X ⊗ FyB→B , (13.216)
y∈[n]
′
with each EyA→A X being a CP map such that Ey ∈ CPTP(A → A′ X), and each
P
y∈[n]
Fy ∈ CPTP(B → B ′ ). Thus,
′ ′ X ′
′
MAB→A B X ρAB = qy FyB→B σyA BX , (13.217)
y∈[n]
′ ′ ′
where σyA BX := q1y EyA→A X ρAB and qy := Tr EyA→A ρAB . Combining this with the
convexity of the coherent information (see (13.211)) we get from (13.215) that
′ ′ X
E→ N AB→A B ρAB ⩽ sup qy I A′ ⟩B ′ X Fy (σy )
M∈LOCC1
y∈[n]
X (13.218)
qy I A′ ⟩B ′ X
cf. (13.212)→ ⩽ sup σy
.
M∈LOCC1
y∈[n]
′ ′ ′
Finally, denoting by Z := XY , and by E A→A Z := EyA →A X ⊗ |y⟩⟨y|Y we conclude that
P
y∈[n]
′ ′
E→ N AB→A B ρAB ⩽ I A′ ⟩BZ
sup E(ρ)
E∈CPTP(A→A′ Z) (13.219)
AB
(13.213)→ = E→ ρ .
Exercise 13.2.37. Show that E→ is an entanglement monotone under LOCC1 . That is,
prove the strong monotonicity property and convexity.
Exercise 13.2.38. Let Φm ∈ D(AB) be the maximally entangled state with m := |A| = |B|.
Show that
E→ ΦAB
m = log(m) . (13.220)
The coherent information of entanglement is superadditive. That is, for any ρ ∈ D(AB)
and σ ∈ D(A′ B ′ ) we have (see Exercise 13.2.39)
′ ′
′ ′
E→ ρAB ⊗ σ A B ⩾ E→ ρAB + E→ σ A B .
(13.221)
exists. We will see in the next section that this regularize coherent information of entangle-
ment has an operational meaning as the one-way distillable entanglement of ρAB .
Computing the above quantity in general is a highly challenging task, so we often rely on
establishing lower and upper bounds. In this section, we will narrow our focus to the special
cases where either ρ or σ is maximally entangled. Recall that these cases are particularly
relevant for calculating entanglement distillation and entanglement cost.
Remark. In Sec. 12.5.1, we explored various conversion distances among pure bipartite states.
It was established that the P⋆ -conversion distance is equal to the P -conversion distance.
Additionally, we speculated, albeit without formal proof, that the T -conversion distance
might be strictly smaller than the P -conversion distance. The lemma above confirms this
speculation by demonstrating that, when the target state is Φm , the T -conversion distance
actually aligns with the P 2 -conversion distance, which is indeed strictly smaller than the P -
conversion distance. This outcome is based on the understanding that the purified distance
is no greater than one; thus, squaring it effectively reduces its magnitude.
Proof. Let G ∈ LOCC(A′ B ′ → A′ B ′ ) be the twirling map given in (3.251). That is, for any
ω ∈ D(A′ B ′ ) Z
G (ω) := dU (U ⊗ U )ω(U ⊗ U )∗
U(m) (13.225)
(3.255)→ = (1 − Tr [Φm ω]) τ + Tr [Φm ω] Φm ,
where τ ∈ D(A′ B ′ ) is given by τ = (I − Φm )/(m2 − 1). In particular, observe that for all
ω ∈ D(A′ B ′ )
1 1
∥G (ω) − Φm ∥1 = 1 − Tr [Φm ω] ∥τ − Φm ∥1 = 1 − Tr [Φm ω] , (13.226)
2 2
where the last equality follows from the fact that τ Φm = Φm τ = 0. From the DPI of the
trace distance, and the invariance of Φm under the twirling map G, it follows that for all
N ∈ LOCC(AB → A′ B ′ ) and all ρ ∈ D(AB)
1 1
∥N (ρ) − Φm ∥1 ⩾ ∥G ◦ N (ρ) − Φm ∥1 . (13.227)
2 2
Since G ◦ N is also an LOCC channel it follows from the inequality above that the conversion
distance can be expressed as
LOCC
1
T ρ −−−→ Φm = inf ∥G ◦ N (ρ) − Φm ∥1
N ∈LOCC 2
(13.228)
(13.226)→ = 1 − sup Tr [Φm N (ρ)] .
N ∈LOCC
Exercise 13.3.1. Let k := |A| = |B|, m = |A′ | = |B ′ |, and ρ ∈ D(AB). Show that
AB LOCC ′B′
k
T ρ −−−→ ΦA
m ⩾1− . (13.229)
m
LOCC ′B′
Note that this bound is not trivial for k < m. Hint: Estimate T ΦAB
k −−−→ ΦA
m .
where the supremum is over all N ∈ LOCC1 (AB → A′ B ′ ). It is evident that since LOCC1
is a subset of LOCC, we have the following inequality for all ρ ∈ D(AB):
LOCC LOCC1
T ρ −−−→ Φm ⩽ T ρ −−−→ Φm . (13.231)
Given that one-way LOCC is significantly easier to characterize than LOCC, we can rep-
LOCC1
resent the conversion distance T (ρ −−−→ Φm ) as an optimization problem over quantum
instruments (see the lemma below). This simplification is advantageous because it reduces
the complexity involved in the calculation and allows for a more straightforward analysis
of the conversion distance. By focusing on one-way LOCC, we limit the operations to a
sequence where one party, say Alice, performs a quantum operation and communicates the
outcome classically to the other party (Bob), who then performs a quantum operation based
on that information. This constraint narrows down the set of operations to be considered
in the optimization problem, making the task of determining the conversion distance more
manageable and conceptually clearer.
In the following lemma we relate between the conversion distance under one-way LOCC
↑
and the optimized conditional min-entropy Hmin as defined in (7.143). The relation will be
given in terms of the function
↑ ′
Qmin (A′ |BX)τ := 2−Hmin (A |BX)τ ∀τ ∈ D(A′ BX) , (13.232)
where we extended the definition of Qmin to subnormalized states such that for any σ ∈
D⩽ (A′ B)
↑ ′
Qmin (A′ |B)σ := 2−Hmin (A |B)σ
n
B A′ B A′ B
o (13.234)
:= min Tr Λ : I ⊗ Λ ⩾ σ , Λ ∈ Pos(B) .
Remark. Note that Qmin (A′ |BX)E(ρ) depends on m as m := |A′ |. Replacing CPTP(A →
A′ X) in (13.235) with CPTP(A → AX), we obtain a lower bound (see Exercise 13.3.3):
LOCC1
1
T ρ −−−→ Φm ⩾ 1 − sup Qmin (A|BX)E(ρ) . (13.236)
m E∈CPTP(A→AX)
Exercise 13.3.3. Prove (13.236). Hint: Use the property that any conditional entropy is
invariant under local isometries (particularly on Alice’s system).
Building on Lemma (13.3.2) and Exercise 13.3.2, the conversion distance can be re-
expressed as follows:
LOCC1
1 X
T ρ −−−→ Φm = 1 − sup Qmin (A′ |B)Ex (ρ) . (13.239)
m E∈CPTP(A→A′ X)
x∈[k]
Combining this with the relation P 2 = 1 − F 2 between the purified distance and the fidelity,
we conclude that (13.235) can also be rewritten as:
X ′ ′
LOCC1
T ρAB −−−→ Φm = inf min P 2 uA ⊗ τ E , ExA→A (ρAE ) , (13.241)
{Ex } τ ∈D(E)
x∈[k]
′
where the infimum is over all k ∈ N and all quantum instruments {ExA→A }x∈[k] . We will use
this form of the conversion distance to get the following upper bound.
Upper Bound
The main result of this subsection is the following upper bound on the right-hand side of
the equation above.
Upper Bound
Theorem 13.3.1. Let ρ ∈ Pure(ABE) and m ∈ N. Then,
√ 1 ↑
AB LOCC1
T ρ −−−→ Φm ⩽ m2− 2 H̃2 (A|E)ρ , (13.242)
Remark. When m ⩾ |A|, the upper bound above is trivial, since in this case
√ − 1 H̃ ↑ (A|E)ρ 1 ↑
m2 2 2 = 2 2 (log(m)−H̃2 (A|E)ρ) ⩾ 1 , (13.243)
where we used the fact that H̃2↑ (A|E)ρ ⩽ log |A| ⩽ log(m). However, as we will soon see,
this upper bound is very useful when |A| > m. Specifically, we will use it to derive a tight
lower bound on the distillable entanglement.
Proof. We get the upper bound on the conversion distance in two stages. First, by taking
τ E = ρE in (13.241) we get
′B′
′
A→A′ AE
LOCC1
X
T ρAB −−−→ ΦA
m ⩽ inf P 2
u A
⊗ ρ E
, E x (ρ ) . (13.244)
{Ex }
x∈[k]
Second, we replace the maximization above over all quantum instruments {Ex }x∈[k] with a
specific choice of a quantum instrument to get a simpler upper bound. We will denote by
n := |A| and assume that m ⩽ n (see the remark above).
Observe that the expression in (13.244) has a form that is somewhat similar to the
decoupling theorem studied in Sec. 7.7. Therefore, our strategy is to choose {Ex }x∈[k] in such
a way that we will be able to use the upper bound given in the decoupling theorem. For this
purpose, recall that the twirling operation G ∈ CPTP(AÃ → AÃ) as defined in (7.199) can
be express as a finite convex combination of product unitary channels as given in (7.211).
With these k ∈ N, p ∈ Prob(k), and {Ux }x∈[k] ⊂ U(A), we define
′ ′
ExA→A := px N A→A ◦ UxA→A , (13.245)
′
n ∗
where Ux (·) = Ux (·)Ux∗ , and N A→A (·) := m V (·)V , where V : A′ → A is some isometry.
′
We now discuss the properties of the set {ExA→A }x∈[k] . First, observe that by definition,
the channel X
RA→A := px UxA→A , (13.246)
x∈[k]
corresponds to the completely randomizing channel that outputs the maximally mixed state
irrespective on the input state. This follows from the fact that both (7.199) and (7.211)
corresponds to the same twirling channel, so their marginal channels are also the same
(see (3.242)). With this at hand, we get that
′ ′ ′
X
E A→A := ExA→A = N A→A ◦ RA→A . (13.247)
x∈[k]
′ ′
Next, we argue that E A→A is trace preserving so that {ExA→A }x∈[k] as defined above is indeed
a quantum instrument. To see it, observe that for all ω ∈ L(A) we have
h i h i
A→A′ A A→A′ A→A A
Tr E (ω ) = Tr N ◦R ω
h i
1 1 A→A′ A A
N (uA ) = N (I A ) = V V ∗ −−−−→ = Tr N Tr[ω ]u
n m
(13.248)
1
= Tr[ω ] Tr[V V ∗ ]
A
m
= Tr[ω A ] ,
where in the last line we used the fact that V V ∗ is a projection of rank m = |A′ | since
V : A′ → A is an isometry.
Therefore, with this choice of quantum instrument, Eq. (13.244) becomes
′B′
X ′
LOCC1 A→A′
T ρAB −−−→ ΦA
m ⩽ p x P 2
u A
⊗ ρ E
, N ◦ U A→A AE
x (ρ )
x∈[k]
X ′
∗ ′
(13.249)
P 2 (ρ, σ) ⩽ ∥ρ − σ∥1 −−−−→ ⩽ px N A→A UxA ρAE UxA − uA ⊗ ρE ,
1
x∈[k]
where the last line follows from (5.202). Finally, to apply the decoupling theorem as given
in (7.221), we define
′ 1 AA′ 1 ′
1 A ∗
τ AA := JN = N Ã→A ΩAÃ = I ⊗ V ΩAÃ I A ⊗ V .
(13.250)
n n m
′ ′
Note in particular that the marginal τ A = uA so that the right-hand side of Eq. (13.249) has
the exact same form as given on the left-hand side of (7.221) (in the decoupling theorem).
Hence, we can apply the decoupling theorem to get
− 12 H̃2↑ (A|E)ρ +H̃2↑ (A|A′ )τ
AB LOCC1 A′ B ′
T ρ −−−→ Φm ⩽2 . (13.251)
′
Moreover, since τ AA is maximally entangled we get that
Substituting this into the previous equation gives (13.242). This completes the proof.
′ ′ ′ ′
AB
In what follows we take the source state σ A B to be the maximally entangled state Φm .
Exercise 13.3.4. Show that for any ρ ∈ D(AB) and any σ ∈ D(A′ B ′ ) we have
LOCC LOCC LOCC
T 2 σ −−−→ ρ ⩽ P 2 σ −−−→ ρ ⩽ 2T σ −−−→ ρ . (13.254)
Proof. Recall first that for the case that ρAB is a pure pure state, the theorem follows from
Corollary 12.5.1. We therefore need to generalize this result to the case that ρAB is a mixed
state. We start by showing that
n o n o
E (Φm ) : E ∈ LOCC(A′ B ′ → AB) = ω ∈ D(AB) : SR ω AB ⩽ m ,
(13.256)
where SR ω AB is the Schmidt rank as defined in (13.120). Indeed, since the Schmidt rank
SR as defined in (13.120) is a measure of entanglement it follows that for ω AB = E (Φm )
SR ω AB ⩽ SR (Φm ) = m .
(13.257)
Therefore, the left-hand side of (13.256) is contained in the right-hand side. On the other
hand, from Exercise 13.2.14 it follows that every state ω AB with Schmidt rank no greater
than m has a pure state decomposition with all states having Schmidt rank no greater than
m. As a consequence of Nielsen’s theorem, such a state ω AB can be generated by LOCC
from Φm . That is, the right-hand side of (13.256) is contained in the left-hand side. This
completes the proof of the equality in (13.256).
Let E be a purifying system of dimension n := |E| ⩽ |AB|. From the equivalency of the
two sets in (13.256) we get
2 AB 2 AB AB
max
′ ′
F E (Φm ) , ρ = max F ω , ρ
E∈LOCC(A B →AB) ω∈D(AB), SR(ω)⩽m
′
Uhlmann s theorem→ = max ⟨ϕABE |ψ ABE ⟩
2
. (13.258)
ψ,ϕ∈Pure(ABE)
SR(ϕAB )⩽m, ψ AB =ρAB
Let {|x⟩E }x∈[n] be a fixed orthonormal basis of the purifying system E, and observe that
every purification ψ ABE of ρAB can be expressed as
X√
|ψ ABE ⟩ := px |ψxAB ⟩|x⟩E , (13.259)
x∈[n]
where {px , ψxAB }x∈[n] is a pure state decomposition of ρAB . Specifically, there is a one-to-one
correspondence between all purifications ψ ABE of ρAB P that have the form (13.259), and all
AB AB AB
pure states decompositions {px , ψx }x∈[n] of ρ = x∈[n] px ψx .
Similarly, for every ω AB with Schmidt rank SR(ω AB ) ⩽ m let
X√
|ϕABE ⟩ := qx |ϕx ⟩AB |x⟩E , (13.260)
x∈[n]
be a purification of ω AB with the property that each |ϕx ⟩AB has a Schmidt rank no greater
than m (see Exercise 13.2.14). With this at hand, we get from (13.258) that
X√ 2
2 AB AB AB
max′ ′
F E (Φm ) , ρ = max q x p x ⟨ϕx |ψx ⟩ , (13.261)
E∈LOCC(A B →AB)
x∈[n]
where the maximum on the right-hand side is over all pure states decompositions of ρAB =
AB AB
P
x px ψx , all probability vectors q ∈ Prob(n), and all pure states {ϕx }x∈[n] with Schmidt
rank no greater than m. Now, from Corollary 12.5.1 it follows that (see Exercise 13.3.5) for
every ψ ∈ Pure(AB)
2
max ⟨ϕAB |ψ AB ⟩ = ∥ψ A ∥(m) . (13.262)
ϕ∈Pure(AB)
SR(ϕ)⩽m
where the maximum on the right-handPsides stands for a maximum over all pure state
decompositions {px , ψxAB }x∈[n] of ρAB = x∈[n] px ψxAB . In terms of the purified distance we
have X
2 AB
px ψxA (m)
max′ ′
P E (Φm ) , ρ = 1 − max
E∈LOCC(A B →AB)
x∈[n]
X
= min px 1 − ψxA (13.265)
(m)
x∈[n]
= E(m) ρAB ,
where both the min and max above are over all pure-state decompositions {px , ψxAB }x∈[n] of
ρAB . This completes the proof.
Since the computation of the distillable entanglement is hard, we start with the single-shot
one-way distillable entanglement. As LOCC1 is a subset of LOCC, any lower bound on the
one-way distillable entanglement will automatically provide a lower bound on the distillable
entanglement defined above.
Exercise 13.4.1. Let k := |A| = |B|, m = |A′ | = |B ′ |, and ρ ∈ D(AB). Show that
A simple formula for the above expression is not presently available. However, we can provide
some useful lower and upper bounds.
In the following theorem, we present an upper bound on the single-shot one-way distillable
entanglement. It is worth noting that the upper bound given in (11.38) will not be helpful
in this case, as we are considering a subset of LOCC. Therefore, we can expect to obtain a
tighter upper bound, particularly since the upper bound given in (11.38) remains valid even
if we replace LOCC with non-entangling operations.
The upper bound presented in the following theorem is expressed in terms of the coherent
information of entanglement, denoted as E→ . This particular measure of entanglement has
been defined and extensively examined in Sec. 13.2.6.
Theorem 13.4.1. Let ρ ∈ D(AB) and ε ∈ (0, 1/2). Then, the one-way ε-single-shot
distillable entanglement is bounded by
ε AB
1 AB
1+ε ε
Distill→ ρ ⩽ E→ ρ + h , (13.269)
1 − 2ε 1 − 2ε 1+ε
LOCC1 ′B′
Proof. Let m ∈ N be such that Distillε→ ρAB = log m so that T ρAB −−−→
ΦA
m ⩽ ε.
LOCC ′ ′ ′ ′
This means that ρAB −−−−→ 1
σ A B for some state σ ∈ D(A′ B ′ ) that is ε-close to ΦmAB
.
Therefore, from the monotonicity of E→ under one-way LOCC we get that
′ ′
AB
⩾ E→ σ A B .
E→ ρ (13.270)
′ ′ ′ ′
Next, we use the fact that σ A B is ε-close the ΦA m
B
to show that the right-hand side of
the equation above cannot be much smaller than log(m). Indeed, combining the continuity
property of the function I(A′ ⟩B ′ )ρ := −H(A′ |B ′ )ρ (see (10.50)), with the second part of
Exercise 13.2.36, gives
′ ′
E→ σ A B ⩾ I(A′ ⟩B ′ )σ
′ ′ ε
(10.50)→ ⩾ I(A ⟩B )Φm − 2ε log m − (1 + ε)h (13.271)
1+ε
ε
= (1 − 2ε) log(m) − (1 + ε)h .
1+ε
The proof is concluded by noting that the inequality above in conjunction with the inequal-
ity (13.270) yields the desired inequality (13.269).
Exercise 13.4.2. Use similar lines as in the proof above to prove the following bound on
the ε-shot distillable entanglement
ε AB
1 ′ ′
1+ε ε
Distill ρ ⩽ sup I A ⟩B E(ρ) + h . (13.272)
1 − 2ε E∈LOCC(AB→A′ B ′ ) 1 − 2ε 1+ε
In the context where the single-shot distillable entanglement (and the entanglement cost)
is expressed as log(m) for some integer m ∈ N, it is useful to introduce a specific notation, ⪆,
to denote a particular type of inequality between two real numbers a, b ∈ R. This notation
is defined as follows:
a ⩾ log 2b .
a⪆b ⇐⇒ (13.273)
This definition provides a convenient way to express inequalities that are relevant in the
quantification of entanglement, especially in scenarios involving logarithmic expressions
and
ε AB
integer values. With this in mind, our next goal is a lower bound on the Distill→ ρ . The
lower bound given below is known as the single-shot hashing bound.
Theorem 13.4.2. Let ρ ∈ Pure(ABE) and ε ∈ (0, 1). Then, for every 0 < δ < ε we
have
Distillε→ ρAB ⪆ Hmin
δ
(A|E)ρ + log(ε − δ)2 .
(13.274)
Remark. Observe that unlike the upper bound which is given in terms of the coherent in-
formation of entanglement, the lower bound above does not involve an optimization over
channels in CPTP(A → AX).
Proof. The main strategy of the proof is to use the upper bound given in Theorem 13.3.1
for the one-way convcersion distance. Specifically, let δ ∈ (0, 1) and let ρ̃ ∈ Bδ (ρAE ) be such
↑δ ↑
that Hmin (A|E)ρ = Hmin (A|E)ρ̃ . With these definitions we get
√ − 1 H ↑δ (A|E)ρ √ − 1 H ↑ (A|E)ρ̃
m2 2 min = m2 2 min
√ − 1 H̃ ↑ (A|E)ρ̃
↑
Hmin ⩽ H2↑ −−−−→ ⩾ m2 2 2
AB LOCC1 A′ B ′
(13.275)
(13.242)→ ⩾ T ρ̃ −−−→ Φm
′B′
LOCC1
Lemma 11.1.2→ ⩾ T ρAB −−−→ ΦAm −δ .
That is,
AB LOCC1 ′B′
√ − 1 H ↑δ (A|E)ρ
T ρ −−−→ ΦA
m ⩽ m2 2 min +δ . (13.276)
Therefore, for any ε ∈ (0, 1) and 0 < δ < ε we get that the one-way ε-distillable entanglement
satisfies
n o
ε AB AB LOCC1 A′ B ′
Distill→ ρ := max log m : T ρ −−−→ Φm ⩽ε
n √ 1 δ
o
(13.276)→ ⩾ max log m : m2− 2 Hmin (A|E)ρ + δ ⩽ ε
n o (13.277)
δ
= max log m : log m ⩽ Hmin (A|E)ρ + log(ε − δ)2
j δ (A|E)
k
2 Hmin
= log (ε − δ) 2 ρ
.
Exercise 13.5.1. Show that for any n ∈ N and any ρ ∈ D(AB) we have
1 1
Distill ρ⊗n Distill→ ρ⊗n .
Distill (ρ) ⩾ and Distill→ (ρ) ⩾ (13.282)
n n
Remark. Since the distillable entanglement is always no smaller than the one-way distillable
entanglement, we also have
Distill ρAB ⩾ I(A⟩B)ρ .
(13.284)
Proof. Let ρABE ∈ Pure(ABE) be a purification of ρAB , and let ε, δ ∈ (0, 1) be such that
δ < ε. From the lower bound in (13.274) we get
1 ε ⊗n
1 δ n n
lim inf Distill→ ρ ⩾ lim inf Hmin (A |E )ρ⊗n + 2 log(ε − δ)
n→∞ n n→∞ n
1 δ
= lim inf Hmin (An |E n )ρ⊗n (13.285)
n→∞ n
Theorem 11.2.2→ = H(A|E)ρ
Duality relation (7.158)→ = −H(A|B)ρ = I(A⟩B)ρ .
Since the equation above holds for all ε ∈ (0, 1), it also holds if we take the limit ε → 0+ .
This completes the proof.
It is worth noting that the Hashing bound reveals that the distillable entanglement is
non-zero whenever the conditional entropy of ρAB is negative. As we discussed earlier, the
conditional entropy can only be negative for entangled states, which aligns with the fact
that only entangled states can possess non-zero distillable entanglement. However, in the
upcoming sections, we will discover that the converse statement is not true. Specifically,
there exist entangled states with zero distillable entanglement.
Remark. It is worth noting that the theorem provides an operational interpretation for the
coherent information of entanglement as the one-way distillable entanglement.
Proof. For the direct part (i.e. achievability), observe that any quantum instrument E ∈
CPTP(A → AX) can be considered as a special type of LOCC1 (i.e. Alice implies the
instrument {Ex }x∈[m] on her system and sends the outcome x to Bob). Since the single-shot
one-way distillable entanglement behaves monotonically under such LOCC1 we get that for
all ε ∈ (0, 1)
Distillε→ ρAB ⩾ Distillε→ σ ABX ,
(13.288)
where σ ABX := E A→AX ρAB . Since for every n ∈ N the equation above also holds with n
copies of ρ and σ we get
1 1
Distillε→ ρ⊗n ⩾ lim inf Distillε→ σ ⊗n
lim inf
n→∞ n n→∞ n (13.289)
(13.285)→ ⩾ I(A⟩BX)σ
Since the inequality above holds for all E ∈ CPTP(A → AX) we conclude that
1
Distillε→ ρ⊗n ⩾ E→ ρAB .
lim inf (13.290)
n→∞ n
We would like to replace the right-hand side above with the regularized version of E→ . For
this purpose, fix k ∈ N and observe that by applying the above inequality for ρ⊗k ∈ D Ak B k
we get
1 1
Distillε→ ρ⊗kn ⩾ E→ ρ⊗k .
lim inf (13.291)
n→∞ kn k
We next show that the left-hand side of the two equations above coincide. Indeed, by
definition of the lim inf, the left-hand side of (13.290) is no greater than the left-hand side
of (13.291). For the converse, let {nj }j∈N be a subsequence of integers such that
1 1
Distillε→ ρ⊗n = lim Distillε→ ρ⊗nj .
lim inf (13.292)
n→∞ n j→∞ nj
n
Now, for any j ∈ N, set mj := k kj ; i.e. mj is the largest multiple of k that is no greater
than nj . In particular, note that nj − k < mj ⩽ nj . Then,
1 1
Distillε→ ρ⊗kn ⩽ lim inf Distillε→ ρ⊗mj
lim inf
n→∞ kn j→∞ mj
1
Distillε→ ρ⊗nj
mj ⩽ nj −−−−→ ⩽ lim inf
j→∞ mj
(13.293)
1
Distillε→ ρ⊗nj
nj − k < mj ⩽ nj −−−−→ = lim
j→∞ nj
1
(13.292)→ = lim inf Distillε→ ρ⊗n ,
n→∞ n
where the first inequality follows from the fact that {mj }j∈N is a subset of {kn}n∈N , the second
inequality from the fact that Distillε→ (ρ⊗mj ) ⩾ Distillε→ (ρ⊗nj ) as mj ⩾ nj , and the third in
n
equality from the fact that m1j = mjj n1j and limj→∞ nj /mj = 1 (since nj −k < mj ⩽ nj ). This
completes the proof that the left-hand side of (13.290) equals the left-hand side of (13.291)
so that
1 1
lim inf Distillε→ ρ⊗n ⩾ E→ ρ⊗k .
(13.294)
n→∞ n k
Since the equation above holds for all k ∈ N it must also hold for the limit k → ∞; that is,
we conclude that
1
lim inf Distillε→ ρ⊗n ⩾ E→ reg
ρAB .
(13.295)
n→∞ n
For the converse inequality we apply the upper bound (13.269) with ρ⊗n instead of ρ.
Explicitly, observe that for any ε ∈ (0, 1)
1 ε ⊗n
1 1 ⊗n
1+ε ε
lim sup Distill→ ρ ⩽ lim sup E→ ρ + h
n→∞ n n→∞ n 1 − 2ε 1 − 2ε 1+ε
(13.296)
1 reg AB
= E ρ .
1 − 2ε →
Taking the limit ε → 0+ and combining the resulting inequality with (13.295) we conclude
that the equality in (13.287) holds.
In the final step of the proof just discussed, we had to consider the limit as ε approaches
0 (from the positive side). It remains unclear to the author whether this step is essential,
and whether the following equality holds true:
1
lim Distillε→ ρ⊗n = E→ reg
ρAB
(13.297)
n→∞ n
for all ε within the interval (0, 1). This point raises an interesting question in the study
of quantum information theory, particularly regarding the behavior of the distillable en-
tanglement under asymptotic conditions. The uncertainty here revolves around whether
reg
the regularized entanglement measure, E→ , aligns with the distillable entanglement rate,
1 ε ⊗n
n
Distill → (ρ ), for any non-zero ε. Resolving this would contribute to a deeper understand-
ing of entanglement properties in quantum systems.
for some two-qubit entangled state σ ∈ D(A′ B ′ ) with |A′ | = |B ′ | = 2. However, since the
logarithmic negativity is additive we get
whereas ′ ′
LN σ A B > 0 (13.301)
′ ′
since σ A B is a two-qubit entangled state and from Theorem 13.1.4 it follows that it is NPT
(recall that the logarithmic robustness is strictly positive on NPT states). We therefore get
that ′ ′
LN ρ⊗n < LN σ A B
(13.302)
in contradiction with (13.299) and the fact that the logarithmic robustness is a measure of
entanglement and therefore cannot increase by LOCC. This completes the proof.
By applying the hashing bound, we can deduce that if H(A|B)ρ < 0, then the distillable
entanglement of ρAB is strictly positive. Combining this with the theorem mentioned above,
we can conclude that if ρ ∈ D(AB) is PPT, then its conditional entropy H(A|B)ρ ⩾ 0. This
observation is consistent with the reduction criterion discussed in Section 13.1.3, as Corol-
lary 7.3.1 and Theorem 7.3.1 show that states satisfying the reduction criterion (particularly
PPT states) have non-negative conditional entropy.
Exercise 13.5.2. Let ρ ∈ Pure(ABC) be a tripartite pure state. Show that if its marginals
satisfy I A ⊗ ρC > ρAC then Distill ρAB > 0.
Theorem 13.5.3 states that PPT entangled states have zero distillable entanglement. This
result is an important insight into the relationship between entanglement and the partial
transpose operation. However, it raises the question of whether the converse of this property
also holds. That is, are all entangled states with zero distillable entanglement (i.e., bound
entangled states) necessarily PPT? This is one of the most challenging and long-standing
open problems in quantum information theory, and despite significant efforts over the past
two decades, the answer is still unknown. Despite the current lack of a definitive answer to
this question, research in this area continues to progress, with new insights and techniques
being developed to study the properties of entangled states and their relation to the partial
transpose operation.
where Φm is the maximally entangled state in D(A′ B ′ ) with m := |A′ | = |B ′ |. Since all
metrics in finite dimensional Hilbert spaces are topologically equivalent, we chose the square
of the purified distance as it is easier to work with, and in particular, has the form given in
Theorem 13.3.2.
That is, the logarithm of the Schmidt rank of ρAB offers an operational interpretation as the
zero-error entanglement cost of ρAB . Following this, we demonstrate that SR(ρAB ) bears a
close relationship to the conditional max-entropy. To elaborate further, let’s first present an
alternative method for describing the convex roof extension.
X
ρXAB := px |x⟩⟨x|X ⊗ ψxAB . (13.305)
x∈[k]
Note that if {ψxAB }x∈[k] in the definition above were composed of mixed states instead of
pure states, then ρXAB would not necessarily qualify as a regular extension, even though it
would still be an extension of ρAB . Moreover, we show now that the Schmidt rank of ρAB can
be expressed as an optimization problem over all marginal cq-states ρXA that results from
regular extensions. Explicitly, if ρXAB is a regular extension of ρAB , as given in (13.305),
then the marginal cq-state, ρXA , has the form
X
ρXA = px |x⟩⟨x|X ⊗ ρA
x , (13.306)
x∈[k]
AB
where ρAx := TrB ψx . By definition, SR(ψxAB ) = Tr ΠAρx , where Πρx ∈ Pos(A) is the
projection in A to the support of ρA
x . Combining this with the relation (13.122) we can
AB
express the Schmidt rank of ρ as
where the maximum is over all regular extensions ρABX of ρAB . The expression above can
be rewritten in terms of the conditional max-entropy of the state ρAX . To see this, let ΠXA
ρ
where in the last line we replaced the maximum over all τ ∈ D(X) with a maximum over all
x ∈ [k]. Using this observation in conjunction with (13.307), we can express the logarithm
of the Schmidt rank of ρAB as
where the infimum is over all classical systems X and all regular extensions ρABX of ρAB . In
the following exercise you show that we can remove the restriction to regular extensions.
Exercise 13.6.1. Show that (13.311) still holds even if we take the infimum over all classical
systems X and all extensions ρABX of ρAB .
Exercise 13.6.2. Let ρ ∈ D(AB). Show that the entanglement of formation of ρAB can be
expressed as
EF ρAB := inf H(A|X)ρ
(13.312)
ρABX
where H(A|X)ρ is the von-Neumann conditional entropy, and the infimum is over all classical
systems X and all extensions ρABX of ρAB .
where the second infimum is over all density matrices ω AB , all classical systems X, and
over all regular extensions ω ABX of ω AB . Given the complexity of the above expression,
we’ll transition to an alternative approach, specifically employing the formula presented in
Theorem 13.3.2 for the conversion distance.
Remark. Exercise 13.6.1 allows us to limit the infimum in (13.314) to regular extensions
ρABX of ρAB . Additionally, from (13.309), the smoothed version of Hmax (A|X)ρ is expressed
as:
ε
(A|X)ρ = min max log Tr ΠA
Hmax ωx , (13.315)
ω∈Bε (ρXA ) x∈[k]
X
ρ(m) = px |x⟩⟨x| ⊗ ρ(m)
x , (13.316)
x∈[k]
(m)
where each ρx is the m-pruned version of ρx as defined in (5.156).
Exercise 13.6.4. Use (12.60) to show that if ρ(m) ̸= ρXA then H(A|X)ρ(m) = log m.
Lemma 13.6.1. Let ε ∈ (0, 1), d := |A|, ρ ∈ D(XA), and for all m ∈ [d] let
ρ(m) ∈ D(XA) be the m-pruned version of ρXA as defined in (13.316). Then,
ε
(A|X)ρ = min log m : ρ(m) ≈ε ρXA
Hmax (13.317)
m∈[d]
Remark. Observe that the trace distance between ρXA and its m-pruned version is given by
1 XA X 1
ρ − ρ(m) 1 = px ρx − ρ(m)x 1
2 2
x∈[k]
X
(5.157)→ = px 1 − ∥ρx ∥(m) (13.318)
x∈[k]
X
=1− px ∥ρx ∥(m) .
x∈[k]
Proof. By definition,
ε
(A|X)ρ = min Hmax (A|X)ω : ω XA ≈ε ρXA
Hmax
(m)
≈ε ρXA
Restricting ω = ρ(m) −−−−→ ⩽ min Hmax (A|X)ρ(m) : ρ
m∈[d] (13.320)
Exercise 13.6.4→ = min log m : ρ(m) ≈ε ρXA .
m∈[d]
ρXA − ω XA ΠXA
h i
Tr ω XA ΠXA
ω =1 −−−−→ = 1 + Tr ω
h i (13.323)
ρXA − ω XA − ΠXA
η ⩾ −(η)− ∀η ∈ Herm(XA) −−−−→ ⩾ 1 − Tr ω
XA XA
ΠXA
ω ⩽ I XA −−−−→ ⩾ 1 − Tr ρ −ω −
⩾1−ε,
where we used the fact that Tr ρXA − ω XA − = 21 ∥ω XA −ρXA ∥1 ⩽ ε. Therefore we get a con-
tradiction with the assumption that m was the minimizer of the right-hand side of (13.321).
This completes the proof.
We are now ready to prove Theorem 13.6.1.
Proof of Theorem 13.6.1. From Theorem 13.3.2 it follows that the conversion distance that
appears in (13.303) can be expressed as
X
LOCC
P 2 Φm −−−→ ρAB = min px 1 − ρA x (m) (13.324)
ρXAB
x∈[k]
where the minimum is over all regular extensions ρXAB of ρAB , with the same notations as
in (13.306). We therefore get that the ε-single-shot entanglement cost as defined in (13.303)
can be expressed as
n X o
Costε ρAB = min log m : max px ρ A
x (m) ⩾ 1 − ε
m∈[d] ρXAB
x∈[k]
n X o
Exercise 13.6.5→ = min min log m : px ρA
x (m) ⩾1−ε (13.325)
ρXAB m∈[d]
x∈[k]
ε
(13.319)→ = min Hmax (A|X)ρ ,
ρXAB
where the maximum is overl all regular extensions ρXAB of ρAB . This completes the proof.
2. Without assuming (13.314), use Theorem 5.4.3 to show (by direct calculation) that for
the pure state case, the formula in (12.101) can be expressed as
Costε ψ AB = Hmaxε
(ρA ) := inf Hmax (σ A ) ,
(13.327)
σ∈Bε (ρ)
where ρA := TrB ψ AB .
Recall from Exercise 11.5.2 that the above cost does not change if we replace the trace dis-
tance with the square of the purified distance, as the two metrics are topologically equivalent.
Thus, from (11.111) it follows that the asymptotic entanglement cost can be expressed as
1
Cost ρAB := lim+ lim inf Costε ρ⊗n .
(13.329)
ε→0 n→∞ n
Theorem 13.7.1. Let ρ ∈ D(AB). Then, the entanglement cost of ρAB can be
expressed as
1
Cost ρAB = EFreg ρAB := lim EF ρ⊗n
(13.330)
n→∞ n
Proof. We first prove that Cost ρAB ⩽ EFreg ρAB . From Theorem 13.6.1 we have
⊗n
Costε ρAB ε
= inf Hmax (An |Yn )ρn
ρn
⊗n (13.331)
ε
(An |X n )ρ⊗n
taking ρn = ρXAB −−−−→ ⩽ inf Hmax
ρXAB
where the first infimum is over all classical systems Yn and all regular extensions ρn ∈
AB ⊗n
n n
D(Yn A B ) of ρ , and the second infimum is over all regular extensions ρ ∈ D(XAB)
AB XAB
of ρ . For theinequality above we used the fact that if ρ is a regular extension of
AB XAB ⊗n AB ⊗n
ρ then ρ is a regular extension of ρ (see Exercise 13.6.3). Therefore, the
entanglement cost satisfies
1 ε
Cost ρAB ⩽ lim+ inf lim inf Hmax (An |X n )ρ⊗n
ε→0 ρXAB n→∞ n
Thus, EF ρAB ⩾ Cost ρAB . Repeating the same argument with m ∈ N copies of ρAB
gives EF (ρ⊗m ) ⩾ Cost (ρ⊗m ). Combining this with (11.120) we get
1 1
Cost ρAB ⩽ Cost ρ⊗m ⩽ EF ρ⊗m .
(13.333)
m m
Since the inequality
AB
reg AB
holds for all integers m, it also holds in the limit m → ∞. Hence,
above
Cost ρ ⩽ EF ρ .
Now, the asymptotic continuity property (10.51) of the conditional entropy gives for any
n
ω ∈ Bε ρYnn A
H(An |Yn )ω ⩾ H(An |Yn )ρn − n log |A|f (ε) . (13.335)
Substituting this into (13.334) gives
Costε ρ⊗n ⩾ inf H(An |Yn )ρn − n log |A|f (ε)
ρn
(13.336)
Exercise 13.6.2→ = EF ρ⊗n − n log |A|f (ε) .
Dividing both sides by n and taking the limit n → ∞ followed by ε → 0+ gives Cost ρAB ⩾
EFreg ρAB . This completes the proof.
If the entanglement of formation (EF ) were additive under tensor products, determining
EFreg ρAB would be a more straightforward task. For quite some time, a prevalent belief
among researchers in the field was that EF is indeed additive, implying its equivalence to
the entanglement cost. However, in a pivotal development in 2008, Hastings refuted this
additivity conjecture, demonstrating that the entanglement of formation is generally not
additive. From its definition, for any states ρ ∈ D(AB) and σ ∈ D(A′ B ′ ), the following
inequality holds: ′ ′
′ ′
EF ρAB ⊗ σ A B ⩽ EF ρAB + EF σ A B .
(13.337)
Hastings’ result indicates that this inequality can be strict, even when ρ is equal to σ.
Notably, Hastings’ proof is existential, meaning it establishes the existence of such non-
additivity without providing an explicit counterexample. To date, an explicit example where
EF is not additive has not been identified, but Hastings’ contribution significantly altered our
understanding on this problem (further details can be found in the ”Notes and References”
section at the end of this chapter).
While the entanglement of formation (EoF) is not generally additive, it can be additive
for certain specific states, allowing for efficient computation of their entanglement cost. An
interesting concept relevant in this context is that of an “entanglement breaking subspace”.
2. The reduced density matrix of every ϕAB ∈ Pure (K) of the form |ϕAB ⟩ = x∈[3] λx |χAB
P
x ⟩
(with λx ∈ C) can be expressed as
1 1
TrB [ϕAB ] = I A − φT (13.342)
2 2
where |φA ⟩ := λx |x⟩A .
P
x∈[3]
Denote by K := supp ρAB . The key idea is to first show that for every ψ ∈ Pure (K ⊗ A′ B ′ )
we have ′ ′
′ ′
E ψ ABA B ⩾ EF ψ AB + EF ψ A B ,
(13.345)
where E on the left-hand side is the entropy of entanglement between systems AA′ and BB ′ ,
′ ′
and for simplicity of notations we use ψ AB and ψ A B to denote the mixed marginal states of
′ ′ ′ ′
ψ ABA B . To see why the above inequality holds, recall that ψ ABA B belongs to an EBS so
that we can express it as
′ ′
X√
A′ B ′
|ψ ABA B ⟩ := px |x⟩B ⊗ |ϕA
x ⟩ ⊗ |φx ⟩ (13.346)
x∈[m]
Then, from the strong subadditivity as given in (7.134) and the fact that the marginal
′ ′
σ AA = ψ AA we get
H(AA′ )ψ = H(AA′ )σ
Strong subadditivity (7.134)→ ⩾ H(A)σ + H(AA′ C)σ − H(AC)σ
X ′ (13.350)
Exercise 13.7.2→ = H ψ A + px H φAx .
x∈[m]
A
AB
Now, from (13.69) we have H ψ ⩾ E F ψ . Moreover, by definition, for every x ∈ [m]
′ A′ B ′
we have H φA x = E φ x . Combining this with the equation above and with (13.347)
gives ′ ′
′ ′ X
E ψ ABA B ⩾ EF ψ AB + px E φA
x
B
x∈[m] (13.351)
′ ′
AB AB
⩾ EF ψ + EF ψ .
This completes the proof.
Exercise 13.7.2. Prove the last equality in (13.350).
Exercise 13.7.3. Compute the entanglement cost of the state
ρAB := pΦAB AB
+ + (1 − p)Φ− , (13.352)
for all p ∈ [0, 1], where
1
|ΦAB
± ⟩ := √ (|00⟩ ± |11⟩) . (13.353)
2
′ ′
Since ΦA
m
B
is invariant under the action of the (self-adjoint) twirling map
′ ′ Z ′ ′ ∗
G ω A B = dU U ⊗ U ω A B U ⊗ U ∀ ω ∈ L(A′ B ′ ) , (13.355)
we can replace E in (13.354) with E ◦ G, or in other words, we can assume without loss
of generality that E = E ◦ G. Any such non-entangling (RNG) operation has the form
(see (3.256))
E(ω) = Tr [(I − Φm )ω] σ AB + Tr [Φm ω] η AB ∀ ω ∈ D(A′ B ′ ) , (13.356)
where σ, η ∈ D(AB). Note that the channel E is RNG (i.e., non-entangling) if and only if
Tr [(I − Φm )ω] σ + Tr [Φm ω] η ∈ SEP(AB) ∀ ω ∈ SEP(A′ B ′ ) . (13.357)
Now, recall that the density state τ := (I − Φm )/(m2 − 1) is a separable isotropic state
(see (13.19)). Taking ω = τ above we get E(τ ) = σ. Therefore, since τ is a separable
state we getthat σ must be separable as well. More generally, from (13.26) we have that
Tr Φm ω AB ⩽ m1 for all separable states ω ∈ SEP(AB). Therefore, the condition in (13.357)
AB
In other words, the conversion distance above can be interpreted as the distance of ρAB to
the set of states with robustness no greater than m − 1.
Using the compcat expression for the conversion distance above, we get that for any
ε ∈ (0, 1), the ε-single-shot entanglement cost under non-entangling operations is given by
n 1 AB o
Costε (ρAB ) = min log m : ρ − η AB 1 ⩽ ε , R η AB ⩽ m − 1 .
(13.362)
m∈N 2
That is,
Costε ρAB = log 1 + Rε ρAB = LRε ρAB .
(13.363)
The formula above provides an operational interpretation for the smoothed logarithmic ro-
bustness as the single-shot entanglement cost under non-entangling operations.
Due to the symmetry of Φm we can assume without loss of generality that E = G ◦ E so that
the non-entangling operation E has the form (see (3.257))
′ ′ ′B′
E(ω) = 1 − Tr [Λω] τ A B + Tr [Λω] ΦA
m ∀ ω ∈ L(AB) , (13.365)
Hint: Observe that the optimal m in (13.368) is the floor of the reciprocal of maxσ∈SEP(AB) Tr [Λσ].
The results obtained in the single-shot regime lead directly to the following expressions for
the cost and distillation of a bipartite state ρ ∈ D(AB) under non-entangling operations:
1
Cost ρAB = lim lim inf LRε ρ⊗n
ε→0 n→∞ n
1 ε (13.371)
Distill ρAB = lim lim sup Dmin ρ⊗n ∥SEP .
ε→0 n→∞ n
From these expressions, it becomes evident that if the generalized quantum Stein’s lemma
(as proposed in Conjecture 11.3.1) is valid, then the distillable entanglement under non-
entangling operations would be equal to the regularized relative entropy of entanglement.
In contrast, regarding the entanglement cost, it is known (as detailed in the section
on ‘Notes and References’ at the end of this chapter) that there are states for which the
entanglement cost is strictly greater than the distillable entanglement. This implies that
even under a broad range of non-entangling operations, the reversibility of mixed state
entanglement is not guaranteed. In essence, this reflects a fundamental asymmetry in the
processes of creating and extracting entanglement from quantum systems.
In this subsection, we delve into the quantum resource theory where F(AB) is defined as the
set of PPT states within D(AB), and F(AB → A′ B ′ ) as the set of completely PPT preserving
quantum channels. The exploration of this resource theory is not only intriguing from a
theoretical standpoint but is also driven by the fact that the set of completely PPT preserving
quantum channels encompasses LOCC. Consequently, the entanglement cost and distillation
rates determined under these operations offer lower and upper bounds, respectively, on the
corresponding rates under LOCC.
In the framework of completely PPT-preserving operations, entangled states that exhibit
a positive partial transpose are regarded as free resources. Accordingly, in this resource
theory, the focus is on NPT-entanglement, emphasizing interest in entangled states with a
negative partial transpose (NPT), which are states whose partial transpose is not positive
semidefinite. In the rest of this section, we will use the notation PPT(AB) to refer to the
set of all density matrices in D(AB) that have a positive partial transpose. Thus, we obtain
the following relation:
where the superscript Γ indicates partial transpose w.r.t. Bob’s systems (see (13.148)). In the
following exercise you prove some of the key properties of this extension of partial transpose
to linear maps.
Exercise 13.9.1. Let E ∈ L(AB → A′ B ′ ) be a bipartite linear map, E Γ be its partial
′ ′
transpose, and JE := JEABA B be its Choi matrix. Prove the following statements:
1. E is PPT preserving if and only if E Γ is PPT preserving.
2. E is completely PPT preserving if and only if E Γ is completely PPT preserving.
3. E satisfies:
JE Γ = JEΓ , (13.375)
′ ′
where on the right-hand side the superscript Γ denotes the partial transpose of JEABA B
with respect to both B and B ′ .
4. E satisfies: ∗
(E ∗ )Γ = E Γ . (13.376)
2. E Γ is a quantum channel.
Proof. We can use 13.375 to observe that E Γ is a quantum channel if and only if JEΓ ⩾ 0.
Thus, it suffices to show that E is completely PPT preserving if and only if its Choi matrix
′ ′
is PPT. To see this, note that the Choi matrix of E AB→A B can be expressed as
′ ′ ′ ′
JEABA B = E ÃB̃→A B Ω(AB)(ÃB̃) , (13.377)
where Ω(AB)(ÃB̃) is an unnormalized maximally entangled state between system ÃB̃ and
system AB. Furthermore, we can write
where ΩAÃ and ΩB B̃ are unnormalized maximally entangled states between the respective
systems. Since we take the partial transpose with respect to system B B̃, we get from the
equation above that Ω(AB)(ÃB̃) is PPT with respect to B B̃. Therefore, if E is completely
′ ′ ′ ′
PPT preserving, then its Choi matrix JEABA B must be PPT, since ΩABA B is PPT.
′′ ′′
Conversely, suppose JEΓ ⩾ 0 (i.e. E Γ is a quantum channel) and let ρA B AB be a PPT
state with respect to system B ′′ B. For simplicity of the exposition here, we use the su-
perscript Γ to indicate partial transpose on all systems on Bob’s side (i.e. for ρAB , the
′′ ′′
superscript Γ in ρΓ stands for partial transpose w.r.t. B, and for ρA B AB , the superscript
in ρΓ stands for partial transpose w.r.t. to system B ′′ B). Then,
AB→A′ B ′ A′′ B ′′ AB
Γ
= E Γ ρΓ ⩾ 0 ,
E ρ (13.379)
′ ′ ′′ ′′
since ρΓ ⩾ 0 and E Γ is completely positive. Therefore, the state E AB→A B ρA B AB is
′′ ′′
PPT, and since ρA B AB was an arbitrary PPT state in D(A′′ B ′′ AB) we conclude that E is
completely PPT preserving. This completes the proof.
We denote by PPT(AB → A′ B ′ ) the set of all completely PPT preserving channels in
CPTP(AB → A′ B ′ ).
since E Γ is CP. Since (E(ρ))Γ = E Γ (ρΓ ) we get that the above equation is equivalent to
Γ
−Λ′Γ ⩽ E(ρ) ⩽ Λ′Γ where Λ′ := E(Λ) . (13.382)
Exercise 13.9.4. Consider the linear map E ∈ CPTP(AB → AB) with m := |A| = |B|
defined for all ω ∈ L(AB) as
where
1 Γ
Λ := I AB − ΦAB
m . (13.385)
m+1
1. Show that Λ ∈ Eff(AB).
2. Show that E is PPT preserving (but not necessarily completely PPT preserving).
3. Show that for m > 3 we have N E ρAB W > N ρABW , where ρAB
W is the maximally
entangled Werner state (see (13.30) with α = 1)
1
ρAB I AB − F AB .
W = (13.386)
d(d − 1)
The exercise above demonstrates that the negativity measure, in general, does not exhibit
monotonic behavior under PPT-preserving operations, but as we saw earlier, it does exhibit
monotonic behavior under completely PPT-preserving operations.
In order to simplify the expression above for the conversion distance, we will need the
following lemma.
Proof. Suppose first that Λ = E ∗ (Φm ) for some E ∈ PPT(AB → A′ B ′ ). Since E is a CPTP
′ ′
map it follows that E ∗ is a unital CP map. Therefore, since 0 ⩽ Φm ⩽ I A B we have
0 ⩽ Λ ⩽ I AB . Moreover, observe that
Γ
Λ = E (Φm ) = (E ∗ )Γ ΦΓm
Γ ∗
∗ Γ
(13.376)→ = E Γ Φm (13.388)
1 ∗ AB
= EΓ F ,
m
where F AB is the flip operator. Since E Γ is also a CPTP map (see Theorem 13.9.1) it follows
∗
that E Γ is a unital CP map. Combining this with the fact that −I AB ⩽ F AB ⩽ I AB
(recall F 2 = I AB ) we conclude that
1 AB 1
− I ⩽ ΛΓ ⩽ I AB , (13.389)
m m
which is equivalent to ΛΓ ∞ ⩽ m1 .
1
Conversely, suppose Λ ∈ Eff(AB) and ΛΓ ∞
⩽ m
. Define the measurement-prepare
channel E ∈ CPTP(AB → A′ B ′ ) as
where τ := (I − Φm )/(m2 − 1) ∈ D(A′ B ′ ). Observe that Λ = E ∗ (Φm ) (can you see why?).
The intuition behind the definition above comes from the observation that the optimization
over the PPT channels in (13.387) can be further restricted to channels that satisfy E = G ◦E
which according to (3.257) have the form of the channel above. It is therefore left to show
that E as defined above is a PPT quantum channel.
Indeed, observe that by definition for all ω ∈ L(AB) we have
Γ
E Γ (ω) = E ω Γ
= Tr Λω Γ ΦΓm + Tr (I − Λ)ω Γ τ Γ
(13.391)
The partial transpose
→ = Tr ΛΓ ω ΦΓm + Tr I − ΛΓ ω τ Γ .
is self-adjoint
1 A′ B ′ 1
ΦΓm = F = (ΠSym − ΠAsy ) , (13.392)
m m
and (recall I = ΠSym + ΠAsy )
I − ΦΓm 1 1
τΓ = = ΠSym + ΠAsy . (13.393)
m2 − 1 m(m + 1) m(m − 1)
Substituting these expressions for ΦΓm and τ Γ into (13.391) (and rearranging terms) gives
(see Exercise 13.9.5)
where
1 2 2
Λ′ :=
I + mΛ , σSym := ΠSym , σAsy := ΠAsy . (13.395)
2 m(m + 1) m(m − 1)
Hence, since ∥Λ∥∞ ⩽ 1/m we get that Λ′ ∈ Eff(A′ B ′ ) so that E Γ is itself a measurement-
prepare quantum channel. That is, E is indeed a PPT channel. This completes the proof.
Corollary 13.9.1. Using the same notations as above, the PPT conversion distance
to a maximally entangled state is given by
′B′
PPT
T ρAB −−→ ΦA = 1 − max Tr ΛAB ρAB .
m (13.396)
Λ∈Eff(AB)
∥ΛΓ ∥∞ ⩽ m1
′ ′ ′B′
In the next lemma we provide a characterization of the density matrix E A B →AB ΦA
m .
Remark. The condition presented in equation (13.398) leads to the implication that (m +
1)ω Γ ⩾ (1 − m)ω Γ . This inequality can be simplified and equivalently restated as mω Γ ⩾ 0.
Thus, the partial transpose of ω is positive semidefinite, so that ω ∈ PPT(AB).
Proof. Let G ∈ CPTP(A′ B ′ → A′ B ′ ) be the twirling channel as defined in (3.251). Recall
that G is an LOCC channel, and thus it is completely PPT preserving. If σ = E(Φm ) for
some PPT channel E, we can assume without loss of generality that E = E ◦ G. Otherwise,
we can replace E with E ′ = E ◦ G, which has the desired property. From (3.256) it then
follows that for all η ∈ L(AB)
where we replaced ω1 in (3.256) with E(Φm ) = σ and renamed the density matrix ω2 as
ω ∈ D(AB). The partial transpose of E is given for all η ∈ L(AB) as
E Γ (η) = Tr η Γ Φm σ Γ + Tr (I − Φm ) η Γ ω Γ
Partial transpose
Γ Γ (13.400)
I − ΦΓm η ω Γ .
is self-adjoint → = Tr ηΦm σ + Tr
We next use (13.392) to express ΦΓm in terms of the symmetric and antisymmetric projectors.
Hence,
1 1
E Γ (η) = Tr [η (ΠSym − ΠAsy )] σ Γ + Tr (m − 1)ΠSym + (m + 1)ΠAsy η ω Γ
m m (13.401)
1 1
= Tr [ηΠSym ] σ Γ + (m − 1) ω Γ + Tr [ηΠAsy ] (1 + m) ω Γ − σ Γ .
m m
Hence, E Γ is completely positive if and only if the matrices σ and ω satisfy (13.398). This
completes the proof.
From the lemma above it follows that the conversion distance can be expressed as
′ ′
A B PPT AB
1 Γ Γ Γ
T Φm −−→ ρ = min ∥ρ − σ∥1 : (1 − m)ω ⩽ σ ⩽ (1 + m)ω . (13.402)
ω,σ∈D(AB) 2
Now that we have obtained formulas for the two types of PPT-conversion distances in
this subsection, we can use them for the operational tasks of entanglement distillation and
entanglement cost.
For every ε ∈ (0, 1), the ε-single-shot distillable NPT-entanglement of a bipartite state
ρ ∈ D(AB) is defined as
n ′B′
o
PPT
Distillε ρAB = sup log m : T ρAB −−→ ΦA
m ⩽ ε . (13.403)
m∈N
Theorem 13.9.2. Let ρ ∈ D(AB) and ε ∈ (0, 1). Then, the ε-single-shot distillable
NPT-entanglement is given by
j ε AB AB k
Distillε ρAB = min log 2Dmin (ρ ∥η ) ,
(13.404)
∥η Γ ∥1 ⩽1
η∈Herm(AB)
where the definition of the hypothesis testing divergence above has been extended to
operators that are not necessarily density matrices.
Proof. Using the expression for the conversion distance given in (13.396) we get
n 1 o
Distillε (ρ) = sup
log m : Tr [Λρ] ⩾ 1 − ε, ΛΓ ∞ ⩽ , Λ ∈ Eff(AB)
m∈N m
1
n 1 o
m= −−−−→ = max log : Tr [Λρ] ⩾ 1 − ε, Λ ∈ Eff(AB)
∥ΛΓ ∥∞ ∥ΛΓ ∥∞
(13.405)
Now, from Exercise 2.3.22 we have that
ΛΓ Tr ΛΓ η
∞
= max
∥η∥1 ⩽1
η∈Herm(AB)
(13.406)
Tr Λη Γ
Γ is self-adjoint→ = max
∥η∥1 ⩽1
η∈Herm(AB)
Tr Λη Γ
minmax theorem→ = max min
∥η∥1 ⩽1 Tr[Λρ]⩾1−ε
η∈Herm(AB) Λ∈Eff(AB)
(13.407)
2−Dmin (ρ∥η )
ε Γ
= max
∥η∥1 ⩽1
η∈Herm(AB)
ε
= max 2−Dmin (ρ∥η) .
∥η Γ ∥1 ⩽1
η∈Herm(AB)
Substituting this into (13.405) we obtain (13.404). This completes the proof.
Exercise 13.9.7. Show that the ε-single shot distillable NPT-entanglement can be be com-
puted with an SDP program and find the dual problem of (13.404).
Exercise 13.9.8. Let ε ∈ (0, 1), ρ ∈ D(AB), and η ∈ Herm(AB). Show that for all
E ∈ CPTP(AB → A′ B ′ ) we have
ε ε
Dmin E(ρ) E(η) ⩽ Dmin (ρ∥η) . (13.408)
ε
That is, the DPI with Dmin still holds even if η is not positive semidefinite.
For states that have a certain symmetry, the optimization problem given in (13.404)
can be performed analytically. For example, consider the Werner state, ρAB
W , as defined
in (13.29). This state is invariant under the twirling map G ∈ CPTP(AB → AB) as defined
in (7.199). Let η ∈ Herm(AB) be an optimal matrix such that
Distillε ρAB ε
ρAB η AB .
W = Dmin W (13.409)
Due to the invariance property of ρAB
W we have
ε ε
Dmin (ρW ∥η) ⩾ Dmin G(ρW ) G(η)
ε
(13.410)
= Dmin ρW G(η) .
Moreover, since G ∈ PPT(AB → AB) and ∥η Γ ∥1 ⩽ 1 we get that also ζ := G(η) satisfies
ζ Γ 1 = G Γ ηΓ 1
(13.411)
DPI →→ ⩽ η Γ 1 ⩽ 1 .
ε ε
Therefore, since η was optimal, we must have Dmin (ρW ∥η) = Dmin (ρW ∥ζ), which means that
ζ is also optimal. To summarize, without loss of generality, we can restrict the optimization
in (13.404) to Hermitian matrices η ∈ Herm(AB) that satisfy both ∥η Γ ∥1 ⩽ 1 and G(η) = η.
This additional condition implies that η can be written as a linear combination of I AB and
F AB , or equivalently, η Γ can be expressed as
η Γ = aΦAB AB
m + bτm , (13.412)
AB :=
for some a, b ∈ R, and τm (I AB − ΦAB 2
m )/(m − 1).
Corollary 13.9.2. Let ρ ∈ D(AB) and ε ∈ (0, 1). Then, the ε-single-shot distillable
NPT-entanglement is bounded from above by
n o
Distillε ρAB ⩽ min ε
Dmin (ρ∥σ) + LN(σ) , (13.414)
σ∈D(AB)
Proof. By removing the floor function, and replacing Herm(AB) in the right-hand side
of (13.404) with the smaller set Pos(AB), we get
Distillε (ρ) ⩽ min ε
Dmin (ρ∥η)
∥η Γ ∥1 ⩽1
η∈Pos(AB)
n o
ε
η = tσ −−−−→ = min Dmin (ρ∥σ) − log t (13.415)
t∥σ Γ ∥1 ⩽1
σ∈D(AB), t⩾0
n o
Taking the largest ε Γ
possible t=1/∥σ Γ ∥1 → = min Dmin (ρ∥σ) + log ∥σ ∥1 .
σ∈D(AB)
Finally, observe that the second term on the right-hand side of the equation above is the
logarithmic negativity of σ AB as defined in (13.164). This completes the proof.
Proof. The proof follows directly from a combination of Corollary 13.9.2 and the quantum
Steins’ lemma given in (8.211). Explicitly, let ε ∈ (0, 1) and σ ∈ D(AB). Then, from
Corollary 13.9.2 we get
1 ε ⊗n
1 ε ⊗n ⊗n
1 ⊗n
lim sup Distill ρ ⩽ lim sup D ρ σ + LN σ
n→∞ n n→∞ n min n (13.417)
(8.211) + Additivity of LN→ = D(ρ∥σ) + LN(σ) .
Since the inequality above holds for all σ ∈ D(AB) we can take the minimum over all density
matrices so that
1 n o
lim sup Distillε ρ⊗n ⩽ min
D (ρ∥σ) + LN(σ) . (13.418)
n→∞ n σ∈D(AB)
Note that the inequality above is in fact stronger than (13.416) in the sense that it holds for
all ε ∈ (0, 1). This completes the proof.
Exercise 13.9.10. Show that the Rains bound is a measure of entanglement and in particular
does not increase under completely PPT preserving operations.
Exercise 13.9.11. Prove that the Rains’ bound for a pure bipartite state ψ ∈ Pure(AB)
is equal to the entropy of entanglement E(ψ AB ), which means that on pure bipartite states,
the Rains’ bound is equal to the distillable entanglement. Hint: Take a look at the proof of
Theorem 13.2.3.
where
Eκε ρAB = ′ min Eκε ρ′AB
(13.421)
ρ ∈Bε (ρ)
(this can also be verified directly from the expression in (13.419)). Therefore, it is sufficient
to prove the lemma for the case ε = 0. For ε = 0 the entanglement cost given in (13.419)
takes the form
Now, observe that every m ∈ N and ω ∈ PPT(AB) that satisfy (1 − m)ω Γ ⩽ ρΓ also satisfy
−(1 + m)ω Γ ⩽ ρΓ . Therefore, we get that
While it is true that the lower and upper bounds above may appear to be simpler than
computing the Eκε directly using an SDP program, it is important to note that this may not
always be the case. In fact, in many instances, computing the bounds may require solving
non-trivial optimization problems themselves, and as such may not necessarily be any easier
to compute than the original quantity Eκϵ . Therefore, while the bounds can be a useful tool
for gaining insight into the behavior of Eκϵ , they should not be relied upon exclusively as a
substitute for computing the quantity directly using an SDP program. Moreover, it is not
clear to the author how these bounds can be used in deriving computable bounds for the
asymptotic NPT-entanglement cost.
Exercise 13.9.12. Prove the theorem above, and in particular show that the limit
1 ϵ ⊗n
lim Eκ ρ (13.428)
n→∞ n
Therefore, Eq. (13.429) provides a computable upper bound on the NPT-entanglement cost.
entanglement theory is not reversible even under the broad set of non-entangling operations.
Completely PPT preserving operations (sometimes referred to as PPT operations) were
introduced in [185]. In the same work, among many other findings, the Rains’ bound (13.416)
on distillable NPT-entanglement was discovered. The monotonicity of negativity and log-
arithmic negativity under PPT operations was proven in [180]. The expression presented
in (13.404) for the one-shot distillable NPT-entanglement was initially discovered in [76] (see
also [188] for additional results on distillation beyond LOCC). Lastly, the NPT-entanglement
cost was first studied in [8] and developed further in [229].
Multipartite Entanglement
Thus far, our focus has been on entanglement that is shared solely between two parties.
However, entanglement is not restricted to bipartite systems and can exist among any number
of parties. In this section, we will examine the properties of multipartite entanglement,
comparing and contrasting it with bipartite entanglement. It’s important to note that the
theory of multipartite entanglement can be quite complex. Therefore, in this chapter, we
will restrict ourselves to pure multipartite states, and concentrate on simpler cases involving
three and four qubits in greater detail.
The relation above implies that ψ can be converted into ϕ through local measurements with
some non-zero probability, since each Mx can be considered as one Kraus element of a local
generalized measurement on system Ax .
The set of all states |ϕ⟩ in An that can be obtained from |ψ⟩ as in (14.1) is called the
SLOCC class of ψ. The SLOCC class of ψ comprises two types of states: those that can be
645
646 CHAPTER 14. MULTIPARTITE ENTANGLEMENT
written in the form (14.1) with invertible matrices M1 , . . . , Mn , and those in which at least
one of the matrices Mx is non-invertible. In the former case, ψ can be converted to ϕ and
vice versa using SLOCC, whereas in the latter, the resulting state |ϕ⟩ cannot be converted
back to ψ via SLOCC.
If |Ax | = 2 for some x ∈ [n] (i.e. the x-th subsystem is a qubit) and Mx is non-invertible,
then the resulting state |ϕ⟩ is a product state between the qubit system Ax and the remaining
n − 1 subsystems. To demonstrate this, assume x = 1 for simplicity, so that M1 is a 2 × 2
non-invertible matrix. Since M1 is a rank-one matrix, it can be written as M1 = |u⟩⟨v| where
|u⟩ is an unnormalized vector in A1 , and |v⟩ is a normalized vector in A1 . The state |ψ⟩ can
be expressed as
n
|ψ⟩A = a|v⟩A1 |ψ1 ⟩A2 ···An + b|v ⊥ ⟩A1 |ψ2 ⟩A2 ···An ; , (14.2)
where a, b ∈ C, |v ⊥ ⟩ ∈ A1 is an orthogonal vector to |v⟩, and ψ1 and ψ2 are some pure states
in A2 · · · An . Thus,
n
M1 ⊗ M2 ⊗ · · · ⊗ Mn |ψ⟩A = a|u⟩A1 ⊗ M2 ⊗ · · · ⊗ Mn |ψ1 ⟩A2 ···An ; ,
(14.3)
Exercise 14.1.1. Let ψ, ϕ ∈ Pure(An ) and suppose there exists matrices M1 , . . . , Mn such
that (14.1) holds. Let B := A2 · · · An , d := |A1 |, and suppose det(M1 ) = 0. Show that
SR ϕA1 B ⩽ d − 1 ,
(14.4)
1. Determine the n Schmidt ranks between each subsystem and the other n−1 subsystems.
2. Classify the n-partite entanglement based on a fixed set of n Schmidt ranks obtained
in the first step.
It is worth noting that for the second step mentioned above, we only need to consider
reversible SLOCC conversions where all matrices M1 , . . . , Mn in (14.1) are invertible. Hence,
states ψ, ϕ ∈ Pure(An ) belong to the same reversible SLOCC class if and only if there exists
a matrix
M ∈ GLn := GL(A1 ) × · · · × GL(An ) , (14.5)
such that |ϕ⟩ = M |ψ⟩. Here, for each x ∈ [n], the set GL(Ax ) represents the group of
invertible matrices in L(Ax ). It is also noteworthy that M takes the form of M1 ⊗ · · · ⊗ Mn ,
where Mx ∈ GL(Ax ) for each x ∈ [n]. Please note that our notation GLn does not explicitly
specify the system An = A1 · · · An . However, in the rest of this chapter we will assume that
the context makes it clear which system we are referring to.
With these observations, we can use certain tools from representation theory to charac-
terize the reversible SLOCC class of |ψ⟩; i.e., the set of states M |ψ⟩. To achieve this, we
begin by relaxing the normalization condition that M |ψ⟩ = 1 and allowing each Mx to
vary over any element of SL(Ax ). The group SL(Ax ) is a subgroup of GL(Ax ) with the
property that the determinant of its elements is one. The limitation to this group can only
affect the normalization of the states in M |ψ⟩. Therefore, we will consider the “orbit” of ψ
with respect to the group
By working with SL-orbits rather than GL-orbits, we can classify multipartite entanglement
using SL-invariant polynomials.
Remark. The condition that f (0) = 0 is a convention that we will adopt to eliminate trivial
SLIPs that are constant for all vectors in An . Furthermore, we will see shortly that this
convention implies that SLIPs vanish on product states.
The set of all SLIPs forms a vector space over C. Additionally, the following exercise
reveals that this vector space has a basis consisting of homogeneous SLIPs. Therefore, we
will concentrate on homogeneous SLIPs of some fixed degree k ∈ N. For instance, the SLIP
in (14.10) is homogeneous of degree 2 since it satisfies f (c|ψ⟩) = c2 f (|ψ⟩) for any c ∈ C. The
dimension of the space of all homogeneous SLIPs of a fixed degree k is finite, but, as we will
see, it grows exponentially with n.
Exercise 14.2.1. Show that the vector space of SLIPs has a basis consisting of homogeneous
SLIPs.
Exercise 14.2.2. Let AB be a bipartite system with d := |A| = |B|. Show that the function
The degrees of homogeneous SLIPs have a close relationship with the local dimensions
of the subsystems. To understand this connection, consider a multipartite system An =
A1 · · · An , where mx := |Ax | for each x ∈ [n]. Suppose fk : An → C is a homogeneous SLIP
of degree k ∈ N. Note that if c ∈ C satisfies cmx = 1 for some x ∈ [n], then the matrix
n
cI Ax has a determinant of one, implying that cI A ∈ SLn . Hence, we obtain the following
relationship: for every |ψ⟩ ∈ An
n
fk (|ψ⟩) = fk cI A |ψ⟩ = ck fk (|ψ⟩) ,
(14.12)
where the first equality is due to the SL-invariance property of fk , and the second equality is
due to the homogeneity of fk . Therefore, as long as fk is not the zero polynomial it follows
that ck = 1 for any complex number c that satisfies cmx = 1. This means that mx must
divide k. Since this holds for all x we can conclude that k is divisible by the least common
multiple r := lcm(m1 , . . . , mn ).
n
Exercise 14.2.3. Let |ψ⟩ ∈ An be a product state; i.e. |ψ⟩A = |ψ1 ⟩A1 ⊗ · · · ⊗ |ψn ⟩An . Show
that for any f ∈ SLIP(An ) we have f (|ψ⟩) = 0.
SLOCC
Remember that for ψ, ϕ ∈ Pure(An ), the relation ψ −−−−→ ϕ holds if there exists a
matrix Mx ∈ L(Ax ) for every x ∈ [n] such that both Mx∗ Mx ⩽ I Ax and |ϕ⟩ = M |ψ⟩ are
satisfied, where M := M1 ⊗ · · · ⊗ Mn . The subsequent lemma establishes that if any of the
matrices in the set {Mx }x∈[n] exhibits rank deficiency, then any SLIP will be nullified for |ϕ⟩.
Lemma 14.2.1. Let f : An → C be a SLIP and |ψ⟩ and |ϕ⟩ be as above. If there
exists x ∈ [n] such that det(Mx ) = 0 then f (|ϕ⟩) = 0.
Proof. Since every SLIP can be expressed as a linear combination of homogeneous SLIPs,
we will assume without loss of generality that f is a homogeneous SLIP of degree k ∈ N.
Denote by d := |An |, and for every ε ∈ [0, 1), define Mε := M1 (ε) ⊗ · · · ⊗ Mn (ε), where each
Mx (ε) is a slight perturbation of Mx ensuring that det(Mx (ε)) ̸= 0 for all ε ∈ (0, 1). As a
1/d
consequence, µε := det(Mε ) ̸= 0 for all ε ∈ (0, 1), which implies Nε := Mε /µε is an element
of SLn . By definition, we can express:
Upon taking the limit as ε → 0+ on both sides and noting that M = limε→0+ Mε and
limε→0+ µε = det(M ) = 0, we infer that f (M |ψ⟩) = 0, or equivalently, f (|ϕ⟩) = 0. This
concludes the proof.
where for convenience we replaced the second Pauli matrix σ2 that appear in (13.91) with
J2 := −iσ2 = |0⟩⟨1| − |1⟩⟨0|.
1. Use the relation (C.13) to show that for any M ∈ SLn , and any vectors |ψ⟩, |ϕ⟩ ∈ An
M |ψ⟩, M |ϕ⟩ n
= |ψ⟩, |ϕ⟩ n
. (14.15)
|ψ⟩, |ψ⟩ n
=0. (14.16)
The bilinear form defined above can be used to define SLIPs on systems of n qubits. For
an even number of qubits, it follows from the exercise above that
is an homogeneous SLIP of degree 4. To see that the above function is a SLIP, observe that
A2 ···An
(n) (n−1) n (n)
for any M = M1 ⊗ M ∈ SL , we have g4 M |ψ⟩ =g4 M 1⊗I |ψ⟩ since
a b
·, · n−1
is invariant under the action of M (n−1) . Now, let M1 = and observe that:
c d
g4 M1 ⊗ I A2 ···An |ψ⟩
a|ψ0 ⟩ + b|ψ1 ⟩, a|ψ0 ⟩ + b|ψ1 aψ0 ⟩ + b|ψ1 ⟩, c|ψ0 ⟩ + d|ψ1 ⟩
= det
c|ψ0 ⟩ + d|ψ1 ⟩, a|ψ0 ⟩ + b|ψ1 ⟩ c|ψ0 ⟩ + d|ψ1 ⟩, c|ψ0 ⟩ + d|ψ1 ⟩
2
= (a2 µ00 + b2 µ11 + 2abµ01 )(c2 µ00 + d2 µ11 + 2cdµ10 ) − acµ00 + bdµ11 + (ad + cb)µ01
= (ad − bc)2 µ00 µ11 − µ201
(14.21)
where the last line follows from direct algebraic simplification of all the terms involved. Since
M1 ∈ SL(2, C) we have ad − bc = 1 so that
To get other SLIPs, let An be a system of n qubits, and for any choice of m < n of its
qubits, we associate a bipartite cut denoted as Am ⊗ Bn−m , where Am is a system of m qubits
of An , and Bn−m is the system comprising of the remaining n − m qubits of An . With respect
to this bipartite cut, any vector |ψ⟩ ∈ An can be expressed as
n
X X
|ψ A ⟩ = λxy |ux ⟩Am |vy ⟩Bn−m
x∈[2m ] y∈[2n−m ] (14.23)
= Λ ⊗ I B̃n−m ΩBn−m B̃n−m
where {|ux ⟩Am } and {|vy ⟩Bn−m } are orthonormal bases of Am and Bn−m , respectively, and
each coefficient λxy ∈ C. Therefore, and vector |ψ⟩ ∈ An , and every bipartite cut Am ⊗ Bn−m
of An defines a matrix Λ := (λxy ) with x ∈ [2m ] and y ∈ [2n−m ].
Proof. Let M ∈ SLn . We need to show that f (M |ψ⟩) = f (|ψ⟩). Let N ∈ SLm and
L ∈ SLn−m be such that M = N ⊗ L. Then,
n
M |ψ A ⟩ = N Λ ⊗ L ΩBn−m B̃n−m
(14.25)
= N ΛLT ⊗ I B̃n−m ΩBn−m B̃n−m
We therefore get that
ℓ
An ⊗(n−m)
J2⊗m N ΛLT J2 LΛT N T
fℓ M |ψ ⟩ = Tr
ℓ (14.26)
⊗(n−m)
Cyclic permutation→ = Tr N T
J2⊗m N ΛLT J2 LΛT ,
where we have used the invariance of the trace under cyclic permutation (note that the
power over ℓ does not effect this property). To complete the proof we now argue that
⊗(n−m) ⊗(n−m)
N T J2⊗m N = J2⊗m and LT J2 L = J2 so that the right-hand side above equals
A n
f |ψ ⟩ . Indeed, observe that N = N1 ⊗ · · · ⊗ Nm , where Nx ∈ SL(2, C) so that
N T J2⊗m N = N1T J2 N1 ⊗ · · · ⊗ Nm J2 Nm
(14.27)
(C.13)→ = J2 ⊗ · · · ⊗ J2 = J2⊗m .
⊗(n−m) ⊗(n−m)
In the same way, one can prove that LT J2 L = J2 . This completes the proof.
Examples:
1. The case n = 2. In this
P case the only non-trivial m is m = 1. In this case for any
two-qubit state |ψ⟩ = x,y∈{0,1} λxy |xy⟩ we get
h i
T ℓ
fℓ (|ψ⟩) = Tr J2 ΛJ2 Λ
h i
(C.13)→ = Tr (J2 det(Λ)J2 )ℓ (14.28)
ℓ
J22 = I −−−−→ = 2 det(Λ)
The term
T T
Λ1 Λ1 J2 Λ1 Λ1 J2 Λ2
ΛJ2 ΛT = J2 [ΛT1 ΛT2 ] = . (14.32)
T T
Λ2 Λ2 J2 Λ1 Λ2 J2 Λ2
0 J2
Hence, combining this with J2⊗2 = gives
−J2
J2 Λ2 J2 ΛT1 J2 Λ2 J2 ΛT2
J2⊗2 ΛJ2 ΛT = . (14.33)
T T
−J2 Λ1 J2 Λ1 −J2 Λ1 J2 Λ2
Since the trace of the matrix above is zero the case ℓ = 1 is trivial. For the case ℓ = 2
we have h 2 i
Tr J2⊗2 ΛJ2 ΛT = Tr J2 Λ2 J2 ΛT1 J2 Λ2 J2 ΛT1
− Tr J2 Λ2 J2 ΛT2 J2 Λ1 J2 ΛT1
(14.34)
+ Tr J2 Λ1 J2 ΛT2 J2 Λ1 J2 ΛT2
− Tr J2 Λ1 J2 ΛT1 J2 Λ2 J2 ΛT2 .
The above expression is an homogeneous SLIP of degree 4. Since for three qubits there
are no homogeneous SLIPs of degree two, any other SLIP must be proportional to
some power of fℓ=2 . Hence, for three qubits, the above SLIP is essentially the only
one.
3. The case n = 4. Let A2 ⊗ B2 be a bipartite cut with exactly two qubits on each side
and let |ψ⟩ ∈ A4 be given as |ψ⟩ = Λ ⊗ I B2 |ΩA2 B2 ⟩, where Λ is a 4 × 4 matrix. Then,
the function h ℓ i
fℓ (|ψ⟩) = Tr J2⊗2 ΛJ2⊗2 ΛT , (14.36)
is an homogeneous SLIP of degree 2ℓ. Specifically, consider the four-qubit state
|ψ⟩ = λ1 |Ψ+ ⟩|Ψ+ ⟩ + λ2 |Ψ− ⟩|Ψ− ⟩ + λ3 |Φ+ ⟩|Φ+ ⟩ + λ4 |Φ− ⟩|Φ− ⟩ , (14.37)
where λx ∈ C for all x ∈ [4] and the two-qubit states |Ψ± ⟩ and |Φ± ⟩ form the Bell
basis of C2 ⊗ C2 . For this states it follows that
fℓ (|ψ⟩) = λ2ℓ 2ℓ 2ℓ 2ℓ
1 + λ2 + λ3 + λ4 . (14.38)
fχ (ψ) = χ ψ ⊗k ∀ ψ ∈ An , (14.39)
where the coefficient vector |χ⟩ ∈ (An )⊗k . However, this relation is not one-to-one; fχ is
equal to fχ′ if the coefficient vectors |χ⟩ and |χ′ ⟩ are related by a permutation matrix. This
permutation is with respect to the k copies of An . As a result, there exists an isomorphism
between the space of homogeneous polynomials of degree k and the subspace Symk (An ) of
(An )⊗k (see definition in (C.161)).
The polynomial fχ above is SLIP if and only if for any M ∈ SLn for all |ψ⟩ ∈ An we
have fχ (M |ψ⟩) = fχ (|ψ⟩), which is equivalent to
χ|M ⊗k ψ ⊗k = χ ψ ⊗k . (14.40)
Since the above equation has to hold for all |ψ⟩, and since if M ∈ SLn then M ∗ ∈ SLn we
conclude that fχ is SLIP if and only if (see Exercise 14.2.6)
n
where we used the same notations as in (C.175). Our task is therefore to characterize V SL .
For this purpose, observe that the vector space V is isomorphic to
V = (An )⊗k ∼
= A⊗k ⊗k
1 ⊗ · · · ⊗ An . (14.43)
Let P : V → A⊗k ⊗k
1 ⊗ · · · ⊗ An be this isomorphism (permutation) map. Under this isomor-
⊗k
phism any matrix M , with M := M1 ⊗ · · · ⊗ Mn ∈ SLn , goes to
⊗k
P M ⊗k P −1 = P M1 ⊗ · · · ⊗ Mn P −1 = M1⊗k ⊗ · · · ⊗ Mn⊗k . (14.44)
Therefore, for any |χ⟩ ∈ V that satisfies (14.41) the vector |ϕ⟩ := P |χ⟩ satisfies
Exercise 14.2.6. Let |χ⟩ ∈ (An )⊗k . Show that fχ as defined in (14.39) is an homogeneous
SLIP of degree k if and only if (14.41) holds.
SL(A1 ) SL(An )
|ϕ⟩ ∈ W := A⊗k
1 ⊗ · · · ⊗ A⊗k
n , (14.46)
Proof. Clearly, by definition, if |ϕ⟩ ∈ W then |ϕ⟩ satisfies (14.45). Conversely, suppose that
SL(A1 )
|ϕ⟩ satisfies (14.45), and let {|vx ⟩} be an orthonormal basis of A⊗k 1 , and {|uy ⟩} be an
SL(A )
orthonormal basis of the orthogonal complement of A⊗k in A⊗k
1
1 1 . Finally, let {|φz ⟩}
be an orthonormal basis of A2 ⊗· · ·⊗An . With these notations, since |ϕ⟩ ∈ A⊗k
⊗k ⊗k
1 ⊗· · ·⊗An
⊗k
it can be expressed as
X X
|ϕ⟩ = λxz |vx ⟩|φz ⟩ + µyz |uy ⟩|φz ⟩ , (14.47)
x,z y,z
where λxz , µyz ∈ C. Using this expression in (14.45), and taking a special case in which
M2 = I A2 ,. . . ,Mn = I An , gives
X X X X
λxz M1⊗k |vx ⟩|φz ⟩ + µyz M1⊗k |uy ⟩|φz ⟩ = λxz |vx ⟩|φz ⟩ + µyz |uy ⟩|φz ⟩ . (14.48)
x,z y,z x,z y,z
Since M1 ∈ SL(A1 ) we have for all x, M1⊗k |vx ⟩ = |vx ⟩ (by definition of |vx ⟩). Hence, the
above equation can be simplified to
X X
µyz M1⊗k |uy ⟩|φz ⟩ = µyz |uy ⟩|φz ⟩ . (14.49)
y,z y,z
Since the vectors {|φz ⟩} are orthonormal, it follows that for all z and all M1 ∈ SL(A1 ) we
have X X
µyz M1⊗k |uy ⟩ = µyz |uy ⟩ . (14.50)
y y
P
The above equation implies that for each z the vector y µyz |uy ⟩ belongs to the subspace
SL(A1 )
A⊗k
1 . However, by definition, the vectors {|uy ⟩} belong to the orthogonal complement
⊗k SL(A1 )
of A1 . Therefore, the coefficients {µyz } must be zero, so that
X
|ϕ⟩ = λxz |vx ⟩|φz ⟩ . (14.51)
x,z
Denoting by C := A⊗k ⊗k C
2 ⊗ · · · ⊗ An , the above equation can be expressed as Π1 ⊗ I |ϕ⟩ = |ϕ⟩,
where X
Π1 := |vx ⟩⟨vx | (14.52)
x
SL(A1 )
is the orthogonal projection to the subspace A⊗k 1 . Denoting by the Πx the orthogonal
⊗k SL(Ax )
projection to the subspace Ax , and repeating the same argument for any x ∈ [n]
we conclude that
Π1 ⊗ · · · ⊗ Πn |ϕ⟩ = |ϕ⟩ . (14.53)
That is, |ϕ⟩ ∈ W . This completes the proof.
n SL(Ax )
The lemma above shows that characterizing V SL can be done by characterizing A⊗k x
or the orthogonal projection Πx . Therefore, the problem of characterizing all SLIPs of degree
SL(A)
k can be reduced to characterizing A⊗k , which is a classic representation theory prob-
lem that uses the Schur-Weyl duality. This duality connects the irreducible representations
(irreps) of SL(A) to the symmetric group on k elements, with a natural action. Further
information can be found in the ‘Notes and References’ section at the end of this chapter.
Proof. From Exercise 14.2.3, we deduce that E vanishes on product states. Consider an
n
arbitrary m ∈ N and ψ ∈ Pure(An ). If there exists an LOCC protocol that transforms ψ A
n An
to ϕA n
x ∈ Pure(A ) with a probability px , where x ∈ [m], then each ϕx can be represented
as:
n 1 n
ϕA
x = √ Mx ψ A , (14.55)
px
where each matrix Mx is a tensor product of the form MP x = Λx1 ⊗ · · · ⊗ Λxn , and for every
n
y ∈ [n], Λxy ∈ L(Ay ). Additionally, we have the relation x∈[m] Mx∗ Mx = I A .
Leveraging Lemma 14.2.1, we observe that f (Mx |ψ⟩) = 0 when det(Mx ) = 0. We can
then categorize the set {Mx }x∈[m] into two subsets: the matrices that are rank deficient and
those that possess full rank. Without loss of generality, let’s assume the first r ∈ [m] matrices
{Mx }x∈[r] are all of full rank, while the subsequent matrices, for all x = r + 1, . . . , m, satisfy
the condition det(Mx ) = 0.
Thus, for each x ∈ [r], aside from a scalar coefficient, Mx can be interpreted as a member
of SLn . More precisely, we can express Mx as Mx = µx Nx , where µx := (det(Mx ))1/d ,
d := |An |, and the normalized matrix Nx := µ1x Mx belongs to SLn . With these notations,
we can proceed as follows:
2/k X 2/k
X
An
X 1 An µx An
px E ϕx = px fk √ Mx ψ = px fk √ Nx ψ
px px
x∈[m] x∈[r] x∈[r]
n 2/k
X
fk is homogenous of degree k → = |µx |2 fk Nx ψ A
x∈[r]
(14.56)
An 2/k
X
n 2
SL invariance→ = |µx | fk ψ
x∈[r]
n X
= E ψA |µx |2 .
x∈[r]
where we removed the absolute value since Mx∗ Mx ⩾ 0. From the geometric-arithmetic
1/d
∗
inequality we have that det (Mx Mx ) ⩽ d1 Tr [Mx∗ Mx ]. Hence, substituting this into the
equation above gives
X X 1 1
|µx |2 ⩽ Tr [Mx∗ Mx ] = Tr I An = 1 .
d d (14.58)
x∈[r] x∈[m]
n n
where the minimum is over all pure state decompositions of ρA = x∈[m] px ψ A . The above
P
theorem implies that E is an entanglement monotone on mixed states.
Hint: Follow the exact same lines as in the proof of Theorem 13.2.2 by with ·, · n
replacing
the bilinear form given in (13.91).
The reason for this terminology, is that any state |ψ⟩ ∈ Crit(An ) is a critical point of the
function f : SLn |ψ⟩ → R+ defined by f (|ϕ⟩) := ∥|ϕ⟩∥. In fact, we have something that is a
bit stronger.
Proof. We start by proving that 1 ⇒ 2. Suppose |ψ⟩ ∈ Crit(An ) and observe that for any
M ∈ SLn also M ∗ M ∈ SLn . We can therefore write M ∗ M = eX for some X ∈ Lie (SLn ).
Hence,
∥M |ψ⟩∥2 = ⟨ψ|eX |ψ⟩
eX ⩾ I + X −−−−→ ⩾ ⟨ψ| (I + X) |ψ⟩ (14.64)
2
|ψ⟩ is critical→ = ⟨ψ|ψ⟩ = ∥|ψ⟩∥ .
To prove that 2 ⇒ 1, suppose that for any M ∈ SLn we have M |ψ⟩ ⩾ |ψ⟩ . Then, for
any X ∈ Lie (SLn ) and t ∈ R we have
1 2
f (t) := e 2 tX |ψ⟩ = ⟨ψ|etX |ψ⟩ ⩾ ⟨ψ|ψ⟩ = f (0) . (14.65)
Hence, t = 0 must be a critical point of the function f (t) so that f ′ (0) = ⟨ψ|X|ψ⟩ = 0. Since
this holds for any X ∈ Lie (SLn ) we conclude that |ψ⟩ ∈ Crit(An ).
We next prove the equivalence of 1 and 3. Recall that any X ∈ Lie (SLn ) can be written
as a linear combination of matrices, that up to a permutation of the subsystems of An , have
the form X1 ⊗ I A2 ⊗ · · · ⊗ I An . Now, if X = X1 ⊗ I A2 ⊗ · · · ⊗ I An then the condition
⟨ψ|X|ψ⟩ = 0 is equivalent to
Tr ρA1 X1 = 0,
(14.66)
n
where ρA1 := TrA2 ···An ψ A . Since the above condition has to hold for all X1 ∈ Lie SL(A1 ) ,
we conclude that ρA1 is proportional to the identity matrix. In other words, |ψ⟩ ∈ Crit(An )
n
if and only if for any x ∈ [n], the reduced density matrix of ψ A on the xth-subsystem is
proportional to the identity matrix I Ax . This completes the proof.
Exercise 14.3.1. Let |ψ⟩ ∈ Crit(An ) and let M ∈ SLn be such that M |ψ⟩ = |ψ⟩ .
1. Show that there exists a local unitary matrix; i.e., U ∈ SU (d1 ) × · · · × SU (dn ) such
that M |ψ⟩ = U |ψ⟩.
For two qubit states the null cone consists only of product states. This can be easily
verified by noting that the SLIP given by f (|ψ⟩) := |ψ⟩, |ψ⟩ 2 is zero if and only if |ψ⟩ ∈
C2 ⊗ C2 is a product state. For higher number of qubits the null cone is not trivial. As an
example, consider the three-qubit state, known as the W-state,
1
|W ⟩ := √ |100⟩ + |010⟩ + |001⟩ . (14.68)
3
⊗3
t 0
This state has the property that for any 0 ̸= t ∈ C the matrix Mt := satisfies
0 t−1
Mt |W ⟩ = t|W ⟩ . (14.69)
Since t ̸= 0 this means fk (|W ⟩) = 0. Since fk is an arbitrary homogenous SLIP, this implies
that for any f ∈ SLIP(A3 ) (where A is a qubit; i.e. |A| = 2) we have f (|W ⟩) = 0. Therefore,
the W-state belong to the null cone of three qubits.
From (14.69) of the example above it follows that
That is, the orbit SL3 |W ⟩ contains a sequence of vectors approaching the zero vector. This
is precisely the key property of states in the null cone.
2. There exists a sequence of vectors {|ψk ⟩}k∈N ⊂ SLn |ψ⟩ such that
The direction that 2 ⇒ 1 is relatively simple to show. Indeed, suppose there is a sequence
of vectors |ψk ⟩k∈N ⊂ SLn |ψ⟩ that approaches the zero vector in the limit k → ∞. Let
f ∈ SLIP(An ). Then, for any k ∈ N we have f (|ψk ⟩) = f (|ψ⟩). Since this holds for any
integer k it must hold also for the limit k → ∞. Combining this with the continuity of
polynomial functions we get
f (|ψ⟩) = lim f (|ψk ⟩) = f (0) = 0 . (14.73)
k→∞
As f was an arbitrary SLIP we conclude that |ψ⟩ ∈ Null(An ). The other direction can be
found in Theorem 43 of [227].
Exercise 14.3.2. Let λ1 , . . . , λn ∈ C, and let
|ψ⟩ := λ1 |10 . . . 0⟩ + λ2 |01 . . . 0⟩ + · · · + λn |00 . . . 1⟩ ∈ An . (14.74)
Show that |ψ⟩ ∈ Null(An ).
Note that the orbit SLn |ψ⟩ is closed if for any sequence of states {|ϕk ⟩}n∈N ⊂ SLn |ψ⟩
with a limit limk→∞ |ϕk ⟩ = |ϕ⟩ we have that the limit |ϕ⟩ is also in SLn |ψ⟩. Therefore, states
in the null cone are not stable since if |ψ⟩ ∈ Null(An ) is a non-zero vector then SLn |ψ⟩ does
not contain the zero vector. Still, SLn |ψ⟩ contains a sequence of vectors with zero limit since
|ψ⟩ is in the null cone. Hence, the null cone and the set of stable states forms two disjoint
set of states in An . The following theorem shows that any state in An can be written as a
linear combination of these two set of states.
The above result follows from a variant of the Hilbert-Mumford theorem given in Theo-
rem 45 of [227]. The theorem above states that the vector space An can be decomposed into
the direct sum
An = Stable(An ) ⊕ Null(An ) . (14.76)
In addition, it can be shown that almost all vectors in An are stable in the sense that the
closure of Stable(An ) is the whole space; i.e.
An = Stable(An ) . (14.77)
Proof. Suppose |ψ⟩ is stable so that SLn |ψ⟩ is closed. Then, there exists a state |ϕ⟩ ∈ SLn |ψ⟩
with minimal norm; that is, for any M ∈ SLn
But since |ϕ⟩ = N |ψ⟩ for some N ∈ SLn we can express the above equation as
Since any M ′ ∈ SLn can be expressed as M ′ = M N −1 for some M ∈ SLn we conclude that
M ′ |ϕ⟩ ⩾ |ϕ⟩ for all M ′ ∈ SLn . From Theorem 14.3.1 it then follows that |ϕ⟩ is a critical
state. That is, the orbit SLn |ψ⟩ contains a critical state. The proof of the converse part can
be found in Theorem 47 of [227].
n
Therefore, if h ∈ SLIPk (An ) is another homogenous SLIP of degree k such that h ψ A ̸= 0
then the above equation is equivalent to
n n
f ϕA f ψA
= . (14.82)
h ϕ An h ψ An
n n
That is, if ψ A and ϕA belong to the same reversible SLOCC then the above equation must
hold. The following theorem demonstrates that the converse is also true for almost all states
in An .
Theorem 14.3.5. Let |ψ⟩, |ϕ⟩ ∈ Stable(An ). Then, there exists θ ∈ [0, 2π) and
M ∈ SLn such that (14.80) holds
if and only if (14.82) holds for all k ∈ N and all
n An
f, h ∈ SLIPk (A ) with h ψ ̸= 0.
Remark. Note that in the theorem above, k is unbounded. However, since it is known that
the space of SLIPs has a finite dimension, it is possible to restrict k, although the best upper
bound is unknown.
Proof. We showed above that (14.80) implies (14.82). It is therefore left to show the converse.
n n n
If there exists h ∈ SLIPk (An ) such that h ψ A ̸= 0 but h ϕA = 0 then clearly ψ A
n
and ϕA are not in the same invertible SLOCC class. We therefore assume without loss of
n An
generality that there exists k ∈ N and h ∈ SLIP k (A ) such that both h ψ ̸= 0 and
An
h ϕ ̸= 0, and denote by
n
h ϕA
λ := ̸= 0 . (14.83)
h ψ An
With this notation, our assumption in (14.82) implies that for all f ∈ SLIPk (An ),
n n n
f ϕA = λf ψ A = f λ1/k ψ A . (14.84)
Our first goal is to show that up to some phase factors, f above can be replaced with any
SLIP (even not homogeneous). For this purpose, consider the subgroup Gn,k ⊂ GL(An )
defined by
Gn,k := µM : µk = 1 , M ∈ SLn , µ ∈ C ,
(14.85)
and observe that in addition for being SLn -invariant polynomial (i.e., SLIP), h is also Gn,k -
invariant polynomial. Moreover, the degree of any homogeneous Gn,k -invariant polynomial
must be divisible by k. To see this, let g be a Gn,k -invariant polynomial of degree m. Then,
since for any µ ∈ C such that µk = 1 we have µI ∈ Gn,k , it follows that g(|ψ⟩) = g(µI|ψ⟩) =
µm g(|ψ⟩), so that µm = 1. Since m satisfies this property for any such µ (i.e. any k-th root
of unity), we conclude that m = kr for some r ∈ N.
Now, fix k, and let g be a homogenous Gn,k -invariant polynomial of degree kr for some
r ∈ N. Since g is a SLIP, from the assumption of the theorem
n n
g ϕA g ψA
= . (14.86)
hr ϕAn hr ψ An
where in the last equality we used the fact that g is homogeneous of degree kr. Since the above
equation holds for any homogenous SLn -invariant polynomial g (recall that r was arbitrary),
it must also hold for all (possibly non-homogeneous) SLn -invariant polynomials. Hence,
using a result from invariant theory that closed orbits of a reductive algebraic subgroup of
GL(An ) are separated by their invariant polynomials, we conclude that there exists µ ∈ C
n n n n
with µk = 1 and M ∈ SLn such that ϕA = λ1/k µM ψ A . The upshot is ϕA = cM ψ A
n n
for some c ∈ C, and the normalization ϕA = 1 gives c = eiθ / M ψ A . This completes
the proof.
The theorem above demonstrates that SLIPs can be used to classify multipartite entan-
glement. We give two examples of such classifications in three and four qubits systems.
In the previous chapters, we learned that pure bipartite states can always be represented in
their Schmidt form. Specifically, for a two-qubit system AB, any state ψ ∈ Pure(AB) can
be expressed, up to local unitaries, as
√ p
|ψ AB ⟩ = p|00⟩ + 1 − p|11⟩, (14.88)
where p ∈ [0, 1]. We refer to this representation as the canonical form of the state ψ AB .
Now, our goal is to find a canonical form for any three-qubit state in ABC where |A| =
|B| = |C| = 2. To achieve this, we will utilize the following property presented in the
following exercise.
Exercise 14.4.1. Let AB be a two-qubit system and let |ψ0 ⟩, |ψ1 ⟩ ∈ AB be two pure bipartite
vectors. Show that if the vectors |ψ0AB ⟩ and |ψ1AB ⟩ are linearly independent then there exists
numbers a, b ∈ C such that a|ψ0AB ⟩ + b|ψ1AB ⟩ is a product (i.e. non-entangled) state. Hint:
Denote by c := ab and view the determinant of the reduced density matrix of the (non-
normalized) state c|ψ0AB ⟩+|ψ1AB ⟩ as a quadratic polynomial in c. Recall that over the complex
field, all quadratic polynomials have roots.
Remark. The normalization of ψ ABC implies that λ20 + · · · + λ24 = 1. In the proof below, the
fact that we can restrict θ to the domain [0, π] will be left as an exercise.
Proof. Every three-qubit state |ψ⟩ ∈ ABC can be expressed as
ψ ABC = |0⟩A |ψ0BC ⟩ + |1⟩A |ψ1BC ⟩ , (14.90)
where |ψ0 ⟩, |ψ1 ⟩ ∈ BC are two orthogonal (possibly unnormalized) vectors. From the exercise
above it follows that there exists two complex numbers a, b ∈ C such that a|ψ0BC ⟩ + b|ψ1BC ⟩
2 2
is a product state. Note that loss of generality we can assume that |a| + |b| = 1.
without
a b
Therefore, the matrix U = is a unitary matrix, so that by applying U to the first
−b̄ ā
qubit of |ψ ABC ⟩ we get
U A ⊗ I BC ψ ABC = a|0⟩A − b̄|1⟩A ψ0BC + b|0⟩A + ā|1⟩A |ψ1BC ⟩
(14.91)
= |0⟩A a ψ0BC + b ψ1BC + |1⟩A ā ψ1BC − b̄ ψ0BC .
Since a ψ0BC +b ψ1BC is a (possibly unnormalized) product state, there exists a local unitary
on BC that transform it to the state λ0 |00⟩BC where λ0 ∈ C is some normalization factor.
We therefore conclude that, up to local unitaries, the state |ψ ABC ⟩ can be expressed as
|ψ ABC = λ0 |000⟩ + |1⟩|ϕBC ⟩ (14.92)
where |ϕBC ⟩ is some vector in BC. Let λ1 , . . . , λ4 ∈ C be such that
|ϕBC ⟩ = λ1 |00⟩ + λ2 |01⟩ + λ3 |10⟩ + λ4 |11⟩ . (14.93)
Note that by applying to the state above, the local unitary
iθ1 iθ3
e 0 e 0
U BC := ⊗ , (14.94)
iθ2 iθ4
0 e 0 e
we get
U BC |ϕBC ⟩ = λ1 ei(θ1 +θ3 ) |00⟩ + λ2 ei(θ1 +θ4 ) |01⟩ + λ3 ei(θ2 +θ3 ) |10⟩ + λ4 ei(θ2 +θ4 ) |11⟩ . (14.95)
Therefore, by choosing appropriately the four phases θ1 , θ2 , θ3 , θ4 we can make three of the λs
non-negative real numbers. We choose them to be λ2 , λ3 , λ4 ∈ R+ . Moreover, observe that
by applying eiθ |0⟩⟨0| + |1⟩⟨1| to system A in (14.92) we can add a phase to λ0 . Therefore,
we can assume without loss of generality that λ0 is a real non-negative number.
Exercise 14.4.2. Complete the proof above by showing that θ in (14.89) can be restricted to
[0, π].
Exercise 14.4.3. Let ψ ABC be the three-qubit state given in (14.89). Show that its three
local marginals (i.e. reduced density matrices) are given by
2 −iθ 2 2 2 iθ
λ0 λ0 λ1 e λ + λ1 + λ2 λ1 λ3 e + λ2 λ4
ψA = , ψB = 0 (14.96)
λ0 λ1 e−iθ 1 − λ20 λ1 λ3 e−iθ + λ2 λ4 λ23 + λ24
and
+ +λ20 λ21 λ23
λ1 λ2 e + λ3 λ4 iθ
ψC = . (14.97)
λ1 λ2 e−iθ + λ3 λ4 λ22 + λ24
From Theorem 14.4.1 and the exercise above we get that up to local unitaries, there is
only one normalized critical state given by the GHZ state
1
|GHZ⟩ := √ |000⟩ + |111⟩ . (14.98)
2
Proof. From the properties of critical states (see Theorem 14.3.1) we know that if |ψ⟩ ∈
Crit(ABC) is normalized then all three local marginals ψ A , ψ B , and ψ C must be maximally
mixed. Now, from Theorem 14.4.1 we know that up to local unitaries the state ψ ABC can
be expressed as in (14.89). Hence, using this form, we get from Exercise 14.4.3 that the
condition ψ A = 21 I A holds if and only if λ20 = 21 and λ1 = 0. The condition ψ B = 12 I B gives
in particular λ20 + λ21 + λ22 = 21 . Therefore, also λ2 = 0. Finally, the condition ψ C = 12 I C gives
λ3 = 0 and λ24 = 12 . Hence, the state ψ ABC as given in (14.89) is critical if and only in it is
the GHZ state. This concludes the proof.
Recall the homogeneous SLIP of degree four as defined in (14.19) for odd number of
qubits. For three qubit system ABC (with |A| = |B| = |C| = 2), its absolute value is called
the 3-tangle, and it is given for any vector
by
|ψ0 ⟩, |ψ0 ⟩ |ψ0 ⟩, |ψ1 ⟩
Tangle ψ ABC
:= det (14.101)
|ψ1 ⟩, |ψ0 ⟩ |ψ1 ⟩, |ψ1 ⟩
where |ψx ⟩, |ψy ⟩ := ⟨ψ̄xBC |J2 ⊗J2 |ψyBC ⟩ for each x, y ∈ {0, 1}; recall that J2 := |0⟩⟨1|−|1⟩⟨0|.
From the corollary discussed earlier, it follows that all stable vectors in ABC are, up to
normalization, contained in the G3 orbit of the GHZ state |GHZ⟩. In other words, almost
all three-qubit normalized states are in the SLOCC class of the GHZ state. This, in turn,
implies that almost all three-qubit states have a non-zero 3-tangle, which is consistent with
the formula for the 3-tangle given below.
Exercise 14.4.4. Show that the 3-tangle of the state ψ ABC in (14.89) is given by
Tangle ψ ABC = λ0 λ4 .
(14.102)
The formula presented in the exercise above shows that the 3-tangle is zero when λ0 = 0,
which makes sense because in this case, the state ψ ABC is a product state between system
A and system BC. This implies that the state is in the null cone, i.e., it has no genuine
tripartite entanglement. On the other hand, if λ4 = 0, then the 3-tangle is also zero. In this
case, the state ψ ABC can be expressed as
ψ ABC = λ0 |000⟩ + λ1 eiθ |100⟩ + λ2 |101⟩ + λ3 |110⟩ . (14.103)
If we apply the flip operator |10| + |0⟩⟨1| to the first qubit of the state ψ ABC , the resulting
state takes the form:
ψ ABC = λ1 eiθ |000⟩ + λ0 |100⟩ + λ2 |001⟩ + λ3 |010⟩ . (14.104)
⊗3
Moreover, observe that by applying the local unitary matrix e−iθ/3 |0⟩⟨0| + e2iθ/3 |1⟩⟨1| we
can eliminate the phase attached to the |000⟩ term. Therefore, after renaming the coefficients
λ0 , . . . , λ3 we conclude that unless the state ψ ABC is a product state between A and BC, its
3-tangle is zero if and only if, up to local unitaries, it can be expressed as
|ψ ABC ⟩ = λ0 |000⟩ + λ1 |100⟩ + λ2 |010⟩ + λ3 |001⟩ , (14.105)
with λ0 , . . . , λ3 ∈ R+ .
Exercise 14.4.5. Show that for any three-qubit pure state ψ ABC of the form (14.105), there
exists three matrices M, N, L ∈ GL(2, C) such that
ψ ABC = M ⊗ N ⊗ L W (14.106)
where |W ⟩ is the W-state as defined in (14.68).
The preceding discussion and exercise demonstrate that the SLOCC class of the W -state
consists of all states whose 3-tangle vanishes. Furthermore, since the W -state lies in the
null cone (as shown in the discussion below equation (14.68)), we can conclude that the null
cone precisely consists of the SLOCC class of the W -state. This, in turn, implies that a
three-qubit vector lies in the null cone if and only if its 3-tangle vanishes. This also implies
that any other SLIP must be proportional to a power of the 3-tangle. Therefore, the 3-tangle
is essentially the absolute value of the only SLIP in three qubits.
In summary, we can divide the space of three qubits into six invertible SLOCC classes:
• The “genuine” tripartite entanglement classes: the GHZ class and the W-class.
• Three bipartite entanglement classes: the three SL3 -orbits generated by |0⟩A |ΦBC ⟩,
|0⟩B |ΦAC ⟩, and |ΦAB ⟩|0⟩C .
SU (2) ⊗ SU (2) ∼
= SO(4) . (14.107)
T U1 ⊗ U2 T ∗ ∈ SO(4) .
(14.109)
Hint: Show that T T T = J ⊗ J, where J = |0⟩⟨1| − |1⟩⟨0| is the matrix that satisfies (C.13).
We can use the above isomorphism to get the canonical form of a four-qubit state. Let
ψ ABCD ∈ ABCD be a four qubit state, and let M : AB → AB be the 4 × 4 complex
matrix defined via
ψ ABCD = M ⊗ I CD Ω(AB)(CD) (14.110)
where X
Ω(AB)(CD) = |xy⟩AB |xy⟩CD . (14.111)
x,y∈{0,1}
In other words, we view four-qubit states as 4 × 4 complex matrices. Consider now a state
with each Ux ∈ SU (2). That is, |ψ⟩ and |ϕ⟩ are related by local unitaries. Let N be the
4 × 4 matrix representing ϕABCD similarly to (14.110). Then, from the second part of
Exercise 2.3.26 we get that M and N are related by
N = (U1 ⊗ U2 )M (U3 ⊗ U4 )T
(14.113)
= T ∗ O1 T M T ∗ O2 T ,
M ∗ M = D + iΛ; , (14.115)
and D, Λ ∈ R4×4 , with D being a diagonal matrix with non-negative diagonal elements and
Λ being a skew-symmetric matrix.
Exercise 14.4.8. Prove that if Λ1 , Λ2 ∈ SL(2, C) then
where SO(4, C) is the (non-compact) special orthogonal group over C; i.e. O ∈ SO(4, C) if
and only if O ∈ C4×4 , OT O = I4 , and det(O) = 1. Here T is the same matrix that was used
in Exercise 14.4.6.
Critical States
In this subsection, we will characterize the set Crit(ABCD) of critical states in the four-
qubit system by leveraging the isomorphism described in (14.116). Specifically, we begin
by considering Λ = Λ1 ⊗ Λ2 ⊗ Λ3 ⊗ Λ4 ∈ G4 , ψ ∈ ABCD, and the matrix M defined
in (14.110). We observe that
T
Λ ψ ABCD = N ⊗ I CD Ω(AB)(CD) where N = Λ1 ⊗ Λ2 M Λ3 ⊗ Λ4 .
(14.117)
Next, under the isomorphism in (14.116), the matrix M is transformed into M̃ = T M T ∗ and
N into Ñ = T N T ∗ . We can then express the relation between M̃ and Ñ as Ñ = O1 M̃ O2 ,
where T
O1 := T Λ1 ⊗ Λ2 T ∗ and O2 := T Λ3 ⊗ Λ4 T ∗ .
(14.118)
Note that if O1 and O2 were unitaries, we could have diagonalized M̃ using the singular value
decomposition. However, since they are orthogonal, this is not always possible. Nevertheless,
a somewhat cumbersome canonical form does exist (see, e.g., [218]).
We now focus on four-qubit states in ABCD whose corresponding M̃ matrix has the form
O1 DO2′ , where D is a 4 × 4 complex diagonal matrix, and O1′ and O2′ are 4 × 4 orthogonal
′
complex matrices. We will show that all critical states in four qubits belong to this class.
Therefore, by the Kempf-Ness theorem (Theorem 14.3.4) in conjunction with (14.77), this
class of states is dense in ABCD. In other words, almost all four-qubit pure states have this
property.
We begin by noting that the diagonalizable property of M̃ remains invariant under the
action of G4 . This is because we have already shown that for every |ψ⟩ ∈ ABCD, the
transformation |ψ⟩ → Λ|ψ⟩ translates, under the isomorphism, to the transformation of M̃
to O1 M̃ O2 = O1 O1′ DO2′ O2 , which is of the form Q1 DQ2 , where Q1 = O1 O1′ and Q2 = O2 O2′
are two orthogonal matrices.
Next, for a fixed diagonal matrix D = Diag(λ1 , λ2 , λ3 , λ4 ), where each λx ∈ C (x ∈ [4]),
we take the state corresponding to M̃ = D to represent this G4 orbit. Note that M =
T ∗ M̃ T = T ∗ DT , so the representative state has the form
ψ ABCD = T ∗ DT ⊗ I CD Ω(AB)(CD)
(14.119)
= (T ∗ D ⊗ T T ) Ω(AB)(CD) .
For any j ∈ [4] with binary representation (x, y) (with x, y ∈ 0, 1), we define |uAB
j ⟩ :=
T ∗ |xy⟩AB and |vjCD ⟩ := T T |xy⟩CD . With these notations
X
ψ ABCD = λj |vjAB ⟩|uCD
j ⟩ . (14.120)
j∈[4]
ψλABCD = λ1 ΦAB
+ |ΦCD
+ + λ2 ΦAB
− |ΦCD
− + λ3 ΨAB
+ |ΨCD
+ + λ4 ΨAB
− |ΨCD
− , (14.121)
where {|Φ± ⟩, |Ψ± ⟩} denotes the Bell basis of maximally entangled states in four qubits.
Note that if there exists another diagonal matrix D′ = Diag(λ′1 , λ′2 , λ′3 , λ′4 ) such that
D′ = O1 DO2 , we then must have
2
D′ = (D′ )T D′ = O2T D2 O2 . (14.122)
Theorem 14.4.2. Let |ψλ ⟩ and |ψλ′ ⟩ be two four qubit states as given in (14.121),
with the coefficients λ1 and λ′1 being real positive such that for all x = 2, 3, 4 we have
λ1 ⩾ |λx | and λ′1 ⩾ |λ′x |. Then, the two states |ψλ ⟩ and |ψλ′ ⟩ belong to the same
SLOCC class if and only if λ1 = λ′1 and there exists a permutation, π, on three
elements such that for each x ∈ {2, 3, 4} we have λ′x = λπ(x) or λ′x = −λπ(x) .
The theorem above highlights a stark contrast between three-qubit systems and four-
qubit systems. While three-qubit systems have a finite number of SLOCC classes, the same
cannot be said for four-qubit systems. In fact, the theorem demonstrates that four-qubit
systems have an uncountable number of SLOCC classes.
This has significant implications, as it means that converting |ψλ ⟩ to |ψλ′ ⟩ by LOCC is
impossible, even with a probability less than one, unless λ′ = λ up to a permutation and a
sign change of the components of λ′ and λ. In simpler terms, the components of λ′ and λ
must be identical except for a rearrangement and possibly a change in sign.
Exercise 14.4.9. Show that the state |ψλ ⟩ in (14.121) is a critical state. Specifically, show
that if ψλABCD is normalized then its four local marginals are maximally mixed; i.e. show
that
1
ψλA = ψλB = ψλC = ψλD = I2 . (14.123)
2
It is worth noting that in [220, 218, 226] it has been shown that up to local unitaries, the
set
X
C := ψλABCD : λ1 , λ2 , λ3 , λ4 ∈ C , |λx |2 = 1 (14.124)
x∈[4]
(k)
and for each j ∈ [n], Nj ∈ L(Aj ). We denote the set of all such channels by SEP(An →
(k)
An ). While the matrices Nj (and, by extension, Mk ) might not always be invertible,
in this section, our attention is specifically on the conversion between one pure state to
another under the assumption that all {Mk }k ∈ [m] are non-singular. We use the notation
SEP1 (An → An ) ⊂ SEP(An → An ) to represent all separable channels of this kind. In
essence, our focus is restricted to separable operations as defined earlier, with each Mk being
an element of GLn . For further insights and references into the relations between LOCC,
SEP1 , and SEP, readers interested are referred to the concluding section of this chapter,
titled “Notes and references.”
n
Definition 14.5.1. Let ψ ∈ Pure(An ). The stabilizer group of ψ A is a subgroup of
GLn defined by
Stab(ψ) := Λ ∈ GLn : Λ|ψ⟩ = |ψ⟩
(14.126)
Note that the set Stab(ψ) is not empty since the identity matrix belongs to it.
Exercise 14.5.1. Let ψ ∈ Pure(An ) and consider the stabilizer group Stab(|ψ⟩).
Exercise 14.5.2. Let AB be a bipartite system with |A| = |B|. Find the stabilizer group of
the maximally entangled state |ΦAB ⟩.
The stabilizer group for ψ is a subgroup of GLn . One may naturally wonder how this
group is related to the same group, but with GLn replaced by SLn . The following theorem
demonstrates that, unless ψ is in the null cone of An , every element in the stabilizer group
of ψ lies in SLn up to a factor given by a root of unity.
Theorem 14.5.1. Let ψ ∈ Pure(An ) and suppose that |ψ⟩ ̸∈ Null(An ). Then, there
exists m ∈ N such that
n 2πk o
Stab(ψ) ⊂ Gm := ei m Λ : k ∈ [m] , Λ ∈ SLn . (14.128)
Proof. By definition, since |ψ⟩ ̸∈ Null(An ) there exists a homogeneous SLIP, f , with the
property that f (|ψ⟩) ̸= 0. Let m be the degree of f . Now, let Λ′ ∈ Stab(ψ) and observe
that since Stab(ψ) ⊂ GLn there exists a ∈ C such that Λ′ = aΛ, where Λ ∈ SLn . Thus, the
property Λ′ |ψ⟩ = |ψ⟩ gives
Corollary 14.5.2. Let ψ ∈ Crit(An ) be such that Stab(ψ) is a finite group. Then,
there exists m ∈ N such that
Stab(ψ) ⊂ Km . (14.133)
Proof. Let M ∈ Stab(ψ). Since ψ is a critical state it is not in the null cone of An , so that
from Theorem 14.5.1 there exists N ∈ SLn , and a ∈ C with am = 1, such that M = aN .
Moreover, using the polar decomposition we can further express N as N = U Λ, where
U ∈ SU n and Λ > 0 is a positive matrix in SLn . Hence,
|ψ⟩ = M |ψ⟩ = aU Λ|ψ⟩ = Λ|ψ⟩ . (14.134)
As Λ ∈ SLn is positive definite, the Kempf-Ness theorem (as described in Exercise 14.3.1)
implies that Λ|ψ⟩ = |ψ⟩. In other words, Λ belongs to the stabilizer group Stab(ψ). Since
Stab(ψ) is a finite group, the sequence {Λk }k∈N must contain elements that are equal to each
n
other, and therefore, there exists k ∈ N such that Λk = I A . Since Λ > 0 we must have
n
Λ = I A . Hence, M = aU ∈ Km . Since M was an arbitrary element of Stab(ψ) we conclude
that all the elements of Stab(ψ) belong to Km . This completes the proof.
Now, recall that in three qubits, the 3-tangle is defined in terms of an homogeneous SLIP
of degree 4. Therefore, from Corollary 14.5.1 we get that the quotient group Stab(ϕ)/G is
a group of order at most four. However, note that if Λ ∈ SL3 then also −Λ ∈ SL3 so that
G4 = G2 , where the groups G2 and G4 are defined in (14.128). We therefore conclude that
Stab(ϕ)/G is a group of order at most two. Since X ⊗ X ⊗ X ∈ Stab(ϕ)/G we conclude
that Stab(ϕ)/G contains only the identity matrix and the flip matrix X ⊗ X ⊗ X so that
Stab(ϕ) is the union of G and the coset (X ⊗ X ⊗ X)G.
Exercise 14.5.4. Find the stabilizer group of the W-state of three qubits. Is it compact?
|ψ⟩ = λ1 |Φ+ ⟩|Φ+ ⟩ + λ2 |Φ− ⟩|Φ− ⟩ + λ3 |Ψ+ ⟩|Ψ+ ⟩ + λ4 |Ψ− ⟩|Ψ− ⟩ , (14.136)
where {|Φ± ⟩, |Ψ± ⟩} is the Bell basis of two qubits, λ1 , λ2 , λ3 , λ4 ∈ C, and λ2x ̸= λ2x′ for all
x ̸= x′ ∈ [4]. For this four-qubit state, it can be shown (see [?] and [226]) that the group
G = Stab(ψ) ∩ SL4 is the Klein group consisting of only four elements:
G = {I , X ⊗ X ⊗ X ⊗ X , Y ⊗ Y ⊗ Y ⊗ Y , Z ⊗ Z ⊗ Z ⊗ Z} (14.137)
1
|ψ⟩ := √ |Φ+ ⟩|Φ+ ⟩ + ω|Φ− ⟩|Φ− ⟩ + ω|Ψ+ ⟩|Ψ+ ⟩ , (14.139)
3
2π
where ω = ei 3 .
for some cx ∈ C. After combining the relation above with (14.143), and performing some
algebra, we obtain:
1 ∥N2 |χ⟩∥ −1
N Mx N1 |χ⟩ = |χ⟩ . (14.146)
cx ∥N1 |χ⟩∥ 2
SEP
with px := |cx |2 . Hence, ψ1 −−→
1
ψ2 if and only if Λ1 and Λ2 satisfy the relation above. This
completes the proof.
Let χ ∈ Crit(An ) have a finite stabilizer, i.e., Stab(χ) = Uxx∈[m] is a finite set of unitaries,
as established by Corollary 14.5.2. In this situation, we can define the Stab(χ)-twirling
operation as:
n 1 X n
G ωA = Ux ω A Ux∗ ∀ ω ∈ L(An ) . (14.150)
m
x∈[m]
1 SEP
Now, according to Theorem 14.5.2 ψ1 −−→ ψ2 if and only if there exists a probability
distribution {px }x∈[m] such that
X
Λ1 = px Ux∗ Λ2 Ux . (14.151)
x∈[m]
1 SEP
By taking the twirling map G on both sides of the equation above we get that if ψ1 −−→ ψ2
then
G (Λ1 ) = G (Λ2 ) . (14.152)
SEP
1
In other words, the condition above is a necessary condition for the conversion ψ1 −−→ ψ2
SEP1
(but not always sufficient). Moreover, if Λ1 is symmetric, meaning G(Λ1 ) = Λ1 , then ψ1 −−→
ψ2 if and only if Λ1 = G (Λ2 ).
n
One special case that Λ1 is symmetric is the case that ψ1 = χ. In this case, Λ1 = I A
SEP1 n n
and χ −−→ ψ2 if and only if G (Λ2 ) = I A . Conversely, if ψ2 = χ then Λ2 = I A , so the
n SEP1
condition (14.151) becomes Λ1 = I A . In other words, ψ1 −−→ χ if and only if up to local
unitaries ψ1 = χ. This is consistent with the intuition that the critical state is the maximally
entangled state of the SLOCC orbit.
To demonstrate how the theorem mentioned above generalizes Nielsen’s majorization
theorem, we now apply it to the bipartite case. Let us consider a bipartite system AB with
m := |A| = |B|. The only critical state of this system is the maximally entangled state Φm ,
and its stabilizer is given by:
Stab(Φm ) := S −1 ⊗ S T : S ∈ GL (m) .
(14.153)
The stabilizer group mentioned above is clearly not compact. However, in the derivation
of Nielsen’s majorization theorem, we employed Lo-Popescu’s Theorem (Theorem 12.2.1)
which limits Bob’s operations to be unitary operations. Thus, we can limit S to be a unitary
matrix without loss of generality.
Consider two bipartite states ψ1 , ψ2 ∈ Pure(AB). As per Exercise 14.5.7, an APO of
ψ1 has the form Λ1 = ρA B A AB
1 ⊗ I , where ρ1 is the reduced density matrix of ψ1 . Similarly,
an APO of ψ2 has the form Λ2 = ρA B A
2 ⊗ I , where ρ2 is the reduced density matrix of ψ2 .
AB
SEP
Therefore, from Theorem 14.5.2, we can infer that ψ1AB −−→ ψ2AB if and only if there exists a
1
probability distribution {px }x∈[k] along with k unitary matrices {Ux }x∈[k] ⊂ U (m) such that:
X
ρA B
1 ⊗I = px Ux∗ ρA B
2 Ux ⊗ I . (14.154)
x∈[k]
This condition is precisely the same as the condition we obtained in (12.26) for Nielsen’s
majorization criterion.
Exercise 14.5.8. Let ψ be the 4-qubit state (14.136) and suppose it satisfies (14.138). Clas-
SEP1
sify all the 4-qubit states ϕ for which ψ −−→ ϕ holds.
Lemma 14.6.1. Let ψ ∈ Pure(AB) and let ρB := TrA ψ AB be its reduced density
matrix on system B. Then, for every pure state decomposition ρB = x∈[m] px ϕB
P
x,
there exists a POVM on Alices system, {Λx }x∈[m] ⊂ Eff(A), such that for all x ∈ [m]
1
ϕB TrA ΛA B
AB
ΛA B
AB
x = x ⊗I ψ and px = Tr x ⊗I ψ . (14.156)
px
where the supremum is taken over all pure state decompositions of ρAB = x∈[m] px ϕAB
P
x .
Note that this definition is similar to the definition of the entanglement of formation given
in (13.68), except that we take the supremum instead of the infimum as taken in (13.68).
Exercise 14.6.1. Compute the entanglement of assistance of the maximally mixed state uAB
and conclude that the entanglement of assistance is not a measure of entanglement.
Exercise 14.6.2. Let E be a measure of pure bipartite entanglement, and let EF and Ea be its
corresponding entanglement of formation and assistance ABrespectively.
Let ψ ∈ Pure(AB1 B2 )
AB1 := 1 B2
be a tripartite pure state with marginal ρ TrB2 ψ . Show that if ψ AB1 B2 satisfies
the disentangling condition (14.184) then
EF ρAB = Ea ρAB .
(14.161)
However, we now show that for certain choices of maximally entangled states {ΦAx
2B
}x∈0,1,2,3 ,
the transformation above cannot be achieved (even with probability less than one) if we only
allow system R to perform a measurement.
The reduced density matrix ρAB := TrR ψ ABR of the state above can be expressed as
ρAB = φAB
0 + φAB
1 (14.165)
where
1 A1 A2 B 1
|φAB ⟩ + √ |1⟩A1 |ΦA 2B
⟩ + |ΦA 2B
0 ⟩ := |0⟩ |Φ0 2 3 ⟩ (14.166)
2 2 2
and
1 A1 A2 B 1
|φAB ⟩ + √ |1⟩A1 |ΦA 2B
⟩ − |ΦA 2B
1 ⟩ := |0⟩ |Φ1 2 3 ⟩ (14.167)
2 2 2
We argue that there exists four maximally entangled states {ΦA
x
2B
}x∈{0,1,2,3} such that any
AB AB
linear combination of |φ0 ⟩ and |φ1 ⟩ is not maximally entangled. Indeed, take
1
|ΦA
0
2B
⟩ = |ΦA
2
2B
⟩ = (|00⟩ + |11⟩ + |22⟩ + |33⟩) (14.168)
2
and
1
|ΦA
1
2B
⟩ = (|00⟩ − i|11⟩ − |22⟩ + i|33⟩)
2 (14.169)
A2 B 1
|Φ3 ⟩ = (|00⟩ − i|11⟩ + |22⟩ − i|33⟩) .
2
With these choices we get by direct calculation that for any a, b ∈ C the linear combination
a|φAB AB
0 ⟩ + b|φ1 ⟩ (14.170)
is not proportional to the maximally entangled state (see Exercise 14.6.3). Therefore, the
state ψ ABR cannot be converted to ΦAB (even with probability less than one) by a local
measurement on system R. Alternatively, none of the pure-state decompositions of ρAB
contains a maximally entangled state (i.e. 2-ebits).
Exercise 14.6.3. Show that for any choice of a, b ∈ C with |a|2 +|b|2 = 1 the state in (14.170)
is
P3not maximally entangled. Hint: Write the state in (14.170) as a linear combination
A B
x=0 |ϕx ⟩|x⟩ and show that the vectors {|ϕA
x ⟩}x cannot all have the same norm and also
orthogonal to each other.
Entanglement of Collaboration
Definition 14.6.1. Let E be a measure of bipartite entanglement for mixed states.
Its corresponding measure of tripartite entanglement, known as the ”entanglement of
collaboration” and denoted by Ec , is defined as:
′ ′
Ec ρABR := sup E N ABR→A B ρABR
∀ ρ ∈ D(ABR) , (14.171)
N ∈LOCC
where the supremum is over all quantum systems A′ and B ′ , and all LOCC channels
′ ′
N ABR→A B .
Therefore, since entanglement of collaboration is defined as a supremum over all such LOCC
′ ′
channels N ABR→A B it must be no smaller than the entanglement of assistance. Furthermore,
unlike entanglement of assistance, entanglement of collaboration is a measure of tripartite
entanglement.
where the parenthesis in ρA(BR) indicates that the entanglement is computed between
system A and the composite system BR.
One of the most fundamental questions in entanglement theory is the distillation of Bell
states from multiple copies of a bipartite entangled state. As we saw earlier, for a given
pure state ψ AB , the distillable entanglement is determined by the von-Neumann entropy of
the reduced density matrix ψ A . A similar question arises in the multipartite regime: given
many copies of a tripartite pure entangled state ψ ABR , how many Bell states can be distilled
between Alice and Bob by LOCC of all three parties sharing the state?
The above lemma asserts that the optimal distillation rate cannot exceed the minimum
between the entropies of system A and system B. Remarkably, it has been shown that this
upper bound can be attained. However, we will present the proof of this statement in volume
2 of this book, after introducing the quantum state merging protocol from quantum Shannon
theory.
Proof. Let ρABR be an extension of the state ρAB . Using the chain rule of the conditional
mutual information (see the second equality in (13.180)) we get
1 1 1
I(A : B|R)ρ = I(A : B1 |R)ρ + I(A : B2 |RB1 )ρ
2 2 2 (14.176)
By definition→ ⩾ Esq ρAB1 + Esq ρAB2 .
Since the above inequality holds for all extensions ρABR of ρAB we conclude that
It is important to recognize that not all measures of entanglement adhere to the monogamy
condition specified by (14.175). Nevertheless, the squash entanglement is currently the only
known measure of entanglement that satisfies (14.175) in all finite dimensions, which is a
remarkable property that highlights the unique nature of this measure. Other measures
satisfy (14.175) on fixed dimensions. For example, on qubit systems, the square of the
concurrence is also a monogamous measure of entanglement that satisfies (14.175) when
|A| = |B1 | = |B2 | = 2.
The equality above implies that for the pure state ψ ABC with marinals ρAB and ρAC we
have
2 2 2
Ca ρAB + Ca ρAC = 4 det ρA = C ψ A(BC) .
(14.179)
2
Denoting by τ ρAB := C ρAB (the square of the concurrence of formation, also know
AB AB
as the 2-tangle) and using the fact that C ρ ⩽ Ca ρ we arrive at the following
monogamy inequality
τ ψ A(BC) ⩾ τ ρAB + τ ρAC ,
(14.180)
where h i
A(BC) A 2
= 4 det ρA .
τ ψ := 2 1 − Tr ρ (14.181)
where f is a function of two variables that satisfies certain conditions. While this family of
monogamy relations may be more flexible than the original definition, it still lacks a clear
theoretical foundation. Thus, a more desirable solution would be to derive the monogamy
relations from more basic principles, which would provide a deeper understanding of the
nature of this phenomenon.
Recently, such approach to monogamy of entanglement has been proposed, which is more
“fine-grained” in nature and avoids the need for introducing a function f . This approach does
not involve monogamy relations such as (14.175) or (14.183). Instead, it defines a measure
of entanglement E to be monogamous if it satisfies a certain condition that does not involve
inequalities. In particular, this approach takes into account the fact that different measures
of entanglement have varying properties and limitations, rather than attempting to impose
a one-size-fits-all definition. By adopting this more nuanced approach, we can gain a deeper
understanding of the monogamy of entanglement and how it manifests itself across different
measures.
E ρAB = E ρAB1
(14.184)
we have that E ρAB2 = 0.
It is important to note that the relation given in (14.185) is not of the form given
in (14.183), since the monogamy exponent α in (14.185) depends on the dimension d, whereas
f is considered universal in the sense that it does not depend on the dimension. Therefore,
if a measure of entanglement such as the entanglement of formation is not monogamous
according to the class of relations given in (14.183), it does not necessarily mean that it is
not monogamous according to Definition 14.7.1.
In the next theorem we show that all quantum Markov states satisfy the disentangling
condition. For this purpose, we will rename B1 as B and B2 as B ′ , since the theorem
involves further decomposition of system B into subsystems. Specifically, an entangled
Markov quantum state ρ ∈ D(ABB ′ ) is a state of the form (cf. (13.185))
′ (1) (2)
B′
M
ρABB = px ρAB
x
x
⊗ ρB
x
x
(14.190)
x∈[m]
where M
B= Bx(1) ⊗ Bx(2) , (14.191)
x∈[m]
(1) (2)
B′ (1) (2)
and for each x ∈ [m], ρAB
x
x
and ρB
x
x
are density matrices in D(ABx ) and D(ABx ),
respectively.
Proof. Since local ancillary systems are free in entanglement theory, one can append a classi-
(1) (2)
cal ancillary system X that encodes the orthogonality of the subspaces Bx ⊗ Bx . This can
(1) (2)
be done with an isometry that maps states in Bx ⊗ Bx to states in B (1) ⊗ B (2) ⊗ |x⟩⟨x|X ,
(1) (2)
where systems B (1) and B (2) have dimensions maxx Bx and maxx Bx , respectively.
Therefore, without loss of generality we can write the above Markov state as
′ (1) (2) B ′
X
σ ABB X = px ρAB
x ⊗ ρxB ⊗ |x⟩⟨x|X . (14.192)
x∈[m]
Now, note that with any entanglement monotone E, the entanglement between A and BB ′
is measured by
′ ′
E ρABB = E σ ABB X
(1) B (2) B ′
X
(13.65)→ = px E ρABx ⊗ ρ x
x∈[m]
(14.193)
X (1)
= px E ρABx .
x∈[m]
E ρAB = E σ ABX
AB (1) B (2)
X
= px E ρ x ⊗ ρx
x∈[m] (14.194)
AB (1)
X
= px E ρ x .
x∈[m]
′
We therefore obtain E ρABB = E ρAB . This completes the proof.
The Markov state mentioned in the theorem has an important property: the marginal
′ ′
state ρAB is separable, which implies E(ρAB ) = 0. Therefore, Markov states always satisfy
the condition given in Definition 14.7.1. However, one might question whether the converse of
′ ′
the statement in the theorem holds true. In other words, if a state ρABB satisfies E(ρABB ) =
E(ρAB ), is it necessarily a Markov state? For mixed tripartite states, the answer is obviously
′
“no” because all separable states between system A and BB ′ satisfy E(ρABB ) = E(ρAB ),
but not all separable states are Markov states. Nonetheless, in the following theorem, we will
see that under mild assumptions, the converse of the above theorem holds for pure tripartite
states.
In Section 13.2.1, we observed that every entanglement monotone takes the form (13.70)
when evaluated on pure states. Specifically, the entanglement monotone E can be expressed
as follows:
E ψ AB = g ρA with ρA := TrB ψ AB ,
(14.195)
where the function g : D(A) → R+ is Schur concave. Furthermore, we noted that if g
is symmetric (i.e., invariant under unitary channels) and concave, then the convex roof
extension of E corresponds to an entanglement monotone. As a reminder, given any measure
of entanglement E on mixed states, we can construct its convex roof extension as
X
EF ρAB := min px E ψxAB
∀ ρ ∈ D(AB) , (14.196)
x∈[m]
where the minimum is over all pure state decompositions of ρAB = x∈[m] px ψxAB . Moreover,
P
if E is convex (e.g., entanglement monotone) then E ρAB ⩽ EF ρAB for all ρ ∈ D(AB).
′
Remark. The above theorem states that if the pure state ψ ABB satisfies the disentangling
condition, then it is a Markov state (up to local unitary on system B). Additionally, keep
in mind that since E measures entanglement, the function g defined in (14.195) is invariant
under unitary channels. As g is also strictly concave, the convex roof extension of E yields
an entanglement monotone, as stated in Theorem 13.2.1. However, we don’t assume E to be
equal to its convex roof extension. Instead, we observe that since E is convex, it is always
lower than or equal to its convex roof extension.
Proof. We only prove the implication from the first statement to the second, as the converse
is straightforward and left as an exercise for the reader. The first statement implies that
ABB ′
= E ρAB ⩽ EF ρAB ,
E ψ (14.197)
where we used the fact that E is no greater than its convex roof extension. P
On the other
hand, from Lemma 14.6.1 we get that every pure-state decomposition of ρ = x∈[m] px ψxAB
AB
′
has a corresponding measurement on system B ′ of ψ ABB , where the outcome x occurs with
probability px and the post-measurement state on system AB is ψxAB . When combined with
the fact that E is an entanglement monotone, this implies that
′
X
E ψ ABB ⩾ px E ψxAB .
(14.198)
x∈[m]
The two equations above lead to the very strong conclusion that all pure-state decom-
AB ABB ′
positions of ρ have the same average entanglement, which equals E ψ . In other
words, the inequality in the above equation is actually an equality, and it holds for every
pure-state decomposition {px , ψxAB }x∈[m] of ρAB . This equality can be expressed in terms of
the function g as follows:
X
g(ρA ) = px g(ρA
x) , (14.199)
x∈[m]
AB
where ρAx := TrB ψx . Since g is strictly concave, the equation above holds if and only if
ρ = ρx for all x ∈ [m]. Let B1 be a system of dimension r := |B1 | = Rank(ρA ), and let
A A
Our first goal is to show that if {|ψxAB ⟩}x∈[m] are the eigenvectors of ρAB , then Vx∗′ Vx = δxx′ I B1 .
To prove it, let {qy , ϕAB
y }x∈[m] be another pure-state decomposition of ρ
AB
, also with m
elements). Then, for the exact same reasons as stated above, for each y ∈ [m] there exists
an isometry Wy : B1 → B such that
B1 →B AB1
|ϕAB A
y ⟩ = I ⊗ Wy |χ ⟩ . (14.201)
A √ AB1
A
X √
I ⊗ qy Wy |χ ⟩ = I ⊗ uyx px Vx |χAB1 ⟩ . (14.203)
y∈[m]
−1/2
By multiplying both sides by ρA we can replace |χAB1 ⟩ on both sides of the equation
above with the (unnormalized) maximally entangled state |ΩB̃1 B1 ⟩. Therefore, the equation
above gives
√ X √
qy Wy = uyx px Vx . (14.204)
x∈[m]
Now, using the fact that {|ψxAB ⟩}x∈[m] forms an orthonormal set of vectors we get from (14.202)
that
√ 2
X
qy = qy |ϕAB
y ⟩ 2
= px |uyx |2 . (14.206)
x∈[m]
The equation above holds for all unitary matrices U = (uyx ) and all y ∈ [m]. Setting y = 1,
and choosing U to be a unitary matrix with its first row as √12 (1, 1, 0, . . . , 0) gives V2∗ V2 +
V1∗ V2 = 0. Similarly, choosing the first row of U to be √12 (1, i, 0, . . . , 0) gives V2∗ V2 −V1∗ V2 = 0.
Thus, we obtain V1∗ V2 = V2∗ V1 = 0. By repeating the same argument with permuted versions
of √12 (1, 1, 0, . . . , 0) and √12 (1, i, 0, . . . , 0), we conclude that for all x, x′ ∈ [m] such that x ̸= x′ ,
we have Vx∗′ Vx = 0.
Let {|z⟩}z∈[r] be an orthonormal basis of B1 , and define |φB xz ⟩ := Vx |z⟩ for all x ∈ [m] and
z ∈ [r]. Using the fact that Vx∗′ Vx = δxx′ I B1 , we can derive that ⟨φB B
x′ z ′ |φxz ⟩ = δxx′ δzz ′ for all
x ∈ [m] and z ∈ [r]. Let K be the subspace spanned by the orthonormal vectors {|φB xz ⟩}, with
x ∈ [m] and z ∈ [r], and note that the dimension of K is mr. Thus, there exists a subspace
B2 of B with |B2 | = m such that K is isomorphic to B1 ⊗ B2 . This isomorphism implies that
B1 B2 →B
there exists an isometry U : B1 B2 → B such that |φB xz ⟩ = U |z⟩B1 |x⟩B2 . Combining
B := B1 B1 B1 B2
this with the definition |φxz ⟩ Vx |z⟩ gives Vx |z⟩ = U |z⟩ |x⟩ for all x ∈ [m] and all
B1 →B B1 B2 →B
z ∈ [r]. Hence, Vx =U I B1 ⊗ |x⟩B2 so that |ψxAB ⟩ = U B1 B2 →B |χAB1 ⟩|x⟩B2 and
X
ρAB = U χAB1 ⊗ σ B2 U ∗ where σ B2 := px |x⟩⟨x|B2 .
(14.208)
x∈[m]
′
Observe that the state in (14.208) has a purification of the form U B1 B2 →B |χAB1 ⟩|ϕB2 B ⟩,
where X√
′ ′
|ϕB2 B ⟩ = px |x⟩B2 |x⟩B . (14.209)
x∈[m]
′
Therefore, since |ψ ABB ⟩ is also a purification of ρAB we conclude that up to a local unitary
′ ′
on system B and on system B ′ , the state ψ ABB has the form |χAB1 ⟩|ϕB2 B ⟩.
where the minimum is over all pure-state decompositions of ρAB = x∈[m] px ψxAB . Show that
P
if Eh is monogamous on pure tripartite states, then E is also monogamous on pure tripartite
states.
′ ′
Proof. Let {px , |ψ ABB ⟩}x∈[m] be the optimal pure state decomposition of ρABB satisfying
′
X ′
EF ρABB = px E ψxABB . (14.211)
x∈[m]
′
Suppose EF ρABB = EF ρAB . Combining this with the two equations above gives
ABB ′
X
= EF ρAB
px E ψx
x∈[m]
X (14.213)
px EF ρAB
Convexity of EF −−−−→ ⩽ x .
x∈[m]
since EF is a measure of entanglement (in fact, an entanglement monotone) and does not
increase under the tracing out of the local system B ′ . Hence, from the two inequalities above
we get that for all x ∈ [m] we have
ABB ′
= EF ρAB
E ψx x . (14.215)
(1) (2)
Now, from Theorem 14.7.4 we get that for each x ∈ [m] there exists systems Bx and Bx ,
(1) (2)
and an isometry Vx : Bx Bx → B such that
′ (1) (2) (1) (2) ′
Bx →B
|ψxABB ⟩ = VxBx
χAB
x
x
ϕB
x
x B
, (14.216)
(1) (2) ′
for some χx ∈ Pure(ABx ) and ϕx ∈ Pure Bx B . Tracing out system B on both sides of
the equation above gives
′ B′
ψxAB = χAx ⊗ ϕx . (14.217)
′ ′ ′
Therefore, the marginal state ρAB = x∈[m] px ψxAB is separable so that E ρAB = 0. This
P
completes the proof.
while that for four qubits was done by [218]. A detailed analysis of all 4-qubit maximally
entangled states can be found in [100, 203]. The classification of maximally entangled sets
of multipartite systems is presented in [59].
The generalization of Nielsen’s majorization theorem to the multipartite case, as pre-
sented in Theorem 14.5.2, is due to [101]. The generalization of this theorem to the full
set SEP can be found in [117]. Several other generalizations of this result, particularly de-
terministic interconversions of multipartite entanglement under various operations including
LOCC can be found in [204] and [60]. It is worth mentioning that in [92] and [196], it was
shown that the stabilizer group of almost all multipartite entangled states is trivial, and con-
sequently, LOCC conversion between two states in the same SLOCC class is almost never
possible. Nevertheless, certain multipartite states that have symmetry (e.g., GHZ states,
graph states, stabilizer states, etc.) have a non-trivial stabilizer group and consequently rich
entanglement properties [147].
In [117], it was demonstrated that any pure-state transformation attainable by LOCC
using a finite number of communication rounds can also be accomplished using SEP1 . How-
ever, not every pure-state transformation possible with SEP is achievable with SEP1 . These
findings underscore that SEP1 serves as a robust outer approximation of LOCC, particularly
given that infinite rounds of classical communication are less feasible in practice.
The concept of entanglement of assistance was first introduced in [65], and the example
given in (14.163) that demonstrates that it is not a tripartite entanglement monotone was
taken from [96]. The result that asymptotic entanglement of assistance is equal to the smaller
of its two local entropies was discovered in [201]. The concept of localizable entanglement was
first introduced in the context of spin chains in [219], and its comparison with entanglement
of collaboration can be found in [87].
Monogamy of entanglement was first introduced in [53], in which the CKW monogamy
relation was discovered. The monogamy of the squashed entanglement was discovered
in [50]. The concept of ”monogamy of entanglement without inequalities” was first in-
troduced in [104] and developed further in [106]. Additional references on monogamy of
entanglement can be found in those papers.
693
CHAPTER 15
695
696 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY
G-Invariant States
In the resource theory of asymmetry, each element g ∈ G is denoted by a unitary matrix
Ug . If ρ ∈ D(A) represents the density matrix of a quantum system with respect to Alice’s
reference frame, then the state of the same physical system with respect to Bob’s reference
frame is given by
Ug (ρ) := Ug ρUg∗ (15.1)
If Alice and Bob are unaware of the element g ∈ G that establishes the relation between
their reference frames, then the states that Alice can prepare relative to Bob’s reference
frame are those satisfying ρ = Ug (ρ) for all g ∈ G. Such states are referred to as G-invariant
and satisfy [ρ, Ug ] = 0, as indicated by Definition C.3.2.
The absence of a shared reference frame places a limitation on the types of states that
Alice can generate relative to Bob’s reference frame. She is only capable of creating G-
invariant states, which comprise the free states in the QRT of reference frames, denoted
as
F(A) = INVG (A) := {ρ ∈ D(A) : Ug (ρ) = ρ ∀g ∈ G} . (15.2)
For instance, suppose the group G = U (1) corresponds to an optical phase reference
or to dynamics with rotational symmetry around a fixed axis (in which case the group is
SO(2), which is known to be isomorphic to the group U (1)). In this scenario, a unitary
representation of G is provided by Uθ = eiN̂ θ , where θ ∈ U (1) and N̂ is the total number
operator (or in the case of rotational symmetry, N̂ can be replaced with Ln , the angular
momentum P operator in the n direction). In this instance, the free states are given by states
of the form n pn |n⟩⟨n|, where |n⟩ corresponds to the eigenvectors of N̂ .
More generally, it will be observed that the absence of a shared reference frame enforces
a superselection rule regarding the types of states that Alice can generate. This superselec-
tion rule is characterized by the fact that coherent superpositions between states in specific
subspaces are not feasible. For instance, coherent superpositions of U (1) states among the
eigenstates of the number operator are not free and cannot be prepared by Alice.
G-Covariant Channels
The set of free operations in the QRT of reference frames can be defined similarly to the
free states. Let σ ∈ D(B) be an arbitrary density matrix of system B described in Bob’s
reference frame. Suppose Alice performs a quantum operation on this system described
by the channel E ∈ CPTP(A → A) in her reference frame. How would this operation be
described in Bob’s reference frame? If Bob knows that their reference frames are linked by
an element g ∈ G, then Ug∗ (σ) is Alice’s description of the initial state, and E(Ug∗ (σ)) is her
description of the final state. Therefore, the final state in Bob’s reference frame is given by
Ug ◦ E ◦ Ug∗ (σ), and his description of Alice’s operation is Ug ◦ E ◦ Ug∗ .
Hence, if Alice and Bob are unaware of the value of g ∈ G, they will have a similar
description of the CPTP map E only if E satisfies
Quantum channels of this kind are referred to as G-covariant, and they represent the free
operations in the QRT of asymmetry. Similar to G-invariant states, a quantum channel is
G-covariant if and only if it commutes with Ug for all g ∈ G.
Therefore, the set of free operations in the QRT of reference frames can be expressed as
where [E, Ug ] := E ◦ Ug − Ug ◦ E (see Fig. 15.1 below). For instance, for G = U (1), a
G-covariant quantum channel, E ∈ CPTP(A → A), satisfies for all θ ∈ [0, 2π) and all
ρ ∈ D(A)
E eiθN̂ ρe−iθN̂ = eiθN̂ E(ρ)e−iθN̂ . (15.5)
This implies that for every g ∈ G, there exists a phase ωg ∈ C with |ωg | = 1 such that
Ug V Ug∗ = ωg V . (15.7)
Since this equation holds for all g ∈ G, it follows that the map g 7→ ωg is a 1-dimensional
representation of G. Specifically, when g = e is the identity element, we have Ug = I which
gives ωg = 1. Furthermore, for g, h ∈ G, we have
∗
ωgh V = Ugh V Ugh
= Ug Uh V Uh∗ Ug∗
(15.8)
= ωh Ug V Ug∗
= ωh ωg V ,
which implies that ωgh = ωg ωh . In other words, the set of all G-covariant unitary chan-
nels can be characterized by unitary matrices that are “almost” G-invariant, meaning they
commute with the elements of the group up to a phase, where this phase itself forms a
1-dimensional representation of G.
In conclusion, we have observed that in the QRT of reference frames, the set of free
states is the set of symmetric states (i.e., those states that commute with Ug for all g ∈ G),
and the set of free operations is the set of symmetric operations (i.e., those operations that
commute with Ug for all g ∈ G). Symmetric evolutions are prevalent in physics and may arise
in various contexts, not just from the absence of a shared reference frame. Therefore, the
set of G-covariant operations defines a resource theory with applications extending beyond
quantum reference frames. It may be referred to as a QRT of asymmetry because in any
QRT in which F specifies a set of G-covariant operations, asymmetric states and asymmetric
operations are the resources of the theory.
Thus far, we have only examined G-covariant channels with the same input and output
dimensions. More generally, a quantum channel E : CPTP(A → B) is G-covariant with
respect to two (unitary) representations of G, {UgA }g∈G and {UgB }g∈G , if
Refer to Fig. 15.1 for an illustrative depiction of G-covariant operations. The set of all
G-covariant quantum channels in CPTP(A → B) will be denoted by COVG (A → B). It is
worth noting that this notation does not explicitly specify the two unitary representations
of G, {UgA }g∈G and {UgB }g∈G . The representations used will be clear from the context.
Figure 15.1: Heuristic description of G-covariant operations. The channel E is G-covariant if for
every choice of group element g ∈ G, the blue and purple pathways yield the same outcome.
G-Covariant Measurements
The result obtained from a quantum measurement, often referred to as the classical outcome,
provides a form of information known as “speakable information.” This type of information
can be effectively communicated between parties who do not share a common reference
frame. Let’s consider an example where Alice and Bob do not have a shared Cartesian
reference frame. Suppose Alice performs a measurement on the spin of an electron in the z-
direction relative to her reference frame and obtains an outcome of “up” (indicating that the
electron’s spin is pointing in the positive z-direction). Alice can then transmit this outcome
to Bob, allowing him to determine that the electron’s spin is aligned with the positive z-
direction in relation to Alice’s frame. Therefore, even though the specific information about
the z-direction itself cannot be conveyed, the measurement outcome, i.e., the “up”/“down”
information, can be effectively communicated between the parties involved.
Consequently, we make the assumption that the group G associated with the resource
theory of asymmetry has a trivial action on classical systems that represent measurement
outcomes. Moving forward, in Section 3.5.10, we observed that a general quantum mea-
surement can be characterized by a quantum instrument denoted as E ∈ CPTP(A → BX),
where X represents the classical outcome of the measurement. We refer to E as a G-covariant
quantum instrument if it satisfies the condition:
E A→BX ◦ UgA→A = UgB→B ◦ E A→BX ∀g∈G. (15.10)
The collection of all such G-covariant quantum instruments is denoted by COVG (A → BX).
Every quantum instrument E A→BX as discussed above can be expressed as
X
E A→BX = ExA→B ⊗ |x⟩⟨x|X (15.11)
x∈[m]
where m ∈ N, and each Ex ∈ CP(A → B). If E A→BX is G-covariant the relation above in
conjunction with (15.10) implies that for all x ∈ [m] we have
ExA→B ◦ UgA→A = UgB→B ◦ ExA→B ∀g∈G. (15.12)
In other words, the quantum instrument E A→BX is G-covariant if and only if each CP map
ExA→B is G-covariant.
A special type of G-covariant quantum instrument is a G-covariant POVM. We get G-
covariant POVM by taking above B to be the trivial system (i.e., |B| = 1) so that for each
x ∈ [m] and every ρ ∈ L(A), ExA→B (ρA ) = Tr[ΛA A
x ρ ] for some Λx ∈ Eff(A) and the set
A
{Λx }x∈[m] is a POVM. Now, for a trivial system B, the condition given in (15.12) becomes
equivalent to
Tr[Λx ρ] = Tr[Λx Ug (ρ)] = Tr[Ug∗ (Λx ) ρ] ∀ g ∈ G. (15.13)
Since the condition above holds for all ρ ∈ L(A) we must have Λx = Ug∗ (Λx ) for all g ∈ G
and x ∈ [m]. In other words, a POVM {Λx }x∈[m] is G-covariant if and only if each element
Λx is G-invariant; i.e., each Λx satisfies [Λx , Ug ] = 0 for all g ∈ G.
The averaging CPTP map is known as the G-twirling map (see Sec. C.4). If the group G is
finite, the integral is replaced by a discrete sum over the |G| elements of the group, that is,
1
P
G(ρ) = |G| g∈G Ug (ρ).
The free states in this QRT have a very particular structure. First, note that ρ ∈ F(A)
if and only if it is G-invariant, meaning that Ug (ρ) = ρ for all g. In particular, G(ρ) = ρ
for all ρ ∈ F(A). Combining this with the definition of F(A) implies that G-twirling is a
resource-destroying map (see Definition 9.3.3). Additionally, one can characterize the free
states using techniques from representation theory. In particular, Theorem C.3.3 states that
ρ ∈ D(A) is free if and only if ρA has the following form:
M
ρA = uBλ ⊗ ρC where ρC
A A A
λ := TrBλ Π ρ Π . (15.15)
λ λ λ λ
λ
λ∈Irr(U )
Moreover, note that the above expression implies the following corollary.
The above corollary demonstrates that the G-twirling operation eliminates any correla-
tions among distinct irreducible representations. For example, let us consider the case where
G = U(1). As this group is Abelian, it has only one-dimensional irreps (i.e., |Bλ | = 1). The
irreps of U(1) are labeled by integers λ = k ∈ Z, and the k-th irrep uk : U(1) 7→ C is of the
form:
uk (θ) = eikθ ∀θ ∈ U(1) . (15.17)
In this context, we will consider an infinite dimensional (separable) Hilbert space denoted
by A with basis vectors |n⟩ where n belongs to the set of integers Z. The “number” operator
which generates the U(1) symmetry can be defined as follows:
X
N̂ := n|n⟩⟨n| , (15.18)
n∈Z
Note that we allow negative values of n and work with the representation θ 7→ eiN̂ θ .
For each irrep on a single copy of A, the multiplicity space is trivial (i.e., |Cλ | = 1), and
the G-twirling operation can be easily represented as
X
G(·) = |k⟩⟨k|(·)|k⟩⟨k| , (15.19)
k∈Z
which means that G is the completely dephasing channel with respect to the basis |k⟩k∈Z in
this case.
However, when considering ℓ copies of A, the multiplicity space of a given irrep is usually
not trivial, and as a result, the G-twirling operation is not equivalent to the dephasing
channel. Specifically, let N̂x be the number operators associated with system Ax for each
x ∈ [ℓ]. Consider the unitary representation on An = (A1 , . . . , Aℓ ) defined by
O X
θ 7→ eiN̂x θ = eiN̂tot θ , where N̂tot := N̂x . (15.20)
x∈[ℓ] x∈[ℓ]
In this case, the irreps are denoted by the eigenvalues n ∈ Z of the total number operator
N̂ tot. While the representation space Bn (i.e., Bλ with λ = n) is trivial (i.e., one dimensional)
for every irrep λ = n, the multiplicity space Cn is not. Let
X
Π(ℓ)
n := |k1 ⟩⟨k1 | ⊗ · · · ⊗ |kℓ ⟩⟨kℓ | (15.21)
k1 +···+kℓ =n
k1 ,...,kℓ ∈Z
be the projection onto the eigenspace of N̂tot corresponding to the eigenvalue n. Using this
notation, the G-twirling operation can be expressed as
X
Gℓ (·) = Π(ℓ) (ℓ)
n (·)Πn . (15.22)
n∈Z
(ℓ)
Note that Πn is the projection onto the multiplicity space Cn . This space is often referred
to as the decoherence-free subspace, as any pure state ψ ∈ Pure(Cn ) is U (1)-invariant, i.e.,
G(ψ) = ψ. For example, if ℓ = 3, any linear combination of |011⟩, |101⟩, and |110⟩ is an
eigenvector of N̂tot corresponding to an eigenvalue of 2. Therefore, the coherence of any state
in the span of these three vectors remains unaffected by the G-twirling operation.
The G-twirling operation can be applied to quantum channels as well. Suppose Alice
applies a quantum operation E ∈ L(A → A) to her system. If Bob knows the relation
between their reference frames, then he can describe the operation relative to his own system
as Ug ◦ E ◦ Ug∗ , where Ug is a unitary operator that relates Alice’s and Bob’s frames. However,
in the absence of a shared reference frame, Bob cannot use this description. Instead, the
channel E appears to him as a mixture of the form G dg Ug ◦ E ◦ Ug† .RIn order for Alice and
R
Bob to have the same description of the channel, the condition E = G dg Ug ◦ E ◦ Ug† must
be satisfied. This integral is a type of twirling operation applied to the channel E.
Exercise 15.2.1. Show that the G-twirling map is unital and idempotent; i.e. G ◦ G = G.
For compact Lie groups the G-twirling is defined in terms of an integral over the group.
From Carathéodory theorem (see Theorem A.3.2) it follows that the G-twirling can be
expressed as a finite convex combination of unitary channels Ug . To see why, for every
g ∈ G let |ψgAÃ ⟩ := UgA ⊗ I Ã ΩAÃ , and let C be the convex hull of the set ψgAÃ g∈G . Note
that C ⊂ R, where R is a subspace of Herm(AÃ) given by
n o
R := ΛAÃ ∈ Herm(AÃ) : ΛA ∝ I A , ΛÃ ∝ I Ã , (15.23)
Show that
G1⊗k ◦ Gk = G1⊗k . (15.28)
Exercise 15.2.4. Let ρ ∈ D(A) and α ∈ R+ . Show that if ρ is G-invariant then also
ρα /Tr[ρα ]. Hint: Use (15.15).
Exercise 15.2.5. Let g 7→ Ug be a projective unitary representation of a finite or compact
(λ)
Lie group G. For each λ ∈ Irr(U ), let Ug be the reduction of Ug to the space Bλ as given
in (C.45). Show that for every ρ ∈ L(Bλ ) we have
Z
dgUg(λ) ρBλ Ug∗(λ) = Tr ρBλ I Bλ .
(15.29)
ZG (15.31)
g ′ := hgh−1 −−−−→ = dg ′ p(hg ′ h−1 )Ug′ ρUg∗′
ZG
p is a class function→ = dg ′ p(g ′ )Ug′ ρUg∗′ = Gp (ρ) .
G
Since ρ ∈ L(A) was arbitrary we conclude that Uh∗ ◦ Gp ◦ Uh = Gp for all h ∈ G. This
completes the proof.
In the definition of the weighted G-twirling we assumed that p(g) is an arbitrary prob-
ability density over G. However, as we saw earlier, thanks to Carathéodory’s theorem, the
G-twirling can be expressed as a finite convex combination of unitary channels of the form
Ug (·)Ug∗ . The same arguments can be applied to Gp , so we can assume, without loss of
generality, that p is a discrete probability distribution, i.e., p ∈ Prob(d) for some d ⩽ m4 ,
where m := |A|. Hence, the weighted G-twirling takes the form:
X
Gp (·) = px Ugx (·)Ug∗x . (15.32)
x∈[d]
Exercise 15.2.6. Give an example of a group G, a probability distribution p(g) ̸= δ(g), and
a state ρ ∈ D(A) such that Gp (ρ) = ρ but ρ ̸∈ INVG (A).
UgE ∼
M
= Ug(λ) ⊗ I Cλ , (15.34)
λ∈Irr(U E )
(λ) (λ)
where each Ug acts irreducibly on Bλ . We will denote by um′ m (g) the m′ m-component of
(λ)
Ug , and use the indices λ, m, and x to label the basis of E as given in (C.46). The index
x corresponds to the multiplicity index.
Definition 15.2.1. Let g 7→ UgE be as above, and let g 7→ UgA and g 7→ UgB be two
additional projective unitary representations of G. We say that a set of operators
{Kλ,m,x }λ,m,x ⊂ L(A, B) is an irreducible tensor operator with respect to the three
projective unitary representations on systems A, B, and E, if its elements are
orthonormal and satisfy for all λ, m, and x,
X (λ)
UgB Kλ,m,x Ug∗A = um′ m (g)Kλ,m′ ,x ∀g∈G, (15.35)
m′
(λ) (λ)
where um′ m (g) is m′ m component of the matrix Ug that appear in (15.34).
Remark. The orthonormality condition for an irreducible tensor operator is defined in terms
of the Hilbert-Schmidt inner product. Specifically, we make the assumption that:
The condition stated in Equation (15.35) imposes constraints not only on the elements
of the irreducible tensor operator, but also on the three representations UgA , UgB , and UgE .
This can be illustrated by the following exercise, where it is shown that the cocycle of the
map g 7→ UgE is entirely determined by the cocycles of g 7→ UgA and g 7→ UgB .
7 UgA and g 7→ UgB have cocycles
Exercise 15.2.7. Suppose the representations g →
n A o n B o
iθ (g,h) iθ (g,h)
e and e , (15.37)
g,h∈G g,h∈G
It is important to note that if A = B in the definition given above, then the exercise
shows that the representation g 7→ UgE is non-projective. Additionally, we emphasize that
the irreps λ ∈ Irr(U E ) used in the definition of the irreducible tensor operator may not
necessarily be the same irreps that appear in the decompositions of UgA or UgB . Therefore,
the dimension of the system E = span{|λ, m, x⟩E } depends on the irreps λ ∈ Irr(U E ) that
(λ) (λ)
appear in the decomposition of UgE . Specifically, the components umm′ (g) of Ug appear in
the decomposition of UgE and not in the decompositions of UgA or UgB .
Proof. We first prove that the channel given in (15.39) is G-covariant. Indeed, for any
ρ ∈ L(A) we have
X ∗
UgB ◦ E ◦ Ug∗A (ρ) = UgB Kλ,m,x Ug∗A ρ UgB Kλ,m,x Ug∗A
λ,m,x
X (λ) (λ) ∗
(15.35)→ = ukm (g)ūk′ m (g)Kλ,k,x ρKλ,k ′ ,x
(15.40)
λ,m,x,k,k′
X
∗
Ug(λ) is a unitary matrix→ = δkk′ Kλ,k,x ρKλ,k ′ ,x = E(ρ) .
λ,x,k,k′
Hence, E is G-covariant.
Conversely, suppose E is a G-covariant quantum channel. Let {Kx }x∈[n] ∈ L(A, B) be a
canonical Kraus decomposition of E (see Corollary 3.4.2). Since E is G-covariant it follows
that for all g ∈ G and ρ ∈ L(A)
X ∗
E(ρ) = UgB ◦ E ◦ UgA∗ (ρ) = UgB Kx Ug∗A ρ UgB Kx Ug∗A .
(15.41)
x∈[n]
Therefore, the set {UgB Kx Ug∗A }x∈[n] also form a canonical Kraus decomposition of E. Now,
recall from Sec. 3.4.4 that every two operator sum representations of E that have the same
number of elements are related by a unitary
matrix. Therefore, for any g ∈ G there exists
E
an n × n unitary matrix Ug = uxz (g) ∈ L(E) with n := |E| such that for all x ∈ [n], we
have X
UgB Kx Ug∗A = uzx (g)Kz . (15.42)
z∈[n]
Furthermore, since {Kz }z∈[n] are linearly independent (as they are orthonormal in the Hilbert-
Schmidt inner product), it follows that for every g ∈ G, there is a unique UgE that satisfies
the equation (15.42). Additionally, using the notation given in (15.37) for the cocycles, we
get that for all g, h ∈ G
= ei(θ )
B (g,h)−θ A (g,h)
X
(UgE UhE )z′ x Kz′
z ′ ∈[n]
= ei(θ )U E U E .
B (g,h)−θ A (g,h)
E
Ugh g h (15.44)
That is, the mapping g 7→ UgE is a projective unitary representation of the group G. Finally,
using the unitary freedom in the choice of the canonical Kraus decomposition {Kz }z∈[n] of
E (see Exercise 3.4.21), we choose it in such a way that UgE is block-diagonal with respect
L (λ)
to the irreps of G. In this basis, UgE = λ Ug ⊗ I Cλ so we can denote the Kraus operators
by {Kλ,m,x } (with λ the irrep label and x the multiplicity index). This completes the
proof.
Exercise 15.2.8. Extend the theorem above to CP maps that are not necessarily trace pre-
serving. That is, show that E ∈ CP(A → B) is G-covariant if and only if it can be expressed
as in (15.39).
To illustrate the theorem above, we will provide a few examples. Let’s begin with the case
of a covariant unitary channel. As we mentioned earlier, a unitary channel E(·) = V (·)V ∗ ,
where V : A → A is a unitary matrix, is covariant if and only if (15.7) holds for all g ∈ G.
Here, g 7→ ωg is a 1-dimensional unitary representation of G. As we will illustrate now, the
theorem mentioned above can be used to derive the same conclusion.
Indeed, since a unitary channel has only one Kraus operator, the unitary representation
g 7→ UgE must be an irreducible representation (therefore, a single λ), one-dimensional (thus,
a single m), and with no multiplicity (a single x). This implies that |E| = 1, so we have
UgE = ωg for some ωg ∈ C where |ωg | = 1. Let us denote the single Kraus operator of E by
V = Kλ,m,x . In this case, the relation (15.35) can be expressed as follows:
It’s worth noting that if there exists g ∈ G such that ωg ̸= 1 (i.e., V is not G-invariant),
then G(V ) = 0. To see why, consider taking the integral over G (with respect to the Haar
measure) on both sides of the equation above:
Z
G(V ) = cV where c := dg ωg . (15.46)
G
for all g ∈ G, and c = 1 in this case. Therefore, if V is not G-invariant, we must have c = 0,
and consequently, G(V ) = 0.
As another example, let’s consider the group U (1). The Kraus operators of a U (1)
covariant channel can be labeled as Kk,α ∈ L(A), where α is the multiplicity index. Then,
from (15.35) we get that
Note that the irreducible representations of U (1) are one-dimensional. As a result, the
Kraus operators are not mixed with one another under the action of U (1). This provides a
significant simplification compared to the non-Abelian case.
Any Kraus operator Kk,α that satisfies (15.47) must have the form (see Exercise 15.2.9)
where X
Sk := |n + k⟩⟨n| , (15.49)
n∈Z
is the “shift” operator, and Dk,α are diagonal operators in L(A); i.e., ⟨n|Dk,α |n′ ⟩ = 0 for
n ̸= n′ . Note that in the infinite-dimensional Hilbert space A, the shift operator Sk is unitary.
Therefore, in the QRT of U (1)-asymmetry, the set of free unitary operations consists of
diagonal unitaries, shift operators, and combinations of the two.
(k,α)
Exercise 15.2.9. Prove (15.48). Hint: Substitute Kk,α = n,n′ cnn′ |n⟩⟨n′ | in (15.47).
P
Exercise 15.2.10. Let G be the U(1)-twirling map, and Sk the shift operator. Show that Sk
is not U(1)-invariant by showing that G(Sk ) = 0. Still, we emphasize that E(·) = Sk (·)Sk∗ is
U(1)-covariant.
Find the general form of the Kraus operators constituting the operator sum representation of
a Zn -covariant channel with respect to the representation k 7→ Gk .
Remark. If E is G-covariant then in the proof below we will see that the representation g 7→
E T
UgE is given by the induced representation of E. The components of the matrix U g = Ug∗E
E
equals the complex conjugate of the corresponding components of UgE . Recall that g 7→ U g
is also a projective unitary representation (see Exercise C.3.1).
Proof. Suppose E has the form E(ρ) = TrE [V ρV ∗ ], where the isometry V satisfies (15.51).
Using the standard Stinesprings dilation theorem, we know that E ∈ CPTP(A → B).
E E
where U g (·) := U g (·)(UgE )T .
Conversely, suppose E ∈ COVG (A → B). Let {Kλ,m,x } be its canonical covariant Kraus
decomposition as given in Theorem 15.2.2, and define the isometry V : A → BE as
X
V := Kλ,m,x ⊗ |λ, m, x⟩E (15.53)
λ,m,x
where E := span{|λ, m, x⟩E } is the induced space of E decomposed according to the irreps
of G that appear in the induced representation of E. By definition, for all ρ ∈ L(A) we have
X
TrE [V ρV ∗ ] = ∗
Kλ,m,x ρKλ,m,x
λ,m,x (15.54)
(15.39)→ = E(ρ) .
Therefore, it is left to show that V satisfies (15.51). Indeed, taking g 7→ UgE to be the induced
representation of E we get
E
X E
UgB ⊗ U g V Ug∗A = UgB Kλ,m,x Ug∗A ⊗ U g |λ, m, x⟩E
λ,m,x
XX (λ) E
(15.35)→ = um′ m (g)Kλ,m′ ,x ⊗ U g |λ, m, x⟩E
λ,x m,m′
(15.55)
P (λ) XX E T
u ′ (g)|λ,m,x⟩ ′
m
m m
= UgE
T
|λ,m′ ,x⟩
−−−−→ = Kλ,m′ ,x ⊗ Ug UgE |λ, m , x⟩ E
λ,x m′
T
Ug
E
UgE = IE −−−−→ = V .
When considering a channel E ∈ COVG (A → A), a slightly different version of the covari-
ant Stinespring dilation theorem is obtained. Note that in this case, not only is the output
system B replaced with the input system A, but the same projective unitary representation
is also considered on both the input and output systems of E. This enables us to obtain a
covariant Stinespring dilation theorem that involves a G-invariant unitary matrix.
Remark. The matrix W AE in the theorem above is G-invariant with respect to the projective
E
unitary representation g 7→ UgA ⊗ U g , where the representation g 7→ UgE is the induced
representation of E. Moreover, from Theorem C.4.2 it follows that a G-invariant pure state
always exists. To see it, using the same notations as in Theorem C.4.2 we get for all ψ ∈
Pure(E) the vector Π|ψ⟩ is proportional to a G-invariant state.
Proof. The proof that (15.56) implies that E is G-covariant follows similar lines as the ones
appear in the proof of Theorem 15.2.3 (we leave the details to Exercise 15.2.12). For the
converse, if E ∈ COVG (A → A), Theorem 15.2.3 states that there exists an intertwiner
isometry V : A → AE such that
E
UgA ⊗ U g V = V UgA and E(ρ) = TrE [V ρV ∗ ] . (15.57)
Now, let |0⟩ ∈ E be a G-invariant state and define à := {|ψ A ⟩ ⊗ |0⟩E : |ψ⟩ ∈ A}. Clearly
à is a subspace of AE and we define the isometry Ṽ : à → AE via
such as communication and computation, it does impose limitations, reducing their practical
efficiency. This often requires more advanced encodings. Hence, parties may prioritize
allocating communication resources to establish a shared reference frame initially. Later,
they can utilize a standard encoding instead of continuously circumventing its absence with
a relational encoding.
In tasks aimed at establishing a shared reference frame, parties can employ quantum par-
ticles to encode information regarding the relative orientation of their frames. For instance,
spin-1/2 particles, like electrons, can encode the orientation of Cartesian frames, while ex-
changing quantum states of an optical mode can align phase references. Hence, in the realm
of quantum reference frames, which involve quantum particles holding information about a
shared reference frame, the usefulness of a quantum state is determined by the amount of
information that can be extracted from it to establish such a reference frame.
The above discussion illustrates that the resource theory of quantum reference frames
introduces certain aspects that differ from what we have encountered thus far. Specifically,
when Alice and Bob do not share a reference frame, Alice can gain at least partial information
about Bob’s reference frame by receiving a resource in the form of a quantum state that
encodes it. As a result, the set of free operations (i.e., G-covariant operations) needs to
be updated to incorporate this partial information. For instance, instead of using regular
G-twirling operations, weighted G-twirling operations can be employed, taking into account
that the parties have partial knowledge about the element g ∈ G that relates their reference
frames.
We now discuss the general approach to align reference frames making the notions dis-
cussed above rigorous. Consider two parties, Alice and Bob, who doesn’t share a reference
frame with G being the corresponding group describing the reference frame. The goal is for
Bob to learn the element g ∈ G that relates between his reference frame and Alice’s refer-
ence frame. To accomplish that, Alice sends Bob a quantum reference frame (e.g., spin-1/2
particles pointing in the z-direction of her Cartesian reference frame) in a form of a quan-
tum state ρ ∈ D(A). From Bob’s perspective, he received one of the states {Ug ρUg∗ }g∈G , all
occurring with uniform prior.
To determine the specific state he possesses, i.e., to identify the group element g ∈ G,
Bob conducts a POVM, {Λg }g∈G , on his system. Consequently, the probability that Bob
guesses the group element as g ′ , given the actual element is g, is denoted by:
q(g ′ |g) := Tr Λg′ Ug ρUg∗ .
(15.60)
In order to quantify how much information Bob gained after the measurement, consider first
the case that G is a finite group. In this senario, we can use the probability that Bob guess
g correctly as our figure of merit and maximize this function over all states and all POVMs.
For a given state ρ ∈ D(A) and a POVM {Λg }g∈G , this probability is given by:
1 X 1 X
Tr Λg Ug ρUg∗ .
Prguess (ρ, {Λg }g∈G ) := p(g|g) = (15.61)
|G| g∈G |G| g∈G
Thus, Alice and Bob’s objective is to maximize this guessing probability across all possible
ρ (referred to as the fiducial state) and all POVMs {Λg }g∈G .
Conversely, if G is a compact Lie group, the chance of Bob correctly inferring Alice’s
reference frame becomes infinitesimally small. In such instances, the direct likelihood or
guessing probability cannot serve as an effective figure of merit. Instead, the maximum
likelihood of a correct guess is adopted as the figure of merit. This maximum likelihood, akin
to the formula above but integrated over the group, is defined as:
Z Z
dg Tr Λg Ug ρUg∗ ,
µmax := max dg p(g|g) = max (15.62)
G G
with the maximization conducted over all fiducial states ρ ∈ D(A) and all POVMs {Λg }g∈G .
Given that the guessing probability in (15.62) is linear in ρ, the maximal value can always
be achieved with a pure state, allowing us to assume, for simplification, that the fiducial
state ρ = ψ is pure.
L From Theorem C.3.2 it follows that the Hilbert space A can be decomposed as A =
λ∈Irr(U ) Bλ ⊗ Cλ , where for each irrep λ, Bλ denotes the representation space, and Cλ
denotes the multiplicity space. We will denote by dλ := |Bλ | and mλ := |Cλ |. With these
notations we can write any pure state ψ ∈ Pure(A) as
M
|ψ⟩ = cλ |ψλ ⟩ (15.63)
λ∈Irr(U )
where each ψλ ∈ Pure(Bλ Cλ ), and cλ ∈ C with λ∈Irr(U ) |cλ |2 = 1. In the following theorem
P
Theorem 15.2.5. Using the notations above, the maximum likelihood µmax as
defined in (15.62) is given by
X
µmax = dλ nλ . (15.64)
λ∈Irr(U )
Proof. Let ψ and {Λg }g∈G be the optimal state and POVM that maximizes the maximum
likelihood. Expressing ψ as in (15.63), observe that due i Schmidt decomposition of |ψλ ⟩
h to the
there exists an orthogonal projector Πλ such that Tr Πλ = nλ and I Bλ ⊗ ΠC
Cλ Cλ
λ |ψλ ⟩ = |ψλ ⟩.
λ
Substituting the above inequality into (15.62) we get that the maximum likelihood is bounded
from above by: Z
µmax ⩽ max dg Tr [Λg Π]
G
Z X (15.67)
dg Λg = I −−−−→ = Tr[Π] = dλ nλ .
G
λ∈Irr(U )
λ∈Irr(U )
where ΠC Cλ
P
λ := x∈[nλ ] |x⟩⟨x| . Let Π be the projector appearing on the right hand side of
λ
the equation above and observe that it satisfies Π|ψ⟩ = |ψ⟩ and [Ug , Π] = 0. Therefore, the
set {I A − Π} ∪ {Λg }g∈G is a POVM. Moreover, the measurement outcome corresponding to
the element I A − Π occur with probability
Tr I A − Π Ug ψUg∗ = Tr Ug I A − Π ψUg∗ = 0
∀ g ∈ G. (15.70)
Hence µmax ⩾ ν and since we already saw that µmax ⩽ ν we conclude that µmax = ν. This
completes the proof.
⊗n
Here we are interested in the representation of SO(3) on the space An := (C2 ) for
some integer n. We extend here the group of rotations SO(3) to the group SU (2) to allow
for spinor representations. For simplicity, we will assume that n is even, and use some well-
known results from representation theory. Specifically, the representation g 7→ Ug⊗n can be
decomposed into a direct sum of SU (2) irreps, labeled by the total angular momentum j
ranging from 0 to n/2. The decomposition (C.44), for the representation of SU (2) on An ,
has been extensively studied in representation theory, and is given by
n/2
M
n
A = Bj ⊗ Cj (15.72)
j=0
Moreover, in this case the optimal state (15.68) that achieves the maximum likelihood above
is given by
n/2−1
1 X √
|ψ⟩ = √ (2j + 1)|Φj ⟩ + n + 1|n/2, n/2⟩ . (15.75)
µmax j=0
where
j
1 X
|Φj ⟩ := √ |j, m⟩Bj |ϕC
m⟩
j
(15.76)
2j + 1 m=−j
C
and {|ϕmj ⟩}m∈{−j,...,j} is an orthonormal set of vectors in Cj .
As a specific example, suppose n = 2. In this case we have two irreps, corresponding to
total angular momentum j = 0 and j = 1. In this case |Cj | = 1 for both j = 0 and j = 1,
and the representation space B0 = span{|0, 0⟩} is one dimensional spanned by the singlet
state
1
|0, 0⟩B0 := |ΨA Ã
− ⟩ := √ (|01⟩ − |10⟩) (15.77)
2
whereas B1 is three dimensional spanned by the triplet states |1, 1⟩B1 := |0⟩A |0⟩A , |1, 0⟩B1 =
|ΨA Ã A A
+ ⟩, and |1, −1⟩ := |1⟩ |1⟩ . Therefore, the formula above implies to the two 1/2-spin
particle state
1 √ 1 √
|ψ⟩ = |0, 0⟩B0 + 3|1, 1⟩B1 = |ΨA−
Ã
⟩ + 3|11⟩AÃ
(15.78)
2 2
achieves the largest maximum likelihood µmax = 4 as defined in (15.62). It is worth pointing
out that the state above is not unique, and replacing |1, 1⟩B1 with any other normalized state
in B1 would still give the maximum likelihood µmax = 4.
Exercise 15.2.14. Let φ ∈ Pure(B1 ) be a pure state in the triplet space of two spin-1/2
particles. Show that there exists a POVM {Λg }g∈G such that the state
1 √ B Z
B0
dg Tr Λg Ug ρUg∗ = 4 .
|ψ⟩ = |0, 0⟩ + 3|φ ⟩ 1
satisfies (15.79)
2 G
2. Relative Entropies of Asymmetry: Measures that are derived from the general
framework of resource theories using different choices of relative entropies.
While some measures, like the relative entropy of asymmetry, are derived from the gen-
eral framework of quantum resource theories, they have certain drawbacks, such as not being
additive under tensor products and having zero regularized versions. To overcome these lim-
itations, we introduce a new technique to construct measures of asymmetry that involve
taking derivatives of quantum divergences. We refer to these measures as derivatives of
asymmetry. The concept of derivatives of asymmetry encompasses significant measures like
the quantum Fisher information and the Wigner-Yanase-Dyson skew information. These
measures play pivotal roles in fields like quantum metrology, where precision and sensitivity
are paramount. By exploring the derivatives of asymmetry, we can gain a deeper under-
standing of asymmetry in quantum systems and their applications in diverse fields beyond
quantum information.
In order to quantify how much information Bob gained after the measurement, let X be the
random variable corresponding to the element g ∈ G that relates between Alice and Bob’s
reference, and let Y be the random variable associated with Bob’s measurement outcome
g ′ ∈ G. With these notations, any measure of conditional uncertainty can be used to
quantify the uncertainty of X given that Bob has access to Y . Let S(X|Y )q , with q :=
{q(g ′ |g)}g,g′ ∈G denotes the probability distribution, be some measure of conditional certainty
such as the negative of the conditional entropy H(X|Y )q . Then, a measure of quantum
frameness associated with S(X|Y )q is defined for all ρ ∈ D(A) as
where the maximum is over all POVMs that Bob can perform on his system. In other words,
Bob chooses a POVM that maximizes his certainty about X. We say that F is a measure
of quantum reference frame only if it can be written in this way.
Proof. Let ρ ∈ D(A) and {Γg }g∈G ⊂ Eff(A). Let N ∈ COVG (A → B) and observe that
since N is G-covariant we get that
Therefore,
F N (ρ) = max
∗
S(X|Y )q ⩽ max S(X|Y )q = F(ρ) , (15.83)
{N (Γg )} {Λg }
where the first maximum is over all POVMs of the form {N ∗ (Γg )}g∈G which is a subset of
all possible POVMs {Λg }g∈G . This completes the proof.
Note that in the proof above we did not need to use any of properties of the function
S(X|Y )q . However, since S(X|Y )q measures the conditional certainty of X given that Bob
has access to Y , measures of quantum frameness has additional properties. In Chapter 7
we saw that all measures of conditional uncertainty has to behaves monotonically under
conditional majorization. However, in Chapter 7 we only considered finite dimensional,
discrete probability distributions. Therefore, for finite groups in which X and Y are discrete
random variables with |X| = |Y | = |G|, the function S(X|Y )q must behaves monotonically
under conditional majorization. We called such functions in Sec. 4.6.5 conditionally Schur
convex functions.
The extension of conditional majorization to continuous probability distributions is a
complex and currently unresolved issue in the field. However, various functions, such as the
family of conditional R’enyi entropies (including the conditional von-Neumann entropy), can
be utilized to measure the conditional entropy of continuous distributions. For the purpose
of our discussion here, we only need to focus on one common property shared by all such
functions that quantify conditional uncertainty: their invariance under the action of the
group, both from the left and from the right.
Let’s recall that S(X|Y )q represents a function of the conditional probability distribu-
tion q(g ′ |g). Now, suppose Bob rotates his reference frame by an element h ∈ G, causing
the corresponding element g ∈ G relating his frame to Alice’s frame to change to h−1 g.
Consequently, the outcome of the measurement g ′ transforms to h−1 g ′ under this change
in Bob’s reference frame. Since such a transformation should not affect Bob’s uncertainty
about g, we deduce that both distributions q(g ′ |g) and r(g ′ |g) := q(h−1 g ′ |h−1 g) represent the
same conditional uncertainty. Hence, any function S(X|Y )p that quantifies Bob’s certainty
about X must be left-invariant, meaning that S(X|Y )q = S(X|Y )r holds for all conditional
distributions q and all h ∈ G.
Similarly, let’s consider the scenario where Bob changes his reference frame such that
g 7→ gh and g ′ 7→ g ′ h. As before, such a transformation should not affect Bob’s uncertainty
about g. Consequently, both distributions q(g ′ |g) and r(g ′ |g) := q(g ′ h|gh) represent the same
conditional uncertainty. Thus, any function S(X|Y )q that quantifies Bob’s certainty about
X must also be right-invariant, meaning that S(X|Y )q = S(X|Y )r holds for all conditional
distributions q and all h ∈ G.
Many of the functions S(X|Y )q are not linear in q which makes the optimization in (15.81)
very difficult for such choices. We therefore focus here on measure of conditional certainty
that are linear in q. We start with the maximum likelihood that we already encountered in
Sec. 15.2.3
where the maximum is over all POVMs {Λg }g∈G ⊂ Eff(A). Observe that µmax := maxρ∈D(A) µ(ρ)
is the maximum likelihood for Bob’s correct guess of Alice’s reference frame.
In the theorem below we will see that the maximum likelihood µ(ρ) can be expresses
in terms of the max relative entropy. Among other things, this result demonstrates that
the function µ(ρ) behaves monotonically under G-covariant operations and therefore can be
used to define a measure for asymmetry. To be more precise, since for ρ ∈ INVG (A) we have
µ(ρ) = 1, the function ρ 7→ µ(ρ) − 1 is a measure of asymmetry (in fact, it can be shown to
be an asymmetry monotone, see the exercise below).
Theorem 15.3.2. Using the same notations as above, for any ρ ∈ D(A)
so that Z
dg Tr Λg Ug ρUg∗ = Tr [Λρ] .
(15.88)
G
Conversely, let RΛ ∈ Pos(A) be such that G(Λ) = I A , and define , Λg := Ug ΛUg∗ for every
g ∈ G, so that G dg Λg = G(Λ) = I A . By definition, this POVM {Λg }g∈G satisfies (15.88).
We can therefore express µ(ρ) as the following SDP:
The above optimization problem is an SDP. As such, it has a dual given by (see Sec. A.9)
n o
µ(ρ) = min t ⩾ 0 : tσ ⩾ ρ , σ ∈ INVG (A) . (15.91)
That is,
log µ(ρ) = min Dmax (ρ∥σ) . (15.92)
σ∈INVG (A)
Exercise 15.3.2. Use the duality relations discussed in Sec. A.9 to show that the dual
of (15.90) is given by the expression in (15.91).
Exercise 15.3.3. Show that the maximum likelihood µ(ρ) is an asymmetry monotone.
We now use the expression in the theorem above to compute the maximum likelihood
of a pure state ψ ∈ D(A). For this purpose, we will use the fact that any quantum state
σ ∈ INVG (A) has the form M
σA = I Bλ ⊗ σλCλ , (15.93)
λ∈Irr(U )
for some σλ ∈ Pos(Cλ ). Let sλ := Tr [σλ ] and observe that since σ is normalized we must
have X
dλ s λ = 1 , (15.94)
λ∈Irr(U )
where dλ := |Bλ |. We will also use the fact that any pure state ψ ∈ Pure(A) can be expressed
as M √
|ψ⟩ = pλ |ψλ ⟩ (15.95)
λ∈Irr(U )
where {pλ }λ∈Irr(U ) is a probability distribution, and each ψλ ∈ Pure(Bλ Cλ ) is a pure state in
the tensor product of the representation and multiplicity spaces.
Proof. Let t ∈ R+ and σ ∈ D(A) be such that tσ ⩾ ψ. This condition holds if and only if
tI A ⩾ σ −1/2 |ψ⟩⟨ψ|σ −1/2 , (15.97)
(where all inverses are understood as generalized inverses). The above condition holds if and
only if t ⩾ ⟨ψ|σ −1 |ψ⟩. Therefore,
µ(ψ) = min ⟨ψ|σ −1 |ψ⟩ . (15.98)
σ∈INVG (A)
Now, a density matrix σ ∈ INVG (A) if and only if it has the form (15.93). Therefore, the
maximum likelihood of ψ is given by
X
µ(ψ) = min pλ ψλ I Bλ ⊗ σλ−1 ψλ (15.99)
λ∈Irr(U )
where the minimum is over all σλ ∈ Pos(Cλ ) whose traces satisfy (15.94). Using the notations
sλ := Tr [σλ ] and ηλ := s1λ σλ , we split the minimization into two parts: first, we fix the
numbers {sλ }λ∈Irr(U ) and minimize the expression over all ηλ ∈ D(Cλ ), and then we minimize
the resulting expression over all {sλ }λ∈Irr(U ) that satisfy (15.94). That is,
X
µ(ψ) = min pλ s−1
λ min ψλ I Bλ ⊗ ηλ−1 ψλ . (15.100)
{sλ } ηλ ∈D(Cλ )
λ∈Irr(U )
h i
Denote the reduced density matrix of ψλ by ρCλ
λ
:= Tr Bλ ψ Bλ Cλ
λ , and observe that
ψλ I Bλ ⊗ ηλ−1 ψλ = Tr ρλ ηλ−1
√
ρλ √ 2 2 −1
τλ := √ −−−−→ = (Tr [ ρλ ]) Tr τλ ηλ (15.101)
Tr ρλ
√
Definition 6.3.2 with α = 2→ = (Tr [ ρλ ])2 2D2 (τλ ∥ηλ ) .
Therefore, the minimum of the expression above over all η ∈ D(Cλ ) is obtained when η = τλ .
Hence, X √ 2
µ(ψ) = min pλ s−1
λ (Tr [ ρλ ]) . (15.102)
{sλ }
λ∈Irr(U )
For the remaining of the optimization problem, for each λ ∈ Irr(U ) we denote by rλ :=
√ 2
and qλ := 1s sλ , where s := λ∈Irr(U ) sλ . Observe that from (15.94) we get
P
pλ Tr ρλ
that s−1 = λ∈Irr(U ) dλ qλ . Therefore, with these notations we get that
P
X X
µ(ψ) = min dλ qλ rλ qλ−1 . (15.103)
λ∈Irr(U ) λ∈Irr(U )
where the minimum is over all probability distributions {qλ }λ∈Irr(U ) . From the Cauchy-
Schwarz inequality we get that the minimum is given by
X p 2
µ(ψ) = dλ rλ . (15.104)
λ∈Irr(U )
Finally, observe
hp the iexpression inside the log on the right-hand side of the equation above is
given by Tr G(ψ) . Hence, log µ(ψ) is given by (15.96). This completes the proof.
where the maximum is over all POVMs {Λg′ }g′ ∈G ⊂ Eff(A), and q(g ′ |g) := Tr Λg′ Ug ρUg∗ is
the probability of guessing g ′ given that the actual element that relates between the parties’
reference frames is g. Note that by taking f (g ′ , g) = δ(g ′ g −1 ) to be the Dirac delta function
we can get back the maximum likelihood function. We therefore call the function above the
weighted maximum likelihood.
Note that we can write µf (ρ) = max S(X|Y )q as given in (15.81), where q := {q(g ′ |g)}g,g′ ∈G ,
the maximum is over all POVM as above, and S(X|Y )q = Lf (q) is taken to be the linear
functional Z Z
Lf (q) := dg dg ′ f (g, g ′ )q(g ′ |g) . (15.107)
G G
Since the function Lf (q) represents the certainty that Bob has about g, it has to be (see the
discussion above) both left and right invariant. Fix h ∈ G and denote by r(g ′ |g) := q(hg ′ |hg).
Since Lf is left invariant we have Lf (r) = Lf (q) for all conditional distributions p. Since
Z Z Z Z
′ ′ −1 ′ −1
Lf (r) := dg dg f (g, g )q(h g |h g) = dg dg ′ f (hg, hg ′ )q(g ′ |g) , (15.108)
G G G G
As the above condition holds for all conditional probability distributions q(g ′ |g) , we conclude
that f itself is left-invariant. That is,
The left-invariance property of f is consistent with the intuition that the payoff function
should exclusively depend on the relative transformation between the transmitted state,
characterized by the group element g, and the measurement outcome, represented by the
group element g ′ .
Following similar arguments as above, the right-invariance property of Lf implies that
the function f itself is also right-invariant, that is,
The fact that f is both right and left invariant has the following consequences.
First, by taking h = g −1 in (15.110) we get that
That is, f (g, g ′ ) can be viewed as a function of g −1 g ′ . We will denote this function by p so
that f (g, g ′ ) = p(g −1 g ′ ). Now, since f (g, g ′ ) is also right invariant we get that for all h, g ∈ G
we have p(hgh−1 ) = p(g). That is, p is a class function as introduced in Definition C.6.2.
R Since f is non-negative so is p, and consequently, it is natural to normalize p such that
G
dg p(g) = 1. That is, {p(g)}g∈G is a probability distribution over the group. Moreover,
for a function f that is both left and right invariant we have
Z Z
dg ′ p(g −1 g ′ )Tr Λg′ Ug ρUg∗
µf (ρ) = max dg
ZG ZG
dg ′ dh p(h)Tr Λg′ Ug′ Uh∗ ρUh Ug∗′
h := g −1 g ′ −−−−→ = max (15.113)
ZG G
Exercise 15.3.5. Explain why for f that is not left and right invariant, the function µf is
not necessarily a measure of asymmetry.
is a measure of asymmetry. For certain choices of the divergence D, the function above can
be hard to compute. However, for the relative entropy it has a very simple form.
For any α ∈ [0, 2], the α-Rényi relative entropy of asymmetry is defined as
In Exercise 15.2.4 you showed that if ρ is G-invariant then for all α ∈ [0, 2] the state
ρα := ρα /Tr[ρα ] is also G invariant. Therefore, from Theorem 10.105 it follows that the
α-Rényi relative entropy of asymmetry is given by
1
Asyα (ρ) = log ∥G (ρα )∥1/α
α−1 (15.117)
(10.108)→ = H1/α G(ρα ) − Hα (ρ) ,
where Hα is the α-Rényi entropy, and G is the G-twirling map that is also the resource
destroying map of the QRT of asymmetry. The special case of α = 1 is also known as the
G-asymmetry of the state ρ ∈ D(A) and is given by
Asy(ρ) = H G(ρ) − H(ρ) . (15.118)
From Theorem 15.3.2 it follows that the log of the maximum likelihood, log µ(ρ), can
be viewed as the max relative entropy of asymmetry, in which Dα in (15.116) is replace by
Dmax . Since Dmax is the largest relative entropy, the formula in (15.117) with α = 2 can be
used to provide a lower bound for log µ(ρ). Specifically, we have
2
ρ
log µ(ρ) ⩾ H1/2 G − H2 (ρ) . (15.119)
Tr[ρ2 ]
Remarkably, due to Theorem 15.3.3, the inequality above becomes an equality on all pure
states.
Despite the elegant expression above for the G-asymmetry, in general, the G-asymmetry
is not additive under tensor products. In fact, in the following theorem we show that its
regularization is zero!
Theorem 15.3.4. Let G be a finite or compact Lie group and let ρ ∈ D(A). Then,
1
Asy ρ⊗n = 0 .
lim (15.120)
n→∞ n
Remark. The theorem above underscores a notable constraint associated with using the
G-asymmetry as a measure of asymmetry in quantum systems. It signals the necessity
to investigate other measures capable of surmounting this limitation, particularly in the
asymptotic regime where numerous copies of asymmetric states are considered. We will
see that venturing into alternative measures will pave the way for a broader and more
nuanced comprehension of asymmetry’s nature and characteristics when approached from
the perspective of the asymptotic domain.
Proof. According to (15.25) the action of the G-twirling on the state ρ is given by
X
G(ρ) = px Ugx (ρ)Ug∗x , (15.121)
x∈[d]
where d is an integer satisfying d ⩽ m4 (see Exercise 15.2.2). Therefore, from the von-
Neumman property (7.120) we get that
H G(ρ) ⩽ H(ρ) + H(p) ⩽ H(ρ) + log(d) . (15.122)
Thus, combining this with the definition in (15.118) gives Asy(ρ) ⩽ log(d).
Now, fix n ∈ N and consider the action of the G-twirling on ρ⊗n :
Z
⊗n
dg Ug⊗n ρ⊗n Ug⊗n .
Gn ρ := (15.123)
G
Observe that the support of ρ⊗n is a subspace of the symmetric subspace Symn (A). Thus,
we can view ρ⊗n as a positive semidefinite operator acting on Symn (A). Moreover, the map
g 7→ Ug⊗n can also be seen as a projective unitary representation of G on the space Symn (A).
Therefore, if we repeat the same steps that led to the inequality Asy(ρ) ⩽ log(d) but with
ρ⊗n instead of ρ, we obtain:
Asy ρ⊗n ⩽ log(dn ) ,
(15.124)
where dn is an integer no greater than the dimension of Symn (A) to the power four (see (15.26)).
Combining this with the formula (C.166) for the dimension of the symmetric subspace we
arrive at
⊗n
n+m−1
Asy ρ ⩽ 4 log
n (15.125)
(8.87)→ ⩽ 4m log(n + 1) .
Hence,
1 1
Asy ρ⊗n ⩽ 4m lim log(n + 1) = 0 .
lim (15.126)
n→∞ n n→∞ n
where {|x⟩}x∈[m] are the eigenvectors of the number operator N̂ , and each cx ∈ C. We can
express n copies of ψ as
n
X
|ψ ⊗n ⟩ = aj |ϕA
j ⟩ (15.128)
j∈[mn]
n
where aj ∈ C and |ϕAj ⟩ is the eigenvector of the total number operator N̂tot corresponding
to the eigenvalue j, for each j ∈ [mn]. By applying the G-twirling to ψ ⊗n , we obtain
(see (15.22))
n
X
Gn ψ ⊗n = |aj |2 ϕA
j . (15.129)
j∈[mn]
Denoting by p ∈ Prob(mn) with components pj := |aj |2 for each j ∈ [mn], we conclude that
H Gn ψ ⊗n = H(p) ⩽ log(mn) .
(15.130)
Therefore,
1 1
H Gn ψ ⊗n ⩽ lim log(nm) = 0 .
lim (15.131)
n→∞ n n→∞ n
The key observation in this example is that the rank of G (ψ ⊗n ) grows linearly with n.
Proof. We begin by expressing Asyp (ρ) as the mutual information of the state σ XA , defined
as X
σ XA := px |x⟩⟨x|X ⊗ Ugx ρA Ug∗x , (15.133)
x∈[d]
D σ XA σ A ⊗ σ X = H σ X + H σ A − H σ XA ,
(15.134)
where σ A = Gp ρA and H σ X = H(p). From Exercise 7.5.1, we have
X
H σ XA = H(p) + px H Ugx ρA Ug∗x
x (15.135)
−−−−→ = H(p) + H ρA .
H Ugx ρA Ug∗x = H ρA
Asyp ρA = D σ XA σ X ⊗ σ A .
(15.136)
Next, let N ∈ COVG (A → B) and observe that since N A→B ◦ UgA = UgB ◦ N A→B for all
g ∈ G, we have
(15.137)
X
px |x⟩⟨x|X ⊗ UgBx ◦ N A→B ρA .
=
x∈[d]
Therefore,
DPI→ ⩽ D σ XA σ X ⊗ σ A
(15.138)
= Asyp ρA .
Exercise 15.3.6. Show that for every divergence D, all ρ ∈ D(A), and all p ∈ Prob(d) we
have
Ap (ρ) ⩽ Ip (ρ) . (15.142)
Ag (ρ) := D ρ Ug ρUg∗
∀ ρ ∈ D(A) . (15.143)
As we will see shortly that quite often the derivative on the right-hand side yields the constant
zero function. In such cases, DΛ (ρ) is defined in terms of the second derivative as
1 d2 itΛ −itΛ
DΛ (ρ) := D ρ e ρe . (15.145)
2 dt2 t=0
Exercise 15.3.7. Show that the derivative of asymmetry as defined above is a measure
of asymmetry. Hint: Use the fact that for each g ∈ G the function Ag is a measure of
asymmetry.
In order to compute the derivatives above, we will use the expension
1
eitΛ ρe−itΛ = ρ + it[Λ, ρ] − t2 Λ, [Λ, ρ] + O(t3 ) .
(15.146)
2
It will be convenient to use the notations σ := i[Λ, ρ] and η := − 21 Λ, [Λ, ρ] so that
We now use this expansion to compute the derivatives of asymmetry for several examples.
The differential trace distance measures the asymmetry in a state ρ relative to a subgroup
of G associated with a generator Λ. This measure depends on the coherence of ρ over the
eigenspaces of Λ, which is indicated by the non-zero commutator [ρ, Λ]. The question then
arises as to which operator norm should be used to measure the commutator [ρ, Λ] and thus,
the asymmetry of ρ. While the answer to this question may not be immediately apparent,
the above discussion indicated that the trace norm is the most appropriate measure for this
purpose.
Exercise 15.3.8. Let ψ ∈ Pure(A) and Λ ∈ Herm(A). Show that
p
TΛ (ψ) = ⟨ψ|Λ2 |ψ⟩ − ⟨ψ|Λ|ψ⟩2 . (15.149)
That is, on pure states, the differential trace distance of asymmetry reduces to the variance
of the observable Λ.
{|x⟩}x∈[m] is the basis of A consisting of the eigenvectors of ρA ), and make use of the divided
difference approach discussed in Appendix D.1. Particularly, the trace above has the form
given in Corollary D.1.1 with g(t) := tα and f (t) := t1−α . Therefore, the function h(t) as
defined in Corollary D.1.1 is given by
h(t) := g(t)f ′ (t) = 1 − α , (15.151)
so that h(ρ) = (1 − α)I A is a constant function. As such, Tr [h(ρ)σ] = (1 − α)Tr[σ] = 0 and
similarly Tr [h(ρ)η] = (1 − α)Tr[η] = 0. Observe further that since h is a constant function,
Lh (σ) = 0. Substituting all this into Corollary D.1.1 we conclude that
2 1 1 2
log 1 − t Tr [Lf (σ)Lg (σ)] + O(t3 )
Dα ρ ρ + tσ + t η =
α−1 2
2
(15.152)
1 t
= Tr [Lf (σ)Lg (σ)] + O(t3 ) ,
21−α
where the self-adjoint linear maps Lf , Lg ∈ Herm(A → A) are defined by (see Appendix D.1
for more details)
px1−α − py1−α
⟨x|Lf (σ)|y⟩ = ⟨x|σ|y⟩
px − py (15.153)
1−α 1−α
= −i px − py ⟨x|Λ|y⟩ ,
and similarly
⟨y|Lg (σ)|x⟩ = −i pαy − pαx ⟨y|Λ|x⟩ .
(15.154)
Therefore,
X
py1−α − p1−α pαy − pαx |⟨x|Λ|y⟩|2
Tr [Lf (σ)Lg (σ)] = x
x,y∈[m]
X X
=2 px |⟨x|Λ|y⟩|2 − 2 p1−α
x pαy |⟨x|Λ|y⟩|2 (15.155)
x,y∈[m] x,y∈[m]
Hence, for all α ∈ [0, 2], ρ ∈ D(A) and Λ ∈ Herm(A), the differential α-Rényi divergence of
asymmetry is given by
1
Tr ρΛ2 − Tr ρ1−α Λρα Λ .
DΛ,α (ρ) = (15.156)
1−α
The expression in the parenthesis above (i.e. without the factor 1/(1 − α)) is known as the
Wigner-Yanase-Dyson skew information. Note that as the previous example, on pure states
the Wigner-Yanase-Dyson skew information reduces to the variance of Λ.
Exercise 15.3.9. Let ρ = ψ ∈ Pure(A) and Λ ∈ Herm(A). Show that
1
⟨ψ|Λ2 |ψ⟩ − ⟨ψ|Λ|ψ⟩2 .
DΛ,α (ψ) = (15.157)
1−α
Exercise 15.3.10. Show that for α ∈ (0, 1) the function DΛ,α (ρ) is concave in ρ. Hint: Use
Lieb’s Concavity Theorem (see Theorem B.6.1).
Exercise 15.3.11. Show that for α = 1 and ρ ∈ D(A) we have
DΛ (ρ) := lim DΛ,α (ρ) = Tr Λ2 ρ log ρ − Tr [ΛρΛ log ρ] .
(15.158)
α→1
As our third example, we take D = D̃α to be the minimal quantum divergence. In this case,
we use the invariance of every relative entropy under unitary operations to get that
D̃α ρ∥Ug ρUg∗ = D̃α Ug∗ ρUg ∥ρ .
(15.159)
Since Ug∗ ρUg = ρ − tσ + t2 η + O(t3 ) we get that
where
1−α 1−α 1−α 1−α 1−α 1−α
ρ̃ := ρ 2α ρρ 2α = ρ1/α , σ̃ := ρ 2α σρ 2α , and η̃ := ρ 2α ηρ 2α . (15.162)
α
The trace Tr (ρ̃ − tσ̃ + t2 η̃) has the form given in Corollary D.1.1 with g(t) := 1 and
f (t) := tα . Therefore, h(t) := g(t)f ′ (t) = αtα−1 , so that
and similarly Tr [h(ρ̃)η̃] = αTr[η] = 0. Observe further that since g is a constant function,
Lg (σ) = 0. Substituting all this into Corollary D.1.1 we conclude that
α 1
ρ̃ − tσ̃ + t2 η̃ = 1 + t2 Tr [σ̃Lh (σ̃)] + O(t3 ) .
Tr (15.164)
2
It will be convenient to denote s := α1 . Since we assume that α ∈ [1/2, ∞] we have that
s ∈ [0, 2]. Working with the eigenbasis of ρ we get for all x, y ∈ [m]
1−α 1−α
⟨x|σ̃|y⟩ = px2α py2α ⟨x|σ|y⟩
(15.165)
s := 1/α, σ := i[Λ, ρ] −−−−→ = ipx(s−1)/2 p(s−1)/2
y (py − px )⟨x|Λ|y⟩ .
Furthermore, since the eigenvalues of ρ̃ are {psx }x∈[m] we get by definition of Lh that
h psy − h (psx ) 1 p1−s
y − px1−s
⟨y|Lh (σ̃)|x⟩ = ⟨y|σ̃|x⟩ = ⟨y|σ̃|x⟩ , (15.166)
psy − psx s psy − psx
p1−s
y − p1−s
x 1 − s 1−2s
lim = p . (15.167)
py →px s s
py − px s x
Note that since the limit py → px of the components in the sum above is zero, we can restrict
the sum above to all x, y ∈ [m] that satisfies px ̸= py . Hence, we conclude that
1 X ps−1
x − ps−1
y
D̃Λ,s (ρ) := D̃Λ,α (ρ) = (px − py )2 |⟨x|Λ|y⟩|2 . (15.169)
2(s − 1) px − psy
s
x,y∈[m]
px ̸=py
If ρ is given by the pure state ψ = |1⟩⟨1| ∈ Pure(A) then px = δ1x for all x ∈ [m]. In this
case, for all s ∈ [0, 2]
m m
1 X 1 X
D̃Λ,s (ψ) = |⟨ψ|Λ|y⟩|2 + |⟨x|Λ|ψ⟩|2
s − 1 y=2 s − 1 x=2
m
2 X
= |⟨x|Λ|ψ⟩|2
s − 1 x=2 (15.170)
m 2
ψA Λ I A − ψ Λ ψA A
−−−−→ =
X
|x⟩⟨x|A = I A − ψ A
x=2 s−1
2
⟨ψ|Λ2 |ψ⟩ − ⟨ψ|Λ|ψ⟩2 .
=
s−1
The Fisher information is a measure of asymmetry that is obtained by setting s = 2 in
the family of asymmetry monotones given in equation (15.169), which yields:
X (px − py )2
FΛ (ρ) := 4D̃Λ,2 (ρ) = 2 |⟨x|Λ|y⟩|2 . (15.171)
px + py
x,y∈[m]
The Fisher information is a fundamental concept in statistics and information theory with
numerous applications in quantum metrology and quantum information. It plays a crucial
role in studying the ultimate limits of precision in quantum measurements, commonly re-
ferred to as the quantum Cramér-Rao bound. Moreover, the Fisher information is employed
to measure the distinguishability of quantum states, to characterize the entanglement prop-
erties of multipartite systems, and to devise optimal quantum measurement strategies. In the
field of quantum thermodynamics, it has an operational interpretation as the coherence cost
of preparing a system in a particular state without any restrictions on work consumption.
Exercise 15.3.12. Show that for s = α = 1
X
D̃Λ (ρ) := lim D̃Λ,α (ρ) = (log px − log py )(px − py )|⟨x|Λ|y⟩|2 . (15.172)
α→1
x,y∈[m]
2. Two pure states ψ, ϕ ∈ Pure(A) are called unitarily G-equivalent if there exists
a G-invariant unitary matrix V : A → A such that V |ψ⟩ = |ϕ⟩.
We will also refer to the set of all states σ ∈ D(A) that are G-equivalent to ρ as the
G-equivalence class of ρ. In this subsection, our focus is on characterizing the G equivalence
class of a pure state. To achieve this goal, we begin by characterizing unitarily G-equivalent
states. Note that if [V, Ug ] = 0 holds for all g, then [V ∗ , Ug ] = 0 holds for all g as well.
Therefore, we can replace the condition V |ψ⟩ = |ϕ⟩ in the definition of unitarily G-equivalent
states with |ψ⟩ = V |ϕ⟩.
Exercise 15.4.1. Let g 7→ G be a projective unitary representation of G and for every
ρ ∈ D(A) let
SymG (ρ) := {g ∈ G : Ug ρUg∗ = ρ} . (15.175)
1. Show that SymG (ρ) is a subgroup of G.
G−COV
2. Show that if ρ −−−−−→ σ for some ρ, σ ∈ D(A) then SymG (ρ) is a subgroup of SymG (σ).
3. Show that if ρ and σ are G-equivalent then SymG (ρ) = SymG (σ).
Every projective unitary representation, g 7→ UgA , corresponds
L to a decomposition of the
Hilbert space A as given in (C.44). Specifically, A = λ∈Irr(U ) Aλ , where Aλ = Bλ ⊗ Cλ .
Accordingly, every two pure states ψ, ϕ ∈ Pure(A) can be expressed as
X X
|ψ A ⟩ = |ψλBλ Cλ ⟩ and |ϕA ⟩ = |ϕBλ
λ Cλ
⟩, (15.176)
λ∈Irr(U ) λ∈Irr(U )
where |ψλ ⟩, |ϕλ ⟩ ∈ Bλ Cλ are subnormalized states in Aλ . In the following theorem we show
that ψ A and ϕA are unitarily G equivalent if the marginals of ϕBλ
λ Cλ
and ϕB
λ
λ Cλ
on Bλ are the
same. In addition, the theorem characterizes states that are unitarily G-equivalent in terms
of their characteristic functions. In Sec. C.7, we discuss various properties of characteristic
functions, and we encourage readers who are unfamiliar with this material to read Sec.C.7
before proceeding to the theorem below.
Proof. The implication 1 ⇒ 2: Suppose that ψ and ϕ are unitarily G-equivalent. Then there
exists a G-invariant unitary matrix V : A → A such that |ϕ⟩ = V |ψ⟩. Since V is G-invariant,
after multiplying both sides by Ug from the left we get Ug |ϕ⟩ = Ug V |ψ⟩ = V Ug |ψ⟩.
The implication 2 ⇒ 3: For all g ∈ G we have
⟨ψ|Ug |ψ⟩ = ⟨ψ|V ∗ V Ug |ψ⟩
V |ψ⟩ = |ϕ⟩ −−−−→ = ⟨ϕ|V Ug |ψ⟩ (15.177)
V Ug |ψ⟩ = Ug |ϕ⟩ −−−−→ = ⟨ϕ|Ug |ϕ⟩ .
The implication 3 ⇒ 4: Since we assume that χψ (g) = χϕ (g) for all g ∈ G, we get from
Theorem C.7.1 h i Z
Bλ Cλ
TrCλ ψλ = |Bλ | dg χψ (g −1 )Ug(λ)
ZG
χψ (g −1 ) = χϕ (g −1 ) −−−−→ = |Bλ | dg χϕ (g −1 )Ug(λ) (15.178)
G
h i
(C.129)→ = TrCλ ϕB λ
λ Cλ
.
The implication 4 ⇒ 1: Since for each λ ∈ Irr(U ) the states ψλBλ Cλ and ϕBλ
λ Cλ
have the
same marginal on representation space Bλ , there exists a unitary matrix Vλ : Cλ → Cλ such
that
I Bλ ⊗ VλCλ ψλBλ Cλ = ϕB λ
λ Cλ
. (15.179)
Let V : A → A be the unitary matrix
M
V := I Bλ ⊗ VλCλ . (15.180)
λ∈Irr(U )
Then, by definition, |ϕA ⟩ = V |ψ A ⟩, and since for each g ∈ G the unitary matrix Ug has the
form (C.45) we get that [V, Ug ] = 0. Hence, ψ A and ϕA are unitarily G-equivalent. This
completes the proof.
Exercise 15.4.2. Let ψ ∈ Pure(A) and ϕ ∈ Pure(B), where |B| = ̸ |A|, and consider two
projective unitary representations g 7→ UgA and g 7→ UgB in A and B, respectively. Show that
if χψ (g) = χϕ (g) for all g ∈ G then ψ and ϕ are G-equivalent.
The above theorem characterizes unitarily G-equivalent states. However, from a resource
theory perspective, two states belong to the same resource equivalence class if they are G-
equivalent (not necessarily unitarily). Our next theorem characterize this G-equivalence
class, assuming that the states involved are G-regular.
Definition 15.4.2. Let ψ, ϕ ∈ Pure(A) and G be a group. We say that ψ and ϕ are
G-regular with respect to a representation g 7→ Ug if one of the following two
conditions holds:
1. The group G is finite and there is no g ∈ G such that χψ (g) = χϕ (g) = 0; that
is, the functions χψ , χϕ : G → C cannot take the zero value simultaneously.
2. The group G is a compact Lie group and there is no open set (other than the
trivial one) C of G for which χψ (g) = χϕ (g) = 0 for all g ∈ C.
It is worth pointing out that every connected compact Lie group G satisfies the second
condition above. In fact, if G is connected, for any state ψ ∈ Pure(A), there cannot be an
open neighbourhood C of G for which χψ (g) = 0 for all g ∈ C. To see why, by contradiction,
suppose that χψ (g) = 0 for all g ∈ C. Since the function χψ : G → C is analytic, the identity
theorem in complex analysis implies that χψ is the zero function, which contradicts the fact
that χψ (e) = 1 for the identity element e ∈ G.
Exercise 15.4.3. Let G be a compact Lie group and let ψ, ϕ ∈ Pure(A) be such that for all
g ∈ G there exists elements h, h′ ∈ H of a connected subgroup H of G for which |χψ (hgh′ )| +
|χϕ (hgh′ )| =
̸ 0. Show that ψ and ϕ are G-regular.
To clarify the notion of G-regular states, let’s consider the group O(2) of 2 × 2 real
orthogonal matrices. This group is a compact Lie group, but it is not connected because
matrices with determinant one are not continuously connected to matrices with determinant
minus one. Let H := SO(2) be the subgroup of O(2) consisting of all the elements of O(2)
with determinant one. The question we want to answer is: Is there a state ψ ∈ Pure(C2 )
such that χψ (g) = 0 for all g ̸∈ H?
question, wefirst observe that all the matrices g ∈ O(2) with det(g) = −1
To answer this
cos θ sin θ
have the form for some θ ∈ [0, 2π]. Therefore, χψ (g) = 0 for all g ̸∈ H if
sin θ − cos θ
and only if
cos θ sin θ
ψ ψ =0 ∀ θ ∈ [0, 2π] . (15.181)
sin θ − cos θ
The only pure state that satisfies the above equation is |ψ⟩ = √12 (|0⟩ + i|1⟩). Therefore, in
this example, the second condition in Definition 15.4.2 is satisfied except in the case where
|ψ⟩ = |ϕ⟩ = √12 (|0⟩ + i|1⟩).
Remark. We will see in the proof below that for any finite or compact (not necessarily
connected) Lie group G, if (15.182) holds, then ϕ and ψ are G-equivalent. Therefore, we
only need the assumption that ψ and ϕ are G-regular for the converse part. In Sec. D.6
of the appendix we provide additional observations for the case that ψ and ϕ are not G-
regular. Moreover, it is worth noting that semi-simple compact Lie groups, such as SU (2),
do not have any non-trivial 1-dimensional representation. Therefore, it follows from the
theorem above and the preceding theorem that for such groups, the following statements are
all equivalent:
1. ψ and ϕ are G-equivalent.
UgE := φE iθg E
1 + e φ2 , (15.183)
For the converse part of the proof, suppose there exists a G-covariant channel mapping
ψ to ϕ and another G-covariant channel that maps ϕ to ψ. From the covariant version of
Stinespring delation theorem (see Theorem 15.2.3) there exists two isometries V1 : A → AE
and V2 : A → AẼ, each satisfying (15.51) for all g ∈ G, and with the property that
for some φ1 , φ2 ∈ Pure(E). Since, V1 and V2 satisfy (15.51) for all g ∈ G, the two equations
above imply that for all g ∈ G
First, if for all g ∈ G χψ (g) ̸= 0 and/or χϕ (g) ̸= 0 then χφ1 (g)χφ2 (g) = 1. Since the absolute
value of characteristic functions cannot exceed one, it follows that |χφ1 (g)| = |χφ2 (g)| = 1 for
all g ∈ G. Therefore, from Lemma C.7.1 we get that the states φ1 and φ2 are G-invariant
in this case. Second, suppose G is a compact Lie group and suppose by contradiction
that there exists g ∈ G such that χφ1 (g)χφ2 (g) ̸= 1. Then, from the continuity of the
characteristic function, there exists a neighbourhood C ⊂ G of g such that for all g ′ ∈ C
we have χφ1 (g ′ )χφ2 (g ′ ) ̸= 1. From (15.188) it then follows that χψ (g ′ ) = χϕ (g ′ ) = 0 for all
g ′ ∈ C in contradiction with the assumption that ψ and ϕ are G-regular. Therefore, also in
this case χφ1 (g)χφ2 (g) = 1 for all g ∈ G, so that φ1 and φ2 are G-invariant.
To summarize, in both cases we can express the characteristic functions of φ1 and φ2 as
χφ1 (g) = eiθg and χφ2 (g) = e−iθg , where g 7→ eiθg is a 1-dimensional unitary representations
of G (see Exercise C.7.1). Substituting this into (15.187) completes the proof.
where {|n⟩}n∈Z is the eigenbasis of the number operator. The characteristic function of ψ̃ is
given by X
χψ̃ (θ) = ψ̃ eiθN̂ ψ̃ = |λn |2 eiθn . (15.190)
n∈[m]
Since the characteristic function of ψ̃ depends only on the absolute values of the coefficients
{λn }n∈[m] , we get from Theorem 15.4.1 (particularly, the equivalence of the first and third
where pψ , pϕ : Z → [0, 1] are the probability distributions associated with ψ and ϕ, respec-
tively. Using the Fourier transform (see Exercise 15.4.4) we get that the above condition can
be expressed as
pψ (n) = pϕ (n + k) . (15.193)
As a specific example, observe that the states |ψ⟩ = √1 (|0⟩ + |1⟩) and |ϕ⟩ = √1 (|1⟩ + |2⟩) are
2 2
G-equivalent since in this case pψ (n) = pϕ (n − 1).
Exercise 15.4.4. Show that the condition in (15.192) is equivalent to one in (15.193). Hint:
Apply a Fourier transform on both sides of (15.192).
Remark. If χϕ (g) ̸= 0 for all g ∈ G then the theorem above states in this case that ψ can
be converted to ϕ by symmetric operations if and only if χψ (g)/χϕ (g) is a positive definite
function over G.
G−COV
Proof. Suppose first that ψ −−−−−→ ϕ. In the derivation of the relation in (15.187), using
G−COV
the covariant Stinespring dilation theorem we showed that the condition ψ −−−−−→ ϕ implies
that there exists a pure state φ ∈ Pure(A) such that
Hence, taking f (g) := χφ (g) we get χψ (g) = χϕ (g)f (g). Finally, observe that the character-
istic function χφ : GC is a normalized positive definite function over G (see Theorem C.8.1).
Conversely, suppose χψ (g) = χϕ (g)f (g) for some positive definite function f . Since
for g = e we get f (e) = χψ (e)/χϕ (e) = 1 the function f is normalized so that according
to Theorem C.8.1 it corresponds to some characteristic function f (g) = ⟨φ|UgE |φ⟩, where
g 7→ UgE is some unitary representation of G on some Hilbert space E. Moreover, there
exists a G-invariant state |0⟩ ∈ E whose characteristic function is constant and equal to
one for all group elements. Therefore, from the relation χψ (g) = χϕ (g)f (g) we get that
the states |ψ⟩A |0⟩E and |ϕ⟩A |φ⟩E have the same characteristic function. Therefore, there
exists a G-invariant unitary V : AE → AE such that V |ψ⟩ |0⟩ = |ϕ⟩A |φ⟩E . Taking
A E
the trace over E on both sides demonstrates that ψ can be converted to ϕ by a G-covariant
channel.
Exercise 15.4.5. Consider two states ψ, ϕ ∈ Pure(A) and suppose ψ has the property that
G−COV
χψ (g) = 0 for all g ∈ G such that g ̸= e. Show that ψ −−−−−→ ϕ. In other words, ψ with
such a property is a maximal resource state.
Note that the above unitary representation of Zn composed of a direct sum of its irreps, each
occurring with multiplicity one. Pn−1 √
We would like to find the conditions under which the quantum pure state |ψ⟩ = x=0 px |x⟩
Pn−1 √
can be converted to another pure state |ϕ⟩ := x=0 qx |x⟩ by Zn -covariant operations. Ob-
serve that the characteristic function of |ψ⟩ is given for any x ∈ Zn by
X 2πyx
χψ (x) = ⟨ψ|Ux |ψ⟩ = py e i n . (15.196)
y∈Zn
Similarly, χϕ (x) can be expressed as above with qy replacing py . The above equation demon-
strates that the characteristic function is nothing but the discrete Fourier transform of the
sequence {p0 , . . . , pn−1 }.
The theorem above implies that ψ can be converted to ϕ by Zn -covariant operations if
and only if the function x 7→ χψ (x)/χϕ (x) is a positive definite function over Zn . From
Exercise C.8.2 we have that a function f : Zn → C is positive definite if and only if its
Zn −COV
(discrete) Fourier transform is positive. We therefore conclude that ψ −− −−−→ ϕ if and only
if
X χψ (x) 2πxy
ei n ⩾ 0 ∀ y ∈ Zn . (15.197)
x∈Z
χ ϕ (x)
n
To illustrate the condition above, we consider now the case n = 2. For n = 2 the condition
above gives for y ∈ Z2 = {0, 1}
χψ (0) χψ (1) p0 − p1
0⩽ + (−1)y = 1 + (−1)y . (15.198)
χϕ (0) χϕ (1) q0 − q1
|p0 − p1 |
⩽1 (15.199)
|q0 − q1 |
which is equivalent to
max{p0 , p1 } ⩽ max{q0 , q1 } . (15.200)
2 Z −COV
The condition we obtained for the case n = 2 can be expressed also as ψ −− −−−→ ϕ if
and only if q ≻ p where p := (p0 , p1 )T and q := (q0 , q1 )T . More generally, for arbitrary
Z2 −COV
integer n ∈ N we have that if ψ −− −−−→ ϕ then necessarily q ≻ p. To see why, observe that
the relation χψ (x) = χϕ (x)f (x) implies that f (0) = 1 and f (x) itself can be expressed as a
Fourier series
i2πzx
X
f (x) = rz e n (15.201)
z∈Zn
where rz ∈ R. Since f is positive definition over Zn , we must have that rz ⩾ 0 for all z ∈ Zn .
Since f (0) = 1 we conclude that {rz }z∈Zn is a probability distribution. Substituting the
above expression for f (x) into the relation χψ (x) = χϕ (x)f (x) gives
X 2πyx X 2π(z+w)x
py e i n = qw rz ei n . (15.202)
y∈Zn w,z∈Zn
Next, observe that the equation above can be expressed simply as p = Qr, where Q is an
n × n matrix whose (y, z) component is qy−z . Hence, assuming Q is invertible we conclude
Zn −COV
that ψ −− −−−→ ϕ if and only if Q−1 p ⩾ 0, where the inequality is entry-wise. In order to
avoid the computation of Q−1 we can also use the Cramer’s rule as we discuss now.
The matrix Q as defined above is known as a circulant matrix. The eigenvalues of such
matrices are given by the discrete Fourier transforms. Specifically, the x-th eigenvalue of Q
is given by X 2πyx
λx (Q) = χϕ (x) = py e i n ∀ x ∈ Zn . (15.206)
y∈Zn
The matrix Q is also doubly stochastic so its determinant is in the interval [0, 1]. Therefore,
as long as χϕ (x) ̸= 0 for all x ∈ Zn we have det(Q) > 0. Next, for any x ∈ Zn let Qx be the
matrix obtained from Q by replacing the x-th column with the column (p0 , p1 , . . . , pn−1 )T .
Zn −COV
Then, assuming det(Q) > 0 we get from Cramer’s rule that ψ −− −−−→ ϕ if and only if
det(Qx ) ⩾ 0 for all x ∈ Zn .
As a specific example, consider the case n = 3. For this case the matrix Q has the form
q q q
0 2 1
Q = q1 q0 q2 . (15.207)
q2 q1 q0
Observe that det(Q) ⩾ 0 with equality if and only if q0 = q1 = q2 . That is, if |ϕ⟩ ̸=
Z3 −COV
√1 (|0⟩ + |1⟩ + |2⟩) then det(Q) > 0. Hence, ψ −−−−−→ ϕ if and only if the following three
3
conditions hold:
det(Q0 ) = p0 (q02 − q1 q2 ) + p1 (q12 − q0 q2 ) + p2 (q22 − q0 q1 ) ⩾ 0
det(Q1 ) = p0 (q22 − q0 q1 ) + p1 (q02 − q1 q2 ) + p2 (q12 − q0 q2 ) ⩾ 0 (15.208)
det(Q2 ) = p0 (q12 − q0 q2 ) + p1 (q22 − q0 q1 ) + p2 (q02 − q1 q2 ) ⩾ 0 .
n Z −COV
Exercise 15.4.6. Show that for every ψ ∈ Pure(Cn ) we have Φ −− −−−→ ψ, where |Φ⟩ :=
n−1
√1
P
n x=0 |x⟩. In other words, Φ is a state with maximal Zn -asymmetry.
P3 √ √
px |x⟩ and |ϕ⟩ = 3x=0 qx |x⟩.
P
Exercise 15.4.7. Consider the case n = 3, and let |ψ⟩ = x=0
3 Z −COV
1. Show that if q1 = q2 then ψ −−−−−→ ϕ if and only if q ≻ p.
2. Show that for p = (5/12, 7/24, 7/24)T and q = (5/12, 1/3, 1/4)T it is not possible to
convert ψ to ϕ by Z3 -covariant operations even though q ≻ p.
15.4.3 Catalysis
In every resource theory, if the state ψ cannot be deterministically transformed into the
state ϕ using the limited set of operations, the use of a catalyst provides a potential solution.
As we explored in earlier chapters, a catalyst refers to an additional system that is initially
prepared in a state not compatible with the constraints of the resource theory but must be
restored to its original state at the conclusion of the process. An illustrative example can be
found in the resource theory of entanglement, as discussed in Sec. 12.2.3, where we observed
that certain conversions between states are prohibited under LOCC. However, by employing
LOCC alongside a suitable catalyst, such conversions become achievable.
This notion of catalysis vividly demonstrates the significant variations encountered within
the resource theory of asymmetry, contingent upon the choice of groups involved. Specifically,
we will demonstrate that a catalyst holds no utility for a connected compact Lie group,
whereas for a finite group, a catalyst always exists.
G−COV
ψ A ⊗ φC −−−−−→ ϕA ⊗ φC . (15.209)
Proof. Suppose first that G is a finite group, and let g 7→ UgC be the regular representation
of G on the space C := C|G| = span{|g⟩ : g ∈ G} (see Sec. C.6). Fix an element h ∈ G and
let |φC ⟩ := |h⟩C . By the definition of the regular representation, we have that χφ (g) = δe,g , so
that (15.210) holds trivially. Therefore, for any h ∈ G, the state |φC ⟩ = |h⟩ satisfies (15.209).
We next prove that if G is a connected compact Lie group then the relation (15.209)
never holds. Suppose by contradiction that (15.209) does hold. Then, from Theorem 15.4.3
there exists a positive-definite function f : G → C such that χψ⊗φ (g) = χϕ⊗φ (g)f (g) for
As discussed below Definition 15.4.2, since G is a connected compact Lie group, there exists
a neighbourhood, C, around the identity element of the group such that χφ (g) ̸= 0 for all
elements g ∈ C. Combining this with the equation above gives
However, since the functions χψ , χϕ , and f , are all analytic, the identity theorem in complex
analysis implies that the equality above holds for all g ∈ G. Hence, from Theorem 15.4.3
G−COV
we get that ψ −−−−−→ ϕ in contradiction with the asumption of the theorem that ψ cannot
be converted to ϕ by G-covariant operations. Hence, the relation (15.209) cannot hold if G
is a connected compact Lie group.
The existence of a catalyst for finite groups is a consequence of the fact that for finite
groups, it is possible to completely overcome the lack of a shared reference frame by sending
a single resource from Alice to Bob. In the proof presented above, the state |φC ⟩ := |h⟩C
serves as an “ultimate” resource that removes the restriction to G-covariant operations. To
understand why, let’s revisit the guessing probability given in (15.61).
Taking ρ = φC with the regular representation Ug |h⟩⟨h|Ug∗ = |gh⟩⟨gh| yields
1 X
Prguess (ρ, {Λg }g∈G ) = Tr [Λg |gh⟩⟨gh|] . (15.212)
|G| g∈G
Given a pure state ψ ∈ Pure(A) we want to find the conditions under which the conversion
G−COV
ψ A −−−−−→ σ AX is posible.
G−COV
Theorem 15.4.5. Using the same notations as above, ψ A −−−−−→ σ AX if and only if
there exists normalized positive-definite and continuous (in the case of Lie group)
functions fx : G → C such that
X
χψ (g) = px fx (g)χϕx (g) . (15.214)
x∈[n]
Proof. From the covariant version of Stinespring dilation theorem, E ∈ COVG (A → AX)
if and only if there exists a system E, a projective unitary representation g 7→ UgE , and an
intertwiner isometry V : A → AXE such that for all η ∈ L(A) we have E(η) = TrE (V ηV ∗ ).
G−COV
Therefore, ψ A −−−−−→ σ AX if and only if there exists an intertwiner isometry V : A → AXE
such that
σ AX = E A→AX ψ A = TrE V ψ A V ∗ .
(15.215)
We first assume that such a covariant channel E A→AX exists, and prove the relation (15.214).
Indeed, the equation above implies that V |ψ A ⟩ is a purification of σ AX and therefore have
the form X√
V |ψ A ⟩ = px |ϕA X E
x ⟩|x⟩ |φx ⟩ , (15.216)
x∈[m]
for some orthonormal set {|φE x ⟩}x∈[m] in E. Since G acts trivially on system X, and since V
is an intertwiner we get that
V UgA |ψ A ⟩ = UgB ⊗ I X ⊗ UgE V |ψ A ⟩
X√
px UgA |ϕA X E E (15.217)
= x ⟩ ⊗ |x⟩ ⊗ Ug |φx ⟩
x∈[m]
Finally, taking the inner product between the two states in (15.216) and (15.217) gives
X
χψ (g) = px χϕx (g)χφx (g) . (15.218)
x∈[m]
Since fx (g) := χφx (g) is a positive-definitive function (see Theorem C.8.1) we get that (15.214)
holds.
Conversely, suppose (15.214) holds. From Theorem C.8.1 fx can be expressed as the
characteristic function of some state φE x . Without loss of generality we can assume that the
E′
states {|φx ⟩}x∈[m] are orthonormal since otherwise we can replace each |φE
E E
x ⟩ with |φx ⟩|x⟩ ,
where E ′ is another ancillary system upon which the group G acts trivially (so that |φE x⟩
E E′
and |φx ⟩|x⟩ have the same characteristic function). With this in mind, let
X√
|ϕAXE ⟩ := px |ϕA X E
x ⟩|x⟩ |φx ⟩ . (15.219)
x∈[m]
Then, from (15.214) we get that χψ (g) = χϕ (g) for all g ∈ G. Moreover, there exists
a G-invariant state |0⟩ ∈ XE whose characteristic function is constant and equal to one
for all group elements. Therefore, from the relation χψ (g) = χϕ (g) we get that the states
|ψ A ⟩|0⟩XE and |ϕAXE ⟩ have the same characteristic function. Therefore, there exists a G-
invariant unitary V : AXE → AXE such that V |ψ A ⟩|0⟩XE = |ϕAXE ⟩. Taking the trace
over E on both sides demonstrates that ψ A can be converted to σ AX by a G-covariant
channel. This completes the proof.
Exercise 15.4.8. Prove the following corollary to the theorem above: The conversion
G−COV
ψ A −−−−−→ ϕA can be achieved with probability q if and only if there exists a normalized
positive definition function f : G → C such that χψ (g) − qf (g)χϕ (g) is positive definite.
Applying both sides of this equation to the maximally entangled state |ΩAÃ ⟩ gives
JEAB = UgB ◦ E Ã→B ◦ Ug∗Ã ΩAÃ
(2.91)→ = ŪgA ⊗ UgB ◦ E Ã→B ΩAÃ (15.221)
Therefore, the matrix JEAB is a Choi matrix of a G-covariant channel if and only if
∗
ŪgA ⊗ UgB JEAB ŪgA ⊗ UgB = JEAB
∀ g ∈ G. (15.222)
That is, the Choi matrix JEAB is symmetric with respect to the projective unitary repre-
sentation g 7→ ŪgA ⊗ UgB . In this section, we will denote by G ∈ CPTP(AB → AB) the
G-twirling operation with respect to this representation, so that E is G-covariant if and only
AB AB
if G JE = JE .
With this property we can use Theorem 11.1.1 to get necessary and sufficient condi-
tions for a conversion of one mixed state to another by G-covariant operations. To apply
Theorem 11.1.1 for the case that F(A → B) = COVG (A → B), observe that
Tr η B E A→B ρA = Tr JEAB ρT ⊗ η B
sup sup
E∈COVG (A→B) E∈COVG (A→B)
Tr J AB G AB→AB ρT ⊗ η B
= sup
J∈Pos(AB)
(15.223)
J A =I A
↑
−Hmin (B|A)G (ρT ⊗η)
(7.147)→ = 2 .
G−COV
Therefore, Theorem 11.1.1 implies the following characterization of ρA −−−−−→ σ B .
Corollary 15.5.1. Let ρ ∈ D(A) and σ ∈ D(B). The following are equivalent:
G−COV
1. ρA −−−−−→ σ B .
↑ ↑
2. For all η ∈ D(B) we have Hmin (B|A)G(ρT ⊗η) ⩽ Hmin (B|B̃)G(σT ⊗η) .
While the condition outlined in the corollary holds theoretical significance, it falls short of
offering a practical methodology for assessing whether a quantum state ρA can be transformed
into another state σ B through G-covariant operations. To address this gap, a more applicable
criterion is derived from the condition presented in (11.6). For the context at hand, this
criterion is articulated in a specific format, which we encapsulate as a theorem for clarity
and ease of application.
G−COV
Then, ρA −−−−−→ σ B if and only if f (ρ, σ) ⩾ 0.
Remark. The optimization of the function f (ρ, σ) can be solved efficiently and algorithmically
with an SDP program.
The proof of the theorem above is based on the fact that σ = E(ρ) if and only if for all
Λ ∈ Herm(B) we have Tr[Λσ] = Tr[ΛE(ρ)]. This relation can be expressed as
Tr[Λσ] = Tr JEAB ρT ⊗ ΛB .
(15.225)
In the following exercise you use this to complete the proof.
Exercise 15.5.1. Use (11.6) and (15.223) to prove the theorem above.
G−COV
Exercise 15.5.2. Let ρ ∈ D(A) and σ ∈ D(B). Show that ρA −−−−−→ σ B if and only if
there exists F ∈ CPTP(A → B) such that for all g ∈ G
F Ug (ρ) = Ug (σ) . (15.226)
1 B
σ − E A→B (ρA ) 1
= min Tr [Λ] , (15.228)
2 Λ∈Pos(B)
Λ⩾σ B −E A→B (ρA )
to show that the conversion distance can be expressed as the following SDP:
F
T ρ→
− σ = min Tr [Λ] (15.229)
subject to:
h i
B B AÃ T Ã
1. Λ ⩾ σ − TrA J ρ ⊗I .
2. J A = I A .
4. Λ ∈ Pos(B), J ∈ Pos(AÃ).
where {ax }x∈[m] is the set of distinct eigenvalues of H A and each ΠA x is a projection to the
eigenspace of ax . Without loss of generality, we will assume that a1 < a2 < · · · < am (noting
that they are all distinct, allowing us to arrange {ax }x∈[m] in increasing order). With the
above form of H A , the state ρA is time-translation invariant, if and only if it takes the form:
X
ρ= px ρx , (15.232)
x∈[m]
where p ∈ Prob(m), ρx ∈ D(A), and ρx ρy = 0 for every x ̸= y. We will use the notation
INV(A) to denote the set of states in D(A) that are time-translation invariant.
Exercise 15.6.1. Prove the above form of ρ. Hint: supp(ρx ) ⊆ supp(Πx ).
Consider a quantum channel N ∈ CPTP(A → B), where systems A and B have cor-
responding Hamiltonians H A ∈ Pos(A) and H B ∈ Pos(B). The channel N is said to be
time-translation covariant if for all t ∈ R
A A
B B
N A→B e−iH t ρA eiH t = e−iH t N A→B (ρB )eiH t ∀t∈R. (15.233)
We will use the notation COV(A → B) to denote the set of all time-translation covariant
channels in CPTP(A → B).
In the Choi representation, the property given in (15.233) can be expressed as (see the
relation (15.222))
A B
A B
∗
e−iH̄ t ⊗ eiH t JNAB e−iH̄ t ⊗ eiH t = JNAB ∀t∈R. (15.234)
Note that H̄ A has the same eigenvalues as H A . For our purposes, we can replace H̄ A (in the
equation above) with H A , since it will not make any difference in our analysis. Therefore,
N ∈ COV(A → B) if and only if JNAB commutes with the operator
ξ AB := H A ⊗ I B − I A ⊗ H B . (15.235)
Therefore, the degeneracy of the energy levels of the operator ξ AB will play a key role in the
resource theory of time-translation asymmetry.
This pinching channel, also known as the “twirling channel” (as it is the G-twirling map
A
with respect to the group G = {eiH t }t∈R ), has the property that a state ρ ∈ D(A) is
quasi-classical if and only if PH (ρ) = ρ.
Exercise 15.6.2. Show that the condition PH (ρ) = ρ is equivalent to the condition that ρ
has the form given in (15.232).
Exercise 15.6.3. Let P ∈ CPTP(A → A) and P ′ ∈ CPTP(A′ → A′ ) be the pinching chan-
′
nel associated with the Hamiltonians H A and H A , respectively. Further, let N ∈ CPTP(A →
A′ ).
1. Show that P ′ ◦ N ◦ P ∈ COV(A → A′ ).
3. Show that if the Hamiltonian H A is non-degenerate then PHA→A = ∆A→A , where ∆A→A
is the completely dephasing channel as defined in Sec. 3.5.2.
Covariant channels can also be characterized in terms of the pinching channel. Consider
N ∈ CPTP(A → B) and let Pξ ∈ COV(AB → AB) be the pinching channel associated with
the operator ξ AB given in (15.235). Then, the quantum channel N A→B is time-translation
covariant if and only if its Choi matrix satisfies
This follows from Exercise 3.5.20 and our earlier observation that N ∈ COV(A → B) if and
only if its Choi matrix commutes with ξ AB .
The twirling channel can also be used to quantify time-translation asymmetry. For
example, the relative entropy distance of a quantum state ρ ∈ D(A) to its twirled state P(ρ)
is a time-translation asymmetry (sometimes referred to as coherence) measure given by
C(ρ) := D ρ PH (ρ) = H PH (ρ) − H(ρ) , (15.238)
where D(ρ∥σ) := Tr[ρ log ρ] − Tr[ρ log σ] is the Umegaki relative entropy and H(ρ) :=
−Tr[ρ log ρ] is the von-Neumann entropy. The above function is non-increasing under time-
translation covariant operations, and achieves its maximal value of log d (where d := |A|) on
the maximally coherent state |+⟩ := √1d x∈[d] |x⟩, where {|x⟩}x∈[d] is the energy eigenbasis.
P
We next move to characterize the set COV(A → B) in three different cases that depends on
the level of degeneracy of the Hamiltonians involved.
where {ax } and {by } are the energy eigenvalues of H A and H B , respectively.
ax − ax ′ = b y − b y ′ ⇒ x = x′ and y = y ′ . (15.240)
If the condition above does not hold we say that the Hamiltonians are relatively
degenerate.
Note that if H A and H B are relatively non-degenerate, then each of them is also non-
degenerate. For example, suppose H A is degenerate with ax = ax′ for some x ̸= x′ ∈
[m]. Then, for y = y ′ we get ax − ax′ = 0 = by − by′ even though x ̸= x′ . Therefore,
relative non-degeneracy is a stronger notion than non-degeneracy. In fact, relative non-
degeneracy of H A and H B is equivalent to the non-degeneracy of the operator ξ AB as defined
in (15.235). Moreover, in the generic case in which H A and H B are arbitrary (chosen at
random) the Hamiltonians are relatively non-degenerate. For this case, time-translation
covariant channels have a very simple characterization.
where ∆A→A and ∆B→B are the completely dephasing channels of systems A and B,
respectively. In other words, for physical systems with relatively non-degenerate
Hamiltonians only classical channels are time-translation covariant.
Proof. Since we assume that the Hamiltonians H A and H B are relatively non-degenerate
we get that the joint operator, ξ AB , is non-degenerate. Hence, JNAB is diagonal in the same
eigenbasis {|x⟩A |y⟩B }x∈[m],y∈[n] of ξ AB , so that
∆A→A ⊗ ∆B→B JNAB = JNAB .
(15.242)
The above equation describes the same relation as the one given in (15.241). Hence, N A→B
is a classical channel. This completes the proof.
that is, there are no degeneracies in the nonzero differences of the energy levels of
H A.
Exercise 15.6.4. Show that H A has a non-degenerate Bohr spectrum if and only if all the
non-zero eigenvalues of the operator
ξ AÃ := H A ⊗ I Ã − I A ⊗ H Ã (15.244)
are distinct. In other words, H A has a non-degenerate Bohr spectrum if and only if the zero
eigenvalue of ξ AÃ is the sole eigenvalue with a multiplicity greater than one.
Remark. We will see below that even if the spectrum of the Hamiltonian H A has degeneracies,
any quantum channel N ∈ CPTP(A → A) whose Choi matrix has the form (15.249) is
necessarily time-translation covariant.
Proof. Following the same lines as in Theorem 15.6.1, by replacing H B with H A everywhere,
we get that a quantum channel N ∈ CPTP(A → A) is time-translation covariant if and only
Since H A has a non-degenerate Bohr spectrum, the set {ax − ay } that appear in the sum
above consists of distinct eigenvalues, as we only consider indices x, y ∈ [m] that satisfy
y ̸= x. We therefore conclude that the pinching channel Pξ ∈ CPTP(AÃ → AÃ) associated
with the operator ξ AÃ is given by
X
Pξ (·) = Π(·)Π + Pxy (·)Pxy , (15.246)
x,y∈[m]
x̸=y
where X
Pxy := |xy⟩⟨xy| and Π := |xx⟩⟨xx| . (15.247)
x∈[m]
Observe that Choi matrix JNAÃ satisfies the condition above if and only if ⟨xx′ |JNAÃ |yy ′ ⟩ = 0
unless x = x′ and y = y ′ , or x = y and x′ = y ′ . This completes the proof.
The condition in the theorem is equivalent to the statement that the Choi matrix has
the form X
AÃ AÃ AÃ
JN = py|x |xy⟩⟨xy| + (1 − δxy )qxy |xx⟩⟨yy| , (15.249)
x,y∈[m]
where qxy := xx JNAÃ yy and py|x := ⟨xy|JNAÃ |xy⟩. Observe that by definition px|x = qxx
for all x ∈ [m]. Given that JNAB is the Choi matrix of a quantum channel, it implies
certain properties for the coefficients {py|x }x,y∈[m] and the matrix QN , which consists of the
components qxy . Specifically, the first term in the right hand side of the equation above
corresponds to ΠJNAÃ Π, and the second term is a sum over all Pxy JNAÃ Pxy . Therefore, from
the condition ΠJNAÃ Π ⩾ 0 we get that QN ⩾ 0, where QN is the matrix whose components
are qxy . Similarly, the condition Pxy JNAÃ Pxy ⩾ 0 implies that py|x ⩾ 0. Thus, we conclude
that JNAÃ ⩾ 0 if and only if QN ⩾ 0 and eachPpy|x ⩾ 0. Finally, the remaining condition
JNA = I A implies that for all x ∈ [m] we have y∈[m] py|x = 1. To summarize, the theorem
above implies that N ∈ COV(A → A) if its Choi matrix has the form (15.249), with QN ⩾ 0
and {py|x }x,y∈[m] being a conditional probability distribution.
with components {rxx′ } and {sxx′ }, respectively. In the theorem below we assume that
rxx′ ̸= 0 for all x, x′ ∈ [m], and define the m × m matrix Q, with components
n o
min 1, sxx if x = y
rxx
qxy := sxy (15.251)
rxy
otherwise.
Theorem 15.6.3. Let ρ, σ ∈ D(A) be as in (15.250) with rxx′ ̸= 0 for all x, x′ ∈ [m],
and suppose the Hamiltonian H A has a non-degenerate Bohr spectrum. Then, the
following statements are equivalent:
Remark. We will see in the proof below that the second statement implies the first statement
even if the Hamiltonian H A has a degenerate Bohr spectrum. Moreover, we will see that if
rxy = 0 for some off diagonal terms (i.e. x ̸= y) then sxy must also be zero. However, in this
case, for any x ̸= y ∈ [m] with rxy = 0, the components of qxy can be arbitrary. This means
that in this case the condition becomes cumbersome, as we will need to require that there
exists Q as defined above but with no restriction on the components qxy for which rxy = 0.
Proof. From Theorem 15.6.2 and the preceeding discussion below (15.249), it follows that
there exists N ∈ COV(A → A) such that σ = N (ρ) if and only if there exists a conditional
probability distribution {py|x }x,y∈[m] , and an m×m positive semidefinite matrix Q, such that
h i
σ = N (ρ) = TrA JNAÃ (ρT ⊗ I Ã )
X X
= py|x rxx |y⟩⟨y| + qxy rxy |x⟩⟨y| (15.252)
x,y∈[m] x̸=y
x,y∈[m]
Hence, for the off diagonal terms, sxy = 0 whenever rxy = 0. Since we assume that that
all the off-diagonal terms of ρ are non-zero, i.e. rxy ̸= 0 for x ̸= y, there is no freedom left
in the choice of the off diagonal terms of QN and we must have qxy = srxy xy
. Since QN must
be positive Psemidefinite we will maximize its diagonal terms {px|x }x∈[m] given the constraint
that syy = x∈[m] py|x rxx . This constraint immediately gives syy ⩾ py|y ryy so that we must
have py|y ⩽ sryy
yy
. Clearly, we also have py|y ⩽ 1 so we conclude that
syy
py|y ⩽ min 1, . (15.254)
ryy
where
X 1
µ := (sy − ry )+ = ∥s − r∥1 , (15.256)
2
y∈[m]
Exercise 15.6.7. Show that J AB as given in (15.249) is positive semidefinite if and only if
both py|x ⩾ 0 for all x and y, and Q ⩾ 0.
Exercise 15.6.8. Show that the coefficients {py|x } as defined in (15.255) satisfy
X X
py|x = 1 and sy = py|x rx ∀ y ∈ [m] . (15.257)
y∈[m] x∈[m]
be two qubit states. Without loss of generality suppose that a ⩾ b. In this case the matrix
Q can be expressed as
b w
a z
Q= , (15.259)
w̄
z̄
1
and Q ⩾ 0 if and only if
b w 2
⩾ . (15.260)
a z
COV
Therefore, ρ −−−→ σ if and only if ν(ρ) ⩾ ν(σ), where ν : D(A) → R+ is a measure of qubit
time-translation-asymmetry defined on every density matrix of the form (15.258) as
|z|2
ν(ρ) := . (15.261)
a
p
If ρ is a pure state, so that |z| = a(1 − a), then ν(ρ) ⩾ ν(σ) holds if and only if
|w|2 ⩽ b(1 − a). Note that |w|2 ⩽ b(1 − b) since σ ⩾ 0. Therefore, by taking
|w|2
a ∈ b, 1 − (15.262)
b
we get |w|2 ⩽ b(1 − a) and also a ⩾ b. Hence, for any mixed state σ there exists a pure state
ψ that can be converted to σ.
On the other hand, if σ is pure (i.e. |w|2 = b(1 − b)) and ρ arbitrary qubit, then the
condition in (15.260) becomes
|z|2 ⩾ a(1 − b) . (15.263)
Since ρ ⩾ 0 we also have |z|2 ⩽ b(1 − b). Combining both equation we find that the only way
ρ can be converted to a pure qubit state σ is if b = a (since a ⩾ b was the initial assumption)
and |z|2 = a(1 − a). That is, ρ is a pure state itself, and up to a diagonal unitary equals
to σ. Hence, pure coherence cannot be obtained from mixed coherence, and deterministic
interconversion among inequivalent pure resources is not possible.
The example above shows that there is no unique “golden unit” that can be used as the
ultimate resource in two dimensional systems. Instead, any pure resource (i.e. pure state
that is not an energy eigenstate) is maximal in the sense that there is no other resource that
can be converted into it. However, the set of all pure qubit resources is maximal (i.e. any
mixed state can be reached from some pure state by translation covariant operations). We
now show that this latter property holds in general.
Proof. Observe that the diagonal elements of Q are all 1, and the off-diagonal terms are
given by
σxy
qxy = √ ∀ x, y ∈ [m] , x ̸= y. (15.265)
px p y
Therefore, we can express Q = Dp−1 σDp−1 , where Dp is the diagonal matrix whose diagonal
√ √
is ( p1 , ..., pm ). Since Dp > 0 and σ ⩾ 0 it follows that Q ⩾ 0. This completes the
proof.
Exercise 15.6.9. Show that if ρ and σ are two distinct pure states and both have non-zero
off-diagonal terms (with respect to the energy eigenbasis) then the matrix Q is not positive
semidefinite.
757
758 CHAPTER 16. THE RESOURCE THEORY OF NONUNIFORMITY
′ ′
Note that a completely factorizable channel N A→A has the property that N A→A uA =
′
uA . That is, factorizable channels take maximally mixed states to maximally mixed states.
In particular, if |A| = |A′ | then a completely factorizable channel is unital. However, as we
will see shortly, not all unital channels are completely factorizable.
Proof. Since all the {px }x∈[ℓ] are rational, there exists a common m ∈ N and ℓ
denominatorP
integers {mx }x∈[ℓ] such that px = mmx , and in particular x∈[ℓ] mx = m since x∈[ℓ] px = 1.
P
Set n := |A|, and let B be a system with dimension |B| = m. Define a unitary matrix
U : AB → AB via its action on a basis element |xy⟩ ∈ AB with x ∈ [n] and y ∈ [m] as
(note that ky depends on y). That is, U AB is a controlled unitary that its action on A
depends on the input of system B. Using the notation U AB→AB := U AB (·)U ∗AB get that for
all ω ∈ L(A)
1 X
TrB U AB→AB ω A ⊗ uB = TrB U AB→AB ω A ⊗ |y⟩⟨y|B
m
y∈[m]
(16.5)
1 X A→A A
= Uk y ω ,
m
y∈[m]
where we used the definition of U AB above. Now, observe that from the definition of ky , for
any x ∈ [ℓ] there exists mx values of y ∈ [m] for which ky = x. Therefore, continuing from
the last line above we get
X mx A→A A
TrB U AB→AB ω A ⊗ uB =
U ω
m x
x∈[ℓ]
X (16.6)
= px UxA→A (ω A ) .
x∈[ℓ]
Remark. The limit (16.7) is understood in terms of the Choi matrices. That is, the rela-
tion (16.7) means that
′ ′
lim JNAAk
− JNAA = 0 . (16.8)
k→∞ 1
Note that by definition the set of noisy operations is closed. Moreover, the set of noisy op-
erations in CPTP(A → A) forms a subset of unital channels (see Exercise 16.1.1). However,
it can be shown that not every unital channel is a noisy operation, so that noisy operations
forms a strict subset of unital channels.
Exercise 16.1.2. Use Theorem 16.1.1 and the definition of noisy operations to prove the
theorem above.
σ B = N A→B ρA
σ is diagonal→ = ∆B ◦ N A→B ρA
(16.9)
ρ is diagonal→ = ∆B ◦ N A→B ◦ ∆A ρA .
that is non-increasing under noisy operations and take the value zero on free states. To see
that both definitions are equivalent, observe first that since we consider only diagonal states
(in the same basis) we can replace D(A) above with the classical set Prob(d), where d := |A|.
Due to Corollary 16.3.1, the monotonicity of g under noisy operation is equivalent to
the Schur concavity of g and to the third condition in Def. 5.1.3. The only additional
assumption that we added in Def. 5.1.3 is that g is continuous. This assumption is crucial
for the bijection between divergences and measures of non-uniformity (see Theorem 5.1.3),
and we will assume it also here.
The bijection given in Theorem 5.1.3 demonstrates that all measures of nonuniformity
can be expressed as
g(p) = D p u(d)
∀ d ∈ N ∀ p ∈ Prob(d) . (16.11)
where D is a classical divergence. Therefore, all the divergences and relative entropies that
introduced in Chapters 5 and 6 can be used to quantify nonuniformity. A particular useful
one is the nonuniformity measure obtained by taking D to be the KL-divergence. In this
case, for all p ∈ Prob(n) we have
where H is the Shannon entropy. Similarly, for the Rényi divergences we have for all α ∈
[0, ∞]
gα (p) = Dα p u(d) = log(d) − Hα (p) .
(16.13)
It is worth mentioning that for pure states, specifically when taking p = (1, 0, . . . , 0)T , we
get that gα (p) = log(d). This implies that the nonuniformity of pure states increases with
the dimension d.
Noisy
Theorem 16.3.1. Let ρ, σ ∈ D(A). Then, ρ −−−→ σ if and only if ρ ≻ σ.
Proof. Suppose σ = N (ρ) for some noisy operation N ∈ Noisy(A → A). Since a noisy
operation N ∈ Noisy(A → A) is also a unital channel, from Section 3.5.9 it follows that
ρ ≻ σ. In the same subsection we also proved that ρ ≻ σ if and only if there exist a random
unitary channel that take ρ to σ. From the Theorem 16.1.2, this random unitary is also a
noisy operation, so the proof is concluded.
The theorem above can be slightly modified to accommodate systems of different di-
mensions. In particular, if ρ ∈ D(A) and σ ∈ D(B) then σ B = N A→B (ρA ) for some noisy
operation N ∈ Noisy(A → B) if and only if ρA ⊗ uB ≻ uA ⊗ σ B . This is because appending
a maximally mixed state is a reversible free operation.
From here onward we consider the ‘states’ of the QRT of nonuniformity to be probability
vectors in Prob(d). Therefore, from the theorem and the discussion above it follows that for
two given states p ∈ Prob(d) and q ∈ Prob(d′ ) we have
Noisy ′
p −−−→ q ⇐⇒ p, u(d) ≻ q, u(d ) . (16.14)
That is, conversion under noisy operations induce a pre-order that can be characterized with
relative majorization. Note that if d = d′ this pre-order reduces to the standard definition
of majorization, however, for d ̸= d′ it is not equivalent to majorization between p and q.
In particular, embedding a state, say p ∈ Prob(d), in a higher dimensional space Prob(d′ )
with d′ > d (by adding zero components) can increase the resourcefulness of p. Therefore,
such embeddings are not free.
Exercise 16.3.1. Let q = (1/2, 1/2, 0, 0)T be the vector obtained from the uniform state u(2)
by adding two zeros. Show that q can be converted by noisy operations to any state in D(2).
From the properties of the conversion distance (see for example Lemma 11.1.1), it follows
Noisy
that T (p −−−→ q) remains invariant under any permutation of the components of p or q.
Therefore, in the rest of this chapter we will always assume without loss of generality that
p = p↓ and q = q↓ .
Remark. The case that p ∈ Prob(d) and q ∈ Prob(d′ ) with d ̸= d′ can be solved by applying
′
the theorem above to the vectors p ⊗ u(d ) and u(d) ⊗ q. Specifically,
n o
Noisy (d) (d′ )
T p −−−→ q = max′ u ⊗ q (ℓ) − p ⊗ u . (16.17)
ℓ∈[dd ] (ℓ)
Proof. Since we consider the case that both p and q are d-dimensional, the conversion
distance can be expressed as
Noisy
1
T p −−−→ q = min ∥q − r∥1 : p ≻ r . (16.18)
r∈Prob(d) 2
The above expression for the conversion distance represents the distance of q to the set
majo(p) as defined in (4.96) (with p replacing q). Hence,
Noisy
T p −−−→ q = T q, majo(p)
(16.19)
Theorem 4.2.4→ = max ∥q∥(ℓ) − ∥p∥(ℓ) .
ℓ∈[n]
Theorem 16.3.3. Let ε ∈ (0, 1) and p ∈ Prob↓ (d). The ε-nonuniformity cost of p is
given by
ε
Costε (p) = log d2−Hmin (p) .
(16.22)
Proof. We first prove the theorem for the case ε = 0. In this case,
n o
(m) Noisy
Costε=0 (p) := min log m : e1 −−−→ p . (16.23)
This completes the proof for the case ε = 0. For ε > 0 we use (11.34) to get
Costε (p) = min Costε=0 (p′ )
p′ ∈B
ε (p)
l ′
m
(16.24)→ = ′ min log d2−Hmin (p ) (16.25)
p ∈Bε (p)
ε
= log d2−Hmin (p) ,
ε
where the last line follows from the definition of Hmin (p). This completes the proof.
(m) m
Exercise 16.3.2. Show that the condition (e1 , u(m) ) ≻ (p, u(d) ) is equivalent to p1 ⩽ d
.
Exercise 16.3.3. Let p ∈ Prob↓ (d) and m ∈ [d].
1. Show that m
(m) Noisy
T e1 −−−→ p = fp (16.26)
d
P
where fp (t) := (p
x∈[d] x − t)+ is the function studied at the end of Sec. 4.2.2.
2. Provide a direct proof of the theorem above using the above conversion distance and the
explicit expression given in (4.107) for fp−1 .
3. Show that the conversion distance above can also be expressed as
(m) Noisy
1 m−1
T e1 −−−→ p = p − mu(d) 1 − . (16.27)
2 2
Exercise 16.3.4. Show that the single-shot ε-nonuniformity cost of p is bounded by
ε k
log ∥p∥(k) − ε ⩽ Cost (p) − log(d/k) ⩽ log ∥p∥(k) − ε + (16.28)
d
where k ∈ [d] is the integer satisfying ε ∈ (rk , rk+1 ], where rk is defined in (4.83).
Unlike the case for resource cost, an analogous formula to (11.34) does not exist for resource
distillation. Therefore, the calculation of single-shot
distillablenonuniformity necessitates a
Noisy (m)
direct computation of the conversion distance T p −−−→ e1 . In the following lemma we
provide a closed formula of this conversion distance in terms of the coefficient µm which is
defined for all m ∈ N as
Proof. The case m > d is left as an exercise, and we assume here that m ⩽ d. From the
previous section, the conversion distance can be expressed as
X (m)
Noisy (m)
T p −−−→ e1 = max (e1 ⊗ u(d) )↓j − (u(m) ⊗ p)↓j . (16.32)
k∈[dm]
j∈[k]
(m)
Since the vector e1 ⊗ u(d) has exactly n non-zero components (all equal to 1/d), we get
that the optimizer k above must satisfy k ⩽ d. Moreover, the jth term in the sum above
have the form
1 px
(e1 ⊗ u(d) )↓j − (u(m) ⊗ p)↓j = −
(m)
(16.33)
d m
where x = mj . Since p = p↓ the terms in the equation above are non-decreasing with
d
j.
We therefore conclude that the optimal k in (16.32) must be k = d. Denoting a := m and
b := d − am (hence d = am + b) we get
Noisy (m)
X pa+1
T p −−−→ e1 =1− px + b
m
x∈[a]
(16.34)
d
b = d − am −−−−→ = 1 − ∥p∥(a) − − a pa+1
m
= 1 − µm .
Combining the definition of the ε-single-shot distillable nonuniformity with the lemma
above we obtain the following closed form for Distillε (p).
Theorem 16.3.4. Let ε ∈ (0, 1), p ∈ Prob↓ (d), m ∈ [d], and µm as defined
in (16.30). If p1 > 1 − ε then Distillε (p) := ⌊dp1 /(1 − ε)⌋. Otherwise, the
ε-single-shot distillable nonuniformity is given by
Exercise 16.3.5. Use the closed form in (16.31) to prove Theorem 16.3.4.
Exercise 16.3.6. Show that for ε = 0 the single-shot distillable nonuniformity of p ∈
Prob(d) is given by
Distillε=0 (p) = log(d) − Hmax (p) , (16.36)
where Hmax is the max-entropy given by Hmax (p) := log(k), where k is the number of non-zero
components of p.
The formula in Theorem 16.3.4 is somewhat cumbersome. One can get somewhat simpler
bounds on the single-shot distillable entanglement by removing the floor functions that
appear in the definition of µm . These simpler bounds can be expressed in terms of the formula
for the smoothed max-entropy given in Lemma 10.4.2. Specifically, from Lemma 10.4.2 it
follows that the smoothed max-entropy can be expressed as the logarithm of an integer k
satisfying
∥p∥(k−1) < 1 − ε ⩽ ∥p∥(k) (16.37)
with the convention ∥p∥(0) := 0.
ε
Corollary 16.3.1. Let ε ∈ (0, 1), p ∈ Prob(d), and set k := 2Hmax (p) . Then,
Now, for the lower bound we cannot replace ⌊d/m⌋ with arbitrary integer ℓ ∈ [d] since this
will increase the right-hand side above. Instead, we use the fact that for any s ∈ [ d1 , 1] there
exists a unique m ∈ [d] such that
1 m
s− < ⩽s. (16.43)
d d
Observe further that for any such s ∈ [ d1 , 1] and m ∈ [d], if in addition ∥p∥(⌊s−1 ⌋) ⩾ 1 − ε
then also ∥p∥(⌊d/m⌋) ⩾ 1 − ε since s−1 ⩽ md . Moreover, since such m and s also satisfy
log m ⩾ log(ds − 1) we get that
In the last step, denote by ℓ := ⌊s−1 ⌋ ∈ [d] and use the fact that s−1 ⩽ ℓ + 1 to get s ⩾ ℓ+1
1
.
Substituting this to the right-hand side of the equation above gives
ε d
Distill (p) ⩾ max log − 1 : ∥p∥(ℓ) ⩾ 1 − ε
ℓ∈[d] 1+ℓ
(16.45)
d
= log −1 .
1+k
Remark. Note that the formula for the asymptotic conversion rate demonstrates that the
resource theory of nonuniformity is reversible. Specifically, note that for any p and q as
above, Distill(p → q)Distill(q → p) = 1.
We prove the theorem above by computing separately the nonuniformity cost and the
distillable nonuniformity. Recall from the discussion in Sec. 11.5.1, specifically (11.111), that
the asymptotic cost of a nonuniformity state p ∈ Prob(k) is given by
1
Costε p⊗n .
Cost(p) := lim+ lim inf (16.48)
ε→0 n→∞ n
Therefore, we can use the results from the single-shot case to compute this asymptotic rate.
Lemma 16.4.1. Let p ∈ Prob(d) and ε ∈ (0, 1). Then, the asymptotic
nonuniformity cost of p is given by
1
Costε p⊗n = log(d) − H(p) .
Cost(p) = lim (16.49)
n→∞ n
Proof. From the result in the single-shot case, specifically (16.22), we obtain
1 ε ⊗n
1 l ε
n −Hmin (p⊗n )
m
lim Cost p = lim log d 2
n→∞ n n→∞ n (16.50)
AEP (11.63)→ = log(d) − H(p) .
As before, we can use the results from the single-shot regime to compute this expression.
Lemma 16.4.2. Let p ∈ Prob(d) and ε ∈ (0, 1). Then, the asymptotic distillable
nonuniformity is given by
1
Distill(p) = lim Distillε (p⊗n ) = log(d) − H(p) . (16.52)
n→∞ n
dn
1 ε ⊗n 1
lim sup Distill (p ) ⩽ lim sup log ε (p⊗n ) − 1
n→∞ n n→∞ n 2Hmax (16.53)
AEP (10.171)→ = log(d) − H(p) .
dn
1 ε ⊗n 1
lim inf Distill (p ) ⩾ lim inf log ε (p⊗n )
−1
n→∞ n n→∞ n 1 + 2Hmax (16.54)
AEP (10.171)→ = log(d) − H(p) .
Comparing the two inequalities in the two equations above we conclude that
1
lim Distillε (p⊗n ) = log(d) − H(p) . (16.55)
n→∞ n
Quantum Thermodynamics
Thermodynamics stands as one of the most influential theories in physics, finding applica-
tions across a wide range of disciplines. Initially focused on steam engines, its relevance has
expanded to encompass fields such as biochemistry, nanotechnology, and black hole physics,
among others [83, 26, 61]. Despite its immense success, the foundational aspects of thermo-
dynamics continue to be a subject of controversy. There persists a pervasive confusion re-
garding the relationship between macroscopic and microscopic laws, particularly concerning
reversibility and time-symmetry. Furthermore, there is a lack of consensus on the optimal
formulation of the second law. As early as 1941, Nobel laureate Percy Bridgman noted,
“there are almost as many formulations of the Second Law as there have been discussions of
it,” and unfortunately, little progress has been made in resolving this situation since then.
In recent years, researchers have taken a fresh perspective on these fundamental issues by
approaching thermodynamics as a resource theory. This viewpoint considers a system that
is not in equilibrium with its environment as a valuable resource known as “athermality.”
Athermality serves as the fuel utilized in work extraction, computational erasure operations,
and other thermodynamic tasks.
The resource-theoretic approach to thermodynamics delves into the quantification of
a state’s deviation from equilibrium and explores its utility in quantum thermodynamics.
It also investigates the necessary and sufficient conditions for transforming one state into
another. Within this framework, different notions of state conversion can be examined,
including exact and approximate conversions, single-copy and multiple-copy scenarios, and
conversions with or without the aid of a catalyst.
These quantum-information techniques have brought forth numerous novel insights, par-
ticularly considering the historical importance of information in foundational topics such as
Maxwell’s demon [153], the thermodynamic reversibility of computation [17, 18], Landauer’s
principle regarding the work cost of erasure [145, 137], and Jaynes’s utilization of maximum
entropy principles in deriving statistical mechanics [138, 139].
Furthermore, the resource-theoretic approach to thermodynamics reveals that the con-
ventional formulation of the second law of thermodynamics, which focuses on entropy non-
decrease, is insufficient as a criterion for determining the feasibility of a given state conversion.
771
772 CHAPTER 17. QUANTUM THERMODYNAMICS
However, we will discover that it is possible to identify a set of measures quantifying the
degree of nonequilibrium (including entropy) such that a state conversion is feasible if and
only if all of these measures do not increase.
is the thermal equilibrium state known as the Gibbs state. The Gibbs state, γ B , is also
referred to as the thermal state of the system B, and the normalization factor
h B
i
Z B := Tr e−βH (17.2)
over all ρ ∈ D(A), where λ is a Lagrange multiplier. Let ρ be the optimal density matrix
that minimizes the Lagrangian above. Then, any other state in D(A) can be written as
ρ + tY for some t ∈ R and Y ∈ Herm(A) is a traceless matrix (we can also assume without
loss of generality that ∥Y ∥∞ ⩽ 1 although we will not need it). Since ρ is optimal we must
have for any such Y
d
L(ρ + tY, λ)
0=
dt t=0 (17.4)
Exercise 17.1.2→ = Tr[HY ] + λTr[Y log ρ] .
for all traceless matrices Y ∈ Herm(A), so that H + λ log ρ is orthogonal (in the Hilbert-
Schmidt inner product) to the subspace of all traceless matrices in Herm(A). Consequently,
H + λ log ρ must be proportional to the identity matrix; i.e., there exists c ∈ R such that
H + λ log ρ = cI. Hence, the optimal ρ has the form
c H e−βH
ρ = e λ e− λ = , (17.6)
Z
where in the last equality we denoted by β := λ1 , and used the fact that Tr[ρ] = 1 so that
c
e λ = 1/Tr[e−βH ].
Exercise 17.1.2. Use Corollary D.1.1 to prove the expression for the directional derivative
given in (17.4).
Exercise 17.1.3. Let α ∈ [0, ∞]. Find the state ρα ∈ D(A) that minimizes Tr[H A ρA ] while
keeping the α-Rényi entropy fixed.
between the initial and final states. Therefore, we define the maximal extractable work from
a system in a state ρA as
Wmax ρA = max Tr H A ρA − U ρA U ∗
(17.7)
U
where the maximum is over all unitary matrices U ∈ U(A). Interestingly, the above opti-
mization problem can be solved analytically.
Proof. From a variant of the von-Neumann trace inequality, known as the Ruhe’s Trace
Inequality as given in Theorem B.3.3, it follows that
X
Tr H A U ρA U ∗ ⩾ ax p↓x
x∈[m] (17.10)
A A
= Tr H σρ ,
where we used the lower bound in (B.26) with N := H A and M := U ρA U ∗ . The proof is
then concluded with the observation that there exists a unitary matrix U satisfying U ρA U ∗ =
σρA .
Exercise 17.1.4. Let n, m ∈ N and ρ, σ ∈ D(A).
1. Show that
Wmax (ρ ⊗ σ) ⩾ Wmax (ρ) + Wmax (σ) . (17.11)
2. Show that if n ⩾ m then
1 1
Wmax ρ⊗n ⩾ Wmax ρ⊗m .
(17.12)
n m
In the following lemma, we demonstrate that the maximum extractable work can never
exceed the difference between the energy of the system and the energy of the system at
equilibrium.
Wmax ρA ⩽ Tr H A ρA − Tr H A γ A .
(17.13)
Proof. The Gibbs state γ A is the state with the smallest energy that has an entropy H(ρA ) =
H(σρA ). Therefore, the state σρA has higher energy than γ A so that
Tr H A γ A ⩽ Tr H A σρA .
(17.14)
n n n
where we used the relations Tr H A ρ⊗n = nTr H A ρA and Tr H A γ A = nTr H A γ A .
Let
f (ρ) := min Tr[H A U ρA U ∗ ] , (17.18)
U
Now, let ε > 0 and recall that the number of strongly ε-typical sequences drawn from an
i.i.d.∼ p source scales predominantly as 2nH(X) . Therefore, since H(X) < H(Y ) we get
that for sufficiently small ε > 0 and sufficiently large n we have |Tst n st n
ε (X )| < |Tε (Y )|. In
n n
particular, there exists a one-to-one function πn : [m] → [m] , with the property that for
any xn ∈ Tst n n st n n
ε (X ) we have πn (x ) ∈ Tε (Y ). Define the unitary Un ∈ L(A ) by its action
n n := n n n
on basis elements of A as Un |x ⟩ |πn (x )⟩ for all x ∈ [m] . Since Un is not necessarily
optimal we get
1 n
f reg (ρ) ⩽ lim Tr H A Un ρ⊗n Un∗
n→∞ n
1 X n (17.21)
pxn Tr H A |π(xn )⟩⟨π(xn )| .
= lim
n→∞ n
n n x ∈[m]
where t(xn ) ∈ Type(n, m) is the type of the sequence xn and a := (a1 , . . . , am )T . Substituting
this into the previous equation gives
X
f reg (ρ) ⩽ lim pxn t (πn (xn )) · a
n→∞
xn ∈[m]n
X (17.23)
Exercise 17.1.5→ = lim pxn t (πn (xn )) · a ,
n→∞
xn ∈Tst n
ε (X )
where in the second line we restricted xn to the set of strongly ε-typical sequences. The
theorem of strongly typical sequences ensures that the contribution of non-typical sequences
vanishes in the limit n → ∞ (see Exercise 17.1.5). Since the above inequality holds for all
ε ∈ (0, 1), taking the limit ε → 0+ gives
X
f reg (ρ) ⩽ lim+ lim pxn t (πn (xn )) · a
ε→0 n→∞
xn ∈Tst (X n ) (17.24)
′
εA ′A
= g · a = Tr H γ ,
′
→ g′ as n → ∞. Finally,
where we used the fact that πn (xn )∈ Tε,stn (g ) so that t π n (x n
)
since we proved that f reg (ρ) ⩽ Tr H A γ ′A for any Gibbs state with inverse temperature
β ′ > β it follows that the inequality also hold for β ′ = β. This completes the proof.
Exercise 17.1.5. Prove the relation (17.23). Hint: Use Theorem 8.5.1 in conjunction with
the fact that t (πn (xn )) · a is bounded from above; e.g., t (πn (xn )) · a ⩽ am since t (πn (xn )) is
a probability vector.
A state ρ ∈ D(A) characterized by W reg max(ρ) = 0 is identified as a completely passive
state. Such states are inherently unable to facilitate work extraction, irrespective of their
quantity. As inferred from Exercise 17.1.4, if W reg max(ρ) = 0, then it follows that:
Wmax ρ⊗n = 0
∀n∈N. (17.25)
This insight, derived from Theorem 17.1.1, establishes the Gibbs state as the unique com-
pletely passive state. Consequently, this finding compellingly supports the designation of
the Gibbs state, or thermal state, as the exclusive free state in the domain of quantum
thermodynamics.
Exercise 17.1.6. Give full details why the only state that is completely passive is the Gibbs
state.
A
Proof. By definition, the Gibbs state γ A commutes with U A if and only if e−βH commutes
with U A . Therefore, if [U A , H A ] = 0 then clearly UP A
commutes with γ A . Conversely,
A
suppose U A commutes with e−βH and express H A = x λx Px , where {Px } are orthogonal
projections satisfying Px Py = δxy Px and {λx } are distinct eigenvalues of H A . Then,
A A
X X
e−βλx U Px = U e−βH = e−βH U = e−βλy Py U . (17.26)
x y
Note that in the lemma above, the condition that U A commutes with γ A can be expressed
as
U A→A γ A := U A γ A U ∗A = γ A .
(17.28)
That is, the unitary matrix U A commutes with the Hamiltonian if and only if the unitary
channel U A→A preserves the Gibbs state.
Suppose now that system B is comprised of two subsystems B1 and B2 and that the total
Hamiltonian of system B can be expressed as
H B = H B1 ⊗ I B2 + I B1 ⊗ H B2 . (17.29)
In this case, the Gibbs state γ B of the composite system can be expressed as a tensor product
of the two Gibbs states of the subsystems. Indeed, we have
e−β (H ⊗I +I ⊗H )
B1 B2 B1 B2
B
γ = h i, (17.30)
Tr e−β (H ⊗I +I ⊗H )
B1 B2 B1 B2
and since
e−β (H ) = e−βH B1 ⊗ e−βH B2
B1 ⊗I B2 +I B1 ⊗H B2
(17.31)
B
h B
i
we conclude that γ B = γ B1 ⊗ γ B2 , where for each j = 1, 2, γ Bj := e−βH j /Tr e−βH j is the
Gibbs state of subsystem Bj .
ρA → ρA ⊗ γ B (17.32)
is a free operation, where B is some ancillary system in the Gibbs state γ B and Hamiltonian
H B . The total Hamiltonian of system AB is given by H AB := H A ⊗ I B + I A ⊗ H B . Its
corresponding Gibbs state is given by γ AB := γ A ⊗ γ B . According to the second step above,
any unitary matrix U : AB → AB that commutes with the total Hamiltonian H AB yields a
permissible evolution of the system AB. Combining this with Lemma 17.1.3 we conclude that
a unitary evolution U ∈ CPTP(AB → AB) is free if and only if it preserves the Gibbs state
γ AB . For such a Gibbs preserving unitary channel U AB→AB we get that the transformation
ρA ⊗ γ B → U AB→AB ρA ⊗ γ B
(17.33)
is a thermal operation.
′ ′ ′ ′
Proof. Let γ ABA B := γ AB ⊗ γ A B and let V ∈ CPTP(ABA′ B ′ → ABA′ B ′ ) be the unitary
matrix given by
′ ′ ′ ′
V := U AB→A B ⊗ U ∗A B →AB . (17.37)
′ ′
In the exercise below you show that V preserves the joint Gibbs state γ ABA B . Hence, the
channel
h ′ ′
i h ′ ′ ′ ′ ′ ′
i
TrABB ′ V ω A ⊗ γ BA B = TrABB ′ U AB→A B ω A ⊗ γ B ⊗ U ∗A B →AB γ A B
h i (17.38)
AB→A′ B ′ A B
= TrB ′ U ω ⊗γ
is a thermal operation.
Exercise 17.2.1. Show that the matrix V as defined in the proof above is indeed Gibbs
preserving. Hint: Apply U ∗ to both sides of (17.35) to show that U ∗ is Gibbs preserving.
Exercise 17.2.2. Consider the unitary matrix U : AB → A′ B ′ associated with the unitary
′ ′ ′ ′
channel U AB→A B mentioned in the lemma above (i.e. U AB→A B (·) := U (·)U ∗ . Demonstrate
that the condition (17.35) is satisfied if and only if
′ ′
U H AB = H A B U . (17.39)
Recall that a density matrix ρ ∈ D(A) can be viewed as an athermality state only
when the Hamiltonian or Gibbs state of system A is specified. Similarly, a quantum channel
N ∈ CPTP(A → A′ ) on its own cannot be considered a thermal operation without specifying
the Gibbs state associated with systems A and A′ . We will therefore view a thermal operation
′ ′
as a triple (N A→A , γ A , γ A ), where N ∈ CPTP(A → A′ ), γ A is the input Gibbs state, and
′
γ A is the output Gibbs state. We use this perspective in the following formal definition of
thermal operations.
′
Definition 17.2.1. Let N ∈ CPTP(A → A′ ) and γ A and γ A be two density
′ ′
matrices. The triple (N A→A , γ A , γ A ) is called a thermal operation if there exists a
unitary channel U ∈ CPTP(AB → A′ B ′ ) (with |AB| = |A′ B ′ |), and density matrices
′
γ B and γ B , such that both (17.35) and (17.36) hold.
′ ′
In the following lemma we show that the triple (N A→A , γ A , γ A ) in the definition above
is not independent.
′ ′ ′
Lemma 17.2.2. Let (N A→A , γ A , γ A ) be a thermal operation. Then, N A→A is
Gibbs preserving; i.e.,
′ ′
N A→A (γ A ) = γ A . (17.40)
′ ′
Proof. Observe that since U AB→A B in (17.36) is Gibbs preserving it follows that
′
h ′ ′ i h ′ ′i ′
N A→A (γ A ) = TrB ′ U AB→A B γ AB = TrB ′ γ A B = γ A . (17.41)
Clearly, from their definitions and the lemma above it follows that
Proof. Let {Nx }x∈[m] be a set of m channels in TO(A → A′ ), and consider a convex combi-
nation of these m channels:
′ ′
X
N A→A := px NxA→A (17.44)
x∈[m]
where for each x ∈ [m], Bx and Bx′ are auxiliary thermal bathes, and Ux is a Gibbs preserving
unitary channel. Let
M M M
B := Bx , B ′ := Bx′ and γ B := px γ Bx . (17.46)
x∈[m] x∈[m] x∈[m]
′
Therefore, N A→A is a thermal operation. This completes the proof.
Definition 17.2.2. Let A and A′ be two physical systems. The set of closed thermal
operations, denoted as CTO(A → A′ ), is defined as
Remark. By definition, N ∈ CTO(A → A′ ) if and only if there exists a sequence of thermal
′
operations NnA→A n∈N ⊂ TO(A → A′ ) such that
′ ′
lim NnA→A = N A→A . (17.50)
n→∞
Proof. The proof that 1 ⇒ 2 is left as an exercise, and we prove that 2 ⇒ 1. Let {εk }k∈N
be a sequence of positive numbers with zero limit, and for each k ∈ N, let (ρA A
k , γ ) be an
′ ′
athermality state that can be converted to a state that is εk -close to (σ A , γ A ). That is, for
each k there exists a thermal operation Nk ∈ TO(A → A′ ) with the property that
′ A′
NkA→A ρA
k ≈εk σ . (17.53)
Since the set CPTP(A → A′ ) is compact, there exists a converging subsequence of {Nk }k∈N .
For simplicity of the exposition here, we assume without loss of generality that the sequence
{Nk }k∈N itself is converging (otherwise, we have to replace k with a subsequence {nk }k∈N )
′ A→A ′ A A ′
and set N := limk→∞ Nk . By definition, N ∈ CTO(A → A ) since each Nk ,γ ,γ is
a thermal operation. Moreover, observe that
′ ′ ′
N A→A ρA = lim NkA→A (ρk ) = σ A ,
(17.54)
k→∞
′ ′
where we used (17.53). Hence, ρA , γ A can be converted to σ A , γ A by CTO. This com-
pletes the proof.
Figure 17.2: The conversion of ρ to σ by CTO. For any ε > 0 there exists states ρ̃ and σ̃ that are
ε-close to ρ and σ, respectively, such that ρ̃ can be converted to σ̃ by thermal operation.
Proof. Suppose first that E ∈ TO(A → A′ ), and it has the form (17.36) with AB ∼ = A′ B ′ .
A→A′
To see that E is time-translation covariant, observe that
′
A A
h ′ ′
A A
i
E A→A e−itH ρA eitH = TrB ′ U AB→A B e−itH ρA eitH ⊗ γ B
h i
AB→A′ B ′ A A B B
[γ B , H B ] = 0 −−−−→ = TrB ′ U e−itH ρA eitH ⊗ e−itH γ B eitH (17.56)
h ′ ′
i
= TrB ′ U AB→A B ◦ VtAB→AB ρA ⊗ γ B ,
A B AB
where VtAB := e−itH ⊗e−itH = e−itH , H AB := H A ⊗I B +I A ⊗H B is the total Hamiltonian,
and VtAB→AB := VtAB (·)Vt∗AB . From (17.39) we get that the unitary channel U AB→AB :=
U AB (·)U ∗AB satisfies
′ ′ ′ ′ ′ ′ ′ ′
U AB→A B ◦ VtAB→AB = VtA B →A B ◦ U AB→A B , (17.57)
′ ′ ′ ′ A′ B ′ A′ B ′
where VtA B →A B (·) := e−itH (·)eitH . Combining this with (17.56) we get for any
ρ ∈ D(A)
h ′ ′ ′ ′ i
A→A′ −itH A A itH A A B →A B AB→A′ B ′ A B
E e ρ e = TrB Vt
′ ◦U ρ ⊗γ . (17.58)
′ ′ ′ ′ ′ ′ ′ ′ ′ ′ A′ A′
Finally, observe that VtA B →A B = UtA →A ⊗ UtB →B , where UtA →A (·) := e−itH (·)eitH and
′ ′ B′ B′
UtB →B (·) := e−itH (·)eitH . Substituting this into the equation above gives
′
A A
′ ′
h ′ ′ i
E A→A e−itH ρA eitH = UtA →A TrB ′ U AB→A B ρA ⊗ γ B
(17.59)
A′ ′ A′
= e−itH E A→A ρA eitH .
This completes the proof for E ∈ TO(A → A′ ). The case E ∈ CTO(A → A′ ) follows from the
fact that the limit of time-translation covariant channels is itself time-translation covariant
(see Exercise 17.2.6).
Exercise 17.2.6. Let G be a group, and let {En }n∈N be a sequence of channels in COVG (A →
A′ ) (with respect to some unitary representations of G on A and A′ ). Show that if the limit
E := limn→∞ En exists then also E ∈ COVG (A → A′ ).
′
Definition 17.2.3. Let γ A and γ A be two Gibbs states. A channel
N ∈ GPO(A → A′ ) is called a Gibbs-preserving covariant operation (in short, GPC
operation) if in addition of being Gibbs preserving it is also time-translation
covariant satisfying (17.55). We denote by GPC(A → A′ ) the set of all such GPC
channels in GPO(A → A′ ).
Exercise 17.2.8. Let A, B, A′ , and B ′ be four physical systems with corresponding Hamilto-
′ ′ ′ ′
nians H A , H B , H A , and H B , and let V AB→A B (·) = V (·)V ∗ be a time-translation covariant
isometry channel. Denote by
′
h ′ ′ i
E A→A ω A := TrB ′ V AB→A B ω A ⊗ γ B
∀ ω ∈ L(B) , (17.62)
Z AB
and set t := Z A′ B ′
. Show that the map
′ ′ ′ ′
N A→A ω A = tE A→A ω A + γ A − tE A→A γ A Tr ω A
(17.63)
is a thermal operation (and in particular a quantum channel). Hint: Start with the covariance
AB A′ B ′
property e−βH = V ∗ e−βH V . to get
′ ′
h i h i
AB ∗ −βH A B A′ B ′ ∗ A′ B ′ ′ ′
Z = Tr V V e =Z Tr V V γ ⩽ ZA B , (17.64)
with equality if and only if |AB| = |A′ B ′ | (in which case V is a unitary matrix), and conclude
that ′ ′
A′ γ A − tE A→A γ A
τ := (17.65)
1−t
is a density matrix.
Note that E corresponds to a Gibbs preserving channel. The relation above corresponds
precisely to the definition of relative majorization (see Section 4.3). Therefore, we conclude
that
GPO
(p, g) −−−→ (p′ , g′ ) ⇐⇒ (p, g) ≻ (p′ , g′ ) . (17.67)
Remarkably, the relation above remains unchanged even if we replace the set GPO with
CTO.
Theorem 17.3.1. Let (ρ, γ) and (ρ′ , γ ′ ) be two quasi-classical states of systems A
and A′ , respectively. The following statements are equivalent:
Remark. Note that the theorem above does not state that CTO=GPO, only that they have
the same conversion power. In general, we have CTO⊆GPO since GPO is a closed set of
operations containing thermal operations. Therefore, the implication 1 ⇒ 2 is trivial, and
we only need to prove the direction 2 ⇒ 1.
The proof of the theorem above is technically involved and extensive; it has been deferred
to Appendix D.7. It remains a compelling open challenge to discover a more concise and
straightforward proof for this theorem.
The theorem above, in conjunction with (17.67), implies that interconversions under CTO
can be characterized with relative majorization.
Corollary 17.3.1. Let (p, g) and (p′ , g′ ) be two athermality states (in the
quasi-classical regime) of systems A and A′ , respectively. Then,
CTO
(p, g) −−−→ (p′ , g′ ) ⇐⇒ (p, g) ≻ (p′ , g′ ) . (17.68)
We can therefore apply all the machinery of the theory of (relative) majorization to the
theory of athermality. In particular, one of the immediate consequences of the corollary
above is that in the quasi-classical regime, there exists a bijection between the resource
theory of athermality and the resource theory of nonuniformity. This remarkable connection
between the two theories essentially states that in the quasi-classical regime athermality is
nonuniformity. This equivalence follows from Theorem 4.3.2.
Specifically, suppose (p, g) is an athermality state in the quasi-classical regime, and
suppose that g has only rational components. Then, we can write the components of g as
nx
P
gx = n
with x ∈ [m], nx ∈ N, and n := x∈[m] nx , and we have
m
M
(n)
px u(nx ) .
(p, g) ∼ r, u where r := (17.69)
x=1
That is, there exists an n-dimensional system R in some state r ∈ Prob(n), with trivial
Hamiltonian (i.e. uniform Gibbs states), such that (p, g) ∼ (r, u(n) ). Combining this with
Theorem 17.3.1 we conclude that
CTO CTO
(p, g) −−−→ (r, u(n) ) and (r, u(n) ) −−−→ (p, g) . (17.70)
In other words, (p, g) and (r, u(n) ) corresponds to the same resource, so that the athermality
of (p, g) can be interpreted as the nonuniformity of (r, u(n) ).
Exercise 17.3.1. Let ε > 0 and (p, g) be an athermality state (in the quasi-classical regime).
We do not assume that g has rational components. Show that there exists an n-dimensional
system R with trivial Hamiltonian, and two states r1 , r2 ∈ Prob(n) that satisfies 21 ∥r1 −r2 ∥1 ⩽
ε and
(r1 , u(n) ) ≻ (p, g) ≻ (r2 , u(n) ) . (17.71)
Hint. Use Sec. 4.3.5.
Exercise 17.3.2. Prove that the relation (17.69) implies that for any k ∈ N we also have
⊗k
p⊗k , g⊗k ∼ r⊗k , u(n) . (17.72)
The equivalence between athermality and non-uniformity give rise to the following prop-
erty.
1. For every ε > 0 there exists a thermal catalyst κ := (r, g̃) such that
CTO
(pε , g) ⊗ κ −−−→ (p′ε , g′ ) ⊗ κ , (17.73)
Remark. Very recently (see the notes and references at the end of this section) it was shown
that the theorem above can be strengthened by replacing pA A
ε with p so that (17.73) becomes
CTO
(p, g) ⊗ κ −−−→ (p′ε , g′ ) ⊗ κ . (17.75)
This improvement makes the result somewhat more physical, and furthermore, provides
simple characterization for catalytic majorization (cf. Lemma 4.5.1): (p, g) ≻c (p′ , g′ ) if and
only if for every ε > 0 there exists p′ε ∈ Bε (p′ ) such that (p, g) ≻∗ (p′ε , g′ ). The proof of
this improvement involves techniques not covered in this book and the interested reader can
find the relevant references at the last section of this chapter.
Hence, the equivalence of the two conditions in the theorem follows from Theorem 4.5.1.
This completes the proof.
We have chosen the symbol D to denote a measure of athermality, given that every
normalized quantum divergence D also serves as a measure of athermality. Such measures
of athermality behave monotonically under the larger set of Gibbs-preserving operations.
However, it’s worth noting that not all measures of athermality are quantum divergences, as
they only need to exhibit monotonic behavior under CTO.
Exercise 17.4.1. Show that every normalized quantum divergence is a measure of ather-
mality.
In the quasi-classical regime, athermality measures are applied to pairs of probability vec-
tors, with the stipulation that the second vector remains strictly positive due to the Gibbs
states’ inability to contain zero components (assuming finite energies). Furthermore, as pre-
CTO
viously discussed, two athermality states (p, g) and (p′ , g′ ) satisfy (p, g) −−−→ (p′ , g′ ) if and
only if (p, g) ≻ (p′ , g′ ). Thus, within the quasi-classical framework, the earlier definition
of an athermality measure essentially transforms into the definition of a divergence. This
implies that, in the quasi-classical domain, athermality measures are indeed divergences.
Additionally, the direct correlation between classical divergences and non-uniformity mea-
sures extends to form a bijection between nonuniformity measures and athermality measures,
further intertwining these concepts.
2. J A = I A .
The above optimization problem is an SDP, and consequently has a dual given by (see
Exercise 17.4.2)
′ ′
Gη (ρ, γ) = min Tr σ A − η0A + η1A
(17.82)
∞
′ ′
If the Hamiltonians H A and H A are non-degenerate then the operator ξ AA is non-
degenerate so that Pξ is the completely dephasing channel in the energy eigenbasis. In this
′
case, ω AA is diagonal and therefore we can assume without loss of generality that also η0
and η1 are diagonal. For this case, for every choice of η we have
where ∆ ∈ CPTP(A → A) is the energy dephasing channel. Therefore, for such a choice of
system A′ , Gη depends only on the diagonal elements of ρ.
Exercise 17.4.2. Express the optimization problem in (17.80) as a conic linear program-
ming of the form (A.57) (i.e., as a dual problem) and then use the primal problem A.52 to
obtain (17.82).
Exercise 17.4.3. Show that if A = A′ and H A has a non-degenerate Bohr spectrum, then
without loss of generality we can assume that η1 is diagonal in the energy eigenbasis (i.e.,
Gη depends only on the diagonal elements of η1 ).
′ ′ ′
Exercise 17.4.4. Let η0 , η1 ∈ Pos(A′ ), ρ, γ ∈ D(A), and ω̃ AA := γ A ⊗ η0A + η1A .
1. Show that
↑ ′ ′
Hmin (A′ |A)ω̃ = Hmin (A′ )ω = − log η0A + η1A . (17.86)
∞
is a measure of athermality.
Exercise 17.4.5. Show that for the case F = GPO, the athermality monotones Gη (ρ, γ) are
given as in (17.84), but with
′ ′ ′
ω AA := ρT ⊗ η0A + γ A ⊗ η1A . (17.88)
state. Free energy is denoted by the symbol “F” and for an athermality state (ρ, γ) of system
A the free energy is defined as the energy available to do useful work and is given by:
h i
F (ρ) := Tr ρĤ − T H(ρ) (17.89)
where T is the temperature, and we added here the ‘hat’ symbol to the Hamiltonian Ĥ of
the system, in order to distinguish it from the entropy symbol H(ρ), which stands for the
von-Neumann entropy of ρ.
1 −β Ĥ
Exercise 17.4.6. Show that the free energy of the Gibbs state γ := Z
e is given by
To see the relation of the free energy to the relative entropy, observe that the relative
entropy of athermality is given by:
1 −β Ĥ
D(ρ∥γ) = −H(ρ) − Tr ρ log e
Z
h i
= log Z − H(ρ) + βTr ρĤ (17.91)
= βF (ρ) + log Z
(17.90)→ = β F (ρ) − F (γ) .
Hence, the free energy is the key factor that directly governs the optimal rate of intercon-
versions of athermality.
The Umegaki relative entropy of athermality has another interesting representation. For
a quantum athermality state (ρ, γ), with Hamiltonian Ĥ, we can express D(ρ∥γ) as:
where PĤ is the pinching channel associated with the Hamiltonian Ĥ, and C(ρ) is the
coherence measure defined in (15.238) (C(ρ) is also known as the G-asymmetry of the state
ρ as defined in 15.118, where G stands for the group of time-translation symmetry). That
is, the athermality of the state (ρ, γ) can be decomposed into two components:
1. Its nonuniformity that is quantified by D PĤ (ρ) γ .
2. Its asymmetry (or coherence between energy eigenspaces) that is quantified by the
coherence measure C(ρ).
We will see later on that this decomposition has an operational meaning, in which (roughly
speaking) D PĤ (ρ) γ is the cost to prepare the athermality state (PĤ (ρ), γ) and C(ρ) is
the cost to ‘rotate’ PĤ (ρ) to ρ. Moreover, since the regularization of the C(ρ) vanishes (see
Theorem 15.3.4), we conclude that
1
D Pn ρ⊗n γ ⊗n = D (ρ∥γ) ,
lim (17.93)
n→∞ n
where Pn is the pinching channel associated with the total Hamiltonian of system An .
Definition 17.5.1. Let ρ, γ ∈ D(A) and ρ′ , γ ′ ∈ D(A′ ). We say that the pair (ρ, γ)
relatively majorizes the pair (ρ′ , γ ′ ), and write
The two conditions in (17.95) are equivalent to the existence of a Choi matrix J ∈
Pos(AA′ ) that satisfies
h ′
′
i h ′
′
i
TrA′ J AA ρT ⊗ I A = ρ′ and TrA′ J AA γ T ⊗ I A = γ′ . (17.96)
′
This problem, of determining whether or not such a Choi matrix J AA exists is an SDP
feasibility problem that can be solved efficiently and algorithmically using techniques from
semi-definite programming. However, unlike the classical case, where relative majorization
can be characterized with Lorenz curves, it is not known in the fully quantum case whether
a similar geometrical characterization exists.
Observe that any quantum divergence behaves monotonically under quantum relative
majorization. Specifically, if D is a quantum divergence then
The converse to the above property also holds. That is, if for any choice of a quantum
divergence D we have D(ρ∥γ) ⩾ D ρ′ γ ′ then we must have (ρ, γ) ≻ (ρ′ , γ ′ ). In fact, we
show now that this assertion still holds even if we restrict D to have a very specific form.
′ ′ ′
Recall the complete family of monotones given in (17.84) with ω AA := ρT ⊗ η0A + γ A ⊗ η1A
given as in Exercise 17.4.5. From the completeness of the family of monotones, it follows
that (ρ, γ) ≻ (ρ′ , γ ′ ) if and only if Gη (ρ, γ) ⩾ Gη (ρ′ , γ ′ ) for all η0 , η1 ∈ Pos(A′ ). Similar
to (17.87), for every η0 , η1 ∈ Pos(A′ ) we define
The above functions forms a family of normalized quantum divergences that can be used to
characterize quantum relative majorization.
Exercise 17.5.1. Show that for every η0 , η1 ∈ Pos(A′ ), the function Dη as defined above is
a quantum divergence.
Theorem 17.5.1. Let ρ, γ ∈ D(A) and ρ′ , γ ′ ∈ D(A′ ). Then, the following are
equivalent:
1. (ρ, γ) ≻ (ρ′ , γ ′ ).
2. Show that the theorem holds even if we restrict η0 and η1 to satisfy Tr[η0 + η1 ] = 1
′
(hence, we can assume without loss of generality that ω AA is a density matrix).
3. Show that the theorem holds even if we restrict η0 and η1 to satisfy Tr[η0 ] = Tr[η1 ] =
1/2.
fidelity function F (η, ζ) := |η 1/2 ζ 1/2 |1 for all η, ζ ∈ Pos(A), including unnormalized states,
to provide a more accessible approach to understanding quantum relative majorization.
3. F (aγ ′ − ρ′ , bρ′ − γ ′ ) ⩾ F aγ − ρ, bρ − γ .
The above invariant overlap implies that the matrix V : A → A′ Ã′ R defined by
To see that the channel above satisfies the desired properties, first observe that by isolating
ρ and γ from (17.99) we get
Exercise 17.5.3. Prove the assertion in the proof above that N (γ) = γ ′ .
Exercise 17.5.4. Let ρ, σ, γ ∈ D(A) be three qubit states (i.e. |A| = 2). Show that
GPO
(ρ, γ) −−−→ (σ, γ) (17.107)
if and only if
Dmax ρ γ ⩾ Dmax σ γ and Dmax γ ρ ⩾ Dmax γ σ . (17.108)
That is, the third fidelity condition of the theorem above is unnecessary in this case.
′ ′
2. TrA′ J AA γ T ⊗ I A = γ ′
′ ′ ′ ′
3. Pξ J AA = 0, where Pξ is the pinching channel of ξ AA := H A ⊗ I A − I A ⊗ H A .
4. J A = I A .
Similar to the GPO case, this problem, of determining whether or not such a Choi matrix
′
J AA exists, is an SDP feasibility problem that can be solved efficiently and algorithmically
using techniques from semi-definite programming. However, for certain choices of Hamilto-
GPC
nians, there exists a much simpler way to characterize the conversion (ρ, γ) −−→ (ρ′ , γ ′ ).
Note that in the general case of relatively non-degenerate Hamiltonians, CTO and GPC
operations can only disrupt the coherence between the energy levels of the input state ρA .
In such scenarios, coherence cannot be manipulated, but only destroyed. Therefore, for the
remainder of this chapter, we will focus on Hamiltonians that exhibit relative degeneracy.
When considering the conversion of one athermality state (ρ, γ) to another athermality
state (ρ′ , γ ′ ) we will use the properties
F F
→ (ρ ⊗ γ ′ , γ ⊗ γ ′ )
(ρ, γ) ← and (ρ′ , γ ′ ) ←
→ (γ ⊗ ρ′ , γ ⊗ γ ′ ) . (17.109)
The equivalence relations above follow from the fact that appending or removing a Gibbs
F
state is a free operation in the theory of athermality. Therefore, the conversion (ρ, γ) → −
(ρ′ , γ ′ ) between a state of system A and a state of system A′ is equivalent to the conversion
F
(ρ ⊗ γ ′ , γ ⊗ γ ′ ) →
− (γ ⊗ ρ′ , γ ⊗ γ ′ ) between two states of system AA′ ; see Fig. 17.3. In other
words, interconversions among states with the same dimensions (i.e. states with |A| = |A′ |)
is general enough to capture also interconversions with |A′ | ̸= |A| (as long as we do not
impose non-degeneracy constraints). We will therefore focus here on interconversions among
states that are all in D(A).
Figure 17.3: Equivalence of a conversion from A to A′ and a conversion from AA′ to itself.
To establish the full set of necessary and sufficient conditions, let J AB be the Choi
matrix of a time-translation covariant channel E ∈ COV(A → A) that satisfies E(ρ) = σ and
E(γ) = γ. Denoting by {rxy }x,y∈[m] and {sxy }x,y∈[m] the components of ρ and σ, respectively,
we get that the Choi matrix of E has the form (cf. (15.249))
X X sxy
J AÃ = py|x |x⟩⟨x|A ⊗ |y⟩⟨y|Ã + |x⟩⟨y|A ⊗ |x⟩⟨y|Ã , (17.110)
x,y x̸=y
rxy
where P = (py|x ) is some column stochastic matrix, and we assumed that the off diagonal
terms of ρA are non-zero. Let r and s be the probability vectors consisting of the diagonals
of ρ and σ, and identify the diagonal matrix γ with the Gibbs vector g consisting of its
diagonal. Then, the Choi matrix above corresponds to such a GPC channel E if and only if
it is positive semidefinite and
P r = s and P g = g . (17.111)
The above condition implies that (r, g) ≻ (s, g), however, it is not sufficient since we also
require that J AÃ ⩾ 0. This latter condition is equivalent to the requirement that the matrix
obtained by replacing the diagonal elements of Q (as defined in (15.251)) with {px|x }x∈[m] is
positive semidefinite. We summarize these considerations in the following exercise.
Exercise 17.5.6. Let (ρ, γ) and (σ, γ) be two athermality states of a system A, whose Hamil-
tonian Ĥ has a non-degenerate Bohr spectrum. Suppose also that the off diagonal terms of
ρ are non-zero. Show that
GPC
(ρ, γ) −−→ (σ, γ) (17.112)
if and only if there exists a column stochastic matrix P that satisfies both (17.111) and the
matrix X X sxy
px|x |x⟩⟨x| + |x⟩⟨y| ⩾ 0 . (17.113)
rxy
x∈[m] x̸=y∈[m]
The exercise above does not offer significant computational simplification compared to
the SDP feasibility problem discussed at the beginning of this section. This is because de-
termining the existence of a column stochastic matrix P itself constitutes an SDP problem.
However, the exercise’s significance lies in its ability to highlight the role of quantum co-
herence in converting athermality, as demonstrated by the following theorem. Furthermore,
we will observe later that in the qubit case, the exercise above provides a straightforward
criterion for exact inter-conversions under GPC.
Theorem 17.5.3. Let (ρ, γ) and (σ, γ) be two quantum athermality states of
dimension m := |A|. For any x, y ∈ [m] let rxy := ⟨x|ρ|y⟩ and sxy := ⟨x|σ|y⟩ be the
xy-component of ρ and σ, respectively. Suppose that rxy ̸= 0 for all x, y ∈ [m] and
that rxx = sxx for all x ∈ [m]. Then,
GPC
X sxy
(ρ, γ) −−→ (σ, γ) ⇐⇒ Q := I + |x⟩⟨y| ⩾ 0 . (17.114)
rxy
x̸=y∈[m]
Proof. Since the diagonals of ρ and σ are the same, we get that if Q ⩾ 0 then by taking
the stochastic matrix P to be the identity matrix, all the conditions in Exercise 17.5.6 are
GPC GPC
satisfied so that (ρ, γ) −−→ (σ, γ). Conversely, if (ρ, γ) −−→ (σ, γ) then by Exercise 17.5.6
P with a diagonal {px|x } that satisfies (17.113). By adding
there exists a stochastic matrix P
the positive semidefinite matrix x∈[m] (1 − px|x )|x⟩⟨x| to the matrix in (17.113) we get that
also Q ⩾ 0. This completes the proof.
In simple terms, the condition stated in the theorem above, that ρ and σ share the same
diagonals, implies that they have the same non-uniformity and only differ in their coherence
(asymmetry) properties. Interestingly, the condition Q ⩾ 0 turns out to be identical to the
condition given in Theorem 15.6.3 when ρ and σ have the same diagonal elements. Thus,
GPC
in this case, we can state that (ρ, γ) −−→ (σ, γ) if and only if ρ can be transformed into
σ through time-translation covariant operations. It is noteworthy that the Gibbs state, γ,
does not play a role in such conversions because ρ and σ share the same non-uniformity (i.e.,
same diagonal elements).
GPC
can be converted to σ by GPC. That is, (ψ, γ) −−→ (σ, γ).
Exercise 17.5.7. Prove the corollary above. Hint: See the proof of Corollary 15.6.1.
Exercise 17.5.8. Show that the corollary above still holds even if we replace GPC with
CTO. Hint: Use the fact that ψ can be converted to σ by time-translation covariant channel,
and then use Theorem 15.2.4.
We also denote the diagonals of the matrices above by r := (r, 1−r)T , s := (s, 1−s)T and g =
GPC
(g, 1 − g)T , respectively. We would like to find the conditions under which (ρ, γ) −−→ (σ, γ).
Recall that if a = 0 then we must have b = 0 since GPC cannot generate coherence between
energy levels. Therefore, the case a = 0 has already been covered by the quasi-classical
regime. We will therefore assume in the rest of this subsection that a ̸= 0.
Theorem 17.5.4. Let ρ, σ, γ ∈ D(A) be three qubit states as above and suppose
GPC
a ̸= 0 and γ ̸= u. Then, for r ̸= g, (ρ, γ) −−→ (σ, γ) if and only if (r, g) ≻ (s, g) and
2
|b|2
s−g r−s
2
⩽ + g(1 − g) . (17.117)
|a| r−g r−g
GPC
For r = g, (ρ, γ) −−→ (σ, γ) if and only if s = g and |a| ⩾ |b|.
Proof. From Exercise 17.5.6 it follows that (ρ, γ) can be converted to (σ, γ) by GPC if and
only if there exists a 2 × 2 column stochastic matrix P = {py|x }x,y∈{0,1} that satisfies P r = s,
P g = g, and
p0|0 b/a
⩾0. (17.118)
b̄/ā p1|1
Note that this last condition is equivalent to
|b|2
⩽ p0|0 p1|1 . (17.119)
|a|2
The conditions P r = s and P g = g can be expressed as the following linear systems of
equations
r 1−r p0|0 s r 1−r p1|0 1−s
= and = . (17.120)
g 1−g p0|1 g g 1−g p1|1 1−g
Note that the equations involving p1|0 and p1|1 follows trivially from the ones involving p0|0
and p0|1 since P is column stochastic. From Cramer’s rule it then follows that for the case
that r ̸= g
s 1−r r 1−s
det det
g 1−g g 1−g
p0|0 = and p1|1 = . (17.121)
r 1−r r 1−r
det det
g 1−g g 1−g
Finally, substituting the above expression in (17.119) gives (after some simple algebra) the
inequality (17.117).
For the case that r = g we also have s = g (otherwise, (r, g) ̸≻ (s, g)) and the linear
system of equations in (17.120) has a unique solution given by p0|0 = p1|1 = 1. Therefore, in
this case, (17.119) gives |b| ⩽ |a|. This completes the proof.
Exercise 17.5.9. Show that if s = g in (17.116) then (ρ, γ) can be converted to (σ, γ) by
GPC if and only if
|b|2
⩽ det(γ) . (17.122)
|a|2
From the exercise above it follows that already in the qubit case, conversions under GPC
have a certain type of discontinuity. To see this, consider the case s = g, and observe that
the condition |a|2 det(γ) ⩾ |b|2 is stronger than the condition |a| ⩾ |b| that one obtains
if also r = g. In particular, observe that det(γ) ⩽ 41 . Hence, there exists an ε > 0 and
ρ, σ, γ ∈ D(A) such that for any ρ ∈ Bε (σ) the state (ρ, γ) cannot be converted by GPC to
(σ, γ) unless ρ = σ.
Exercise 17.5.10. Find explicit example of three qubit states ρ, σ, γ, and ε > 0 such that
GPC
for any ρ ∈ Bε (σ), (ρ, γ) −
̸ −→ (σ, γ) unless ρ = σ.
where the minimum is over all Λ ∈ Pos(A′ ) that satisfy the following conditions:
′ ′
1. Λ ⩾ ρ′ − TrA J AA ρT ⊗ I A .
′ ′
2. γ ′ = TrA J AA γ A ⊗ I A .
3. J ∈ Pos(AA′ ) and J A = I A .
For the case that F = GPC the conversion distance is evaluated exactly as above with the
′
additional constraint on J AA that
′
′ ′ ′ ′
Pξ J AA = J AA , where ξ AA := H A ⊗ I A − I A ⊗ H A . (17.126)
Note that this additional condition is still in a form suitable for SDP.
The preceding discussion demonstrates that the conversion distance of athermality can be
computed numerically. However, the formulation presented above for the conversion distance
lacks insight and does not offer any practical means to calculate the distillable athermality
or the athermality cost of a state (ρ, γ). Therefore, we now turn our attention to the case
where the target state is quasi-classical, and show that for this case there exists an analytical
formula for the conversion distance.
where P denotes the pinching channel associated with the Hamiltonian of system A.
Remark. Note that on the right-hand side, we have a conversion distance between two quasi-
classical states. In the next subsection, we will demonstrate that for such cases, an analytical
formula exists.
Proof. Let P and P ′ be the pinching channels associated with the Hamiltonians of systems
A and A′ , respectively, and observe that for any E ∈ COV(A → A′ ) that satisfies γ ′ = E(γ)
we have
γ ′ = P ′ (γ ′ ) = P ′ ◦ E(γ)
(17.128)
Part 2 of Exercise 15.6.3→ = E ◦ P(γ) .
Therefore,
GPC ′ ′
1 ′ ′
T (P(ρ), γ) −−→ (ρ , γ ) = min ∥ρ − E ◦ P(ρ)∥1 : γ = E (γ)
E∈COV(A→A′ ) 2
1 ′ ′
(17.128)→ = min ∥ρ − E ◦ P(ρ)∥1 : γ = E ◦ P (γ)
E∈COV(A→A′ ) 2 (17.129)
1 ′
N := E ◦ P→ ⩾ min ∥ρ − N (ρ)∥1 : γ ′ = N (γ)
N ∈COV(A→A′ ) 2
GPC ′ ′
= T (ρ, γ) −−→ (ρ , γ ) .
For the converse inequality, observe that by using Part 2 of Exercise 15.6.3 we get that for
every E ∈ COV(A → A′ )
∥ρ′ − E ◦ P(ρ)∥1 = ∥ρ′ − P ′ ◦ E(ρ)∥1
′ ′
P ′ (ρ′ ) = ρ′ −−−−→ = P ρ − E(ρ) (17.130)
1
DPI→ ⩽ ∥ρ′ − E(ρ)∥1 .
Combining this inequality with the definition of the conversion distance, specifically with
the first equality in (17.129), gives
GPC ′ ′
1 ′ ′
T (P(ρ), γ) −−→ (ρ , γ ) ⩽ min ∥ρ − E(ρ)∥1 : γ = E (γ)
E∈COV(A→A′ ) 2 (17.131)
GPC ′ ′
= T (ρ, γ) −−→ (ρ , γ ) .
Combining the two inequalities in (17.129) and (17.129) gives the equality in (17.127). This
completes the proof.
Exercise 17.6.1. Use Lemma 11.1.1 to provide a shorter proof of the inequality in (17.129).
Exercise 17.6.2. Let P and P ′ be the pinching channel associated with the Hamiltonians
of systems A and A′ , respectively, and let N := P ′ ◦ E, where E ∈ CPTP(A → A′ ). Show
that N ∈ COV(A → A′ ) if and only if
N =N ◦P . (17.132)
Recall that from Theorem 4.3.2, there exists r, s ∈ Prob(k) such that (p, g) ∼ (r, u(k) ) and
(p′ , g′ ) ∼ (r′ , u(k) ). Specifically,
M M
r := px u(ax ) and r′ := p′y u(by ) . (17.135)
x∈[m] y∈[n]
With these notations and the assumption that the Gibbs vectors have rational components,
we have the following closed formula for the conversion distance.
Theorem 17.6.2. Let (p, g), (p′ , g′ ), and r, r′ ∈ Prob(k) be as above. Then,
CTO
T (p, g) −−−→ (p′ , g′ ) = max ∥r′ ∥(ℓ) − ∥r∥(ℓ) .
(17.137)
ℓ∈[k]
Proof. Let E be a k×n column stochastic matrix defined on every q ∈ Prob(n) as (cf. (4.136))
M
Eq := qy u(by ) . (17.138)
y∈[n]
Observe that r′ = Ep′ and that ∥p′ − q∥1 = ∥r′ − Eq∥1 (see Exercise 17.6.3). Thus,
CTO ′ ′
1 ′
T (p, g) −−−→ (p , g ) = min ∥r − Eq∥1 : r ≻ Eq
q∈Prob(n) 2
1 ′ (17.139)
s := Eq −−−−→ ⩾ min ∥r − s∥1 : r ≻ s
s∈Prob(k) 2
Noisy
cf. (16.15) −−−−→ := T r − −−→ r′ .
Exercise 17.6.4. Show that a vector g ∈ Prob(m) satisfies (17.144) for all g̃ ∈ Prob(m) if
and only if g = u(m) .
In the fully quantum case, under GPC and CTO, coherence among energy level is a
resource that cannot be measured by the golden unit (|0⟩⟨0|A , uA ). The reason is that
this golden unit is quasi-classical, and it cannot be converted by GPC (or CTO) to any
athermality state that is not quasi-classical (even if we take m := |A| = ∞). This means
that in the QRT of quantum athermality, there exists another type of resource, namely,
time-translation asymmetry, that can not be quantified by the golden unit (|0⟩⟨0|A , uA ). We
conclude that quantum athermality can be viewed as a resource comprising of two types:
In contrast to GPC and CTO, GPO has the capability to induce coherence between
energy levels. Consequently, as demonstrated in the subsequent exercise, we can retain the
state (|0⟩⟨0|A , uA ) as the golden unit of the resource theory.
Exercise 17.6.5. Let m := |A| and (ρ′ , γ ′ ) be an athermality state of system A′ . Show that
for sufficiently large m
GPO
(|0⟩⟨0|A , uA ) −−−→ (ρ′ , γ ′ ) . (17.146)
Exercise 17.6.6. Show that under GPO operations, the resource (|0⟩⟨0|A , uA ) is equivalent
to the resource |0⟩⟨0|X , uX
m , where X is a two-dimensional classical system, m := |A|, and
1 m−1
uX
m := |0⟩⟨0|X + |1⟩⟨1|X . (17.147)
m m
The exercise above demonstrates that we can always consider the golden unit to be a
qubit. Moreover, note that uX m is well defined even if m is not an integer. This can help
simplifying certain expressions, and we will therefore consider also the states |0⟩⟨0|X , uX
m
with m ∈ R+ . We will use the notation
Υm := |0⟩⟨0|X , uX
m (17.148)
Exercise 17.6.7. Show that {Υm }m∈N satisfies the conditions of a golden unit outlined
in Definition 11.1.1.
We next consider the conversion distance from the golden unit Υm to an arbitrary state
(ρ, γ) of system A. Here we only consider GPO since GPC cannot generate coherence. By
definition,
GPO
1 A X A X
T Υm −−−→ (ρ, γ) = min ρ − E(|0⟩⟨0| ) 1 : γ = E um (17.155)
E∈CPTP(X→A) 2
Denoting by ω = E(|0⟩⟨0|) and τ = E(|1⟩⟨1|), the conversion distance can be simplified as
GPO
1 1 m−1
T Υm −−−→ (ρ, γ) = min ∥ρ − ω∥1 : γ = ω + τ
ω,τ ∈D(A) 2 m m
1
= min ∥ρ − ω∥1 : mγ ⩾ ω (17.156)
ω∈D(A) 2
1
= min ∥ρ − ω∥1 : Dmax (ω∥γ) ⩽ log m .
ω∈D(A) 2
This expression will be instrumental in our calculations regarding the cost of athermality
under GPO.
Exercise 17.6.10. Let (ρ, γ) be an athermality state of system A. Show that under GPC
for any m ∈ N
GPC
1
T Υm −−→ (ρ, γ) ⩾ min ∥ρ − ∆(σ)∥1 (17.157)
σ∈D(A) 2
where ∆ ∈ CPTP(A → A) is the completely dephasing channel (with respect to the basis of
the Hamiltonian of system A).
Integrating this with the formulas from the preceding subsection that pertain to the conver-
sion distance, we arrive at the subsequent outcome. We denote by P ∈ CPTP(A → A) the
ε
pinching channel associated with the Hamiltonian of system A, and by Dmin the quantum
hypothesis testing divergence as defined in (8.185).
Theorem 17.7.1. Let ε ∈ [0, 1]. For any athermality state (ρ, γ) of a quantum
system A, the ε-approximate single-shot distillation of athermality is given by:
This completes the proof of the first part. The second part of the proof follows from the first
part in conjunction with (17.127). This concludes the proof.
Observe that when we take ε = 0 we get that the exact single-shot distillation is given
by
Distill0 ρA , γ A = Dmin ρA γ A ,
(17.160)
This result give a physical meaning to the min relative entropy as the exact single-shot
distillation rate under GPO.
Theorem 17.7.2. Let ε ∈ [0, 1]. For any athermality state (ρ, γ) of system A, the
ε-single-shot distillation (under GPO) is given by
Proof. Combining the expression (17.156) for the conversion distance together with the def-
inition (17.161) gives
ε
n1 o
Cost (ρ, γ) = inf log m : ∥ρ − ω∥1 ⩽ ε , Dmax (ω∥γ) ⩽ log m , ω ∈ D(A)
0<m∈R 2
n 1 o
= inf Dmax (ω∥γ) : ∥ρ − ω∥1 ⩽ ε , ω ∈ D(A)
2
ε
= Dmax (ρ∥γ) .
(17.163)
This completes the proof.
This result provides a physical meaning to the max relative entropy as the exact single-shot
cost under GPO.
Exercise 17.7.1. Let γ ∈ D>0 (A) be the Gibbs state of system A with eigenvalues g1 , . . . , gm .
Let ψγ ∈ Pure(A) be the pure state
X√
|ψγ ⟩ := gx |x⟩ . (17.165)
x∈[m]
Show that the exact single-shot athermality cost of (ψγ , γ) is equal to log(m).
Recall from Theorem 17.7.1 that in the single-shot regime, for any ε ∈ (0, 1), the distillable
athermality under GPO is given by
Note that in this case we did not need to take the limsup over n since the limit exists.
Therefore, under GPO, the asymptotic distillable athermality is given by the relative entropy
D(ρ∥γ). Remarkably, this is also the distillable rate under GPC and CTO.
Theorem 17.8.1. Let (ρ, γ) be an athermality state of a quantum system A, and let
ε ∈ (0, 1). Then, the distillable athermality under either CTO or GPC is given by
1
Distill (ρ, γ) = lim sup Distillε ρ⊗n , γ ⊗n = D (ρ∥γ) .
(17.169)
n→∞ n
Proof. Let ε ∈ (0, 1) and recall from Theorem 17.7.1 that the ε-single-shot distillable ather-
mality under GPC or CTO is given by
Distillε (ρ, γ) = Dmin
ε
P(ρ) γ , (17.170)
where P is the pinching channel corresponding to the Hamiltonian of system A. Since
P(γ) = γ we have
Distillε (ρ, γ) = Dmin
ε
P(ρ) P(γ)
ε (17.171)
DPI→ ⩽ Dmin (ρ∥γ) .
Thus,
1 1 ε
lim sup Distillε ρ⊗n , γ ⊗n ⩽ lim sup Dmin ρ⊗n γ ⊗n
n→∞ n n→∞ n (17.172)
′
The Quantum Stein s Lemma→ = D(ρ∥γ) .
To get the opposite inequality, for every n ∈ N let Pn ∈ CTO(An → An ) denotes the pinching
channel associated with the Hamiltonian of system An . Now, fix k ∈ N and observe that for
every ε ∈ (0, 1)
1 1 ε
lim sup Distillε ρ⊗n , γ ⊗n = lim sup Dmin Pn (ρ⊗n ) γ ⊗n
n→∞ n n→∞ n
1 ε
Pnk (ρ⊗nk ) γ ⊗nk
⩾ lim sup Dmin (17.173)
n→∞ nk
1 ε
Pk⊗n ◦ Pnk (ρ⊗nk ) Pk⊗n γ ⊗nk ,
DPI→ ⩾ lim sup Dmin
n→∞ nk
where in the last line we used the data processing inequality with the channel Pk⊗n .Now, the
Gibbs state is invariant under the pinching channel and in particular Pk⊗n γ ⊗nk = γ ⊗nk .
Moreover, from Exercise 15.2.3 it follows that Pk⊗n ◦ Pnk = Pk⊗n . We therefore get that
1 1 ε
lim sup Distillε ρ⊗n , γ ⊗n ⩾ lim sup Dmin Pk⊗n (ρ⊗nk ) γ ⊗nk
n→∞ n n→∞ nk
1 1 ε ⊗n ⊗n
= lim sup Dmin Pk (ρ⊗k ) γ ⊗k (17.174)
k n→∞ n
1
= D Pk ρ⊗k γ ⊗k ,
k
where in the last line we used the quantum Stein’s lemma. The above inequality can also
be understood physically by observing that the state σk := Pk ρ⊗k is quasi-classical, and
consequently, it has a distillable athermality rate given by D(σk ∥γ ⊗k ). Now, since the above
inequality holds for all k ∈ N we conclude that
1 1
lim sup Distillε ρ⊗n , γ ⊗n ⩾ lim sup D Pk ρ⊗k γ ⊗k
n→∞ n k→∞ k (17.175)
(17.93)→ = D(ρ∥γ) .
This completes the proof.
where the sum runs over all sequences xn ∈ [m]n of the same type t. With the above
notations X √ n
|ψ⟩⊗n = qt,n |t⟩A (17.180)
t∈Type(n,m)
where
n
qt,n := 2−n H(t)+D(t∥p)
. (17.181)
nt1 , . . . , ntm
n
Note that the vectors |t⟩A are eigenvectors of the Hamiltonian of system An . Specifically,
n n n
X
H A |t⟩A = n tx ax |t⟩A , (17.182)
x∈[m]
n
so that the energy in the state |t⟩A is n times the average energy with respect to the type
t.
Exercise 17.8.1. Consider the generic case, in which the energy eigenvalues {a1 , . . . , am }
are rationally independent; i.e. for any set of m integers ℓ1 , . . . , ℓm ∈ Z we have
ℓ1 a1 + · · · + ℓm am = 0 ⇐⇒ ℓ1 = ℓ2 = · · · = ℓm = 0 . (17.183)
Show that under this mild assumption (which we will not assume in the text), for every
n
n ∈ N, the number of distinct eigenvalues of H A equals |Type(n, m)|. That is, each energy
n
eigenvalue of H A corresponds to exactly one type.
n
Given that each |t⟩A is an energy eigenstate, it naturally follows from (17.180) that
we can express |ψ ⊗n ⟩ as a linear combination of up to |Type(n, m)| ⩽ (n + 1)m energy
eigenstates. In simpler terms, the coherence inherent in |ψ ⊗n ⟩ can be compactly represented
within an (n + 1)m dimensional vector (dimension polynomial in n).
This observation leads to a notable implication. As established in Corollary 17.5.2, for
any mixed state in D(A) there exists a pure state in Pure(A) that can be converted into
it via GPC. When we couple this insight with the aforementioned observation, a significant
deduction emerges: the pure state coherence cost for preparing ρ⊗n ∈ D(An ) must not
surpass m log(n + 1). To put it differently, the rate of asymmetry cost – the coherence
expense per instance of ρ – cannot outpace m log(n+1)n
, a ratio that approaches zero in the
limit as n → ∞. In contrast, the non-uniformity cost does not go to zero in the asymptotic
limit since the energy of ρ⊗n grows linearly with n.
In summary, athermality is made up of two main resources: nonuniformity and time-
translation asymmetry, the latter of which is often referred to as coherence. Because of
this, the costs related to athermality states can be categorized into two parts: the cost
of nonuniformity and the cost of coherence. However, the coherence cost decreases and
approaches zero in the asymptotic limit, necessitating a unique form of rescaling. This
complexity lends a subtle character to the resource theory of quantum athermality, leaving
several critical questions within the theory still unresolved.
Therefore, for any ε > 0 and sufficiently large n, the state |ψ⟩⊗n can be made arbitrarily
close to the state
1 X √ n
X
|ψεn ⟩ := √ qt,n |t⟩A where νε := qt,n . (17.187)
νε t∈S t∈S
n,ε n,ε
P
From (17.182) the energy of any type t ∈ Type(n, m) is given by µt := n x∈[m] tx ax .
In the sum above, the type t belong to Sn,ε so that 12 ∥t − p∥1 ⩽ ε. Consequently, each
component x ∈ [m] of the vector t − p satisfies |tx − px | ⩽ 2ε. Using this property, we get
that X X
|µt − µp | ⩽ n ax |tx − px | ⩽ 2nε ax . (17.188)
x∈[m] x∈[m]
Therefore, for any two types t, t′ ∈ Type(n, m) that are ε-close to p we have
X
|µt − µt′ | ⩽ 4nε ax (17.189)
x∈[m]
In other words, the energy spread of |ψεn ⟩ is no greater than 4nε x∈[m] ax .
P
P
Note that by taking ε > 0 sufficiently small we can make the energy spread 4nε x∈[m] ax
much smaller that nam . However, we still get that the energy spread of ψεn is linear in n.
We show now that by taking ε to depend on n, we can find states in Pure(An ) that are very
close to ψ ⊗n but with energy spread that is sublinear in n.
Lemma 17.8.1. Let ψ ∈ Pure(A) and α ∈ (1/2, 1). There exists a sequence of pure
state {χn }n∈N in Pure(An ) with the following properties:
1. The limit
lim ψ ⊗n − χn 1
=0. (17.190)
n→∞
Proof. Let εn = nα−1 . Since α ∈ 21 , 1 we have limn→∞ εn = 0 and limn→∞ nε2n = ∞. The
latter implies that if we replace ε in (17.186) with εn we still get the zero limit of (17.186).
Hence, the pure state χn := ψεnn satisfies (17.190). Since for all ε > 0 we have that ψεn can be
expressed as a linear combination of no more than (n + 1)m energy eigenvectors, it follows
that also χn have this property. Finally, from (17.189) we get that the energy spread of χn
cannot exceed X X
4nεn ax = 4nα ax . (17.191)
x∈[m] x∈[m]
χn H ⊗n χn (17.192)
significant coherence among energy levels since the coherence grows logarithmically with n.
Indeed, as we will see shortly, such resources makes the QRT of athermality reversible.
H Rn ∞
⩽ cnα ∀n∈N. (17.193)
The key assumption in the given definition is that the energy of systems Rn grows sub-
linearly with n. Consequently,
R R as n approaches infinity in the asymptotic limit, the resource-
fulness of any states (ω n , γ n ) n∈N becomes insignificant compared to the resourcefulness
of n copies of the golden unit Υ2 := |0⟩⟨0|X , uX 2 . We will soon discover that this small
amount of athermality resource is sufficient to restore reversibility.
Exercise 17.8.3. Show that the distillation rate of athermality as given in Theorem 17.8.1
does not change if we replace CTO by CTO+SLAR. In other words, show that SLAR cannot
increase the distillation rate of athermality.
From Corollary 17.5.2 and Exercise 17.5.8 it follows that any mixed state in D(A) can be
obtained by thermal operations from a pure state in Pure(A). Thus, we can restrict the
minimum above over all density matrices ϕ ∈ D(A) to a minimum over all pure states
ϕ ∈ Pure(A).
Exercise 17.8.4. Let ε ∈ (0, 1/2), ρ, σ, γ ∈ D(A), and suppose that ρ ≈ε σ. Show that for
any system R
Cost2ε ε
R (ρ, γ) ⩽ CostR (σ, γ) . (17.195)
With the above definition of the R-assisted single-shot athermality cost, we define the
asymptotic SLAR-assisted athermality cost as
1
CostεRn ρ⊗n , γ ⊗n ,
Cost (ρ, γ) := inf lim+ lim inf (17.196)
{Rn } ε→0 n→∞ n
where the infimum is over all SLARs, {Rn }n∈N . We show now that for pure states the above
cost can be expressed in terms of the relative entropy. The proof of the mixed-state case is
far more complicated (see the discussion in the ‘Notes and References’ section at the end of
this chapter).
Theorem 17.8.2. Let (ψ, γ) be an athermality state with ψ ∈ Pure(A). Then, the
SLAR-assisted athermality cost of (ψ, γ) is given by
Proof. Since the cost of athermality under CTO assisted with SLAR can not be smaller that
the distillation rate under the same operations, we get from Exercise 17.8.3 that
Cost (ψ, γ) ⩾ D ψ γ . (17.198)
Our goal is therefore to prove the opposite inequality.
Let ε ∈ (0, 1/2) and {χn }n∈N be the sequence of pure states that satisfies all the properties
outlined in Lemma 17.8.1. In particular, each χn is very close to ψ ⊗n (for n sufficiently large)
so that for for sufficiently large n we have (see Exercise 17.8.4)
⊗n
Cost2ε , γ ⊗n ⩽ CostεRn χn , γ ⊗n .
Rn ψ (17.199)
Therefore, we focus now on finding upper bound on CostεRP n
(χn , γ ⊗n ).
By definition, the energy spread of χn is given by 4nα x∈[m] ax for some α ∈ ( 21 , 1), and
each χn has the form (cf. (17.187))
X√ n
|χn ⟩ = qt |t⟩A , (17.200)
t∈Sn
where Sn is the set of all types t ∈ Type(n, m) that satisfies 12 ∥t − p∥1 ⩽ nα−1 (i.e., using
the same notations discussed above (17.184) we have Sn := Sn,εn with εn := nα−1 ), and
{qt }t∈Sn form a probability distribution over the set of types in σn . Let kn be the number of
terms in the superposition above (hence kn ⩽ (n + 1)m ). Furthermore, let the set {µj }j∈[kn ]
n
denote the energy eigenvalues of the Hamiltonian H A . These eigenvalues correspond to the
n
energy eigenvectors |t⟩A that appear in the superposition (17.200). That is, each j ∈ [ℓ]
corresponds exactly to one type t that appears in the superposition (17.200). Although
the energies eigenvalues {µj } depend also on n, we did not add a subscript n to ease on
the notations. Without loss of generality
α
P we also assume that µ1 ⩽ · · · ⩽ µkn , so that the
energy spread of χn is µkn − µ1 ⩽ 4n x∈[m] ax (see Lemma 17.8.1). We will also denote
by s ∈ Sn the type that corresponds to the smallest energy µ1 , and by z n ∈ [m]n the
(n)
n n n
sequence of type s(n) so that H A |z n ⟩A = µ1 |z n ⟩A .
With these notations, we are ready to define the SLAR system Rn to be a kn -dimensional
quantum system whose Hamiltonian is given by
X
H Rn = (µj − µ1 )|j⟩⟨j|Rn . (17.201)
j∈[kn ]
Note that the Hamiltonian H Rn has the same eigenvalues as the energies that appears in χn
n
shifted by µ1 . Observe that |1⟩⟨1|R is a zero-energy state of system Rn , and the maximal
Rn α
P
energy of H is given by µkn − µ1 ⩽ 4n x∈[m] ax so that {Rn }n∈N is indeed a SLAR. We
take the SLAR of system Rn to be
X√
|ϕRn ⟩ := qj |j⟩Rn (17.202)
j∈[k]
where qj := qt with t being the type that corresponds to the energy µj . By construction,
the state
n
ϕRn ⊗ |z n ⟩⟨z n |A (17.203)
has the exact same energy distribution as the state
n
|1⟩⟨1|Rn ⊗ χA
n (17.204)
(recall that |1⟩Rn corresponds to the zero energy of system Rn ). Hence, the above two states
are equivalent resources and can be converted from one to the other by reversible thermal
operations (i.e. an energy preserving unitary). We now use this resource equivalency to
compute the cost of χn in terms of the cost of the quasi-classical state |z n ⟩⟨z n |. We do it in
three steps:
n n
1. Replacing χAn with |1⟩⟨1|
Rn
⊗ χAn : By adding the resource (|1⟩⟨1|
Rn
, γ Rn ) we can only
increase the cost. Therefore,
n An n Rn An
CostεRn χA ⩽ CostεRn |1⟩⟨1|Rn ⊗ χA
n ,γ n ,γ . (17.205)
n n
2. Replacing |1⟩⟨1|Rn ⊗ χAn with ϕ
Rn
⊗ |z n ⟩⟨z n |A : As discussed above, these two states
are equivalent resources so that
n Rn An n An Rn An
CostεRn |1⟩⟨1|Rn ⊗ χA ε Rn n
n , γ = Cost Rn ϕ ⊗ |z ⟩⟨z | , γ . (17.206)
n n
3. Replacing ϕRn ⊗ |z n ⟩⟨z n |A with |z n ⟩⟨z n |A : The cost of |z n ⟩⟨z n | without the assistance
of Rn cannot be smaller than the cost of ϕRn ⊗ |z n ⟩⟨z n | with the assistance of Rn ,
since the latter is defined in terms of a minimum over all states in D(Rn ) (see the
minimization in (17.194)). Therefore,
n n n n
CostεRn ϕRn ⊗ |z n ⟩⟨z n |A , γ Rn A ⩽ Costε |z n ⟩⟨z n |A , γ A . (17.207)
Combining all the three steps above with (17.199), and using the fact in the quasi-classical
regime GPO has the same conversion power as CTO (see Theorem 17.3.1), we get that
⊗n
Cost2ε , γ ⊗n ⩽ Costε |z n ⟩⟨z n |, γ ⊗n
Rn ψ
ε
|z n ⟩⟨z n | γ ⊗n
Theorem 17.7.1→ = Dmax (17.208)
n n ⊗n
⩽ Dmax |z ⟩⟨z | γ ,
where in the last inequality we used the fact that Dmax is always no smaller than its smoothed
version. Now, observe that
Dmax |z n ⟩⟨z n | γ ⊗n = − log z n γ ⊗n z n
(17.209)
X
=− ns(n) log⟨x|γ|x⟩ , x
x∈[m]
where in the last equality we used the fact that the sequence z n has a type s(n) . Hence, the
cost per each copy of ψ can not exceed
1 ⊗n 1
lim sup Cost2ε , γ ⊗n ⩽ lim sup D |z n ⟩⟨z n | γ ⊗n
Rn ψ
n→∞ n n→∞ n
X
= − lim s(n)
x log⟨x|γ|x⟩
n→∞
x∈[m] (17.210)
1
X
2
p − s(n)
1
⩽ nα−1 −−−−→ = − px log⟨x|γ|x⟩
x∈[m]
= D(ψ∥γ) .
This completes the proof.
Exercise 17.8.5. Prove explicitly the second line in (17.209).
operations as given in Lemma 17.2.1 is due to [89]. The statement that in the quasi-classical
regime, CTO and GPO have the same conversion power (see Theorem 17.3.1) was first proved
in [137]. However, for the convertibility among general states (i.e., those not commuting
with the Hamiltonian), in [74] an example was given, demonstrating that GPO are strictly
more powerful than CTO. The set of Gibbs-Preserving Covariant (GPC) operations were
introduced in [151].
The characterization of quantum relative majorization in terms of semi-definite program-
ming can be found in [91]. Moreover, in [40] partial characterization of quantum relative
majorization was given in terms of an extension of Lorenz curves to the quantum domain.
The elegant characterization of quantum relative majorization in the (partially) qubit case
(i.e., Theorem 17.5.2) is due to [118]. Another characterization in which all states are qubits
was given in [4].
Corollaries 17.5.1 and 17.5.2, and Theorems 17.5.3 and 17.5.4, can be found in [89]. More
information on coherences in the theory of athermality, along with another set of constraints
similar to the one given in Theorem 17.5.4 can be found in [135]. More details on the SDP
formulation of exact interconversions in the theory of athermality can be found in [91].
In our proof of Theorem 17.8.2, we primarily drew from the work presented in [89].
Although the proof for the mixed state variant of the theorem was initially introduced
in [31], a more comprehensive and rigorous proof was later provided in the broader context
of [202]. It’s important to highlight that the proof outlined in [202] (specifically, Theorem 1)
stipulates that the
√
ancillary system, referred to (in this book) as the SLAR, should possess
a dimension of 2 n log n . Consequently, a lingering question remains regarding the possibility
of reducing this dimension to Poly(n), as is feasible in the pure-state scenario.
Appendices
821
APPENDIX A
We describe here a few properties of convex sets in a finite dimensional (real) Hilbert space
(e.g. Rn ) that are used quite often in quantum information. A set C ⊂ Rn is said to be
convex if for any two elements v, u ∈ C and any t ∈ [0, 1] the vector Ptv + (1 − t)u ∈ C.
Consequently, if v1 , . . . , vm ∈ C and p1 , . . . , pm are non-negative with x∈[m] px = 1 then
X
px vx ∈ C. (A.1)
x∈[m]
Remark. The hyperplane separation theorem has numerous applications in convex analysis
823
824 APPENDIX A. ELEMENTS OF CONVEX ANALYSIS
Figure A.1: (a) A separating hyperplane between two polytopes. (b) A separating hyperplane
does not exists since one of the sets is not convex.
and beyond. Consequently, it has many variants and also has several proofs. Since this
theorem has been used many times in this book, we provide below its proof for the purpose of
self-containment. This is by no means aims to replace a more thorough study of this subject.
A reader interested in more details can follow standard textbooks on convex analysis.
Proof. We will define the vector n and then show that it has all the desired properties. The
key idea is to use the fact (see the proof below) that if C ⊆ Rn is closed and convex then
there exists a unique vector in C with a minimum (Euclidean) norm. Then, the vector n will
be taken to be the vector with minimal norm in the closer of C1 − C2 . We now discuss the
details.
Let C := C1 − C2 be the closure of the set {r1 − r2 : r1 ∈ C1 , r2 ∈ C2 }. Since the later
is convex , also its closure, C, is convex (see Exercise A.1.1). Let d := inf{∥n∥2 : n ∈ C}.
Geometrically, d is the distance between the two sets. Note that since C1 and C2 are disjoint,
the set C1 − C2 does not contain the zero vector. However, its closer may contain it. We will
first consider the case that d > 0, and later treat the case d = 0.
By definition of d, there exists a sequence nj ∈ C such that ∥nj ∥ → d. This sequence is
a Cauchy sequence since
and ∥nj + nk ∥2 = 4∥(nj + nk )/2∥2 ⩾ 4d since the convex combination (nj + nk )/2 ∈ C.
Hence,
∥nj − nk ∥2 ⩽ 2∥nj ∥ + 2∥nk ∥2 − 4d (A.4)
which goes to zero as j, k → ∞. We define n ∈ C to be the limit of {nj }j∈N . Next, let
r1 ∈ C1 and r2 ∈ C2 , and observe that since both r1 − r2 and n are elements of C, any convex
combination t(r1 − r2 ) + (1 − t)n with t ∈ (0, 1) is also in C. Therefore, its square norm
cannot be smaller than d. Hence,
d ⩽ ∥t(r1 − r2 ) + (1 − t)n∥2
(A.5)
= t2 ∥r1 − r2 ∥2 + 2t(1 − t)(r1 − r2 ) · n + (1 − t)2 d
Finally, since the above inequality holds for all t ∈ (0, 1) it must also hold for t = 0. That is,
Note that if d > 0 this implies (A.2) (see Exercise A.1.2). It is therefore left to check the
case d = 0.
Suppose first that the interior of C1 − C2 is not empty. Therefore, there exists a sequence
K1 ⊂ K2 ⊂ · · · of non-empty closed subsets of the interior of C1 − C2 such that their union
is the interior of C1 − C2 . Since C1 − C2 does not contains the zero vector (recall that C1 and
C2 are disjoint sets) each Kj ⊆ C1 − C2 does not contains the zero vector. Moreover, since
Kj is closed it contains a non-zero vector nj ∈ Kj with minimal norm.
We now apply the same argument leading to (A.7) with C1 replaced with Kj and C2
replaced with the zero set {0} (which is disjoint from Kj ). For such choices, d in (A.7)
equals ∥nj ∥2 so that (A.7) becomes 0 ⩽ ∥nj ∥2 ⩽ v · nj for all v ∈ Kj . We can therefore
normalize all {nj } and argue that they satisfies v · nj ⩾ 0 for all v ∈ Kj . Finally, the
sequence of normalized vectors {nj } contains a convergence subsequence (since the sphere
in Rn is compact), and therefore its limit n also satisfies v · n ⩾ 0 for all v in the interior of
C1 − C2 . Hence, by continuity, the inequality v · n ⩾ 0 must also hold for all v in C1 − C2
itself. This completes the proof for the case that the interior of C1 − C2 is not empty.
If the interior of C1 − C2 is empty then its span has a dimension strictly smaller than
the dimension of the whole space. Therefore, it is contained in some hyperplane {v ∈ Rn :
v · n = c} so that v · n ⩾ c for all v in C1 − C2 . As we argued before, this implies (A.2).
The remaining part of the proof for the case that C1 , C2 are closed and compact is left as an
exercise.
Exercise A.1.1. Show that if C1 and C2 are two convex subsets of Rn then C1 − C2 is also
convex.
Exercise A.1.3. Complete the proof above. That is, show that (A.2) holds with strict
inequalities if C1 and C2 are closed and at least one of them is compact.
Exercise A.1.4. Show that if C1 and C2 are two disjoint convex subsets of Rn , and if C1 is
open in Rn , then there exist a nonzero vector n ∈ Rn and a real number c ∈ R such that
Hint: Use the theorem above and the fact that separating hyperplanes cannot intersect the
interiors of convex sets.
Note that by definition, the convex hull of a single vector v ∈ Rn consists of just the vector
v. Hence, the set of all m-dimensional probability vectors is a polytope in Rm .
As a simple example, consider the set Prob(m) consisting of all probability vectors in Rm .
That is, Prob(m) denotes the set of all m-dimensional vectors with non-negative components
that sum to one. It is simple to check that
Face
Definition A.2.1. Consider a convex set C ⊆ Rn . A subset F ⊆ C is called a face of
C if for any v ∈ F and any v1 , v2 ∈ C such that v ∈ (v1 , v2 ) we have v1 , v2 ∈ F.
F ∩ (v1 , v2 ) ̸= Ø ⇒ v1 , v2 ∈ F. (A.12)
To have a better understanding of this definition, let CP:= Conv{v1 , . . . , vm } be the convex
hull of m vectors in Rn (i.e. C is a polytope), and let x∈[m] px vx be a vector that belongs
to a face F of the polytope C. Then, for any x ∈ [m] with px ∈ (0, 1) we must have vx ∈ F.
Hence, any face of C must be a convex hull of a subset of {v1 , . . . , vm }. Note, however, that
the converse is not necessarily true. That is, a convex hull of a subset of {v1 , . . . , vm } is not
necessarily a face.
For any x ∈ [m] the set {vx } (consisting of a single vector) is a face of the convex polytope
C ⊂ Rn . It is also called a vertex of the polytope. Any face F of C that can be expressed
as F = Conv{vx , vy }, where x, y ∈ [m] and x ̸= y is called an edge of the polytope C. Note
that we do not claim that Conv{vx , vy } is necessarily a face, only that if it is a face, then
Figure A.2: Faces of a 3D cube. The dashed line is not a face since it contains points in open
intervals (the purple line) with end points that are outside of the dashed line.
it is called an edge. Finally, a facet of C is a face that can be expressed as a convex hull of
n − 1 distinct vectors in {v1 , . . . , vm }. Therefore, faces of convex sets generalize the notion
of vertices, edges and facets of polytopes (see Fig. A.2).
Every vector w ∈ Rn can be used to define a face of a compact convex set C ⊂ Rn given
by
Fw := v ∈ C : w · v = max w · u . (A.13)
u∈C
To show that this set is indeed a face of C, observe first that Fw is non-empty since C is a
compact set. Now, let v = tv1 + (1 − t)v2 where v ∈ Fw , t ∈ (0, 1), and v1 , v2 ∈ C. Then,
by definition
max w · u = w · v = tw · v1 + (1 − t)w · v2
u∈C
⩽ t max w · u + (1 − t) max w · u (A.14)
u∈C u∈C
= max w · u .
u∈C
Hence, the inequality above must be an equality which can only hold if both w · v1 =
maxu∈C w · u and w · v2 = maxu∈C w · u. That is, v1 , v2 ∈ Fw .
Exercise A.2.1. Show that if v ∈ Fw then any vector v′ ∈ C with the property that
(v − v′ ) · w = 0 (A.15)
is also in Fw .
In other words, an extreme point is a point that cannot be expressed as tv + (1 − t)w, for
some t ∈ (0, 1) and two distinct vectors v, w ∈ C (i.e. v ̸= w). Observe that by definition if
a convex set C = {v} consists of a single vector v ∈ Rn then v is an extreme point of C.
Krein–Milman theorem
Theorem A.3.1. Every compact convex set of Rn equals to the closed convex hull
of its extreme points.
Remark. The theorem above indicates the significance and importance of extreme points
in convex analysis. The theorem implies in particular that the set of extreme points of a
compact convex set in Rn is non-empty. In its proof below we make use of the Zorn’s lemma
from set theory.
Proof. Let C ⊆ Rn be a non-empty compact convex set. We first prove that the set of extreme
points of C is non-empty. If C consists of a single vector then we are done. Otherwise, let
v1 , v2 ∈ C be two distinct vectors (i.e. v1 ̸= v2 ). From the hyperplane separation theorem
(see Theorem A.1.1) there exists a vector w1 ∈ Rn such that w1 · v1 > w1 · v2 . This implies
that the face Fw1 of C does not contain the point v2 (see the definition of Fw in (A.13)).
We next apply the same procedure to Fw1 . Specifically, if this set contains a single point
then that point is an extreme point, and from Exercise A.3.1 it is also an extreme point
of C so that we are done. Otherwise, the face Fw1 contains two vectors (that are not the
same) that can be separated by a hyperplane with a normal vector w2 . Hence, the face
Fw2 := {v ∈ Fw1 : w2 · v = maxu∈Fw1 w2 · u} of Fw1 does not contain one of the two vectors.
Continuing in this way, if the process does not stop at some step j for which Fwj contains a
single point (and therefore it must be an extreme point), then we get an infinite sequence of
faces {Fwj }∞
ȷ=1 that are ordered by strict inclusion
Such a sequence of compact closed convex sets has a minimal element (Zorn’s lemma) which
we denote by F. From the Exercise A.3.2 below, it follows that F is itself a face of C.
Therefore, if it contains more than one point then we can continue with the same procedure
Carathéodory’s Theorem
Theorem A.3.2. Let K be a subset of Rn . If v ∈ Conv(K) then v can be written as
a convex combination of at most n + 1 elements of K.
If m ⩽ n+1 then we are done. Otherwise, m > n+1 so that the vectors w2 −w1 , . . . ., wm −w1
must be linearly dependent (since there are m − 1 > n of them). Let λ2 , . . . , λm ∈ R be
m − 1 numbers, not all zero, such that
m
X
λx (wx − w1 ) = 0 . (A.20)
x=2
P
Observe that since x∈[m] λx = 0 the set {λx }x∈[m] contains at least one strictly positive
number (as we assume that not all of them are zero). We can therefore define
px
µ := min : λx > 0 , x ∈ [m] . (A.22)
λx
By definition, µ has the property that qx := px − µλx ⩾ 0 for all x ∈ [m]. Observe also
T
P
that x∈[m] qx = 1 so that q = (q1 , . . . , qm ) is a probability vector. In addition, from
the definition of µ, there exists at least one y ∈ [m] (the minimizer of (A.22)) such that
qy = py − µλy = 0. Without loss of generality suppose that y = m. We then get that the
convex combination
X X X
qx wx = qx w x = (px − µλx )wx
x∈[m−1] x∈[m] x∈[m]
X (A.23)
(A.21)→ = px w x = v .
x∈[m]
Exercise A.3.4. Let C ∈ Rn be a compact set (i.e., closed and bounded). Show that it’s
convex hull, Conv(C), is also compact. Hint: Use Carathéodory’s theorem.
A.4 Polyhedrons
Polyhedron
Definition A.4.1. Let r1 , . . . , rm ∈ Rn be m vectors, and c1 , . . . , cm ∈ Rm . The set
n
C := v ∈ Rn : v · rx ⩽ cx ∀ x ∈ [m] . (A.25)
The extreme points of polyhedrons are called vertices and our next goal is to characterize
them. Intuitively, one would expect that an extreme point e of the polyhedron C as defined
above should saturate some of inequalities given in (A.25). That is, we would expect that
e · rx = cx at least for some x ∈ [m]. The following theorem makes this intuition rigorous.
Proof. Let e ∈ C and suppose first that span{K} ̸= Rn . Then, there exists a vector v ∈ Rn
such that v · rx = 0 for all rx ∈ K. Since e ∈ C, for all rx ̸∈ K we must have rx · e < cx .
Thus, for sufficiently small ε > 0 we have for all x ∈ [m]
It is natural to ask what is the relationship between polytopes and polyhedrons. Re-
markably, if a polyhedron is bounded then it is a polytope.
Corollary A.4.1. Convex polyhedrons have a finite number of extreme points and if
they are bounded then they are polytopes (i.e. they are convex hulls of finitely many
vertices).
Proof. Let C be a polyhedron as in (A.25). From the Theorem A.4.1 we know that e is an
extreme point of C if and only if e is a solution to the linear system of equations, rx · e = cx ,
where x is running over all x ∈ [m] such that rx ∈ K. Since span{K} = Rn , the solution
to each such linear system of equations is unique, and moreover, since |K| ⩾ n there can
be no more than n extreme points ( m
m
n
is the number of n distinct vectors that can be
chosen from the set {r1 , . . . , rm }). Hence, polyhedrons have a finite number of extreme
points. Now, if C is also bounded it must be compact since convex polyhedrons are closed
(see Exercise A.4.1). Hence, in this case, from Krein–Milman theorem (i.e. Theorem A.3.1)
C is the convex hull of its extreme points. Since we proved that C has a finite number of
vertices, C must be a polytope.
From its definition, it is clear that if v ∈ A then A = A. The relevance of affine subspaces
to our study here is that shifting a subspace by a fixed vector does not change any of the key
properties of convex sets. Therefore, many of the theorems already covered in this chapter,
can be generalized in a straightforward manner to incorporate affine subspaces. For example,
in Theorem A.4.1 we assume that the polyhedron C is in Rn . Clearly, since all n-dimensional
vectors spaces over R isomorphic to Rn we can replace Rn with any n dimensional subspace
A of some vector space V , and moreover the theorem still holds if we replace A with an
affine subspace A since by shifting a polyhedron by a fixed vector we do not change any of
its properties.
An affine subspace A has the property P that for any v1 , . . . , vm ∈ A and any m real
numbers t1 , . . . , tm ∈ R that satisfies x∈[m] tx = 1 we have
X
tx vx ∈ A . (A.30)
x∈[m]
Note that the coefficients {tx } can be negative (hence, in general, they do not form a prob-
ability vector).
As an example of an affine subspace, consider the subspace A ⊂ Rn×n consisting of all
the n × n real matrices whose rows and columns sum to zero. That is, N = (νxy ) ∈ A if and
only if X X
νx′ y = νxy′ = 0 ∀ x, y ∈ [n] . (A.31)
x′ ∈[n] y ′ ∈[n]
1. Show that A above is indeed a subspace, and show that |A| = (n − 1)2 .
The affine subspace A as defined in (A.32) contains the set of all doubly stochastic
matrices. A doubly stochastic matrix is an n × n matrix whose components are non-negative
and has the property that the entries of each row and column sums to one. The set of all
n × n doubly stochastic matrices is a polytope in the real vector space Rn×n , and we will
denote it by Bn (after Birkhoff). Doubly stochastic matrices appear quite often in several
resource theories.
1. Show that Bn is indeed a polytope. Hint: Show first that it is a bounded polyhedron
in A as defined in (A.25) (with the dot product replaced by the Hilbert Schmidt inner
product) and then use Corollary A.4.1.
2. Show that any permutation matrix is an extreme point of Bn . Recall that the entries
in each row or column of a permutation matrix consists of zeros except for one entry
being equal to 1.
The exercise above states that any permutation matrix is a vertex of Bn . It turns out
that there are no other vertices for Bn .
Proof. We will prove the theorem by induction. The case n = 1 is trivial, so we assume now
that n > 1 and that the theorem holds for (n − 1) × (n − 1) doubly stochastic matrices. We
will denote by A the affine subspace (A.32). Therefore, M := (µxy ) ∈ A is in Bn if and only if
its entries satisfies µxy ⩾ 0. These n2 inequalities defines the polyhedron Bn . Now, according
to Theorem A.4.1, if M is an extreme point then the total number of equalities µxy = 0 must
2
P
be at least |A| = (n − 1) (see first part of Exercise A.5.1). Now, since y∈[n] µxy = 1 for all
x ∈ [n], M cannot contain a row (or column) with all zeros. On the other hand, suppose each
row of M has at least two non-zero components. In this case, the number of zero components
of M would not exceed n(n − 2) which is strictly smaller than (n − 1)2 (so that this case is
not possible). We therefore conclude that at least one of the rows, say the y-row, has exactly
one non-zero component, say the x-component of the y-row. This (x, y)-component must be
equal to 1 since the row sums to 1. This in turn implies that in the y-column, except for
the x-component, all the other components are zero. Therefore, crossing out the x-row and
y-column results with an (n − 1) × (n − 1) doubly stochastic matrix that is also an extreme
point. The proof is then concluded by the induction assumption.
Exercise A.5.3. Show that any n × n doubly stochastic matrix can be expressed as a convex
combination of m ⩽ (n − 1)2 + 1 permutation matrices. Hint: Use the above arguments in
conjunction with Carathéodory’s Theorem.
Note that irrespective of the set C, the polar C◦ is always a closed convex set that contains
the zero vector. It is also straightforward to check that the polar of Rn is the zero vector
and the polar of the set consisting only of the zero vector is the whole space Rn .
Note that from the third part of the exercise above we see that the polar of a polytope
is a polyhedron.
Proof. The part C ⊆ (C◦ )◦ was given in the exercise above. We therefore prove here that
(C◦ )◦ ⊆ C. Suppose by contradiction that there exists a vector w ∈ (C◦ )◦ that is not in C.
Then, since C is closed convex set, from the hyperplane separation theorem (see Theorem A.2)
there exists a vector r ∈ Rn and a constant c ∈ R such that w · r > c > v · r for all v ∈ C.
Since the zero vector belongs to C, by taking v = 0 we get c > 0. Therefore, defining s := 1c r
we conclude that both w · s > 1 and 1 > v · s for all v ∈ C. The latter implies that s ∈ C◦ ,
but then the former implies that w ̸∈ (C◦ )◦ in contradiction with our assumption. This
completes the proof.
Exercise A.6.2. Let ε > 0 and Bε (0) := {v ∈ Rn : ∥v∥ ⩽ ε} be a ball of radius ε (in the
Euclidean norm). Show that
Bε (0)◦ = B1/ε (0) . (A.36)
Proof. Without loss of generality we assume that the interior of C is not empty, and further-
more we assume that the zero vector is in the interior of C (otherwise, we can shift C so that
the origin of the coordinate system is in its interior). Therefore, there exists ε > 0 such that
Bε (0) ⊂ C. From Exercise A.6.1 this implies that
where the last equality follows from Exercise A.6.2. That is, C◦ is a bounded polyhedron.
From Corrolary A.4.1 it must be a polytope.
Every hyperplane separates Rn into two half spaces. A closed half-space of Rn is therefore
the set of all vector v ∈ Rn that satisfies n · v ⩽ c for some fixed c ∈ R and a fixed (normal)
vector n ∈ Rn . Therefore, a convex polyhedron as defined in (A.25) can be viewed as the
intersection of finitely many half-spaces. Similarly, in the following theorem we show that
a convex polytope can also be expressed as the intersection of finitely many half-spaces
(the half-spaces that are defined by its facets); see Fig. A.3. This means in particular that
every polytope is a polyhedron (recall that the converse of this assertion is also true if the
polyhedron is bounded).
Remark. The condition that C contains the zero vector is just for convenience and in fact
unnecessary. Specifically, if 0 ̸∈ C then the theorem still hold if we replace the equations
sx · v ⩽ 1 with sx · v ⩽ rx , where rx are some real numbers (see Exercise A.6.3).
Proof. From Theorem A.6.2, the polar of C is itself a polytope. Therefore, there exists
k ∈ N, and s1 , . . . , sk ∈ Rn , such that C◦ = Conv{s1 , . . . , sk }. From the bipolar theorem
(Theorem A.6.1) we get
C = (C◦ )◦
n o (A.40)
(A.35)→ = v ∈ Rn : v · sx ⩽ 1 ∀ x ∈ [k] .
Exercise A.6.3. Show that the theorem above still holds even if 0 ̸∈ C as long as the
equations sx · v ⩽ 1 are replaced with sx · v ⩽ rx , where rx are some real numbers
v′ · s ⩾ v · s ∀ v′ ∈ C . (A.41)
Exercise A.6.4. Prove the supporting hyperplane theorem above. Hint: Use Theorem A.2.
is a sublinear functional. One of the most useful facts about support functions is the following
theorem.
Proof. The direction that C1 ⊇ C2 implies fC1 ⩾ fC2 follows trivially from the definition. On
the other hand, if C1 ̸⊇ C2 then there exists a vector r ∈ C2 such that r ̸∈ C1 . Hence, the
sets {r} and C1 are two disjoint closed compact convex sets of Rn . From the hyperplane
separation theorem (see Theorem A.2) there exists c ∈ R and a vector n ∈ Rn such that
n · r′ < c < n · r ∀ r′ ∈ C1 (A.45)
Taking the maximum over r′ ∈ C1 gives
fC1 (n) < c < n · r ⩽ fC2 (n) . (A.46)
Hence, fC1 ̸⩾ fC2 . This completes the proof.
Lemma A.7.1. Let C1 and C2 be two compact convex sets of Rn . Then, their
support functions satisfy
It is simple to check (see Exercise A.8.1) that K∗ is both closed and convex.
2. Show that if K1 , K2 ⊆ A are two cones such that K1 ⊆ K2 then K∗2 ⊆ K∗1 .
Example. Let A be a Hilbert space and consider the space Herm(A). Recall that Herm(A)
represents the (real) vector space of all Hermitian matrices acting on a Hilbert space A.
Since Herm(A) ∼ = Rn , with n := |A|2 , the definition of a cone and dual cone can be applied
to the vector space Herm(A). An important example of a cone in this space is the cone of
positive semidefinite matrices, K := Pos(A). This is a cone since if Λ ∈ Herm(A) is positive
semidefinite, i.e. Λ ⩾ 0, then also tΛ ⩾ 0 for all t ⩾ 0. Intrestingly, this cone is a self dual
cone in the sense that K∗ = K (see the exercise below).
Exercise A.8.2. Let A be a Hilbert space and consider the space Herm(A).
1. Show that the cone of positive semidefinite matrices is a self dual cone.
2. Show that the dual cone of the whole space K := Herm(A) is K∗ = {0} where 0 is the
zero matrix in Herm(A).
Theorem A.8.1. Let K ⊆ A be a cone in a Hilbert space A. Then, K∗∗ is the closer
of the smallest convex cone containing K. In particular, if K is a closed convex cone
then K∗∗ = K.
Proof. Let C be the closer of the smallest convex set containing K. By the definition of a
dual cone in (A.49), if w ∈ K then for all v ∈ K∗ we must have w · v ⩾ 0. On the other
hand,
K∗∗ := {u ∈ A : u · v ⩾ 0 for all v ∈ K∗ } . (A.50)
Therefore, if w ∈ K we must have w ∈ K∗∗ so that K ⊆ K∗∗ . Since K∗∗ is a closed convex set
it must contain C. Now, suppose by contradiction that the inclusion C ⊆ K∗∗ is strict. That
is, there exists v ∈ K∗∗ that is not in C. Then, from the hyperplane separation theorem (see
Theorem A.2) there exists a vector w ∈ Rn such that
Since the zero vector belongs of C we have in particular that µ ⩽ 0 and w · v < 0. We argue
next that µ must be zero. Otherwise, µ < 0 so that there exists r ∈ C with w · r < 0. But
since C is a cone also tr ∈ C for any t > 0 so we get from the definition of µ that µ ⩽ w · (tr)
which goes to −∞ as t → ∞. This is not possible since according to (A.51) µ is bounded
from below. We therefore conclude that µ = 0. This in turn implies that w ∈ C∗ ⊆ K∗
(where we used the second part of Exercise A.8.1 in conjunction with the fact that K ⊆ C).
However, recall that v ∈ K∗∗ which implies in particular that v · w ⩾ 0, in contradiction
with w · v < 0. Therefore, our initial assumption that v ̸∈ C was incorrect. This completes
the proof.
Remark. The primal problem above has been expressed with respect to two vector spaces
of Hermitian matrices V1 and V2 since these are what we typically encounter in quantum
physics. However, everything that we will discuss in this section is also applicable for any
finite dimensional abstract Hilbert spaces V1 and V2 by replacing the Hilbert-Schmidt inner
product above with the inner product of the vector space V1 .
as the primal problem if we take K2 in the primal problem to be the cone consisting only of
the zero matrix.
A.9.1 Duality
Every primal CLP optimization problem has a dual problem. The dual problem of the primal
CLP problem given in (A.52) is given as follows.
Any ζ ∈ K∗2 that satisfies H1 − N ∗ (ζ) ∈ K∗1 is called a dual feasible plane. If there are no
dual feasible planes than by convention β := −∞.
Exercise A.9.1. Show that with the notations of (A.56), the dual problem can be expressed
as
Find β := sup Tr [ζH2 ]
Subject to H̃1 − Ñ ∗ (ζ) ∈ K∗ and ζ ∈ V2
The significance of the dual problem is that quite frequently α = β. We start first by
showing that α ⩾ β.
Weak Duality
Lemma A.9.1. For any primal feasible plane η, and dual feasible plane ζ, we have
That is, α ⩾ β.
Proof. Let η and ζ be as in the lemma. Since H1 − N ∗ (ζ) ∈ K∗1 and η ∈ K1 we have from
the definition of a dual cone that the inner product
Tr H1 − N ∗ (ζ) η ⩾ 0 .
(A.59)
This inequality can be expressed as
Tr[H1 η] ⩾ Tr[ζN (η)] . (A.60)
On the other hand, since N (η) − H2 ∈ K2 , and ζ ∈ K∗2 we have that the inner product
Tr N (η) − H2 ζ ⩾ 0 . (A.61)
The above inequality can be expressed as
Tr[ζN (η)] ⩾ Tr [ζH2 ] . (A.62)
Combining (A.60) and (A.62) produce (A.58). This completes the proof.
Exercise A.9.2. Let η ∈ K1 be such that N (η) − H2 ∈ K2 , and let ζ ∈ K∗2 be such that
H1 − N ∗ (ζ) ∈ K∗1 . Show that if in addition
Tr H1 − N ∗ (ζ) η = Tr N (η) − H2 ζ = 0
(A.63)
then α = β.
Exercise A.9.3. Show that if α = −∞ there are no dual feasible planes, and if β = +∞
there are no primal feasible planes.
The following theorem, known also as the strong duality theorem, is the key result of this
section that we will use quite often in the book. It provides a sufficient condition for α = β
to hold. We will use the notation int(K) to denote the interior of a cone K.
Strong Duality
Theorem A.9.1. We have α = β if one of the following two conditions hold:
1. K1 and K2 are closed convex cones and there exists a primal feasible plane.
It turns out that in all the problems that we will consider in this book these mild condi-
tions (also known as Slater’s conditions) will hold so that α = β.
Proof. If α = −∞ then from Exercise A.9.3 there are no dual feasible planes so that by
convention β = −∞. Hence, in this case α = β. We therefore consider now the case
α > −∞ (i.e. α is bounded from below) and prove the sufficiency of the first condition.
From (A.56) and Exercise A.9.1 it is sufficient to prove the theorem for the case K2 = {0}.
We will therefore denote by K := K1 and assume that it is closed. Consider the convex cone
n o
C := N (η), Tr[ηH1 ] : η ∈ K ⊂ V2 ⊕ R . (A.64)
Since the set K is closed, also the set C is closed in V2 ⊕ R (recall that we are working
in finite dimensions). Note that any η ∈ K that satisfies N (η) = H2 results with a point
(H2 , Tr[ηH1 ]) ∈ C. We therefore interested in the intersection of the cone C with the line
n o
L := (H2 , t) : t ∈ R . (A.65)
The intersection C ∩ L consists of points of the form {(H2 , Tr[ηH1 ])} over all primal feasible
planes η. This intersection is closed (since both L and C are closed), and is not empty since
there is a primal feasible plane. Moreover, since the set of numbers {Tr[ηH1 ]} over all primal
feasible planes η is bounded from below (recall α > −∞), there exists a feasible optimal
plane η0 such that α = Tr[η0 H1 ]. In the rest of the proof, η0 will denote this feasible optimal
plane.
From the Weak Duality Lemma we know that α ⩾ β. To show the converse, we will
show that for any ε > 0 we have β ⩾ α − ε so that we must have α = β. Set ε > 0 and
observe that from its definition, the point (H2 , α − ε) ̸∈ C. Therefore, from the hyperplane
separation theorem (Theorem A.2) there exists a hyperplane (ζ, s) ∈ V2 ⊕ R and a constant
c ∈ R such that
Note that on the left-hand side we have the inner product between (ζ, s) and (N (η), Tr[ηH1 ]) ∈
C, and on the right-hand side the inner product between (ζ, s) and (H2 , α − ε). Since we
can take η = 0 we must have c > 0. On the other hand, if we take η = η0 the left-hand
side becomes Tr[H2 ζ] + sα and when comparing it with the right-hand side we conclude that
1 s
s < 0. Moreover, since the rescaling ζ 7→ |s| ζ and s 7→ |s| does not change the inequalities
above, we can assume without loss of generality that s = −1. Therefore, since c > 0 the
right-hand side of the equation above gives
It is therefore left to show that ζ is a dual feasible plane so that β, which is defined as the
supremum of Tr[ζH2 ] over all dual feasible planes, is also greater than α − ε. Indeed, since
K is a cone, we must have
Otherwise, if for some η ∈ K the left-hand side above is positive, then the inequality on the
left-hand side of (A.66) (with s = −1) will be violated for tη with t a positive real number
that is sufficiently large. The equation above can be expressed as
h i
Tr η H1 − N ∗ (ζ) ⩾ 0 ∀η∈K, (A.69)
which is equivalent to H1 −N ∗ (ζ) ∈ K∗ . Hence, ζ is a dual feasible plane. This completes the
proof of the sufficiency of the first condition. For the second condition see Exercise A.9.4.
Exercise A.9.4. Prove the sufficiency of the second condition (slater’s condition) in the
theorem above. Hint: Define C as in the proof above but with int(K) replacing K, and use the
version in (A.8) of the hyperplane separation theorem.
f (v) = v . (A.70)
In quantum information this theorem is typically used for functions from density matrices
to density matrices. One example of such linear functions are quantum channels. However,
observe that the theorem above holds for all continuous functions (not only linear ones).
Exercise B.0.1. Let M ∈ Cn×n be a square complex matrix, and let I be an interval in R
containing the eigenvalues of M M ∗ . Show that for any function f : I → R we have
M f (M ∗ M ) = f (M M ∗ )M . (B.1)
1. We say that f is operator monotone if for every Hilbert space A and any
η, ζ ∈ Herm(A) that satisfies η ⩾ ζ we have f (η) ⩾ f (ζ).
845
846 APPENDIX B. OPERATOR MONOTONICITY AND OPERATOR CONVEXITY
Exercise B.1.2. Show that the function f (r) = a + br (defined on any interval) is operator
monotone for any a ∈ R and b ⩾ 0. Show that it is operator convex on any a, b ∈ R.
Exercise B.1.3. Let f1 , f2 : I → R be two real functions and define for any r ∈ I, f (r) :=
af1 (r) + bf2 (r) for some fixed non-negative real numbers a, b ∈ R+ .
In this book we will only work with continuous functions. In this case, the condition (B.2)
for operator convexity can be replaced with a more special condition in which we take t = 21 .
Proof. Clearly, if f satisfies (B.2) then it satisfies (B.5). We therefore show that (B.5)
implies (B.2). Let η, ζ ∈ Herm(A) and suppose (B.5) holds. Observe that for t = 1/4 we get
1 3 1 1 1 1
f η+ ζ =f η+ ζ + ζ
4 4 2 2 2 2
1 1 1 1
(B.5)→ ⩽ f η + ζ + f (ζ)
2 2 2 2 (B.6)
1 1
(B.5)→ ⩽ f (η) + f (ζ) + f (ζ)
4 2
1 3
= f (η) + f (ζ).
4 4
Hence, the condition (B.2) holds for t= 41 and t = 34 . Similarly, by repetition (e.g. taking
convex combinations 12 η + 12 14 η + 43 ζ , etc) it follows that (B.2) must hold for all dyadic
rationals, i.e. numbers of the form t = 2mn where n ∈ N is arbitrary and m is any integer in
[2n ]. Since the set of such dyadic rationals is dense in [0, 1], it follows from the continuity of
f that (B.2) holds for all t ∈ [0, 1]. This completes the proof.
Exercise B.1.4. Use the lemma above to prove that the function f (t) = t2 is operator
convex on any interval. Hint: Show that the difference between f (η)+f (ζ) η+ζ
2
and f 2
can be
expressed as a square of an Hermitian matrix.
The above inequality implies in particular that the maximal eigenvalue of the complex matrix
1 1
N cannot exceed one. Observe that the matrix N is similar to the matrix η − 4 N η 4 =
1 1 1
η − 4 ζ 2 η − 4 which is Hermitian. Since similar matrices has the same eigenvalues we conclude
1 1 1 1 1 1
that I ⩾ η − 4 ζ 2 η − 4 . Conjugating both sides by η 4 we conclude that η 2 ⩾ ζ 2 . The case
that η is not strictly positive (but still positive semidefinite) follows from the fact that η ⩾ ζ
implies that η +εI ⩾ ζ for any ε > 0. Hence, since η +εI > 0 we conclude from the argument
1 1
above that (η + εI) 2 ⩾ ζ 2 . Since this inequality holds for√all ε > 0 it must also hold for
ε = 0. This completes the proof that the function f (r) = r is operator monotone in the
domain [0, 1].
It is possible to show that for any α ∈ [0, 1] this function is operator monotone on the
domain [0, ∞). In Table B.1 we summarized everything that is known in literature about
the operator monotonicity and convexity of the function f (r) = rα . In the section ‘History
and further readings’ we give more information about where the proofs can be found.
Other important examples of functions that appears a lot in applications are the log
function f (r) = log(r) defined on the interval (0, ∞) as well as the function f (r) = −r log r.
The former is known to be both operator concave and operator monotone, while the latter
is known to be operator concave.
where f : R → R. Such functions appear in many applications, and we will see later on
that certain key quantities in quantum information, such as entropies and relative entropies,
are defined in terms of trace functions. For our purposes, we will always assume that f is
continuous.
Use the divided difference approach discussed in Appendix D.1 to show that the function g(t)
is continuously differentiable and
2. If the function f (t) is convex in R then the function η 7→ Tr[f (η)] is convex in
η ∈ Herm(A).
Proof. Part 1. Suppose first that f is differentiable so that f ′ (t) ⩾ 0 for all t ∈ R. Under
this assumption we have that f ′ (ξ) ⩾ 0 for any ξ ∈ Herm(A) (note that ξ does not have to
be positive semidefinite). Let η, ζ ∈ Herm(A) be such that η ⩾ ζ. We need to show that
Tr[f (η)] ⩾ Tr[f (ζ)]. Set ρ := η − ζ and observe that ρ ∈ Pos(A). For any t ∈ [0, 1] define
the function
g(t) := Tr [f (ζ + tρ)] , (B.12)
so that g(0) = Tr[f (ζ)] and g(1) = Tr[f (η)]. Therefore,
Z 1
g(1) − g(0) = g ′ (t)dt
Z0 1
Exercise B.3.1→ = Tr [ρf ′ (ζ + tρ)] dt (B.13)
0
Z 1
Tr ρ1/2 f ′ (ζ + tρ) ρ1/2 dt
ρ⩾0 −−−−→ =
0
Finally, since f ′ (ζ + tρ) ⩾ 0, also ρ1/2 f ′ (ζ + tρ) ρ1/2 ⩾ 0 so that the integrand on the right-
hand side of the equation above is non-negative. Hence, g(1) ⩾ g(0). This completes the
proof for the case that f is differentiable. The proof of the case that f is only continuous (but
not necessarily differentiable) follows from continuity by taking a sequence of continuously
differentiable functions whose limit is f (such a sequence P always exists).
Part 2. Consider the spectral decomposition of η = x∈[m] λx Πx , where Πx := |x⟩⟨x|,
{|x⟩}x∈[m] form an orthonormal eigenbasis of A, and each λx ∈ R. Let {|ψy ⟩}y∈[m] be another
orthonormal basis of A. Then,
XD X E X X
Tr[f (η)] = ψy f (λx )Πx ψy = f (λx )⟨ψy |Πx |ψy ⟩
y∈[m] x∈[m] y∈[m] x∈[m]
X X (B.14)
f is convex→ ⩾ f λx ⟨ψy |Πx |ψy ⟩ .
y∈[m] x∈[m]
Now, let t ∈ [0, 1], η, ζ ∈ Herm(A), and {|ψy ⟩}y∈[n] be an orthonormal basis of A consisting
of the eigenvalues of tη + (1 − t)ζ. For these choices we get
X
Tr f (tη + (1 − t)ζ) = ⟨ψy |f (tη + (1 − t)ζ)|ψy ⟩
y∈[m]
X
|ψy ⟩ is an eigenvector
of tη+(1−t)ζ
−−−−→ = f ψy tη + (1 − t)ζ ψy
y∈[m]
X
= f t⟨ψy η|ψy ⟩ + (1 − t)⟨ψy |ζ|ψy ⟩ (B.16)
y∈[m]
X X
f is convex→ ⩽ t f ⟨ψy |η|ψy ⟩ + (1 − t) f ⟨ψy |ζ|ψy ⟩
y∈[m] y∈[m]
Exercise B.3.2. Let K ∈ L(A), α ∈ (0, ∞), and define the function f : Pos(A) → R via
where U := V1 U2 and V := V2 U1 are two unitary matrices. For any k ∈ [n] let
X
Πk := |x⟩⟨x| , (B.20)
x∈[k]
with the convention that µn+1 = νn+1 = 0. Denoting by ak := µk − µk+1 and bk := νk − νk+1 ,
and using the triangle inequality we get that
X X
Tr[D1 U D2 V ] ⩽ ak bℓ Tr[Πk U Πℓ V ] and Tr[D1 D2 ] = ak bℓ Tr[Πk Πℓ ] . (B.22)
k,ℓ∈[n] k,ℓ∈[n]
Therefore, the proof will be concluded by showing that for each k, ℓ ∈ [n]
X X
Tr[Πk U Πℓ V ] = ⟨x|U Πℓ V |x⟩ ⩽ ⟨x|U Πℓ V |x⟩ ⩽ k , (B.24)
x∈[k] x∈[k]
where the last inequality follows from the fact that ⟨x|U Πℓ V |x⟩ ⩽ 1 (see Exercise B.3.3).
Since Tr[Πk Πℓ ] = k the equation above implies (B.23). This completes the proof.
Proof. Since M is Hermitian we can work in its eigenbasis so that without loss of gener-
ality we will assume that M = D1 := Diag(µ1 , . . . , µn ) is a diagonal matrix. We will also
decompose N = U D2 U ∗ , where D2 := Diag(ν1 , . . . , νn ), and U is unitary. Thus,
X
Tr[M N ] = Tr [D1 U D2 U ∗ ] = ak bℓ Tr [Πk U Πℓ U ∗ ] , (B.27)
k,ℓ∈[n]
where ak , bℓ , and Πk , are the same as in the proof of the von-Neumann trace inequality above.
Note that while the eigenvalues {µk } and {νk } can be negative, the differences ak := µk −µk+1
and bk := νk − νk+1 are non-negative for all k ∈ [n]. Combining this with (B.23) and the
equation above we conclude that
X X
Tr [D1 U D2 U ∗ ] ⩽ ak bℓ Tr [Πk Πℓ ] = Tr[D1 D2 ] = µ x νx . (B.28)
k,ℓ∈[n] x∈[n]
Finally, observe that Tr[D1 D2 ] equals to the left-hand side of (B.26). This completes the
proof.
Remark. Note that for |A| = 1 and isometry V = |ψ⟩ ∈ B one obtains the more familiar
Jensen’s inequality
f ⟨ψ|ρ|ψ⟩ ⩽ ⟨ψ|f (ρ)|ψ⟩ . (B.31)
In this case, it is sufficient to require that f is convex.
Proof. Suppose first that f is operator convex, and let V : A → B be an isometry. For
simplicity denote by m = |A| and n = |B|, and let U be a unitary matrix obtained from the
n × m isometry V by adding n − m columns to V . That is, U can be expressed as
U= V N (B.32)
where N is an n × (n − m) matrix.
M11 M12
Every matrix M ∈ L(B) can be expressed in a block matrix form as M = ,
M21 M22
where the block matrix M11 is m × m and the rest of the block matrices are such that M is
n × n. With this in mind, note that for any ρ ∈ Herm(B)
∗ ∗ ∗
V V ρV V ρN
U ∗ ρU = ρ V N =
. (B.33)
∗ ∗ ∗
N N ρV N ρN
Finally, we define a linear map E : Herm(B) → Herm(B) via
1 1
E(σ) = σ + ZσZ ∀ σ ∈ Herm(B) , (B.34)
2 2
Im 0
where Z = . A key property of this map is that it acts as a type of a dephasing
0 −In−m
map (in fact, it belongs to a type of quantum channels known as the pinching channels).
Particularly, note that
∗
V ρV 0
E(U ∗ ρU ) = ∀ ρ ∈ Herm(B) . (B.35)
∗
0 N ρN
This also implies that for any ρ ∈ Herm(B)
∗
f (V ρV ) 0
f E(U ∗ ρU ) =
. (B.36)
∗
0 f (N ρN )
With these notations we get from (B.36)
∗
∗
f V ρV = f E(U ρU )
11
1 ∗ 1 ∗
f is operator convex→ ⩽ f (U ρU ) + f (ZU ρU Z)
2 2
11
1 ∗ 1
U and UZ are unitaries→ = U f (ρ) U + ZU ∗ f (ρ) U Z = E U ∗ f (ρ)U
2 2 11 11
(B.35)→ = V ∗ f (ρ)V .
(B.37)
Therefore, f satisfies the condition given in (B.30).
We next assume that f satisfies (B.30) and use it to show that f is operator convex. Let
A be a Hilbert space, t ∈ [0, 1], and define V : A → A ⊕ A to be the matrix
1
t2 IA
V = 1
, (B.38)
A
(1 − t) 2 I
Remark. The condition that f (0) ⩽ 0 cannot be removed from the theorem above. This
condition is necessary for this version of Jensen’s inequality, since by taking |A| = 1 and
setting M1 = M2 = 0 in (B.41) we get that f (0) ⩽ 0.
Proof. Suppose first that f is operator convex, and let ρ, σp ∈ Herm(A), and M1 , M2 ∈
L(A) be such that M1∗ M1 + M2∗ M2 ⩽ I. Define M3 := I − M1∗ M1 − M2∗ M2 so that
∗
P
x∈[3] Mx Mx = I. Finally, denote by
M1 ρ 0 0
V := M2 and ω := 0 σ 0 . (B.42)
M3 0 0 0
Observe that the matrix V : A → A ⊕ A ⊕ A is an isometry since V ∗ V = x∈[3] Mx∗ Mx = I.
P
We therefore get
f (M1∗ ρM1 + M2∗ σM2 ) = f (V ∗ ωV )
(B.30)→ ⩽ V ∗ f (ω)V
(B.43)
(B.42)→ = M1∗ f (ρ)M1 + M2∗ f (σ)M2 + M3∗ f (0)M3
∗ ∗
f (0) ⩽ 0 −−−−→ ⩽ M1 f (ρ)M1 + M2 f (σ)M2 .
This matrix is the unitary extension of the isometry V as defined in (B.38). Particularly,
note that U and the isometry V in (B.38) satisfy the relation
1/2 A A A
t I 0 I 0
UΠ = = V 0A where Π := . (B.46)
(1 − t)1/2 I A 0 0A 0A
Remark. We exchanged the roll of σ and ρ from the original definition as it will be more
convenient in the context of quantum information to work with this definition.
1. f is operator convex.
2. #f is jointly convex.
Proof. We first prove the direction 1 ⇒ 2. Let ρ = tρ1 + (1 − t)ρ2 and σ = tσ1 + (1 − t)σ2
1 1
with t ∈ (0, 1), ρ1 , ρ2 ∈ Pos(A) and σ1 , σ2 ∈ Pos>0 (A). Define the matrices M1 := (tσ1 ) 2 σ − 2
1 1
and M2 := (1 − t)σ2 2 σ − 2 . Observe that these matrices form a generalized measurement;
i.e. M1∗ M1 + M2∗ M2 = I A . Moreover, in terms of these matrices we can express the term
1 1
σ − 2 ρσ − 2 in (B.50) as
1 1 −1 −1 −1 −1
σ − 2 ρσ − 2 = M1∗ σ1 2 ρ1 σ1 2 M1 + M2∗ σ2 2 ρ2 σ2 2 M2 (B.51)
Now, from Jensen’s operator inequality (B.41) it follows that
1
−1
1
−1
1
1 − −
f σ − 2 ρσ − 2 ⩽ M1∗ f σ1 2 ρ1 σ1 2 M1 + M2∗ f σ2 2 ρ2 σ2 2 M2 (B.52)
1 1 1 1 1
Conjugating both sides by σ 2 (·)σ 2 and recalling that M1 σ 2 = (tσ1 ) 2 and M2 σ 2 = (1 −
1
t)σ2 2 gives
ρ#f σ ⩽ tρ1 #f σ1 + (1 − t)ρ2 #f σ2 . (B.53)
That is, #f is jointly convex. For the direction 2 ⇒ 1 observe that for σ = I A we get
ρ#f σ = f (ρ) so that the convexity of f follows from the joint convexity of #f .
The Kobu-Ando operator mean can also be applied to operators on the vector space of
super operators consisting of all linear transformations from L(A) to itself. Particularly, in
the proof of the theorem below, for any ρ ∈ L(A) we will consider the linear operators
Observe that Lρ , Rρ : L(A) → L(A) are linear operators belonging to the Hilbert space
L(A → A).
Exercise B.5.1. Let ρ, σ ∈ L(A), α ∈ [0, ∞), and consider the left and right operators, Lρ
and Rσ as define above. Show that:
1. Commutativity; Lρ ◦ Rσ = Rσ ◦ Lρ .
L−1
ρ = Lρ−1 and R−1
ρ = Rρ−1 . (B.57)
4. If ρ ⩾ 0 then
Lαρ = Lρα and Rαρ = Rρα . (B.58)
f (ρ, σ) := Tr K ∗ ρα Kσ 1−α
(B.60)
is jointly concave.
Corollary B.6.1. Let η, ρ ∈ Pos(A) and α ∈ (0, 1). Then, the function
h 1i
ρ 7→ Tr ηρα η α (B.62)
is concave.
1 1
Young′ s inequality (2.75)→ ⩽ Tr[M p ] + Tr[N q ] (B.63)
p q
h 1 i
= αTr ηρα η α + (1 − α)Tr[σ] .
h
α
α1 i
Therefore, isolating the term Tr ηρ η gives
h 1 α 1−α 1 − α
α
α1 i
Tr Tr ηρ ησ
ηρ η −
⩾ Tr[σ] (B.64)
α α
Now, recall that the Young’s inequality achieves equality for N q = M p which is equivalent
1
to σ = ηρα η α . Combining this with the inequality above we conclude that
h 1i n1 1−α o
Tr ηρα η α = max Tr ηρα ησ 1−α − Tr[σ] . (B.65)
σ∈Pos(A) α α
Now, from Lieb’s theorem, the first term on the right-hand side is jointly concave in ρ and
σ, whereas the second term is linear in σ and in particular concave. Hence, this immediately
implies that the term on the left-hand side is concave in ρ (see Exercise B.6.1 below for more
details on this last assertion).
Exercise B.6.1. Let f : Pos(A) × Pos(A) → R be a jointly concave function. Show that the
function
g(ρ) := max f (ρ, σ) (B.66)
σ∈Pos(A)
Theorem B.7.1. For any two positive semidefinite matrices M, N ⩾ 0 (of the same
finite dimension) and any 0 ⩽ α ⩽ 1 the following inequality holds
1 h i
Tr M + N − M − N ⩽ Tr M 1−s N s .
(B.68)
2
Exercise B.7.1. Show that if (B.68) holds for all s ∈ [1/2, 1] then it must also hold for all
s ∈ [0, 1/2].
Proof. Since the term |M − N | can be expressed as |M − N | = 2(M − N )+ − (M − N ), the
inequality (B.68) is equivalent to
Tr(M − N )+ ⩾ Tr[M ] − Tr M 1−s N s .
(B.69)
The identity M − N = (M − N )+ − (M − N )− gives
M ⩽ M + (M − N )− = N + (M − N )+ . (B.70)
Combining the above inequality with the operator monotonicity of the function f (t) = ts for
s ∈ [0, 1] gives s
M s ⩽ N + (M − N )+ ∀ s ∈ [0, 1] . (B.71)
With this inequality at hand, we get
Tr[M ] − Tr M 1−s N s = Tr M 1−s (M s − N s )
h s i
(B.71)→ ⩽ Tr M 1−s N + (M − N )+ − N s
h 1−s s i
s
(B.71) with 1 − s→ ⩽ Tr N + (M − N )+ N + (M − N )+ − N (B.72)
h 1−s s i
= Tr[N ] + Tr(M − N )+ − Tr N + (M − N )+ N
see (B.73) below→ ⩽ Tr(M − N )+
where ρ ∈ Herm(A), σ ∈ Herm(B) and η ∈ C|A|×|B| . Then, the Schur complement of the
block σ of M is defined as the matrix
M/σ := ρ − ησ −1 η ∗ , (B.75)
where σ −1 is taken to be the generalized inverse if the inverse of σ does not exists 1 Similarly,
the Schur complement of the block ρ of M is defined as the matrix
Theorem B.8.1. Let M be the Hermitian block matrix given in (B.74). Then,
M ⩾ 0 if and only if at least one of the following two conditions holds:
1. ρ ⩾ 0 and M/ρ ⩾ 0.
2. σ ⩾ 0 and M/σ ⩾ 0.
Proof. We will show the equivalence of the second condition with M ⩾ 0. The main idea of
the proof is to define the matrix
A
I 0
L := , (B.77)
σ −1 η ∗ I B
1
The generalize inverse of a complex matrix σ is the matrix σ −1 that satisfies σσ −1 σ = σ.
Since the matrix above is positive semidefinite, its Schur complements is also positive semidef-
inite (i.e. we are using Theorem B.8.1 once again). Hence, in particular, the Schur comple-
ment
∗ −1
tη0∗ ρ−1 ∗ −1
0 η0 + (1 − t)η1 ρ0 η1 − tη0 + (1 − t)η1 tρ0 + (1 − t)ρ1 tη0 + (1 − t)η1 ⩾ 0 . (B.83)
Exercise B.8.2.
1. Show that the function f : Pos(A) → Pos(A) given by f (ρ) := ρ−1 for all ρ ∈ Pos(A)
is convex.
2. Show that the function f : L(A) → Pos(A) given by f (η) := η ∗ η for all η ∈ L(A) is
convex.
In this chapter we provide a relatively short review of group theory and representation
theory. We only review concepts from representation theory that are particularly useful for
applications in quantum information theory, and that we are using in this book. Therefore,
this section does not attempt to provide an extensive review of the exceedingly vast subject
of representation theory. Further, much of the material discussed here can be found in
standard textbooks on representation theory. Yet, a reader not familiar with groups and
their representations will find this section self-contained and sufficient for the understanding
of the material discussed in this book. Particularly, most of the material in this section is
used in the study of the resource theory of asymmetry (see Chapter 15).
C.1 Groups
As a very simple example of a group, consider the set of all integers in Z. This set
together with the ‘addition’ operation forms a group. That is, for any a, b ∈ Z we have
a + b ∈ Z and the + operation satisfies all the axioms in the definition above. In particular,
863
864 APPENDIX C. ELEMENTS OF REPRESENTATION THEORY
Note that a group homomorphism maps the identity element e1 ∈ G1 to the identity
element e2 ∈ G2 ; i.e. f (e1 ) = e2 . This in turn implies that a homomorphism satisfies
−1
f (g) = f (g −1 ) for all g ∈ G1 since
is a subgroup of G1 .
Exercise C.1.1. Prove that Im(f ) and Ker(f ) are indeed subgroups of G2 and G1 , respec-
tively.
In this book we will consider two types of groups, finite groups and Lie groups. Finite
groups are groups with a finite number of elements. For example, the set of all bijections from
a given finite set to itself form a group known as the permutation group (or symmetric group)
denoted by Sn . It is known (Cayley’s theorem) that every finite group G is isomorphic to a
subgroup of the symmetric group acting on the elements of G. Consequently, the symmetric
group plays an important role in various areas of theoretical physics.
Lie groups, on the other hand, are groups that are also smooth differentiable manifolds.
That is, a Lie group can be parametrized with a chart of local coordinates, and the smooth-
ness of the manifold means that for any g, h ∈ G the inversion map g 7→ g −1 and the
multiplication map (g, h) 7→ gh are smooth maps. Here are several examples of Lie groups
that are most popular in physics:
As a manifold, this group is isomorphic to the circle. Note also that the inversion of
a group element corresponds to θ 7→ 2π − θ which is clearly a smooth (differentiable)
map. Similarly, the composition of two group elements corresponds to the mapping
(θ1 , θ2 ) 7→ θ1 + θ2 mod 2π which is a differentiable map.
The case n = 3 corresponds to the group SO(3) which corresponds to rotations in R3 .
Each rotation in R3 can be described as a rotation by an angle θ ∈ [0, 2π) along some
axis of rotation. Let n ∈ R3 be the unit vector pointing in the direction of the axis of
rotation, and denote by w := cos(θ/2), and (x, y, z)T := sin(θ/2)n. Then, SO(3) is a
(n)
collection of all matrices Rθ that can be expressed as:
1 − 2y 2 − 2z 2 2xy − 2zw 2xz + 2yw
(n)
Rθ = 2xy + 2zw 1 − 2x2 − 2z 2 2yz − 2xw (C.7)
2xz − 2yw 2yz + 2xw 1 − 2x2 − 2y 2
(n)
It can be shown that if v ∈ R3 then Rθ v is a vector obtained from v after rotating
it by an angle θ along the axis of the direction n. The group SO(3) can also be
parametrized with the three Euler’s angles, as opposed to the axis parametrization
above.
2. The unitary group of degree n, denoted U (n), is the group of all n×n unitary matrices.
Note that the determinant can be viewed as a group homomorphism det : U (n) → U (1)
since any unitary matrix has a determinant equals to eiθ for some θ ∈ [0, 2π). Observe
that the kernel of this group consists of all n×n unitary matrices with determinant one.
This subgroup of U (n) is denoted by SU (n) and is called the special unitary group. In
quantum mechanics, the case n = 2 corresponds to rotations of 21 -spin particles and
therefore plays an important role in physics. This group be expressed as
a b
2 2
SU (2) := : |a| + |b | = 1 , a, b ∈ C (C.8)
−b̄ ā
s0 = cos α
s1 = sin α cos β
s2 = sin α sin β cos γ
s3 = sin α sin β sin γ . (C.9)
U = r0 I + i (r1 σ1 + r2 σ2 + r3 σ3 ) (C.10)
where σ1 , σ2 , and σ3 , are the three Pauli matrices defined in Exercise 2.3.19. Given
that r02 + r12 + r22 + r32 = 1, it is convenient to denote by cos(θ) := r0 and by n :=
√ 1 2 (r1 , r2 , r3 )T so that
1−r0
3. The general linear group, denoted GL(n, F) (in short GL(n) or GL(A), where A is
a Hilbert space of dimension |A| = n), is defined as the set of all n × n invertible
matrices. This set is a group under matrix multiplication. An important subgroup
of GL(n) that appears for example in multipartite entanglement theory, is the special
linear group SL(n). It consists of all n × n matrices with determinant one.
In the examples above, the groups SO(n), U (n), SU (n), are compact, whereas GL(n) or
the real line R for example are not compact. Compact Lie groups are the simplest examples
of continuous groups, and as such, plays an important role in numerous applications in
physics.
From the exercise above it follows that all the elements of SU(2) can be expressed as
iθ(n·σ)
e . It will be convenient (see the next exercise) to parametrize the elements of SU(2)
with the matrices
(n) θ
Tθ := e−i 2 (n·σ) (C.15)
where θ ∈ [0, 4π) and n ∈ R3 is a unit vector. Note that we divided θ by −2 to obtain the
following relation between SU(2) and SO(3).
1. Show that f is a group homomorphism. That is, given two unit vectors n1 and n2 , and
two rotation angles θ1 and θ2 ,
h θ1 θ2
i h θ1 i h θ2 i
f e−i 2 (n1 ·σ) e−i 2 (n2 ·σ) = f e−i 2 (n1 ·σ) f e−i 2 (n2 ·σ) . (C.17)
2. Show that f is 2 : 1 (two-to-one) and onto. That is, every element in SO(3) corre-
sponds exactly to two elements in SU(2). Hint: Denote w := cos(θ/2) and (x, y, z)T =
(n)
sin(θ/2)n, and use the fact that Rθ can be expressed as in (C.7).
Remark. The image of π in the definition above is a subset of L(A) consisting of |A| × |A|
invertible matrices. Therefore, one can replace L(A) in the definition above with the general
linear group GL(A). Note that since π is a homomorphism it follows that π(e) = I A . More-
over, note that for any group G and any Hilbert space A, there exists a group representation
π(g) := I A for all g ∈ G. This representation is called the trivial representation.
cos θ − sin θ 0 0
sin θ cos θ 0 0
θ 7→
(C.18)
0 0 cos θ sin θ
0 0 − sin θ cos θ
Clearly, the above representation has two proper subrepresentations of R4 . Each of these
two subrepresentations cannot be reduced further, so they are irreps. However, these two
irreps are equivalent.
Equivalent Representations
Definition C.2.2. Two representations or subrepresentations, π1 : G → L(A) and
π2 : G → L(B), are said to be equivalent if there exists an isomorphism η : A → B
(in particular, |A| = |B| and η is invertible) such that
If there is no such an intertwiner map η we say that the two representations are
inequivalent.
Note that in particular, since η is invertible, for each g ∈ G the matrix π1 (g) in (C.19)
is similar to the matrix π2 (g). In Fig. C.1 we drew a commutativity diagram describing the
equivalence of two representations that are related via (C.19). Note that the action of the
representation π1 on the Hilbert space A is mirrored by η to the Hilbert space B in which
it takes the form of π2 . Note also that the directions of all the arrows in the figure are
reversible.
Figure C.1: A commutativity diagram for two equivalent representations. Each arrow is reversible.
Schur’s Lemma
Theorem C.2.1. Let G be a group, and A1 and A2 be two Hilbert spaces. Also let
π1 : G → L(A1 ) and π2 : G → L(A2 ) be two irreducible representations of G, and
suppose there exists a complex matrix (linear transformation) T : A1 → A2 that is
equivalent under the action of G; that is, T π1 (g) = π2 (g)T for all g ∈ G. Then,
Proof. Part 1. The idea of the proof is to look at the kernel and image of T . Let |ψ⟩ ∈
Ker(T ). Then, from the commutativity property of T , for all g ∈ G
T π1 (g)|ψ⟩ = π2 (g)T |ψ⟩ = π2 (g)0 = 0 . (C.20)
That is, if |ψ⟩ ∈ Ker(T ) then also π1 (g)|ψ⟩ ∈ Ker(T ) for all g ∈ G. In other words, Ker(T )
is a G-invariant subspace of A1 . Now, recall that π1 is an irrep, and therefore since Ker(T )
is a G-invariant subspace of A1 we must have Ker(T ) = {0} or Ker(T ) = A1 .
Next, let |ϕ⟩ ∈ Im(T ). Then, there exists |ψ⟩ ∈ A1 such that T |ψ⟩ = |ϕ⟩. Therefore,
using the commutativity property of T we get that for all g ∈ G
That is, if |ϕ⟩ ∈ Im(T ) then also π2 (g)|ϕ⟩ ∈ Im(T ) for all g ∈ G. Hence, Im(T ) is a
G-invariant subspace of A2 , and since π2 is an irrep we must have either Im(T ) = {0} or
Im(T ) = A2 .
Combining everything, we conclude that there are two options: (1) Ker(T ) = {0} and
Im(T ) = A2 , or (2) Ker(T ) = A1 and Im(T ) = {0}. From Exercise 2.3.4 (1) can only
hold if A1 = A2 and T is invertible. This option is not possible since we assume in Part
1 that π1 and π2 are inequivalent. Option (2) on the other hand implies that T = 0 (see
Exercise 2.3.4). This completes the first part of the proof.
Proof of Part 2. The proof is based on the fundamental theorem of algebra that states that
every non-constant single-variable polynomial with complex coefficients has at least one com-
there exists λ ∈ C that is a root for the characteristic polynomial ofAT;
plex root. Therefore,
A
i.e. det T − λI = 0. This means that there exists a non-zero vector |ψ⟩ ∈ Ker T − λI .
We then get for all g ∈ G
Exercise C.2.1. Show that all the irreps (over a complex field) of an abelian group G are
1-dimensional.
The following theorem demonstrates that all representations of a finite group can be
decomposed into a direct sum of irreps.
Proof. If there are no proper (i.e. non-trivial) subrepresentation of A then π is itself an irrep
and the proof is done. Therefore, suppose A1 is a proper G-invariant subspace corresponding
to the proper subrepresentation π1 : G → L(A1 ) (i.e. π1 is subrepresentation of π). Let
P : A → A be the projection to the subspace A1 , and define the operator T : A → A as
1 X
T := π(g)P π(g −1 ) . (C.23)
|G| g∈G
1 X
T π(h) = π(g)P π(g −1 )π(h)
|G| g∈G
1 X
= π(g)P π(g −1 h)
|G| g∈G
1 X (C.24)
a := h−1 g −−−−→ = π (ha) P π(a−1 )
|G| a∈G
1 X
= π(h) π (a) P π(a−1 )
|G| a∈G
= π(h)T .
Moreover, we argue now that T is a projection. First, observe that if |ψ⟩ ∈ A1 also π(g)|ψ⟩ ∈
A1 since A1 is a G-invariant subspace. This in particular implies that P π(g)|ψ⟩ = π(g)|ψ⟩
so we conclude that for all |ψ⟩ ∈ A1
1 X
T |ψ⟩ = π(g −1 )P π(g)|ψ⟩
|G| g∈G
(C.25)
1 X 1 X
P π(g)|ψ⟩ = π(g)|ψ⟩ → −−−−→ = π(g −1 )π(g)|ψ⟩ = |ψ⟩ = |ψ⟩ .
|G| g∈G |G| g∈G
Second, observe that for any |ψ⟩ ∈ A (not necessarily in A1 ) we have T |ψ⟩ ∈ A1 . Combining
this with the above equation gives T 2 |ψ⟩ = T |ψ⟩ for all |ψ⟩ ∈ A. Hence, T 2 = T ; i.e.
T : A → A is a projection and an intertwiner. Therefore, both Im(T ) = A1 and A0 := Ker(T )
are G-invariant subspaces and we have A = A1 ⊕ A0 (as representations). Repeating the
process we can continue in this way to decompose A0 and A1 into direct sum of G-invariant
subspaces until we decompose A into a direct sum of irreducible G-invariant subspaces.
where ω(g, h) ∈ C with |ω(g, h)| = 1. The phase factor ω(g, h) is also called a cocycle.
Note that both S and T are unitary matrices. We define a projective unitary representation
W : G → L(Cn ) via
(p, q) 7→ Wp,q := S p T q ∀ (p, q) ∈ G . (C.35)
Since p and q are integers we have that Wp,q is a unitary matrix. Observe that
2πx 2πx
ST |x⟩ = ei n S|x⟩ = ei n |x + 1 (mod n) (C.36)
whereas
2π(x+1)
T S|x⟩ = T |x + 1 (mod n) = ei n |x + 1 (mod n) . (C.37)
Therefore, we conclude that
2π
ST = ei n T S . (C.38)
In the exercise below you show that {Wp,q } is a projective unitary representation of G. The
operators Wp,q are known as the Hiesenberg-Weyl operators.
Exercise C.3.2. Use the relation (C.38) to show that the mapping (p, q) 7→ Wp,q forms a
projective unitary representation of Zn × Zn . Find its cocycle.
(λ,x)
where for each x ∈ [mλ ] the map U (λ,x) : g 7→ Ug is an irrep belonging to the λ-equivalence
class.
For example, consider the unitary representation θ 7→ Uθ of SU (2) in R4 , where Uθ is the
(1) (2)
4 × 4 matrix given in (C.18). Clearly, we can express Uθ = Uθ ⊕ Uθ , where
(1) cos θ − sin θ cos θ sin θ
Uθ := and Uθ(2) := (C.42)
sin θ cos θ − sin θ cos θ
This is the direct sum decomposition into irreps of θ 7→ Uθ . Note that in this case we
have only one equivalence class, without loss of generality we can name it λ = 1, and this
(1) (2)
equivalence class contains two irreps given by Uθ and Uθ , so that the multiplicity of this
irrep is m1 = 2 (i.e. mλ=1 = 2).
As another example, consider the group U (1) and its representation θ 7→ Uθ , where
X
Uθ = eiθk |k⟩⟨k| . (C.43)
k∈[n]
Clearly, this representation already written as direct sum of irreps. Observe that for each k,
the map θ 7→ eiθk |k⟩⟨k| defines a 1-dimensional irrep of Uθ (recall that for abelian groups all
irreps are 1-dimensional; see Exercise C.2.1). In this case the equivalence class of irreps is
labeled by λ = k and the multiplicity mk = 1 for all k ∈ [n].
The following theorem slightly simplify the decomposition (C.41).
Ug ∼
M
= Ug(λ) ⊗ I Cλ , (C.45)
λ∈Irr(U )
(λ)
where Ug acts irreducibly on Bλ , and I Cλ is the identity matrix on Cλ .
Remark. The subspace Bλ is called the representation space, and the subspace Cλ is called
the multiplicity space. They are mathematical objects and we will think about them later
on as virtual subsystems. Moreover, the above decomposition of A means that there exists
an orthonormal basis {|λ, m, x⟩Aλ }λ,m,x whose elements are
where {|x⟩Cλ }x∈[mλ ] is an orthonormal basis of the multiplicity space Cλ , and {|λ, m⟩Bλ }dm=1
λ
Proof. We first argue that without loss of generality the intertwiner map between two irreps
(λ,x) (λ,x′ )
Ug and Ug in the decomposition (C.41)can be taken to be unitary. Indeed, by definition
(λ,x) (λ,x′ ) (λ,x)
of equivalent representations, if T is the intertwiner between Ug and Ug then Ug T =
′
(λ,x ) (λ,x) ′
(λ,x )
T Ug . Since both Ug and Ug are unitary matrices we must have
T ∗ T = T ∗ Ug∗(λ,x) Ug(λ,x) T
∗ ′ ′ (C.47)
= Ug (λ,x ) T ∗ T Ug(λ,x ) ∀g∈G.
(λ,x′ )
Therefore, since Ug is an irrep of G, from the second part of Schur’s Lemma (see The-
orem C.2.1) it follows that T ∗ T = λI for some λ ∈ C. Since T ∗ T > 0 (recall that T is
invertible) we can redefine T 7→ √1λ T so that the new T is unitary.
(λ)
Now, denote by Ug := Ug(λ,1) and by Tx(λ) the unitary intertwiner satisfying
Taking the direct sum over x ∈ [mλ ] on both sides of the equation above gives
M
Ug(λ,x) = T λ Ugλ ⊗ Imλ T ∗λ
(C.49)
x∈[mλ ]
(λ)
where Imλ is the mλ ×mλ identity matrix, T λ := ⊕x∈[mλ ] Tx is a unitary matrix, and Ugλ ⊗Imλ
is viewed as
Ugλ ⊗ Imλ = Ugλ ⊕ · · · ⊕ Ugλ . (C.50)
| {z }
mλ -times
Finally, taking the direct sum over all λ ∈ Irr(U ) on both sides of (C.49), we get that the
unitary matrix T := ⊕λ T λ satisfies
M M M
Ug(λ,x) = T Ugλ ⊗ Imλ T ∗ . (C.51)
λ∈Irr(U ) x∈[mλ ] λ∈Irr(U )
The proof is concluded with the identification of the subspaces Bλ and Cλ as the subspaces
on which Ugλ and Imλ act upon (with Imλ = I Cλ ).
Invariant states, often called symmetric states, plays an important role in physics, par-
ticularly in the resource theory of asymmetry. The following theorem provide a simple
characterization of such states with respect to the decomposition (C.44) of the underlying
Hilbert space.
1
where uBλ = |Bλ |
I Bλ is the maximally mixed (uniform) state on system Bλ , and
ρC
A A A
λ := TrBλ Π ρ Π , (C.53)
λ λ λ
Proof. We are working in a basis in which Ug has the form (C.45). Therefore, if ρA has
the form (C.52) then it clearly commutes with Ug for all g ∈ G so that ρA is G-invariant.
Conversely, suppose ρA is G-invariant, and denote
X
ρA = ρλλ′ where ρλλ′ := ΠAλ′ ρA ΠAλ . (C.54)
λ,λ′ ∈Irr(U )
Note that ρλλ′ is a linear map from Aλ → Aλ′ . Since ρ commutes with Ug for all g ∈ G it
follows immediately from the form (C.45) of Ug that
X
0 = [ρA , Ug ] =
ρλλ′ , Ug
λ,λ′
X ′
(C.55)
Ug(λ) Cλ
Ug(λ ) Cλ′
= ρλλ′ ⊗I − ⊗I ρλλ′
λ,λ′
Multiplying both sides of the equation above by ΠAλ′ from the right, and ΠAλ from the left,
we get for all λ and λ′
(λ) Cλ
(λ′ ) Cλ′
ρλλ′ Ug ⊗ I = Ug ⊗ I ρλλ′ . (C.56)
Now, by multiplying from the left both sides with I Bλ ⊗ T Cλ′ →Cλ , for some mλ × mλ′ matrix
T ∈ L(Cλ′ , Cλ ) and taking the partial trace over Cλ gives
′
ωλλ′ Ug(λ) = Ug(λ ) ωλλ′ where ωλλ′ := TrCλ I Bλ ⊗ T ρλλ′ .
(C.57)
Finally, from the first part of Schur’s lemma it follows that unless λ = λ′ we get ωλλ′ =
0. Since this holds for all T ∈ L(Cλ′ , Cλ ) we conclude from Exercise 2.3.31 that also
ρλλ′ = 0 for λ ̸= λ′ . Moreover,
from the second part of Schur’s lemma we get that
Bλ
ωλλ = TrCλ I ⊗ T ρλλ′ is proportional to the identity matrix for all T ∈ L(Cλ ). Hence,
from Exercise 2.3.30 we conclude that ρλλ = uBλ ⊗ ρC λ . This completes the proof.
λ
The theorem above apply to any operator ρ ∈ L(A). In this book we will only consider
G-invariant quantum states; i.e. G-invariant operators in D(A). For the case that ρ is a
pure quantum state we have the following corollary.
Proof. Taking ρ = ψ in (C.52), it follows that the right-hand side of (C.52) is a rank one
matrix only if and only if the direct sum consists of a single non-zero term, denoted by λ,
for which |Bλ | = 1. This completes the proof.
The definition above is consistent with what one would expect from a function that
quantify the volume or size of a region on a manifold. However, since we are interested here
in measures on Lie groups, we would like the measure also to be invariant under the action
of the group.
By definition of Lie groups, for a fixed group element h ∈ G, the map g 7→ hg is an
isomorphism between smooth manifolds (also known dif and only ifeomorphism). Such a
map transform any region S ⊆ G to the region hS := {hg : g ∈ G}. We then say that µ
is left-invariant if µ(hS) = µ(S) for all S ⊆ G and all h ∈ G. Similarly, we say that µ is
right-invariant if µ(Sh) = µ(S) for all S ⊆ G and all h ∈ G.
All groups have a left-invariant and right-invariant measures. This result is known as
Haar’s Theorem (the proof of Haar’s theorem goes beyond the scope of this book). For
compact groups these Haar measures are finite and unique up to a multiplicative constant.
If the two invariant-measures of a Lie group equals each other up to a multiplicative constant
then the group is said to be unimodular. All compact Lie groups are unimodular, and also
many non-compact groups that appear in applications in physics are unimodular. In this
book we will only consider unimodular Lie groups. Moreover, when the group is compact,
so that µ(G) < ∞, we will always implicitly assume that the Haar measure is normalized;
i.e. µ(G) = 1.
Examples:
1. Consider the group U (1) := {eiθ : θ ∈ [0, 2π)}. This group is clearly homomorphic to
the group with elements in [0, 2π) under group operation of addition modulo 2π. For
any set S ⊆ [0, 2π) the Haar measure of U (1) is given by
Z
1
µ(S) = dθ , (C.59)
2π S
or equivalently, dµ(g) = 1
2π
dθ. Since U (1) ∼
= SO(2) this is also the Haar measure of
SO(2)
2. The Haar measure of SU (2). Recall from (C.9) that the group elements of SU (2) can
be characterize in terms of the Hyperspherical coordinates (α, β, γ). It turns out that
the Haar measure of a region R ⊆ SU (2) is given by
Z
µ(R) = sin(2α)dαdβdγ (C.60)
R
The Haar measure can be used to define various averages over a group. For example,
consider a function f : G → C. One can define the average of the function f over the
compact group G as Z
dg f (g) , (C.61)
G
where we use the short notation dg for the Haar measure dµ(g). Given a projective unitary
representation g 7→ Ug one can also define averages over elements of L(A) as
Z
G(ρ) := dg Ug ρUg∗ ∀ ρ ∈ L(A) . (C.62)
G
The map G : L(A) → L(A) is linear and is known as the G-twirling map.
Remark. If the group G is finite we can always replace that averages above with summations.
1
R P
In particular, for finite group the integral G dg can be simply replaced with a sum |G| g∈G ,
and under this replacement, all the theorems and statements below that apply for compact
Lie group, also apply for finite groups.
1. Use the invariance property of the Haar measure to show that for any ρ ∈ L(A)
In the next theorem we show that the average of {Ug } over the group (w.r.t. the Haar
measure) is an orthogonal projection.
where in the last equality we used the fact that the Haar measure dh is invariant under the
group action. Hence Ug Π = Π for all g ∈ G and we get that
Z Z
∗
Π Π= dg Ug−1 Π = dg Π = Π . (C.70)
G G
Therefore, from the first part of Exercise 2.3.7 it follows that Π is an orthogonal projection.
Moreover, since Ug Π = Π we get that Ug Π|ψ⟩ = Π|ψ⟩ for all g ∈ G. Hence, Π|ψ⟩ ∈ AG for
all |ψ⟩ ∈ A. Finally, to show that Π is not a projection to a proper subspace of AG , observe
that for every |ψ⟩ ∈ AG we have
Z Z
Π|ψ⟩ = dg Ug |ψ⟩ = dg |ψ⟩ = |ψ⟩ . (C.71)
G G
The following theorem states additional orthogonality condition satisfied by the matrix ele-
ments
uλmm′ (g) := ⟨λ, m, x|Ug |λ, m′ , x⟩ = ⟨λ, m|Ug(λ) |λ, m′ ⟩ . (C.73)
In the equation above, the set {|λ, m, x⟩}m,x form the basis of Aλ , whereas {|λ, m⟩}m forms
a basis of Bλ . In particular, on the left-hand side of the equation above there is no index x,
since from (C.45) the components uλmm′ (g) do not depend on x.
Proof. Take
ρ = |λ, m′ , x⟩⟨λ′ , k ′ , x| = |λ, m′ ⟩⟨λ′ , k ′ | ⊗ |x⟩⟨x| (C.75)
Now, denote by σ := G(ρ) and for any irrep µ denote by Bµ the representation space, and
by Cµ the multiplicity space. Then,
Z
Cµ
dg TrBµ ΠAµ Ug (|λ, m′ , x⟩⟨λ′ , k ′ , x|)Ug∗
Aµ
σµ := TrBµ Π σ =
ZG h i
′
(C.45)→ = dg TrBµ ΠAµ Ug(λ) |λ, m′ ⟩⟨λ′ , k ′ |Ug∗(λ ) ⊗ |x⟩⟨x|Cµ
G Z (C.77)
h ′
i
= δµλ δµλ′ dg Tr Ug(λ) |µ, m′ ⟩⟨µ, k ′ |Ug∗(λ ) |x⟩⟨x|Cµ
G
= δµλ δµλ′ δm′ k′ |x⟩⟨x|Cµ .
Since σ = G(ρ) is G-invariant (see Theorem C.4.1) we get form (C.52) (when applied to σ)
M
G(ρ) = uBµ ⊗ σµCµ
µ∈Irr(U ) (C.78)
Bλ Cλ
(C.77)→ = δλλ′ δm′ k′ u ⊗ |x⟩⟨x|
so that
δλλ′ δmk δm′ k′
⟨λ, m, x|G(ρ)|λ′ , k, x⟩ = . (C.79)
|Bλ |
Comparing this with (C.76) concludes the proof.
Note that the orthogonality relations in the theorem above can be used to obtain other
types of relations. For example, the relations (C.74) implies that (see Exercise C.5.1)
Z
′ δλλ′
dg ūλmm′ (g)Ug(λ ) = |λ, m⟩⟨λ, m′ |Bλ (C.80)
G |B λ |
(λ)
Moreover, this relation can be extended to Ug = λ∈Irr(U ) (Ug ⊗ I Cλ ) (see (C.45)) via
L
δλ,Irr(U )
Z
dg ūλmm′ (g)Ug = |λ, m⟩⟨λ, m′ |Bλ ⊗ I Cλ . (C.81)
G |B λ |
(
1 if λ ∈ Irr(U )
where δλ,Irr(U ) := . Taking m′ = m and summing over m results in the
0 otherwise
relation
δλ,Irr(U ) Bλ
Z
dg χ̄λ (g)Ug = I ⊗ I Cλ , (C.82)
G |Bλ |
Exercise C.5.1. Prove the relation (C.80) and the equality ūλmm′ (g) = uλmm′ (g −1 ). Hint:
(λ) P λ ′
For the former, express Ug = k,k′ ukk′ (g)|λ, k⟩⟨λ, k | and use the orthogonality rela-
tions (C.74).
Use the above orthogonality relation to show that the character χ(g) := Tr[Ug ] satisfies
(
mλ if λ ∈ Irr(U )
Z
dg χ̄λ (g)χ(g) = . (C.84)
G 0 otherwise
Definition C.6.1. Let G be a finite group, and let {ω(g, h)}g,h∈G be a cocycle of G
satisfying (C.28) and (C.30). The regular representation g 7→ Ugreg is a unitary
projective representation of G on the Hilbert space C|G| = span{|g⟩ : g ∈ G}
defined by the relation
Note that for any fixed g ∈ G, Ugreg maps the basis {|h⟩}h∈G to itself (up to phases)
and therefore Ugreg must be a unitary matrix. Furthermore, for any g1 , g2 , h ∈ G we have by
definition
Ugreg
1
Ugreg
2
|h⟩ = ω(g2 , h)Ugreg
1
|g2 h⟩
= ω(g2 , h)ω(g1 , g2 h)|g1 g2 h⟩
(C.86)
(C.30)→ = ω(g1 , g2 )ω(g1 g2 , h)|g1 g2 h⟩
= ω(g1 , g2 )Ugreg
1 g2
|h⟩ ,
and since the equation above holds for all h we get that
Ugreg
1
Ugreg
2
= ω(g1 , g2 )Ugreg
1 g2
. (C.87)
That is, g 7→ Ugreg is indeed a unitary projective representation of G with cocycle {ω(g, h)}g,h∈G .
Moreover, note that Ugreg can be expressed as
X
Ugreg = ω(g, h)|gh⟩⟨h| , (C.88)
h∈G
The regular representation of U reg depends only on the group G and the cocycle ω. There-
fore, we will denote the set of equivalence classes of irreps of U reg by Irr(G, ω). From (C.84)
it follows that the dimension of the multiplicity space of any irrep λ ∈ Irr(G, ω) is given by
1 X
mλ = χ̄λ (g)χreg (g)
|G| g∈G
(C.90)
(C.89)→ = χ̄λ (e)
= Tr I Bλ = |Bλ | .
That is, the multiplicity space has the same dimension as the representation space. This
equality has the following remarkable application. Recall that according to (C.44), the
Hilbert space C|G| can be decomposed with respect to the irreps of U reg such that
M
C|G| = Bλ ⊗ Cλ . (C.91)
λ∈Irr(G,ω)
n oλ∈Irr(G,ω)
The above relation implies that the vectors {vg }g∈G defined by vg := √1 uλ ′ (g)
dλ kk k,k′ ∈[dλ ]
|G|
belong to C (since they have exactly |G| components). Moreover, using this in conjunction
with the orthogonality relations (C.74) we conclude that {vg }g∈G is an orthonormal basis of
C|G| .
As the notation for the inner product above suggests, we will use the Dirac notation to
denotes the elements of L2 (G). This will make the analogy with the case of finite groups much
more apparent. Hence, the vector |f ⟩ for example corresponds to the function f (g) ∈ L2 (G).
We also denote by δ(g) the Dirac-delta on the group, defined by the relation
Z
⟨δ|f ⟩ = dg δ(g)f (g) = f (e) ∀ f ∈ L2 (G) . (C.95)
G
We will therefore denote |e⟩ := |δ⟩, so that f (e) = ⟨e|f ⟩. We point out that while δ(g) ̸∈
L2 (G) there is a way to make the concepts we discuss below mathematically rigorous via
the introduction of a rigged Hilbert space. However, this topic goes beyond the scope of this
book, and since we only use the Dirac delta function in this subsection we will not elaborate
on it here. For more information on this subject, we refer the reader to the section “Notes
and References” at the end of this chapter.
Continuing, for any h ∈ G we denote by |h⟩ the function δ(h−1 g) so that
Z
⟨h|f ⟩ = dg δ(h−1 g)f (g) = f (h) ∀ f ∈ L2 (G) . (C.96)
G
With these notations, given a cocycle ω, we define the regular representation g 7→ Ugreg of a
compact Lie group (in analogy with its definition on finite groups) as
To illustrate the above definitions, consider the group U (1) and for simplicity consider
the trivial cocycle ω(θ, θ′ ) = 1 for all θ, θ′ ∈ U (1) ∼
= [0, 2π). The Hilbert space L2 U (1) is
1 dθ
Note that the inner product has the factor of 2π since the Haar measure in this case is 2π .
Hence, the functions in (C.101) are normalized with respect to this inner product. In this
example, the regular representation (C.99) takes the form
Z 2π
1
Uθreg = dθ′ |θ + θ′ ⟩⟨θ′ | , (C.103)
2π 0
where the summation θ + θ′ is mod 2π. The matrix components of Uθreg in the fn -basis {|n⟩}
is given by
Z 2π
reg ′ 1
⟨n|Uθ |n ⟩ = dθ′ ⟨n|θ + θ′ ⟩⟨θ′ |n′ ⟩
2π 0
Z 2π
1 ′ ′ ′ (C.104)
= dθ′ ein(θ+θ ) e−in θ
2π 0
= eiθn δnn′ .
Hence, with respect to the basis {|n⟩} we can express the regular representation as
X
Uθreg = einθ |n⟩⟨n| . (C.105)
n∈Z
That is, the regular representation is a direct sum of all the irreps of U (1) (cf. (C.43)).
Ugreg ∼
M
= Ug(λ) ⊗ I Cλ , (C.106)
λ∈Irr(G,ω)
where for each λ the dimension dλ := |Bλ | < ∞. From (C.84) it follows that for any
λ ∈ Irr(G, ω) the dimension of the multiplicity space Cλ is given by
Z
mλ = dg χ̄reg reg
λ (g)χ (g)
ZG
(C.100)→ = dg χ̄reg (C.107)
λ (g)δ(g)
G
= χ̄reg
λ (e) = Tr[I
Bλ
] = dλ .
This remarkable result also implies that the Hilbert space L2 (G) can be decomposed as
L2 (G) ∼ Bλ ⊗ Cλ with Bλ ∼ = Cλ ∼
M
= = Cdλ . (C.108)
λ∈Irr(G,ω)
Now, define for any λ ∈ Irr(G, ω) and any k, k ′ ∈ [dλ ] the functions
λ 1 λ
fkk ′ (g) := √ u ′ (g) , (C.109)
dλ kk
(λ)
where uλkk′ (g) are the matrix elements of Ug as appear in (C.106). From the orthogonality
λ 2
relations (C.74) we have that {fkk ′ (g)} is an orthonormal set of functions in L (G), and
λ 2
from (C.108) we conclude that {fkk ′ (g)} is an orthonormal basis of L (G). We therefore
Fourier Expansion
Theorem C.6.1. Let G be a compact Lie group and let ω be a cocycle. Any
function f (g) ∈ L2 (G) can be expanded as
X dλ
X
f (g) = cλkk′ ūλkk′ (g) , (C.110)
λ∈Irr(G,ω) k,k′ =1
Remark. The relation above is the generalization of Fourier series. To see this, consider the
group U (1) whose elements are parametrized by θ ∈ [0, 2π). In this case we denote the
irreps by integers λ = n, and we know that they are all one dimensional. Therefore, uλkk′ (g)
becomes uλ (g) (since dλ = 1 so that k = k ′ = 1) and recall that λ = n. In other words,
uλkk′ (g) can be replaced with fn (θ) := eiθn (see (C.105)), and cλkk′ are replaced with cn . Hence,
for G = U (1) the two equations in the theorem above simplify to
Z 2π
X
inθ 1
f (θ) = cn e and cn = dθeinθ f (θ) (C.112)
n∈Z
2π 0
dθ
where we replaced g by θ and the Haar measure dg by 2π . This is precisely the Fourier
expansion of periodic function (with 2π period). The theorem above demonstrate that the
Fourier expansion is not a special feature of the group U (1) but it exists for any compact
Lie group.
Exercise C.6.1. Prove Theorem C.6.1 in full details. Hint: Use the arguments above it.
Class Functions
In the next theorem we show that the orthogonality between irreps implies that all class
functions are linear combinations of the characters.
(λ) (λ)
That is, [Mλ , Uh ] = 0 for all h ∈ G. Since h 7→ Uh is an irrep, we get from Schur’s lemma
that Mλ = bλ I Bλ for some aλ ∈ C. In terms the components, this relation can be expressed
as Z
dg f (g)uλkk′ = bλ δkk′ ∀ k, k ′ ∈ [dλ ] . (C.118)
G
bλ
In other words, the coefficients cλkk′ given in (C.111) satisfies cλkk′ = δ ′.
dλ kk
Denoting by
λ
aλ := db λ we get from (C.110) that
X dλ
X X dλ
X X
f (g) = cλkk′ ūλkk′ (g) = aλ ūλkk (g) = aλ χλ (g) . (C.119)
λ∈Irr(G,ω) k,k′ =1 λ∈Irr(G,ω) k,=1 λ∈Irr(G,ω)
Here we assumed that the characteristic function on AB is defined with respect to the
representation g 7→ UgA ⊗ UgB .
In the next lemma we show that characteristic functions can be used to characterize
G-invariant states.
Lemma C.7.1. Let ψ ∈ Pure(A). Then, the following statements are equivalent.
2. ψ is G-invariant.
Proof. If ψ is G-invariant then by definition Ug ψUg∗ = ψ so that Ug |ψ⟩ = eiθg |ψ⟩ for some
θg ∈ [0, 2π), so that |χψ (g)| = 1. Conversely, suppose that |χψ (g)| = 1 for all g ∈ G. This
means that there exist phases θg ∈ [0, 2π) such that
(n)
2. Let κL denotes the n-th order cumulant defined as
(n) ∂n
κL := i−n n log χρ eiθL (C.127)
∂θ θ=0
Show that the first and second order cumulants are the mean and the variance of the
observable (i.e. Hermitian matrix) L.
Let ρ ∈ L(A) and g 7→ Ug be a projective unitary representation of a group G in L(A).
The reduction of ρ onto the λ-irrep is the matrix
ρB
A A A
λ := TrCλ Π ρ Π . (C.128)
λ λ λ
Note that ρB Aλ A Aλ
λ above is the marginal of Π ρ Π
λ
in the representation space Bλ , whereas ρC
λ
λ
Aλ A Aλ
as define in (C.53) for G-invariant matrices is the marginal of Π ρ Π in the multiplicity
space Cλ .
X h B i
χρ (g) = Tr ρλ λ Ug(λ)
λ
Z (C.129)
ρB
λ
λ
= |Bλ | dg χρ (g −1
)Ug(λ)
G
Remark. The relationship between the characteristic function of ρ and its reduction onto the
λ-irrep is known as the Fourier transform over the group.
For the second equality we will use the relation (C.81). Multiplying both sides of (C.81) by
ρ ∈ L(A) and taking the trace gives
Z
1
dg ūλmm′ (g)χρ (g) = λ, m′ ρB
λ λ, m
λ
(C.131)
G |Bλ |
Since the above equation holds for all m, m′ ∈ [|Bλ |] we conclude that
Z
(λ)
ρB
λ
λ
= |Bλ | dg χρ (g)Ug−1 (C.132)
G
where we used the fact that ūλmm′ (g) = uλmm′ (g −1 ) (see Exercise C.5.1).
Exercise C.7.3. Show that if ρ ∈ L(A) is G-invariant then its reduction the λ-irrep is given
by
ρB
A A B
λ
λ
= Tr ρ Π λ u λ. (C.133)
Remark. For the case that G is a compact Lie group and f is continuous, the definition
above is equivalent to the statement that
Z Z
dg dh c̄(g)f (g −1 h)c(h) ⩾ 0 (C.137)
G G
2
where c ∈ L (G).
Exercise C.8.1. Show that a complex function f : G → C is positive definite if for all
choices of n ∈ N, g1 , . . . , gn ∈ G, and c1 , . . . , cn ∈ C
X X
c̄x cy f (gy gx−1 ) ⩾ 0 . (C.138)
x∈[n] y∈[n]
Observe that the condition above also implies that the left hand side is real.
If a complex function f : G → C is a characteristic function, i.e. f (g) = Tr[ρUg ] for some
ρ ∈ D(A) and some (non-projective) unitary representation g 7→ Ug acting on A, then for
any c1 , . . . , cn ∈ C and g1 , . . . , gn ∈ G
X X X X
c̄x cy f (gx−1 gy ) = c̄x cy Tr ρUg∗x Ugy
In other words, all characteristic functions are positive definite functions over the group.
Conversely, we will see below that every normalized positive definite function f over a group
is a characteristic function.
Exercise C.8.3. Let G be a compact Lie group and f : G → C be a positive definite function
on G. For any two functions f1 , f2 ∈ L2 (G) define
Z Z
⟨f1 |f2 ⟩f := dg dh f1 (g)f2 (h)f (g −1 h) . (C.141)
G G
where we chose the trivial cocycle ω(g, h) = 1 and denoted Irr(G) := Irr(G, ω = 1). In
the following theorem we use it to show that normalized positive definite functions are
characteristic functions.
Theorem C.8.1. Let G be a finite or compact Lie group and f (g) ∈ L2 (G). The
following are equivalent:
Proof of Theorem C.8.1. Since we already saw that all characteristic functions are positive
definite. It is therefore left to show that 2 ⇒ 1. Suppose f is a normalized positive definite
function on G. Recall that if f is also a characteristic function of some state ρ then f
and ρ satisfies (C.129) with f replacing χρ . However, since we need to prove that f is a
characteristic function we use this relationship as a definition. That is, for any λ ∈ Irr(G)
we define the operator Z
Bλ
ρλ := dλ dg f (g −1 )Ug(λ) , (C.144)
G
(λ)
where g 7→ Ugis the λ-irrep of the regular representation of G. We first show that the
operator above is positive semidefinite. Let η ∈ L(Bλ ), multiply both sides of the equation
above by ηη ∗ , and take the trace to get
h i Z
∗ Bλ
Tr ηη ρλ = dλ dg f (g −1 )χηη∗ (g) . (C.145)
G
We next decompose χηη∗ into two characteristic functions. To do that, first observe that
since η, η ∗ ∈ L(Bλ ) we have
χη (g) = Tr [ηUg ] = Tr ηUg(λ) and χη∗ (g) = Tr [η ∗ Ug ] = Tr η ∗ Ug(λ) .
(C.146)
Next, consider the second relation in (C.129) with η replacing ρ and h replacing g; that is,
Z
(λ)
η = dλ dh χη (h−1 )Uh . (C.147)
G
(λ)
Multiplying both of its sides by η ∗ Ug , with some g ∈ G, and taking the trace on both sides
gives Z h i
(λ)
χηη∗ (g) = dλ dh χη (h−1 )Tr Ug(λ) Uh η ∗
ZG (C.148)
−1
= dλ dh χη (h )χη∗ (gh) .
G
Substituting this into (C.145) and using the fact that χη∗ (gh) = χ̄η (h−1 g −1 ) gives
h i Z Z
∗ Bλ
Tr ηη ρλ = dλ 2
dh dg f (g −1 )χη (h−1 )χ̄η (h−1 g −1 ) (C.149)
G G
h i Z Z
∗ Bλ
Tr ηη ρλ = dλ 2
dk1 dk2 χη (k1 )f (k1−1 k2 )χ̄η (k2 )
G G (C.150)
Since f is positive definite→ ⩾ 0 .
From the analysis above this operator is positive semidefinite. We show next that its trace is
one (i.e. it is a density matrix) and that f can be expressed as the characteristic function of
ρA . To see this, recall that since f (g) ∈ L2 (G), it can be expressed as a linear combination
of the basis elements {uµk′ k (g)} as
dµ
X X
f (g) = aµkk′ uµk′ k (g) , (C.152)
µ∈Irr(G) k,k′ =1
where each aµkk′ ∈ C (for convenience we used uµk′ k (g) instead of ūµkk′ (g), so the coefficients
aµkk′ are different than the coefficients cµkk′ of (C.142)). Substituting this into (C.144) gives
dµ Z
X X
ρB
λ
λ
= dλ aµkk′ dg ūµkk′ (g)Ug(λ) (C.153)
µ∈Irr(G) k,k′ =1 G
Finally, combining this with the expression (C.130) for the characteristic function, we get
X h B i
(λ)
χρ (g) = Tr ρλ Ug
λ
λ
dλ
X (C.155)
(C.154)→ = aλkk′ uλk′ k (g)
k,k′ =1
(C.152)→ = f (g) .
Ṽ Ug = Ug Ṽ ∀g∈G. (C.157)
Note that in this definition, the G-invariance property is defined with respect to a single
representation g 7→ UgA , and there is no need to consider another representation on system
′
A′ (i.e. g 7→ UgA ).
Proof. Since Ṽ := V P commutes with Ug it follows that also Ṽ ∗ commutes with Ug . There-
∗
fore, P = ṼL Ṽ also commutes with Ug . Now, consider the irrep decomposition of the Hilbert
space A = λ Bλ ⊗ Cλ . From Theorem C.3.3 it follows that
M M
Ṽ = I Bλ ⊗ ṼλCλ and P = I Bλ ⊗ ΠC
λ ,
λ
(C.158)
λ λ
h i
where ṼλCλ := |B1λ | TrBλ ΠBλ Ṽ and ΠC 1
B Bλ
λ := |Bλ | TrBλ Π P , with Π being the projection
λ λ
onto the space Bλ . Now, observe that since P is a projection the condition P P = P gives
ΠCλ Πλ
λ Cλ
= ΠC Cλ
λ so that each Πλ is itself a projection in the space Cλ . Moreover, since
λ
P = Ṽ ∗ Ṽ we conclude that Πλ λ = Ṽλ∗Cλ ṼλCλ . Therefore, from Exercise 2.3.8 it follows that
C
for each λ, ṼλCλ can be completed to a unitary Wλ : Cλ → Cλ . That is, there exists a unitary
matrix Wλ ∈ L(Cλ ) satisfying Wλ ΠλCλ = ṼλCλ ΠC λ . Define the matrix W ∈ L(A) by
λ
M
W := I Bλ ⊗ WλCλ . (C.159)
λ
n
Exercise C.10.1. Show that {PπA }π∈Sn is indeed a unitary representation of Sn .
According to Theorem C.4.2 in the appendix, the orthogonal projection to the symmetric
n
subspace Symn (A), denote by ΠA
Sym , is given by
n 1 X An
ΠA
Sym = P . (C.162)
n! π∈S π
n
In order to calculate the dimension of Symn (An ), observe that the action of the projection
above on any element |xn ⟩ := |x1 · · · xn ⟩ ∈ An of the standard basis of An gives the symmetric
vector
n 1 X
ΠASym |x n
⟩ = |xπ(1) · · · xπ(n) ⟩ . (C.163)
n! π∈S
n
Since the type of each sequence (xπ(1) · · · xπ(n) ) equals to the type of xn , the state above is
uniquely determined by the type of xn . Recall that t(xn ) denotes the type of xn , and X n (t)
denotes the set of all sequences xn ∈ [m]n whose type is t. Keeping this in mind, we define
for any type t ∈ Type(n, m) the unit vector
1 X
|φt ⟩ := √ |xn ⟩ , (C.164)
kt xn ∈X n (t)
where
n n
kt := |X (t)| = . (C.165)
nt1 , . . . , ntm
n
Observe that the state in (C.163) is proportional to |φt ⟩. Therefore, since the image of ΠA
Sym
is Symn (A), the set of vectors {|φt ⟩}t∈Type(n,m) is an orthonormal basis of Symn (A). This
implies that the dimension of the symmetric subspace is given by
Proof. Let B := span{|ψ⟩⊗n : |ψ⟩ ∈ A} be the vector space on the right-hand side
of (C.167),Pand observe that B ⊆ Symn (A). We therefore need to show that Symn (A) ⊆ B.
Let |ψ⟩ = x∈[m] vx |x⟩, and for every xn ∈ [m]n let vxn := vx1 · · · vxn . Then,
X X
|ψ⟩⊗n = vxn |xn ⟩ = v1nt1 · · · vm
ntm
|χt ⟩ (C.168)
xn ∈[m]n t∈Type(n,m)
n
P
where |χt ⟩ = xn ∈X n (t) |x ⟩ is an unnormalized version of the normalized state defined
in (C.164).
Next, we define the polynomial f : Cm → B as
X
f (v) := v1k1 · · · vm
km
|χt ⟩ ∀ v ∈ Cm , (C.169)
t∈Type(n,m)
where kj := ntj ∈ N for all j ∈ [m]. Observe that the integers {kj }j∈[m] depends on t (for
simplicity of the exposition we did not add a subscript to indicate that). From (C.168)
we have f (v) ∈ B for all v ∈ Cn . We now argue that this implies that |χt ⟩ ∈ B for all
t ∈ Type(n, m) so that Symn (A) ⊆ B. Indeed, observe that for any t and corresponding
integers k1 , . . . , km we have
∂ n f (v1 , . . . , vm )
|χt ⟩ ∝ . (C.170)
∂v1k1 · · · ∂vm km v=0
Now, since f (v) ∈ V for all v ∈ Cm , and since all partial derivatives are limits of linear
combinations of f (v) at different points (v1 , . . . , vm ), we conclude that |χt ⟩ ∈ B. This
completes the proof.
Proof. Let |ψ1 ⟩, |ψ2 ⟩ ∈ Symn (A) be two non-zero vectors in the symmetric subspace. To
show that Symn (A) does not have an irreducible subspace, it will be enough to show that
there exists U ∈ U(A) such that
ψ1 U ⊗n ψ2 ̸= 0 . (C.171)
From Lemma C.10.1 it follows that both |ψ1 ⟩ and |ψ2 ⟩ can be expressed as linear combinations
of states of the form |φ⟩⊗n . Hence, there exists |φ1 ⟩, |φ2 ⟩ ∈ A such that ⟨ψ1 |φ⊗n
1 ⟩ ̸= 0 and
⟨ψ2 |φ⊗n
2 ⟩ ̸
= 0. For j = 1, 2 denote
n o
Gj := U ∈ U(A) : U |φj ⟩ = |φj ⟩ (C.172)
and observe that both G1 and G2 are subgroups of U(A). By definition, |φj ⟩ is an eigenvector
corresponding to the eigenvalue one for any U ∈ Gj . Therefore, denoting m := |A|, from
the spectral decomposition of such U , we conclude that every U ∈ Gj has the form U =
Ũ ⊕ |φj ⟩⟨φj |, where Ũ is a unitary matrix acting on the (m − 1)-dimensional subspace
orthogonal to |φj ⟩. In other words, for every j ∈ {1, 2}
n o
Gj = Ũ ⊕ |φj ⟩⟨φj | : Ũ ∈ U(m − 1) . (C.173)
Now, from Theorem C.4.2 we get that for j = 1, 2 (with dVj being the Haar measure on Gj )
Z
Πj := dVj Vj⊗n , (C.174)
Gj
We next show that the dimension (An )Gj = 1. First, observe that |φj ⟩⊗n ∈ (An )Gj so the
dimension of (An )Gj is at least one. Let |ψ⟩ ∈ (An )Gj so that Vj⊗n |ψ⟩ = |ψ⟩ for all Vj ∈ Gj .
Since each such Vj has the form Vj = V˜j ⊕ |φj ⟩⟨φj | we can take in particular Ṽj = eiθ Pj
where where Pj is the projection to the subspace orthogonal to |φj ⟩, and θ is any phase in
[0, 2π]. For this choice we get (eiθ Pj ⊕ |φj ⟩⟨φj |)⊗n |ψ⟩ = |ψ⟩ for all θ ∈ [0, 2π]. But this is
only possible for a state |ψ⟩ that is proportional to |φj ⟩⊗n . That is, up to a proportionality
coefficient, the only element of (An )Gj is |φj ⟩⊗n . Hence, the projection to (An )Gj is
Finally, let W be any unitary matrix in U (m) that satisfies W |φ2 ⟩ = |φ1 ⟩. Then, we get
that
Z Z
dV1 dV2 ψ1 (V1 W V2 )⊗n ψ2
G1
D GZ2 Z E
⊗n ⊗n ⊗n
= ψ1 dV1 V1 W dV2 V2 ψ2 (C.177)
G1 G2
(C.176)→ = ψ1 φ1 (⟨φ1 |W |φ2 ⟩)n
⊗n
φ⊗n
2 ψ2
W |φ2 ⟩ = |φ1 ⟩→ = ψ1 φ⊗n
1 φ⊗n
2 ψ2 ̸= 0 .
Therefore, there must exists at least one unitary matrix V1 and one unitary matrix V2
such that U = V1 W V2 is satisfying (C.171). This completes the proof that Symn (A) is an
irreducible subspace.
The space An has another useful subspace called the antisymmetric subspace which we
denoted by Asy(An ). It is defined by
n o
n n sign(π) An
Asy(A ) := |ψ⟩ ∈ A : (−1) Pπ |ψ⟩ = |ψ⟩ ∀ π ∈ Sn (C.178)
n
Exercise C.10.2. Show that {(−1)sign(π) PπA }π∈Sn is a unitary representation of Sn .
From Theorem C.4.2 it follows that the projection to the antisymmetric subspace is given
by
n 1 X n
ΠA
Asy = (−1)sign(π) PπA . (C.179)
n! π∈S
n
This vector is zero unless n ⩽ m. Otherwise, if n > m any sequence xn must contain at least
two components that are equal to each other. Without loss of generality suppose x1 = x2 .
Then, for any permutation π ∈ Sn the permutation π ′ defined by
π(2) if j = 1
′
π (j) := π(1) if j = 2 (C.181)
π(j) if j > 2
We already saw that Sym(A2 ) is an irreducible subspace of A2 , and we will see shortly that the
above decomposition is a decomposition into the two irreps of the “natural” representation
of the group U (m) (or SU (m)) on the space A2 .
In the case that n = 2 the set of all permutations on two elements, S2 , consists of only
two permutations. Therefore, from (C.160) we get that the projection to the symmetric
subspace takes the simple form
2 1 A2
ΠASym = I + F (C.186)
2
where F : A2 → A2 is known as the swap operator given by
X
F := |x⟩⟨y| ⊗ |y⟩⟨x| . (C.187)
x,y∈[m]
A2 1 A2
ΠAsy = I −F . (C.188)
2
Exercise C.10.3. Show that for any unitary matrix U ∈ L(A) the swap operator commutes
with U ⊗ U .
The exercise above implies that
X
F = (U ⊗ U ) F (U ⊗ U )∗ = U |x⟩⟨y|U ∗ ⊗ U |y⟩⟨x|U ∗ (C.189)
x,y∈[m]
− 1
|ψxy ⟩ := √ (|xy⟩ − |yx⟩) ∀1⩽x<y⩽m. (C.190)
2
+
Similarly, the symmetric subspace has an orthonormal basis {|ψxy ⟩}x⩽y∈[m] given by
(
√1 (|xy⟩ + |yx⟩) if 1 ⩽ x < y ⩽ m
+ 2
|ψxy ⟩ := . (C.191)
|xx⟩ if x = y ∈ [m]
Denote X X
− −
|ψ1 ⟩ = axy |ψxy ⟩ and |ψ2 ⟩ = bxy |ψxy ⟩. (C.193)
x<y x<y
Now, take U = DPπ , where D = x∈[m] eiθx |x⟩⟨x| is a diagonal unitary with θx ∈ [0, 2π],
P
X X
(U ⊗ U )|ψ2 ⟩ = −
bxy (U ⊗ U )|ψxy ⟩= bxy ei(θπ−1 (x) +θπ−1 (y) ) |ψπ−−1 (x)π−1 (y) ⟩ (C.194)
x<y x<y
Therefore, from (C.192) we get that for all permutations π ∈ Sm and all phases {θx }x∈[m] we
have m
X X
0 = ⟨ψ1 |U ⊗ U |ψ2 ⟩ = eiθx
eiθy a∗xy bπ(x)π(y) . (C.195)
x∈[m] y=x+1
Since the equation above holds for all θy we must have a∗1y bπ(1)π(y) = 0 for all y = 2, . . . , m
and all permutations π ∈ Sm . Since |ψ2 ⟩ ̸= 0, for each y ∈ {2, . . . , m} there exists π ∈ Sm
such that bπ(1)π(y) ̸= 0. Hence, we must have a1y = 0 for all y = 2, . . . , d. Next, observe that
the relation (C.192) becomes
m
X m
X
e iθx
eiθy a∗xy bπ(x)π(y) = 0 . (C.197)
x=2 y=x+1
Therefore, taking the derivative with respect to θ2 and repeating similar lines as above we
conclude that a2y = 0 for all y = 3, . . . , m. Continuing in this way we get that axy = 0 for
all 1 ⩽ x < y ⩽ m in contradiction with the assumption that |ψ1 ⟩ ̸= 0. This concludes the
proof.
Exercise C.10.4. Extend the proof above for the case that n > 2. That is, prove that
Asy(An ) is an irreducible subspace of An under the natural representation of U (m) in An .
Miscellany
f [x, x] = f ′ (x)
f ′ (x) f (x) − f (y)
f [x, x, y] = − (D.4)
(x − y) (x − y)2
1
f [x, x, x] = f ′′ (x). (D.5)
2
Note that (D.5) can be obtained from (D.4) by setting h := y − x → 0 and expanding
f (y) = f (x + h) = f (x) + hf ′ (x) + 21 h2 f ′′ (x) + O(h3 ).
Theorem D.1.1. Let A = Diag(α1 , . . . , αn ) ∈ Cn×n be a diagonal square matrix, and
B = [bij ] ∈ Cn×n be a complex square matrix. Assume that f (x) : C → C satisfy one of the
following conditions:
903
904 APPENDIX D. MISCELLANY
In particular
(1)
X
Tr(LA (B)) = f ′ (αj )bjj (D.10)
j∈[n]
n n
(2) 1 X
′
X f ′ (αi ) − f ′ (αj )
Tr(LA (B)) = f [αi , αj ]bij bji = bij bji . (D.11)
2 i,j=1 i,j=1
2(α i − α j )
Remark. The expansion above can be naturally generalized to higher than the second order,
but for the purpose of this article, we will only need to expand f (A + tB) up to the second
order in t. Moreover, for our purposes we will only need to assume that the αi are real and
the condition 2 on f holds. We kept condition 1 on f in the theorem just to be a bit more
general.
Note that in all the expressions above, one must identify αi = αj with the limit αj → αi .
For example, the term
f ′ (αi ) − f ′ (αj ) 1
= f ′′ (αi ) for αi = αj . (D.13)
2(αi − αj ) 2
In particular, note that if B is diagonal, Eq. (D.11) gives the known second order term of
the Taylor expansion.
Proof. From the conditions on f , it is enough to prove the theorem assuming f is a poly-
nomial. By linearity, it is enough to prove all the claims for f (x) = xm . Clearly, in the
expansion
(A + tB)m = Am + tLA (B) + t2 QA (B) + O(t3 ) (D.14)
we must have
X
LA (B) = Ap BAq , (D.15)
0⩽p,q, p+q=m−1
X
QA (B) = Ap BAq BAr , (D.16)
0⩽p,q,r, p+q+r=m−2
where we expanded (A + tB)m up to first and second order in t. All that is left to show is
that these matrices coincide with the ones defined in Eqs. (D.7,D.8).
Indeed, since A is diagonal, the matrix elements of the LA (B) in Eq.(D.15) are given by
X αim − αjm
[LA (B)]ij = αip αjq bij = bij , (D.17)
0⩽p,q, p+q=m−1
αi − αj
Thus, the expressions in Eq. (D.8) and Eq. (D.16) for QA (B) are the same.
We now prove Eq. (D.11). Observe first that Eq. (D.8) yields
n
X
Tr(QA (B)) = f [αi , αi , αj ]bij bji , (D.20)
i,j=1
where we have used the symmetry f [αi , αj , αi ] = f [αi , αi , αj ]. Now, since bij bji is symmetric
under an exchange between i and j, we can replace f [αi , αi , αj ] in Eq. (D.20) with
1 1
(f [αi , αi , αj ] + f [αj , αj , αi ]) = f ′ [αi , αj ] , (D.21)
2 2
where for the last equality we used Eq. (D.4).
(1)
Since the map Lρ is self-adjoint we get
Tr g(ρ)L(1)
(1)
ρ (σ) = Tr Lρ g(ρ) σ
(D.24)
(1)
g(ρ) = g(ρ)f ′ (ρ) −−−−→ = Tr [g(ρ)f ′ (ρ)σ] = Tr[h(ρ)σ] ,
Lρ
(1)
where we use the fact that Lρ (g(ρ)) = g(ρ)f ′ (ρ) (we leave it as an exercise). Similarly,
Tr g(ρ)L(1)
ρ (η) = Tr [h(ρ)η] . (D.25)
Finally,
X
Tr g(ρ)L(2) g(αx )⟨x|L(2)
ρ (σ) = ρ (σ)|x⟩
x∈[m]
X
= g(αx )f [αx , αy , αx ]|⟨x|σ|y⟩|2
x,y∈[m]
f ′ (αx )
X f (αx ) − f (αy )
(D.4)→ = g(αx ) − |⟨x|σ|y⟩|2
αx − αy (αx − αy )2
x,y∈[m]
1 X h(αx ) − h(αy ) (g(αx ) − g(αy ))(f (αx ) − f (αy ))
σ=σ ∗
−−−−→ = − |⟨x|σ|y⟩|2
2 αx − αy (αx − αy )2
x,y∈[m]
1 X
= ⟨x|σ|y⟩ [Lh (σ)]yx − [Lg (σ)]xy [Lf (σ)]yx
2
x,y∈[m]
1 1
= Tr [σLh (σ)] − Tr [Lf (σ)Lg (σ)] .
2 2
(D.26)
This completes the proof.
where σ11 > 0 and 0 denotes a zero matrix. Note that we can always find a basis in which
ρ and σ has the above form. Moreover, unless specified otherwise, all inverses of matrices
will be
understoodas generalized inverses. For example, the inverse of σ is understood as
−1
−1
σ11 012
σ := . Recall that from (B.75) that the Schur complement of the block ρ22 of
021 022
ρ is ρ/ρ22 = ρ11 − ζρ−1 ∗
22 ζ .
where the infimum is over all 1 < n ∈ N, all p ∈ Prob(n), and all POVMs {Λx }x∈[n−1]
acting on the support of σ that satisfy the following constraints: For all x ∈ [n − 1],
qx := Tr[σ̃Λx ] > 0 and
1 1
X px
σ̃ − 2 ρ̃σ̃ − 2 ⩾ Λx . (D.29)
qx
x∈[n−1]
Proof. From Lemma 5.3.1 and the optimization in (5.102) and (5.103), it follows that D(ρ∥σ)
can be expressed as in (D.28), where the infimum is over all 1 < n ∈ Prob(n), p ∈ Prob(n)
and 0 < q ∈ Prob(n − 1), such that there exists n − 1 density matrices {ωx }x∈[n−1] ⊂ D(A)
satisfying
X X
ρ⩾ px ωx and σ = qx ωx , (D.30)
x∈[n−1] x∈[n−1]
P
where we used the fact that the first relation above holds if and only if ρ = x∈[n−1] px ωx +
pn ωn , for some density matrix ωn . Note that since σ has the form given in (D.27), it follows
from the second relation above and the fact that q > 0 that also the density matrices {ωx }
Since L is invertible, we can conjugate the above relations by L−1 (·)(L∗ )−1 to get back (D.30).
Therefore the above relations are equivalent to (D.30). Moreover, since ρ22 ⩾ 0, it follows
that the relation above holds if and only if
X X
ρ̃ ⩾ px ω̃x and σ̃ = qx ω̃x . (D.34)
x∈[n−1] x∈[n−1]
1 1
where ω̃x := (ωx )11 and σ̃ := σ11 . Finally, denoting by Λx := qx σ̃ − 2 ω̃x σ̃ − 2 , and applying the
1 1
conjugation σ̃ − 2 (·)σ̃ − 2 to both sides of (D.34) gives the relations
1 1
X px X
σ̃ − 2 ρ̃σ̃ − 2 ⩾ Λx and Λx = I A . (D.35)
qx
x∈[n−1] x∈[n−1]
where f˜ is defined in (5.13), and the infimum above is subject to the same conditions given
in Lemma D.2.1. Observe that if f˜(0) = ∞ then pn in (D.36) can be taken to be zero
(otherwise, Df (ρ∥σ) = ∞). This means that the first relation in (D.30) must also hold with
equality (recall the original condition (5.103)). But from the second relation in (D.30), and
the fact that qx > 0 for all x ∈ [n − 1], this is possible only if supp(ρ) ⊆ supp(σ). That is, in
the case f˜(0) = ∞ we have Df (ρ∥σ) = ∞ for supp(ρ) ̸⊆ supp(σ) and for supp(ρ) ⊆ supp(σ)
we can take pn = 0 in the optimization above and also replace the inequality sign of (D.29)
with an equality.
Going back to the general case, one natural choice/guess for the optimal n, p and
{Λx }x∈[n−1] , is to choose them such that we have equality in (D.29). This is possible for
example by taking n = r + 1, where r is the dimension of the support of σ, and for any
1 1
x ∈ [r] to take Λx = |ψx ⟩⟨ψx | with |ψx ⟩ being the x-eigenvector of σ̃ − 2 ρ̃σ̃ − 2 corresponding to
1 1
the eigenvalue px /qx (i.e. p is chosen such that px /Tr[σ̃Λx ] is the x-eigenvalue of σ̃ − 2 ρ̃σ̃ − 2 ).
For this choice we have X px
1 1
σ̃ − 2 ρ̃σ̃ − 2 = |ψx ⟩⟨ψx | (D.37)
qx
x∈[r]
which forces pn to be
X X px
pn = 1 − px = 1 − ⟨ψx |σ̃|ψx ⟩ = 1 − Tr[ρ̃] , (D.38)
qx
x∈[r] x∈[r]
where the last equality follows by multiplying both sides of (D.37) by σ̃ and taking the trace.
Moreover, for these choices of n, p and {Λx }, we have
X
X px px
qx f = Tr[σ̃|ψx ⟩⟨ψx |]f
qx qx
x∈[n] x∈[n]
X px
∀t ⩾ 0 f (t|ψx ⟩⟨ψx |) = f (t)|ψx ⟩⟨ψx | → −−−−→ = Tr σ̃f |ψx ⟩⟨ψx |
qx
x∈[n]
(D.39)
X px
{|ψx ⟩} is orthonormal → −−−−→ = Tr σ̃f |ψx ⟩⟨ψx |
qx
x∈[n]
h 1 i
−2 − 12
= Tr σ̃f σ̃ ρ̃σ̃ .
Note that we obtained the formula above for a particular choice of n, p and {Ex }. Therefore,
since this is not necessarily the optimal choice (recall Df is defined in terms of an infimum),
we must have h 1 i
− 12
−2
Df (ρ∥σ) ⩽ Tr σ̃f σ̃ ρ̃σ̃ + (1 − Tr[ρ̃])f˜(0) . (D.40)
Interestingly, to get this upper bound we did not even assume that f is convex, but if f is
operator convex we get an equality.
Before we state the theorem below, we point out a remarkable result from matrix analysis
that we will use below. Suppose f : [0, ∞) → R is operator convex. In Sec. B we saw that
Remark. We use the convention 0 · ∞ = 0 in the case that both Tr[ρ̃] = 1 and f˜(0) = ∞.
The inverse of ρ22 in the theorem above is a generalized inverse (in case ρ22 is not invertible).
This in particular implies that if supp(ρ) ⊆ supp(σ) then ρ−1 22 = 0 and ρ̃ = ρ = ρ11 so that
Tr[ρ̃] = 1 and the second term on the right-hand side of (D.42) vanishes. Moreover, the
requirement that f (1) = 0 is not necessary, but if f (1) ̸= 0 then the resulting divergence Df
will not be normalized (i.e. we will get Df (1∥1) ̸= 0). Finally, observe that (D.42) can be
expressed in terms of the Kobu-Ando operator mean #f (see Definition B.5.1) as
Proof. We have already shown in (D.40) that Df (ρ∥σ) cannot be great than the right-
hand side of (D.42). Therefore, it is left to show the opposite inequality. Let 1 < n ∈ N,
{Λx }x∈[n−1] be a POVM acting on the support of σ, and p ∈ Prob(n). Suppose the conditions
in Lemma (D.2.1) hold with these elements. From Naimark’s theorem (see Theorem 3.3.2)
there exists a tuple of mutual orthonormal projectors {Px }x∈[n−1] ⊂ Pos(B), where B is
the extended Hilbert space, and an isometry V : A → B such that Λx = V ∗ Px V for all
x ∈ [n − 1]. We use this to compute
X px X px
qx f = Tr[Λx σ̃]f
qx qx
x∈[n−1] x∈[n−1]
h X px i
= Tr f Ex σ̃
qx
x∈[n−1]
h X px i
= Tr f Px V σ̃V ∗ ,
qx
x∈[n−1]
where we put the P the trace. Now, since {Px } are orthogonal projectors we have
sum inside
P px px
x∈[n−1] f qx Px = f x∈[n−1] qx Px . Combining this with the cyclic property of the
trace we get
X p X p
x x
qx f = Tr V ∗ f Px V σ̃
qx qx
x∈[n−1] x∈[n−1]
X p
x ∗
Jensen’s Inequality (B.30) →→ ⩾ Tr f V Px V σ̃ (D.44)
qx
x∈[n−1]
X p
x
= Tr f Λx σ̃ .
qx
x∈[n−1]
To continue, we first consider the case that f˜(0) = ∞. From the remark below (D.36) we
know that Df (ρ∥σ) = ∞ unless supp(ρ) ⊆ supp(σ). Furthermore, if supp(ρ) ⊆ supp(σ) then
we can replace the inequality sign of (D.29) with an equality, so that the above equation
gives the desired inequality (recall that σ̃ = σ and ρ̃ = ρ if σ > 0 which in the context
here σ > 0 is effectively the same statement as supp(ρ) ⊆ supp(σ) since we can restrict all
computations to the support of σ)
X px h 1 1
i
qx f ⩾ Tr σf σ − 2 ρσ − 2 . (D.45)
qx
x∈[n−1]
Note that we did not include on the left-hand side of the equation above the term pn f˜(0)
since in this case (i.e. the case f˜(0) = ∞) we must have pn = 0 so the term pn f˜(0) = 0·∞ = 0
by convention.
Next, we consider the case f˜(0) < ∞. In this case we know that f has the form (D.41).
Hence, continuing from (D.44) we get
X px ˜
h X p
x
i h X p
x
i
qx f ⩾ f (0) + f (0)Tr Λx σ̃ + Tr g Λx σ̃
qx qx qx
x∈[n−1] x∈[n−1] x∈[n−1]
h 1 i
− 12
X
˜ −2
monotonicity decreasing of g → − −−−→ ⩾ f (0) + f (0) px + Tr g σ̃ ρ̃σ̃ σ̃
from (D.29) and operator
x∈[n−1]
X h 1 i
˜ −2 − 21
using again the relation
g(r)=f (r)−f (0)−f˜(0)r
→ −−−−→ = f (0) p x − Tr[ρ̃] + Tr f σ̃ ρ̃σ̃ σ̃
x∈[n−1]
(D.46)
Hence,
X px h 1 1
i
qx f + f˜(0)pn ⩾ f˜(0) (1 − Tr[ρ̃]) + Tr f σ̃ − 2 ρ̃σ̃ − 2 σ̃ . (D.47)
qx
x∈[n−1]
Since the above inequality holds for any choice of 1 < n ∈ N, POVM {Λx }x∈[n−1] , and
p ∈ Prob(n) that satisfy the conditions in Lemma (D.2.1), we conclude that Df (ρ∥σ) is no
smaller than the right-hand side of the equation above. This concludes the proof.
rα −r
As an example, consider the function fα (r) = α(α−1)
which is known to be operator
convex for α ∈ (0, 2]. For this function we have
(
1 1
1
ε − 1−α
˜ ε α ε ε − 1 α(1−α)
if 0 < α < 1
fα (0) = lim+ = lim+ = (D.48)
ε→0 α(α − 1) ε→0 α(α − 1) ∞ if 1 ⩽ α ⩽ 2
For α ∈ [1, 2], unless supp(ρ) ⊆ supp(σ) we have Df (ρ∥σ) = ∞. For the case supp(ρ) ⊆
supp(σ) we have
1 h 1 1 α
1 1
i
Dfα (ρ∥σ) = Tr σ σ − 2 ρσ − 2 − σ − 2 ρσ − 2
α(α − 1)
(D.49)
1 h 1
−2 − 12
α i
= Tr σ σ ρσ −1 .
α(α − 1)
On the other hand, for the case α ∈ (0, 1), Eq. (D.42) gives
1 h 1 1 α
i 1 − Tr[ρ̃]
Dfα (ρ∥σ) = Tr σ̃ σ̃ − 2 ρ̃σ̃ − 2 − Tr[ρ̃] +
α(α − 1) α(1 − α)
(D.50)
1 h 1 1
α i
= Tr σ̃ σ̃ − 2 ρ̃σ̃ − 2 −1 .
α(α − 1)
Exercise D.2.2. Show that for α ∈ (0, 1) we have for any ρ, σ ∈ D(A)
Moreover, define ρ′ := GρG∗ /Tr[G∗ Gρ] and note that the above inequality implies that
GρG∗ ′
ρ ⩽
1−δ
1 1 1 1
η 2 (η + δω)− 2 ρ(η + δω)− 2 η 2 (D.57)
=
1−δ
η
ρ ⩽ η + δω→ ⩽ .
1−δ
It is left to show that ρ′ ∈ Bε (ρ). Using similar arguments as in (10.129) and (??) we
get that G∗ G ⩽ I A and P := 21 (G + G∗ ) ⩽ I A . Moreover, following similar arguments as
in (10.134) we get that F (ρ, ρ′ ) ⩾ Tr[ρP ]. Hence,
F (ρ, ρ′ ) ⩾ 1 − Tr[ρ(I − P )]
ρ ⩽ η + δω and I − P ⩾ 0→ ⩾ 1 − Tr[(η + δω)(I − P )]
= 1 − δ − Tr[η] + Tr[(η + δω)P ] (D.58)
1 1
by definition of P→ = 1 − δ − Tr[η] + Tr[η (η + δω) ]
2 2
1 1
(η + δω) 2 ⩾ η 2 → ⩾ 1 − δ .
The above lemma can be used to bound the smoothed max relative entropy in terms of
its following variant, defined via
(ε)
Dmax (ρ∥σ) := ′ min Dmax (ρ∥σ ′ ) ∀ ρ, σ ∈ D(A) . (D.60)
σ ∈Bε (σ)
That is, we use the brackets on ε to indicate that the smoothing is done with respect to the
second argument of Dmax .
Lemma D.3.2. Let ρ, σ ∈ D(A) be such that r := 2Dmax (ρ∥σ) < ∞. Then, for any
0 < ε < 1r √
2ε (ε)
Dmax (ρ∥σ) ⩽ Dmax (ρ∥σ) − log (1 − εr) . (D.61)
(ε′ )
Proof. Let σ ′ be such that Dmax (ρ∥σ) = Dmax (ρ∥σ ′ ). Since σ ′ ∈ Bε (σ) there exists 0 ⩽ δ ′ ⩽
ε′ and ω ′ , ω ∈ D(A) such that (cf. (5.170))
′ (ε′ )
Denote by t := 2Dmax (ρ∥σ ) = 2Dmax (ρ∥σ) . Then from its definition we have
ρ ⩽ tσ ′
⩽ t(σ ′ + δ ′ ω ′ ) (D.63)
= tσ + tδ ′ ω .
√ √
Hence, from Lemma D.3.1 there exists ρ′ ∈ Bε (ρ), with ε := 2δ ′ ⩽ 2ε′ such that
t t
ρ′ ⩽ ′
σ⩽ σ. (D.64)
1 − tδ 1 − tε′
We therefore conclude that
ε
Dmax (ρ∥σ) ⩽ Dmax (ρ′ ∥σ)
⩽ log t − log(1 − tε′ )
(ε′ )
(ε′ )
′ Dmax (ρ∥σ) (D.65)
= Dmax (ρ∥σ) − log 1 − ε 2
(ε′ )
(ρ∥σ) − log 1 − ε′ 2Dmax (ρ∥σ) .
⩽ Dmax
Proof of Theorem 8.6.1. Suppose first that all the components of q consists of rational
T
numbers. That is, there exists k1 , . . . , km ∈ N such that q = kk1 , . . . , kkm , where k :=
k1 + · · · + km . From Theorem 4.3.2 the vector
M p p1 p2 p2 pm pm
1
r := px u(kx ) = ,..., , ,..., ,..., ,..., (D.66)
k1 k1 k2 k2 km km
x∈[m] | {z } | {z } | {z }
k1 -times k2 -times km -times
satisfies (p, q) ∼ (r, u(k) ). Without loss of generality we assume that the components of p and
q are ordered as in (4.116). Note that this is equivalent to r = r↓ . Moreover, observe that the
n
relation (p, q) ∼ (r, u(k) ) also implies that for any n ∈ N we have (p⊗n , q⊗n ) ∼ (r⊗n , u(k ) ).
We therefore get that for all n ∈ N
1 ε 1 ε n
Dmin p⊗n q⊗n = Dmin r⊗n u(k ) . (D.67)
n n
Combining this with the upper bound in (8.148) we get that
1 ε ℓn 1
Dmin p⊗n q⊗n ⩽ − log bℓn = − log n = log(k) − log(ℓn ) ,
(D.68)
n k n
n) ℓn
where bℓn := u(k (ℓn )
= kn
, and ℓn ∈ {0, 1, . . . , k n − 1} is the integer satisfying
r⊗n (ℓn )
< 1 − ε ⩽ r⊗n (ℓn +1)
. (D.69)
where we will see shortly that the limits above exists. It is therefore left to estimate ℓn .
For this purpose, we will estimate the sums in (D.69) using the notion of (weak) typicality.
Observe first that j in (D.69) can be expressed as a sequence xn = (x1 , . . . , xn ) ∈ [m]n (so
that the components of r⊗n can be expressed as rxn := rx1 · · · rxn ). Denote by Sℓn the set of
the ℓn sequences that correspond to the largest probabilities rxn . With these notations we
have X ↓ X
r⊗n j = rxn . (D.72)
j∈[ℓn ] xn ∈Sℓn
Let δ > 0 be arbitrary small number. From the definition of Sℓn it follows that if there
exists xn ∈ Sℓn such that rxn < 2−n(H(r)+δ) then the set Sℓn contains the set of δ-typical
sequences, Tn,δ (X). Therefore, in this case the sum above is greater than Pr (Tn,δ (X)).
However, for sufficiently large n this probability exceed 1 − ε in contradiction with (D.69).
Therefore, without loss of generality we can assume that for sufficiently large n all the
sequences xn ∈ Sℓn satisfies rxn ⩾ 2−n(H(r)+δ) . Combining this with the first bound in (D.69)
we have X
1−ε> r xn
xn ∈Sℓn (D.73)
−n(H(r)+δ)
⩾ ℓn 2 .
Hence, ℓn ⩽ (1 − ε)2n(H(r)+δ) which gives
1
lim sup log(ℓn ) ⩽ H(r) + δ . (D.74)
n→∞ n
Next, from the second bound in (D.69) we have
X
1−ε⩽ rxn
xn ∈Sℓn +1
X X
⩽ rxn + rxn (D.75)
xn ∈Sℓn +1 ∩Tn,δ (X) xn ̸∈Tn,δ (X)
This completes the proof for the case that q has rational components. For the general
case, let {sk }, {rk } ∈ Prob(m) ∩ Qm be two sequences of probability vectors with rational
components such that both sk → q and rk → q, and in addition
(p, sk ) ≻ (p, q) ≻ (p, rk ) . (D.79)
ε
The existence of such sequences follows from Exercise 4.3.24. Therefore, since Dmin is a
divergence we have
1 ε 1 ε 1 ε
Dmin p⊗n s⊗n ⩾ Dmin p⊗n q⊗n ⩾ Dmin p⊗n r⊗n
k k . (D.80)
n n n
Taking on all sides the limits n → ∞ followed by k → ∞ completes the proof.
Exercise D.4.1. Give more details for the last argument involving (D.80) and the limits
n → ∞ and k → ∞. For the limit n → ∞, consider two cases involving lim inf and lim sup
and then conclude at the end that the limit exists.
Next, we provide an alternative more traditional proof. This proof is based on the AEP
property.
Alternative proof of Theorem 8.6.1. Let ε > 0 and t ∈ [m]n be a probabilistic hypothesis
test satisfying αn (t) ⩽ ε. Observe first that any probabilistic hypothesis test satisfies
X
βn (t) = txn qxn
xn ∈[m]n
X
⩾ txn qxn (D.81)
xn ∈Rεn
X
(8.64)→ ⩾ 2−n(D(p∥q)+ε) txn pxn .
xn ∈Rεn
Dividing by n and taking the limit n → ∞ we get that for all δ > 0
1
lim sup Costε ψ ⊗n ⩽ E(ψ AB ) + δ .
(D.88)
n→∞ n
Since the above inequality holds for all δ > 0 we must have for all ε ∈ (0, 1)
1
lim sup Costε ψ ⊗n ⩽ E ψ AB .
(D.89)
n→∞ n
Conversely, fix ε ∈ (0, 1) and for each n ∈ N, let mn ∈ [dn ] be the smallest integer
satisfying ∥p⊗n ∥(mn ) ⩾ 1−ε. Denote by Sn ⊂ [d]n the set of mn sequences xn with the highest
probabilities pxn . By definition, we have Costε (ψ ⊗n ) = log(mn ), ∥p⊗n ∥(m
n ) = Pr(Sn ), and
AB
|Sn | = mn . Suppose now, by contradiction, that there exists r < E ψ such that
1
Costε ψ ⊗n ⩽ r .
lim inf (D.90)
n→∞ n
This means that for any k ∈ N there exists n ⩾ k such that |Sn | = mn ⩽ 2rn . However,
from the third part of Theorem 8.1.3 (particularly Exercise 8.1.6) it follows that for any
Hence, $ %
1 ε ⊗n
1 kn
lim sup Distill ψ ⩽ lim sup log
n→∞ n n→∞ n ∥p⊗n ∥(kn ) − ε
(D.96)
1 kn
−−−−→ 1 −−−−→ = lim sup
n→∞
p⊗n
(kn )
log
n→∞ n 1−ε
AB
=E ψ +δ .
Since the above inequality holds for all δ ∈ (0, 1) we must have
1
lim sup Distillε ψ ⊗n ⩽ E ψ AB .
(D.97)
n→∞ n
1
Distillε ψ ⊗n ⩽ r .
lim inf (D.98)
n→∞ n
For each n ∈ N let kn be such that
$ %
⊗n kn
Distillε ψ
= log ⊗n
(D.99)
∥p ∥(kn ) − ε
Then, from the two equations above we get that for any a ∈ N there exists n ⩾ a such that
$ %
kn
⩽ 2rn (D.100)
∥p⊗n ∥(kn ) − ε
so that
1
kn ⩽ 2n(r+ n log(1−ε)) + 1 − ε . (D.102)
′
Hence, for sufficiently large n we have kn ⩽ 2nr for some r′ < Distill ψ AB
. For each
n n
n ∈ N denote by Sn ⊂ [d] the set of kn sequences x with the highest probabilities pxn . To
summarize, we got that for any a ∈ N there exists n ⩾ a such that ∥p⊗n ∥(kn ) = Pr(Sn ), and
′
|Sn | = kn ⩽ 2nr . However, from the third part of Theorem 8.1.3 (particularly Exercise 8.1.6)
it follows that there exists n sufficiently large such that Pr(Sn ) ⩽ ε in contradiction with
the assumption that ∥p⊗n ∥(kn ) > ε. Hence, we must have
1
Distillε ψ ⊗n ⩾ Distill ψ AB .
lim inf (D.103)
n→∞ n
The proof is concluded by comparing the above inequality with (D.97).
Note that the identity element belongs to S and since χψ (g −1 ) = χψ (g) we get that if g ∈ S
also g −1 ∈ S. Let H be the group generated by S; that is,
H := ⟨S⟩ := g1 · · · gn : g1 , . . . , gn ∈ S, n ∈ N . (D.105)
With these definitions we get that if ψ and ϕ are G-equivalent, then (15.182) still holds if
we replace G by H.
Remark. Note that if it is possible to extend the one dimensional representation {eiθh }h∈H
from H to G then in such cases (D.106) holds for all h ∈ G (recall that ⟨ψ|Ug |ψ⟩ = ⟨ϕ|Ug |ϕ⟩ =
0 for g ̸∈ H). Therefore, in such cases we get again the equivalence of Theorem 15.4.2.
However, such extensions of {eiθh }h∈H are not always exist (see Exercise D.6.1).
Proof. Following the same steps in the proof of Theorem 15.4.2 that led to (15.188), it follows
that |χφ1 (h)| = |χφ2 (h)| = 1 for all h ∈ S and in particular
for some phases θh ∈ [0, 2π). Note that the equation above implies that for all h ∈ S
and we get
θgh = θg + θh mod 2π . (D.110)
Therefore, the set {eiθh : h ∈ S} can be completed to a 1-dimensional unitary representation
of H := ⟨S⟩. Indeed, for any element h ∈ H that is not in S, there exists n ∈ N and
g1 , . . . , gn ∈ S such that h = g1 · · · gn . For such h we define
Note that θh above is well defined since if we also have h = k1 · · · km for some k1 , . . . , km ∈ S
then
P
Uh |φ1 ⟩ = Ug1 · · · Ugn |φ1 ⟩ = ei x∈[n] θgx |φ1 ⟩ and (D.112)
P
Uh |φ1 ⟩ = Uk1 · · · Ukm |φ1 ⟩ = ei y∈[m] θky |φ1 ⟩ , (D.113)
so that we must have y∈[m] θky = x∈[n] θgx mod 2π. We therefore conclude that h 7→ eiθh
P P
is a 1-dimensional representation of the subgroup H of G. The proof is concluded with
the observation that for any g ∈ H that is not in S we have by the definition of S that
χψ (g) = χϕ (g) = 0. Therefore, in this case (D.106) holds trivially.
Exercise D.6.1. Consider the group SU (2) and consider its 2-element subgroup H =
{I2 , −I2 }. Show that the 1-dimensional unitary representation that takes I2 to 1 and −I2 to
−1 is incompatible with a 1-dimensional representation on SU (2). In other words, show that
there is no unitary representation of SU (2) that maps I2 to 1 and −I2 to −1.
Theorem. Let (ρ, γ) and (σ, γ̃) be two quasi-classical states of systems A and A′ ,
respectively. The following statements are equivalent:
Proof. Let p, q, g, g̃ be the probability vectors whose components are the diagonals of ρ, σ, γ, γ̃,
respectively. From the second statement of the theorem and (17.67) we have that (p, g) ≻
CTO
(q, g̃). To show that (p, g) −−−→ (q, g̃) we will construct a sequence of thermal operations
such that their limit maps (p, g) to (q, g̃). For convenience, we will think of X := A and
Y := A′ as two classical systems, and consider the Gibbs state of system X n Y n . This Gibbs
state can be written as
X
γ ⊗n ⊗ γ̃ ⊗n = gxn g̃yn |xn y n ⟩⟨xn y n | . (D.114)
xn ∈[m]n
y n ∈[k]n
Our goal is to construct an energy preserving unitary (in fact a permutation) U such that
the state
ωn := TrX n Y n−1 U ρ ⊗ γ ⊗(n−1) ⊗ γ̃ ⊗(n−1) ⊗ γ̃
(D.116)
goes to σ as n goes to infinity; see Fig. D.1. We take three steps towards that goal:
We project the state in (D.115) to the strongly typical subspace. Specifically, let Tn (X)
1
and Tn (Y ) be the sets of all ε-strongly-typical sequences with ε = n1/3 ; i.e.,
Then, the projection of the initial state in D.115 to the corresponding typical subspace is
given by the sub-normalized state
n n
X px
η X Y := 1
gxn g̃yn |xn y n ⟩⟨xn y n | . (D.118)
n
gx1
x ∈Tn (X)
y n ∈Tn (Y )
From the theorem of strongly typical sequences, the state above has the following property.
Lemma D.7.1.
1 XnY n n→∞
η − ρ ⊗ γ ⊗(n−1) ⊗ γ̃ ⊗n 1
−−−→ 0 . (D.119)
2
1
Proof. First, observe that gn
g x1 x
= gxn−1 with xn−1 := (x2 , ..., xn ) so that
n n X X
Tr η X Y := px1 gxn−1 g̃yn
xn ∈Tn (X) y n ∈Tn (Y )
1/3
X (D.120)
Hoeffding’s inequality (??)→ ⩾ (1 − e−2n ) px1 gxn−1
xn ∈Tn (X)
where the limit follows from the fact that for very large n we have (n − 1)ε′2
n ≈ n
1/3
.
For any s ∈ Type(n, m) and t ∈ Type(n, k) we denote by x, s, t, · the set of all sequences
(x, xn−1 , y n ) with the same x, same type s of xn , and the same type t of y n . Similarly,
·, s, t, y is used to denote the set of all components (xn , y n−1 , y) with the same y, same type
s of xn , and the same type t of y n . Fix s ∈ Type(n, m) and t ∈ Type(n, k), and denote
the cardinality of these sets by ax := x, s, t, · and by := ·, s, t, y . Observe that (see
Exercise D.7.1)
ax = sx |xn (s)||y n (t)| and by = ty |xn (s)||y n (t)| . (D.124)
Now, let R = (ry|x ) be the column stochastic matrix satisfying Rp = q and Rg = g̃. For
any x ∈ [m] and y ∈ [k] we define by induction on x
( )
X
ℓyx := min by − ℓyx′ , ry|x ax (D.125)
x′ <x
P
where for x = 1 we use the convention that x′ <1 ℓyx′ := 0. Observe that each integer
ℓyx ⩾ 0 (see Exercise D.7.1) and the sum
X X X
ℓyx ⩽ ry|x ax ⩽ ry|x ax = ax . (D.126)
y∈[k] y∈[k] y∈[k]
Therefore, for each x ∈ [m] there exists k disjoint sets {Lyx }y∈[k] with each Lyx ⊂ x, s, t, ·
and with |Lyx | = ℓyx . Now, observe also that from their definition, the integers {ℓyx } satisfy
X
ℓyx ⩽ by . (D.127)
x∈[m]
f : ·, s, t, · → ·, s, t, · (D.130)
Observe that the bijection f = fs,t as defined above can be defined for any s ∈ Type(n, m)
and t ∈ Type(n, k). Therefore, the set of bijections {fs,t }s,t can be used to define the
bijection map
πn (xn y n ) := fs,t (xn y n ) (D.131)
where t is the type of xn and s is the type of y n . Observe that πn is a thermal operation since
it does not change s and t (hence preserves the energy). We define the unitary (permutation)
channel U ∈ CPTP(X n Y n → X n Y n ) as
px px −n H(s)+H(t)+D(s∥g)+D(t∥g̃)
gxn g̃yn = 2 (D.133)
gx gx
where x1 := x and we used (8.85) twice with t and s being the types of xn and y n , respectively.
n n
Denoting by cs,t := 2−n(H(s)+H(t)+D(s∥g)+D(t∥g̃)) we get from the definition of η X Y in (D.118)
that ωn′ can be expressed as
X X px X
ωn′ = cs,t TrX n Y n−1 [|πn (xn y n )⟩⟨πn (xn y n )|] . (D.134)
gx
(s,t)∈Cn x∈[m] (xn ,y n )∈⟨x,s,t,·⟩
Next, instead of summing over all the elements of x, s, t, · (of the third sumSabove), we
will restrict the summation only to sequences (xn , y n ) that belong to the subset y∈[k] Lyx ⊂
x, s, t, · and we will show that the remaining terms are negligible (i.e. goes to zero as n
goes to infinity). That is, we define
X X X X px
ωn′′ = cs,t TrX n Y n−1 [|πn (xn y n )⟩⟨πn (xn y n )|] , (D.136)
gx
(s,t)∈Cn x∈[m] y∈[k] (xn ,y n )∈Ls,t
yx
where we added s, t superscript to Lyx since it depends on the types s and t. We show now
that ωn′′ → σ as n → ∞, and since Tr[σ] = 1 we must have ∥ωn′ − ωn′′ ∥1 → 0 as n → ∞.
Now, observe that for (xn , y n ) ∈ Ls,t n n
yx we have that πn (x y ) ∈ ·, s, t, y , so that the last
n n
component of the sequence (x , y ) is yn = y. Therefore, we get that
X X X X px
ωn′′ = cs,t |y⟩⟨y|Y
gx
(s,t)∈Cn x∈[m] y∈[k] (xn ,y n )∈Ls,t
yx
X X X px s,t (D.137)
= cs,t ℓ |y⟩⟨y|Y .
gx yx
(s,t)∈Cn x∈[m] y∈[k]
Lemma D.7.2. Let {tn , sn }n∈N be a sequence of pair of types such that
(n)
(tn , sn ) ∈ Cn . Denote by ℓxy the coefficients (D.125) that corresponds to the pair of
types (sn , tn ). Then,
(n)
ℓyx
ry|x = lim (n) . (D.138)
n→∞ a
x
where the last equality follows from the fact that g̃ = Rg so that
X
g̃y = ry|x gx ⩾ ry|1 g1 . (D.140)
x∈[m]
Now, fix x ∈ [m] and suppose that the limit (D.138) (with x being replaced by x′ ) holds for
all x′ < x. We need to show that it also holds for x. Indeed,
j k
(n) (n)
(n) t(n) ry|x ax
P
ℓyx x′ <x ℓyx′
y
lim = lim min − ,
n→∞ a(n) n→∞ sx(n) ax
(n) (n)
ax
x
(D.141)
(n)
( )
g̃y X ℓyx′
= min − lim , ry|x .
gx x′ <x n→∞ ax(n)
Since we assume by induction that (D.138) holds if we replace x with x′ < x we get
(n) (n) (n)
ℓyx′ ax′ ℓyx′
lim (n)
= lim (n) (n)
n→∞ ax n→∞ ax ax ′
(n) s,t
sx′ ℓyx′
= lim (n) (n)
n→∞ s x ax ′ (D.142)
(n)
gx′ ℓyx′
= lim
gx n→∞ ax(n)′
gx′
By induction→ = ry|x′
gx
Substituting this into (D.141) we get that
( )
(n)
ℓyx g̃y 1 X
lim = min − ry|x′ gx′ , ry|x = ry|x (D.143)
n→∞ a(n) gx gx x′ <x
x
D.8 Continuity
In Section 17.6.1, we calculated the conversion distance between two athermality states
within the quasi-classical regime. Notably, in Theorem 17.6.2, we postulated that the Gibbs
states g and q′ possess positive rational components. In this section, we demonstrate that
the conversion distance is continuous. Consequently, one can utilize Theorem 17.6.2 to
approximate the conversion distance with arbitrary precision, even when g and g′ have
irrational components.
For this purpose, we fix two probability vectors pA ∈ Prob(m) and qB ∈ Prob(n) and
define for all gA ∈ Prob(m) and gB ∈ Prob(m) the function:
A B A A F B B
f g , g := T (p , g ) → − (q , g ) . (D.146)
Moreover, fix gA , g′A ∈ Prob(m) and gB , g′B ∈ Prob(n), and denote by δ := 21 gA − g′A 1
and ε := 21 gB − g′B 1 . Furthermore, let gmin
B ′B
and gmin be the smallest components of gB
and g′B , respectively. With these notations we prove the following continuity lemma.
2δ
f g′A , gB − f gA , gB ⩽ B .
(D.148)
gmin
1 B
f gA , g′B ⩽ q − q′′B 1
2
1 B 1 ′B
Triangle inequality→ ⩽ q − q′B 1 + q − q′′B 1
(D.151)
2 2
ε
⩽ f gA , gB + B .
gmin
For the converse inequality, observe that by repeating the exact same lines as above, ex-
changing everywhere gB with g′B , we get
ε
f gA , gB ⩽ f gA , g′B + ′B
(D.152)
gmin
This completes the proof of the inequality (D.147).
For the proof of (D.148), as before, let q′B be optimal such that (D.149) holds. We
would like to find a vector q′′B that is close to q′B and that satisfies (pA , g′A ) ≻ (q′′B , gB ).
Since (pA , gA ) ≻ (q′B , gB ) there exists a column stochastic matrix such that EqA = q′B and
EgA = gB . Denote by g′B = Eg′A , and observe that since gA is δ-close to g′A , also gB is
δ-close to g′B (DPI under E). Moreover, by definition we have (pA , g′A ) ≻ (q′B , g′B ). Now,
from Lemma 4.3.4 there exists q′′B such that
1 ′B δ
q′B , g′B ≻ q′′B , gB q − q′′B 1 ⩽ ′B .
and (D.153)
2 gmin
Since pA , g′A ≻ q′B , g′B we also have pA , g′A ≻ q′′B , gB . Hence,
1 B
f g′A , gB ⩽ q − q′′B 1
2
1 B 1 ′B
Triangle inequality→ ⩽ q − q′B 1 + q − q′′B 1
2 2
δ
⩽ f gA , gB + ′B
gmin (D.154)
A B
δ
g′B ≈δ gB −−−−→ ⩽ f g , g + B
gmin − δ
1 B A B
2δ
δ ⩽ gmin −−− −→ ⩽ f g , g + B
.
2 gmin
The opposite inequality can be obtained using the exact same lines as above by exchanging
the roles of gA and g′A . This completes the proof of (D.148).
Observe that the set M(p, q, k) is convex and consists of all two-column matrices [r s] with
probability vectors r, s ∈ Prob(k) for which (p, q) ≻ (s, r).
1. (p, q) ≻ (p′ , q′ )
2. For all k ∈ N
M(p, q, k) ⊇ M(p′ , q′ , k) . (D.156)
Proof. Suppose first that (p, q) ≻ (p′ , q′ ) and fix k ∈ N. Then, if the two column matrix
[r′ s′ ] ∈ M(p′ , q′ , k) we get by definition that (p′ , q′ ) ≻ (r′ , s′ ) so that from the transitivity
of relative majorization we get that also (p, q) ≻ (r′ , s′ ). That is, the two column matrix
[r′ s′ ] ∈ M(p, q, k). Hence, the inclusion in (D.156) must hold.
Conversely, suppose (D.156) holds for all k ∈ N. In particular, it holds for k = m. In
this case we have
[p′ q′ ] ∈ M(p′ , q′ , m) ⊆ M(p, q, m) , (D.157)
so that there exists E ∈ STOCH(m, n) such that [p q] = [Ep Eq]; i.e. (p, q) ≻ (p′ , q′ ).
We therefore proved the equivalence between relative majorization and the inclusion
relations between the sets in (D.156). The significance of this observation is that now we can
make use of the fact that inclusion relation between compact sets is related to inequalities
between their support functions. Specifically, from Theorem A.7.1 we know that two compact
sets C1 and C2 satisfy C1 ⊆ C2 if and only if their support functions fC1 and fC2 satisfy
fC1 ⩽ fC2 everywhere in their domain. The support function of M(p, q, k), denoted by
fM(p,q,k) : Rk×2 → R, is given for any two-column matrix S = [s1 s2 ] ∈ Rk×2 by
Therefore, the lemma above in conjunction with Theorem A.7.1 implies that (p, q) ≻ (p′ , q′ )
if and only if for all k ∈ N and all S ∈ Rk×2
Our next goal is therefore to compute the optimization problem given in (D.158) for the
support function.
1. (p, q) ≻ (p′ , q′ )
Remark. The expressions appearing on the right-hand side and left-hand side of (D.160)
are precisely the support functions given in (D.159). They are known as sublinear function-
als. This lemma provides a characterization for relative majorization in terms of sublinear
functionals (see subsection A.7).
Proof. Denoting by {ez|x }x∈[n], z∈[k] the components (conditional probabilities) of E and by
{sz1 }z∈[k] and {sz2 }z∈[k] the components of s1 and s2 , we continue from (D.158)
XX
fM(p,q,k) (S) = max sz1 ez|x px + sz2 ez|x qx
E∈STOCH(k,n)
z∈[k] x∈[n]
XX
= max ez|x sz1 px + sz2 qx (D.161)
E∈STOCH(k,n)
x∈[n] z∈[k]
X
Exercise D.9.1→ = max sz1 px + sz2 qx .
z
x∈[n]
Combining this with (D.159) completes the proof of the equivalence between (p, q) ≻ (p′ , q′ )
and (D.160) with arbitrary vectors v1 , . . . , vk ∈ R2 . It is left to show that we can assume
that v1 , . . . , vk ∈ R2+ . Indeed, for each j ∈ [k] let vj′ := vj + (r, r)T ⩾ 0 for some sufficiently
large r > 0. In the Exercise D.9.2 below you show that the inequality in (D.160) holds with
v1 , . . . , vk if and only if it holds with v1′ , . . . , vk′ . Hence, without loss of generality we can
assume that all the vectors v1 , . . . , vk have non-negative components.
Exercise D.9.1. Explain in more details the derivation in the last line of (D.161).
Exercise D.9.2. Show that for sufficiently large r > 0, the inequality in (D.160) holds with
v1 , . . . , vk if and only if it holds with v1′ , . . . , vk′ , where for each j ∈ [k], vj′ := vj +(r, r)T ⩾ 0.
The next Lemma is a crucial simplification of the previous lemma. Specifically, we show
that it is sufficient to take k = 2 in Lemma D.9.2.
Lemma D.9.3. Using the same notations as in Lemma D.9.2, the following
statements are equivalent:
1. (p, q) ≻ (p′ , q′ ).
Proof. We start by proving the equivalency of 1 and 2. From Lemma D.9.2 it is sufficient to
show that if (D.163) holds then (D.160) holds for all k ⩾ 3 (the case k = 1 is trivial). Let
v1 , . . . , vk ∈ R2 with k ⩾ 3. In order to prove the inequality in (D.160), we first observe that
the term maxz∈[k] {rx · vz } is the support function of the polytope C = Conv(v1 , . . . , vk ). We
order the set of vertices {v1 , . . . , vk } such that for any x ∈ {2, . . . , k} the vector vx − vx−1
is on the boundary of C (see Fig. D.2a). Specifically, recall that the support function of C is
given by fC (s) = maxx∈[k] s · vx for all s ∈ R2 . Therefore, the left-hand side of (D.160) can
be expressed as X X
max{rx · vz } = fC (rx ) . (D.164)
z∈[k]
x∈[n] x∈[n]
The key idea of the proof is to use the property of support functions under addition of sets
(see Theorem A.7.1). For this purpose, it would have been useful if it was possible to write
C as a sum of convex sets with each set in the sum being the convex hull of only two vectors
(so that (D.163) can be applied). While it is not possible to decompose C in this way, we
define now a set with the same support function as C, but for which such a decomposition
is possible.
We define the desired set in two steps. First, we define the set
C′ := C − R2+ := v − p : v ∈ C , p ∈ R2+ .
(D.165)
That is, the set C′ is an unbounded polyhedron consisting of all the vectors r ∈ R2 for which
there exists v ∈ C with the property that r ⩽ v. By definition, the support function of C′
equals that of C (see Exercise D.9.3). We will see now that the set C′ is a bit simpler to
work with as it contains only a few of the vertices of C (the ones that will be relevant for
the computation of the support of C).
In Fig. D.2b we a depicted the set C′ and its relation to C. Observe that the set C′ is
bounded by (1) the vertical line that passes through the vertex with the highest x-coordinate,
(2) the horizontal line that passes though the vertex with the highest y-coordinated, and
Figure D.2: (a) The polytope C. The red arrow represents the vector vx − vx−1 . (b) The polyhedron
C′ := C − R2+ is described with the blue area.
(3) the portion of the boundary of C that connects between the vertex with the highest y-
coordinate and the one with the highest x-coordinate. In particular, observe that the set of
all vertices of C′ is a subset of {v1 , . . . , vk }. For simplicity of notations, we take {v1 , . . . , vk′ }
with k ′ ⩽ k, to be the set of vertices of C′ . Moreover, observe that we can always arrange
the set {v1 , . . . , vk′ } such that for each x ∈ {2, 3, . . . , k ′ − 1} the vector vx is a “neighbour”
of vx−1 and vx+1 . That is, each vector vx − vx−1 is on the boundary of C′ and its angle
with the x-axis is in the interval [−π/2, 0] (see Fig. D.2b). Further, the angle of vx − vx−1
with the x-axis is non-increasing in x ∈ {2, . . . , k ′ } (i.e., the angle becomes closer to −π/2
as x-increases).
With the above ordering of the vertices {v1 , . . . , vk′ } we are now ready to construct the
second convex set. Denote by v0 = 0 the zero vector in R2 and for each y ∈ [k ′ ] let Ky be
the convex hull of the zero vector 0 ∈ R2 and the vector vy − vy−1 . That is, for each y ∈ [k ′ ]
we can express Ky = {t(vy − vy−1 ) : t ∈ [0, 1]}. We then define
K := K1 + · · · + Kk′
nX o
=
′
tx (vx − vx−1 ) : t = (t1 , . . . , tk′ )T ∈ [0, 1]k . (D.166)
x∈[k′ ]
Clearly, the set K is convex and its support function is given for any s ∈ R2 by
X
fK (s) = max ′ tx (vx − vx−1 ) · s . (D.167)
t∈[0,1]k
x∈[k′ ]
Now, recall that we are only interested in s ⩾ 0, so that the angle of s with the x-axis is
in the interval [0, π/2]. Therefore, the angle between s and the vector vx − vx−1 is non-
decreasing in x, and this angle is in the interval [0, π]. Therefore, there exists ℓ ∈ [k ′ ] such
that the dot product (vx − vx−1 ) · s is non-negative for all x ∈ [ℓ] and it is negative for all
x ∈ {ℓ + 1, . . . , k ′ }. Therefore, the maximum in (D.167) is obtained by taking tx = 1 for
x ∈ [ℓ] and tx = 0 for x ̸∈ [ℓ]. That is, we get that for s ⩾ 0
X
fK (s) = (vx − vx−1 ) · s = vℓ · s . (D.168)
x∈[ℓ]
On the other hand, observe that for all ℓ′ ∈ [k ′ ], by taking tx = 1 for x ∈ [ℓ′ ] and tx = 0 for
x ̸∈ [ℓ′ ], Eq. (D.167) gives
X
fK (s) ⩾ (vx − vx−1 ) · s = vℓ′ · s . (D.169)
x∈[ℓ′ ]
From the two equations above we therefore conclude that that for all s ⩾ 0 we have
We therefore found a convex set K that has the same support function as C on vectors in R2+ ,
and that can be expressed as K = K1 + · · · + Kk′ . From the property of support functions
under addition of sets (see Theorem A.7.1), we therefore get that for all s ∈ R2+
X
fC (s) = fK (s) = fKy (s) . (D.171)
y∈[k′ ]
sincewe assume that (D.160) holds for k = 2 (and for each fixed y we can write the term
max 0, rx · (vy − vy−1 ) in the form maxj∈[2] rx · uj , where u1 := 0 and u2 := vy − vy−1 ).
Combining this with the previous equation we conclude that
X X X X
max{rx · vz } ⩾ fKy (r′x ) = fK (r′x )
z∈[k]
x∈[n] x∈[n] y∈[k′ ] x∈[n]
X X (D.174)
= fC (r′x ) = max{r′x · vz } .
z∈[k]
x∈[n] x∈[n]
Hence, (D.160) holds for all k ∈ N, so that from Lemma D.9.2 we get that (p, q) ≻ (p′ , q′ ).
To prove the equivalency of the second and third statements of the lemma, recall that
due to (D.162) the condition (D.163) is equivalent to
2×2
fM(p,q,2) (S) ⩾ fM(p′ ,q′ ,2) (S) ∀ S ∈ R+ . (D.175)
As argued below (D.162), the condition (D.175) holds if and only if it holds for all S in R2×2
(not necessarily R2×2
+ ). We can therefore conclude from Theorem A.7.1 that the condition
above is equivalent to M(p, q, 2) ⊇ M(p′ , q′ , 2) so that the second and third statements of
the lemma are equivalent. This completes the proof.
Exercise D.9.3. Show that the support function of C equals the support function of C′ .
In the following exercise you simplify even further the expression given in (D.163).
Exercise D.9.4. Using the same notations as in Lemma D.9.2, show that
X X
(p, q) ≻ (p′ , q′ ) ⇐⇒ max{0, rx · v} ⩾ max{0, r′y · v} ∀ v ∈ R2 . (D.176)
x∈[n] y∈[m]
a+b |a−b|
rx = (1, 1)T .
P
Hint: Use the formula max{a, b} = 2
+ 2
in (D.163), and recall that x
Alternative Proof of Theorem 4.3.4. To see that 1 and 2 are equivalent, observe that
n o
M(p, q, 2) := [Ep Eq] : E ∈ STOCH(2, n)
t·p t·q (D.177)
= : t ∈ [0, 1]n
1−t·p 1−t·q
where t ∈ [0, 1]n is the first row of E, and 1n − t is its second row (recall that E is a 2 × n
column stochastic matrix). Note that (t · p, t · q) are precisely the elements of T(p, q).
Therefore, the inclusion T(p, q) ⊇ T(p′ , q′ ) is equivalent to the inclusion M(p, q, 2) ⊇
M(p′ , q′ , 2). We already saw in Lemma D.9.3 that this latter inclusion is equivalent to
(p, q) ≻ (p′ , q′ ). Hence, we proved the equivalence of the first and second statement of the
theorem.
Finally, it is left to show the equivalence between the first and third statement of the
theorem. From Exercise (D.9.4) it follows that (p, q) ≻ (p′ , q′ ) is equivalent to the condition
that for any a, b ∈ R we have
X X
max{0, apx + bqx } ⩾ max{0, ap′y + bqy′ } (D.178)
x∈[n] y∈[m]
where we took v = (a, b)T in (D.176). Using on both sides of the equation above the fact
1
that for any r ∈ R, max{0, r} = 2 r + |r| we conclude that the equation above is equivalent
to the statement that X X
|apx + bqx | ⩾ |ap′y + bqy′ | , (D.179)
x∈[n] y∈[m]
Finally, dividing both sides of (D.179) by a (we can assume without loss of generality that
a ̸= 0), and denoting t := −b/a gives
X X
|px − tqx | ⩾ |p′y − tqy′ | . (D.181)
x∈[n] y∈[m]
This completes the proof of the equivalence between first and third statements of the theorem.
n 1 X An An An
Gn (ω A ) := P −1 ω Pπ ∀ ω ∈ L(An ) . (D.182)
n! π∈S π
n
F ρ⊗n , σn = ψ ⊗n ϕn ,
(D.183)
We then define
n √ n Ãn
|ϕn ⟩ := I A ⊗ σn U |ΩA ⟩. (D.186)
It is straightforward to check that this ϕn satisfies (D.183) (see Exercise D.10.1). Moreover,
note that ϕn is indeed a purification of σn . It is therefore left to show that ϕn is symmetric.
For this purpose, √
we first show that U can be taken to be symmetric. Since both matrices
√ √ ⊗n √
σn ρ and σn ρ⊗n are symmetric, from Theorem C.3.3 it follows that they can be
expressed as
√ p ⊗n M Bλ √ p ⊗n M
σn ρ = I ⊗ ηλCλ and σn ρ = I Bλ ⊗ ζλCλ , (D.187)
λ λ
where ηλCλ and ζλCλ are operators on the multiplicity space of the irrep λ. Define
√ p ⊗n √ p ⊗n −1 M −1
V := σn ρ σn ρ = I Bλ
⊗ ηλCλ ζλCλ (D.188)
λ
where all inverses are generalized inverses. Since V is a partial isometry (see Exercise D.10.2)
−1
it follows that each ηλ ζλCλ
Cλ
is a partial isometry. Therefore, we can complete each
−1
ηλCλ ζλCλ to a unitary matrix Uλ ∈ L(Cλ ). Defining
M
U := I Bλ ⊗ UλCλ (D.189)
λ
we get that U is symmetric and satisfies (D.185). Finally, since U is symmetric we get that
for all π ∈ Sn
n n n Ãn n n√ n n
PπA ⊗ Pπà |ϕAn ⟩ = PπA ⊗ Pπà σn U |ΩA à ⟩
n √ n n n
σn is symmetric→ = PπA ⊗ σn Pπà U |ΩA à ⟩
n √ n n n
U is symmetric →→ = PπA ⊗ σn U Pπà |ΩA à ⟩ (D.190)
n n n √ n n
ΩA Ã is symmetric→ = IπA ⊗ σn U |ΩA Ã ⟩
n Ãn
= |ϕA
n ⟩.
Hence, ϕn is symmetric.
Exercise D.10.1. Show explicitly that ϕn , as defined in (D.186) satisfies (D.183)
Exercise D.10.2. Let Λ ∈ L(A) and define V = Λ|Λ|−1 where the inverse is a generalized
inverse. Show that V is a partial isometry.
937
938
Bibliography
[3] J. Aczél, B. Forte, and C. T. Ng. Why the shannon and hartley entropies are ‘natural’.
Advances in Applied Probability, 6(1):131–146, 1974.
[4] P.M. Alberti and A. Uhlmann. A problem relating to positive linear maps on matrix
algebras. Reports on Mathematical Physics, 18(2):163 – 176, 1980.
[5] S. M. Ali and S. D. Silvey. A general class of coefficients of divergence of one distribution
from another. Royal Statistical Society, Wiley, 28, 1966.
[6] Jr. Arthur F. Veinott. Least d-majorized network flows with inventory and statistical
applications. Management Science, 17(9):547–567, 1971.
[11] David Avis, Hiroshi Imai, Tsuyoshi Ito, and Yuuya Sasaki. Deriving tight bell in-
equalities for 2 parties with many 2-valued observables from facets of cut polytopes.
arXiv:0404014v3, 2004.
939
[12] Stephen D. Bartlett, Terry Rudolph, and Robert W. Spekkens. Reference frames,
superselection rules, and quantum information. Rev. Mod. Phys., 79:555–609, Apr
2007.
[14] David Beckman, Daniel Gottesman, M. A. Nielsen, and John Preskill. Causal and
localizable quantum operations. Phys. Rev. A, 64:052309, Oct 2001.
[15] Salman Beigi. Sandwiched rényi divergence satisfies data processing inequality. Journal
of Mathematical Physics, 54(12):122202, 2013.
[16] Ingemar Bengtsson and Karol Zyczkowski. Geometry of Quantum States: An Intro-
duction to Quantum Entanglement. Cambridge University Press, 2006.
[20] Charles H. Bennett, Gilles Brassard, Claude Crépeau, Richard Jozsa, Asher Peres, and
William K. Wootters. Teleporting an unknown quantum state via dual classical and
Einstein-Podolsky-Rosen channels. Phys. Rev. Lett., 70(13):1895–1899, Mar 1993.
[21] Charles H. Bennett, Gilles Brassard, Sandu Popescu, Benjamin Schumacher, John A.
Smolin, and William K. Wootters. Purification of noisy entanglement and faithful
teleportation via noisy channels. Phys. Rev. Lett., 76:722–725, Jan 1996.
[22] Charles H. Bennett, David P. DiVincenzo, Tal Mor, Peter W. Shor, John A. Smolin,
and Barbara M. Terhal. Unextendible product bases and bound entanglement. Phys.
Rev. Lett., 82:5385–5388, Jun 1999.
[23] Charles H. Bennett and Stephen J. Wiesner. Communication via one- and two-particle
operators on einstein-podolsky-rosen states. Phys. Rev. Lett., 69:2881–2884, Nov 1992.
[24] Mario Berta, Fernando G. S. L. Brandão, Gilad Gour, Ludovico Lami, Martin B. Ple-
nio, Bartosz Regula, and Marco Tomamichel. On a gap in the proof of the generalised
quantum stein’s lemma and its consequences for the reversibility of quantum resources.
2022.
940
[26] Felix Binder, Luis A. Correa, Christian Gogolin, Janet Anders, and Gerardo Adesso.
Thermodynamics in the Quantum Regime. 0168-1222. Springer Nature Switzerland
AG 2018, 2018.
[29] Fernando Brandão, Michal Horodecki, Nelly Ng, Jonathan Oppenheim, and Stephanie
Wehner. The second laws of quantum thermodynamics. Proceedings of the National
Academy of Sciences, 112(11):3275–3279, 2015.
[30] Fernando G. S. L. Brandão and Gilad Gour. Reversible framework for quantum re-
source theories. Phys. Rev. Lett., 115:070503, Aug 2015.
[33] Fernando G.S.L. Brandão, Matthias Christandl, and Jon Yard. A quasipolynomial-
time algorithm for the quantum separability problem. In Proceedings of the Forty-third
Annual ACM Symposium on Theory of Computing, STOC ’11, pages 343–352, New
York, NY, USA, 2011. ACM.
[34] Fernando G. S. L. Brandão and Nilanjana Datta. One-shot rates for entanglement
manipulation under non-entangling maps. IEEE Transactions on Information Theory,
57(3):1754–1760, March 2011.
[35] Fernando G. S. L. Brandão and Martin B. Plenio. Entanglement theory and the second
law of thermodynamics. Nature Physics, 4:873, 2008.
[36] Sarah Brandsen, Isabelle Jianing Geng, and Gilad Gour. What is entropy? a new
perspective from games of chance. 2021.
[37] Thomas R Bromley, Marco Cianciaruso, Sofoklis Vourekas, Bartosz Regula, and Ger-
ardo Adesso. Accessible bounds for general quantum resources. Journal of Physics A:
Mathematical and Theoretical, 51(32):325303, 2018.
[38] Nicolas Brunner, Daniel Cavalcanti, Stefano Pironio, Valerio Scarani, and Stephanie
Wehner. Bell nonlocality. Rev. Mod. Phys., 86:419–478, Apr 2014.
941
[39] Francesco Buscemi and Nilanjana Datta. Distilling entanglement from arbitrary re-
sources. Journal of Mathematical Physics, 51(10):102201, 2010.
[40] Francesco Buscemi and Gilad Gour. Quantum relative lorenz curves. Phys. Rev. A,
95:012110, Jan 2017.
[41] P. Busch. Quantum states and generalized observables: A simple proof of gleason’s
theorem. Phys. Rev. Lett., 91:120403, Sep 2003.
[42] Paul Busch. Informationally complete sets of physical quantities. International Journal
of Theoretical Physics, 30:1217–1227, September 1991.
[43] Eric A. Carlen. Trace inequalities and quantum entropy: An introductory course.
Contemporary Mathematics, 529:73–140, 2010.
[44] Kai Chen and Ling-An Wu. A matrix realignment method for recognizing entangle-
ment. Quantum Inf. Comput., 3(3):193–202, 2003.
[45] G. Chiribella, G. M. D’Ariano, and M. F. Sacchi. Optimal estimation of group trans-
formations using entanglement. Phys. Rev. A, 72:042338, Oct 2005.
[46] Giulio Chiribella. Optimal estimation of quantum signals in the presence of symmetry.
PhD thesis, University of Pavia, 2006.
[47] Eric Chitambar, Julio I. de Vicente, Mark W. Girard, and Gilad Gour. Entangle-
ment manipulation beyond local operations and classical communication. Journal of
Mathematical Physics, 61(4):042201, 2020.
[48] Eric Chitambar and Gilad Gour. Quantum resource theories. Rev. Mod. Phys.,
91:025001, Apr 2019.
[49] Eric Chitambar, Debbie Leung, Laura Mancinska, Maris Ozols, and Andreas Winter.
Everything you always wanted to know about locc (but were afraid to ask). Commu-
nications in Mathematical Physics, 328(1):303–326, May 2014.
[50] Matthias Christandl and Andreas Winter. “squashed entanglement”: An additive
entanglement measure. Journal of Mathematical Physics, 45(3):829–840, 2004.
[51] Dariusz Chruściński and Gniewomir Sarbicki. Entanglement witnesses: construc-
tion, analysis and classification. Journal of Physics A: Mathematical and Theoretical,
47(48):483001, 2014.
[52] Bob Coecke, Tobias Fritz, and Robert W. Spekkens. A mathematical theory of re-
sources. Information and Computation, 250:59 – 86, 2016. Quantum Physics and
Logic.
[53] Valerie Coffman, Joydip Kundu, and William K. Wootters. Distributed entanglement.
Phys. Rev. A, 61:052306, Apr 2000.
942
[54] Thomas M. Cover and Joy A. Thomas. Elements of Information Theory (Wiley Series
in Telecommunications and Signal Processing). Wiley-Interscience, 2006.
[55] I. Csiszár. Eine informationstheoretische ungleichung und ihre anwendung auf den
beweis der ergodizitat von markoffschen ketten. Magyar. Tud. Akad. Mat. Kutato Int.
Kozl., 8:85–108, 1963.
[57] Geir Dahl. Matrix majorization. Linear Algebra and its Applications, 288:53 – 73,
1999.
[58] N. Datta. Min- and max-relative entropies and a new entanglement monotone. IEEE
Transactions on Information Theory, 55(6):2816–2826, June 2009.
[61] Sebastian Deffner and Steve Campbell. Quantum Thermodynamics. 2053-2571. Morgan
Claypool Publishers, 2019.
[63] I. Devetak and A. Winter. Distillation of secret key and entanglement from quantum
states. Proc. R. Soc. A., 461:207–235, 2005.
[65] David P. DiVincenzo, Christopher A. Fuchs, Hideo Mabuchi, John A. Smolin, Ashish
Thapliyal, and Armin Uhlmann. ”Entanglement of Assistance” in Quantum Com-
puting and Quantum Communications: First NASA International Conference, QCQC
’98, Palm Springs, California, USA, February 17-20, 1998, Selected Papers. Lecture
Notes in Computer Science. Springer, 1999.
[66] Andrew C. Doherty, Pablo A. Parrilo, and Federico M. Spedalieri. Complete family of
separability criteria. Phys. Rev. A, 69:022308, Feb 2004.
943
[68] Frédéric Dupuis, Mario Berta, Jürg Wullschleger, and Renato Renner. One-shot de-
coupling. Communications in Mathematical Physics, 328:251–284, May 2014.
[69] W. Dür, G. Vidal, and J. I. Cirac. Three qubits can be entangled in two inequivalent
ways. Phys. Rev. A, 62:062314, Nov 2000.
[70] Ali Ebadian, Ismail Nikoufar, and Madjid Eshaghi Gordji. Perspectives of matrix
convex functions. Proceedings of the National Academy of Sciences, 108(18):7313–
7314, 2011.
[71] Bruce Ebanks, Prasanna Sahoo, and Wolfgang Sander. Characterizations of Informa-
tion Measures. World Scientific, apr 1998.
[73] D.K. Faddeev. On the concept of entropy of a finite probability scheme (in Russian).
Uspekhi Matematicheskikh Nauk, 11:227–231, 1956.
[74] Philippe Faist, Jonathan Oppenheim, and Renato Renner. Gibbs-preserving maps
outperform thermal operations in the quantum regime. New Journal of Physics,
17(4):043003, 2015.
[75] Kun Fang, Gilad Gour, and Xin Wang. Towards the ultimate limits of quantum channel
discrimination. 2022.
[76] Kun Fang, Xin Wang, Marco Tomamichel, and Runyao Duan. Non-asymptotic en-
tanglement distillation. IEEE Transactions on Information Theory, 65(10):6454–6465,
2019.
[77] Hamza Fawzi and Omar Fawzi. Defining quantum divergences via convex optimization.
Quantum, 5:387, January 2021.
[78] Arthur Fine. Hidden variables, joint probability, and the bell inequalities. Phys. Rev.
Lett., 48:291–295, Feb 1982.
[79] Shmuel Friedland and Gilad Gour. An explicit expression for the relative entropy of
entanglement in all dimensions. Journal of Mathematical Physics, 52(5):052201, 2011.
[80] Tobias Fritz. Resource convertibility and ordered commutative monoids. Mathematical
Structures in Computer Science, 27:850–938, 2017.
[82] W. Fulton and J. Harris. Representation Theory: A First Course. Graduate Texts in
Mathematics. Springer New York, 1991.
944
[83] Jochen Gemmer, Mathias Michel, and Gunter Mahler. Quantum Thermodynamics.
1616-6361. Springer, Berlin, Heidelberg, 2004.
[84] Mark W Girard, Gilad Gour, and Shmuel Friedland. On convex optimization problems
in quantum information theory. Journal of Physics A: Mathematical and Theoretical,
47(50):505302, 2014.
[85] Andrew Gleason. Measures on the closed subspaces of a hilbert space. Indiana Univ.
Math. J., 6:885–893, 1957.
[86] Gilad Gour. Family of concurrence monotones and its applications. Phys. Rev. A,
71:012318, Jan 2005.
[87] Gilad Gour. Entanglement of collaboration. Phys. Rev. A, 74:052307, Nov 2006.
[88] Gilad Gour. Quantum resource theories in the single-shot regime. Phys. Rev. A,
95:062314, Jun 2017.
[89] Gilad Gour. Role of quantum coherence in thermodynamics. PRX Quantum, 3:040323,
Nov 2022.
[90] Gilad Gour, Andrzej Grudka, Michal Horodecki, Waldemar Klobus, Justyna Lodyga,
and Varun Narasimhachar. Conditional uncertainty principle. Phys. Rev. A, 97:042130,
Apr 2018.
[91] Gilad Gour, David Jennings, Francesco Buscemi, Runyao Duan, and Iman Marvian.
Quantum majorization and a complete set of entropic conditions for quantum thermo-
dynamics. Nature Communications, 9(5352), 2018.
[92] Gilad Gour, Barbara Kraus, and Nolan R. Wallach. Almost all multipartite qubit
quantum states have trivial stabilizer. Journal of Mathematical Physics, 58(9), 09
2017. 092204.
[93] Gilad Gour, Iman Marvian, and Robert W. Spekkens. Measuring the quality of a
quantum reference frame: The relative entropy of frameness. Phys. Rev. A, 80:012307,
Jul 2009.
[94] Gilad Gour, Markus P. Muller, Varun Narasimhachar, Robert W. Spekkens, and
Nicole Yunger Halpern. The resource theory of informational nonequilibrium in ther-
modynamics. Physics Reports, 583:1 – 58, 2015.
[95] Gilad Gour and Carlo Maria Scandolo. Entanglement of a bipartite channel. Phys.
Rev. A, 103:062422, Jun 2021.
[96] Gilad Gour and Robert W. Spekkens. Entanglement of assistance is not a bipartite
measure nor a tripartite monotone. Phys. Rev. A, 73:062331, Jun 2006.
945
[97] Gilad Gour and Robert W Spekkens. The resource theory of quantum reference frames:
manipulations and monotones. New Journal of Physics, 10(3):033023, 2008.
[98] Gilad Gour and Marco Tomamichel. Optimal extensions of resource measures and
their applications. Phys. Rev. A, 102:062401, Dec 2020.
[99] Gilad Gour and Marco Tomamichel. Entropy and relative entropy from information-
theoretic principles. IEEE Transactions on Information Theory, 2021.
[100] Gilad Gour and Nolan R. Wallach. All maximally entangled four-qubit states. Journal
of Mathematical Physics, 51(11), 11 2010. 112201.
[101] Gilad Gour and Nolan R Wallach. Necessary and sufficient conditions for local manip-
ulation of multipartite pure quantum states. New Journal of Physics, 13(7):073013,
jul 2011.
[102] Gilad Gour and Nolan R. Wallach. Classification of multipartite entanglement of all
finite dimensionality. Phys. Rev. Lett., 111:060502, Aug 2013.
[103] Gilad Gour, Mark M. Wilde, S. Brandsen, and Isabelle J. Geng. Inevitability of
knowing less than nothing. 2022.
[104] Gilad Gour and Guo Yu. Monogamy of entanglement without inequalities. Quantum,
2:81, August 2018.
[105] Otfried Gühne and Géza Tóth. Entanglement detection. Physics Reports, 474(1):1 –
75, 2009.
[106] Yu Guo and Gilad Gour. Monogamy of the entanglement of formation. Phys. Rev. A,
99:042305, Apr 2019.
[107] Yelena Guryanova, Sandu Popescu, Anthony J. Short, Ralph Silva1, and Paul
Skrzypczyk. Thermodynamics of quantum systems with multiple conserved quanti-
ties. Nature Communications, 7:12049, 2016.
[108] Uffe Haagerup and Magdalena Musat. Factorization and dilation problems for com-
pletely positive maps on von neumann algebras. Communications in Mathematical
Physics, 303(2):555–594, 2011.
[109] Nicole Yunger Halpern, Philippe Faist, Jonathan Oppenheim, and Andreas Winter.
Microcanonical and resource-theoretic derivations of the thermal state of a quantum
system with noncommuting charges. Nature Communications, 7:12051, 2016.
[110] G.H. Hardy, J.E. Littlewood, and G Pólya. Some simple inequalities satisfied by convex
functions. Messenger of Mathematics, 58:145–152, 1929.
[111] Lucien Hardy. Quantum mechanics, local realistic theories, and lorentz-invariant real-
istic theories. Phys. Rev. Lett., 68:2981–2984, May 1992.
946
[112] Aram Harrow. Coherent communication of classical messages. Phys. Rev. Lett.,
92:097902, Mar 2004.
[113] Aram W. Harrow. The Church of the Symmetric Subspace. arXiv e-prints, page
arXiv:1308.6595, August 2013.
[114] Aram W. Harrow and Michael A. Nielsen. Robustness of quantum gates in the presence
of noise. Phys. Rev. A, 68:012308, Jul 2003.
[116] Patrick M. Hayden, Michal Horodecki, and Barbara M. Terhal. The asymptotic en-
tanglement cost of preparing a quantum state. J. Phys. A: Math. Gen., 34(35):6891,
2001.
[117] Martin Hebenstreit, Matthias Englbrecht, Cornelia Spee, Julio I. de Vicente, and Bar-
bara Kraus. Measurement outcomes that do not occur and their role in entanglement
transformations. New Journal of Physics, 23(3):033046, mar 2021.
[118] Teiko Heinosaari, Maria A. Jivulescu, David Reeb, and Michael M. Wolf. Extending
quantum operations. Journal of Mathematical Physics, 53(10):102208, 10 2012.
[119] Fumio Hiai and Milán Mosonyi. Different quantum f-divergences and the reversibility
of quantum operations. Reviews in Mathematical Physics, 29(07):1750023, 2017.
[120] FUMIO HIAI, MILAN MOSONYI, DÉNES PETZ, and CÉDRIC BÉNY. Quantum
f-divergences and error correction. Reviews in Mathematical Physics, 23(07):691–747,
2011.
[121] Fumio Hiai and Dénes Petz. The proper formula for relative entropy and its asymptotics
in quantum probability. Communications in Mathematical Physics, 143(1):99–114, Dec
1991.
[122] R.A. Horn and C.R. Johnson. Topics in Matrix Analysis. Cambridge University Press,
1999.
[123] Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press,
USA, 2nd edition, 2012.
[124] Karol Horodecki, Michal Horodecki, Pawel Horodecki, and Jonathan Oppenheim. Se-
cure key from bound entanglement. Phys. Rev. Lett., 94:160502, Apr 2005.
[125] Michal Horodecki, Karol Horodecki, Pawel Horodecki, Ryszard Horodecki, Jonathan
Oppenheim, Aditi Sen(De), and Ujjwal Sen. Local information as a resource in dis-
tributed quantum systems. Phys. Rev. Lett., 90:100402, Mar 2003.
947
[126] Michal Horodecki and Pawel Horodecki. Reduction criterion of separability and limits
for a class of distillation protocols. Phys. Rev. A, 59:4206–4216, Jun 1999.
[127] Michal Horodecki, Pawel Horodecki, and Ryszard Horodecki. Separability of mixed
states: necessary and sufficient conditions. Phys. Lett. A, 223(1–2):1–8, 1996.
[128] Michal Horodecki, Pawel Horodecki, and Ryszard Horodecki. Mixed-state entangle-
ment and distillation: is there a “bound” entanglement in nature? Phys. Rev. Lett.,
80(24):5239–5242, Jun 1998.
[129] Michal Horodecki, Pawel Horodecki, Ryszard Horodecki, Jonathan Oppenheim, Aditi
Sen(De), Ujjwal Sen, and Barbara Synak-Radtke. Local versus nonlocal information
in quantum-information theory: Formalism and phenomena. Phys. Rev. A, 71:062307,
Jun 2005.
[130] Michal Horodecki, Pawel Horodecki, and Jonathan Oppenheim. Reversible transfor-
mations from pure to mixed states and the unique measure of information. Phys. Rev.
A, 67:062104, Jun 2003.
[131] Michal Horodecki and Jonathan Oppenheim. Fundamental limitations for quantum
and nanoscale thermodynamics. Nature Communications, 4:2059, 2013.
[132] Michal Horodecki, Jonathan Oppenheim, and Carlo Sparaciari. Extremal distributions
under approximate majorization. Journal of Physics A: Mathematical and Theoretical,
51(30):305301, jun 2018.
[134] Everett Howe. A New Proof of Erdos’s Theorem on Monotone Multiplicative Functions.
The American Mathematical Monthly, 93(8):593–595, oct 1986.
[135] Piotr Ćwikliński, Michal Studziński, Michal Horodecki, and Jonathan Oppenheim.
Limitations on the evolution of quantum coherences: Towards fully quantum second
laws of thermodynamics. Phys. Rev. Lett., 115:210403, Nov 2015.
[137] D. Janzing, P. Wocjan, R. Zeier, R. Geiss, and Th. Beth. Thermodynamic cost of
reliability and low temperatures: Tightening landauer’s principle and the second law.
International Journal of Theoretical Physics, 39(12):2717–2753, Dec 2000.
[138] E. T. Jaynes. Information theory and statistical mechanics. Phys. Rev., 106:620–630,
May 1957.
948
[139] E. T. Jaynes. Information theory and statistical mechanics. ii. Phys. Rev., 108:171–190,
Oct 1957.
[140] Harry Joe. Majorization and divergence. Journal of Mathematical Analysis and Ap-
plications, 148(2):287–305, 1990.
[142] Daniel Jonathan and Martin B. Plenio. Minimal conditions for local pure-state entan-
glement manipulation. Phys. Rev. Lett., 83:1455–1458, Aug 1999.
[143] Matthew Klimesh. Inequalities that collectively completely characterize the catalytic
majorization relation. 2007.
[144] Ludovico Lami and Bartosz Regula. No second law of entanglement manipulation after
all. Nature Physics, 19(2):184–189, 2023.
[145] R. Landauer. Irreversibility and heat generation in the computing process. IBM
Journal of Research and Development, 5(3):183–191, July 1961.
[146] A. Lenard. Thermodynamical proof of the gibbs formula for elementary quantum
systems. Journal of Statistical Physics, 19(6):575–586, Dec 1978.
[147] Nicky Kai Hong Li, Cornelia Spee, Martin Hebenstreit, Julio I. de Vicente, and Barbara
Kraus. Identifying families of multipartite states with non-trivial local entanglement
transformations, 2023.
[148] Zi-Wen Liu, Xueyuan Hu, and Seth Lloyd. Resource destroying maps. Phys. Rev.
Lett., 118:060502, Feb 2017.
[150] Matteo Lostaglio, David Jennings, and Terry Rudolph. Thermodynamic resource the-
ories, non-commutativity and maximum entropy principles. New Journal of Physics,
19(4):043008, 2017.
[151] Matteo Lostaglio, Kamil Korzekwa, David Jennings, and Terry Rudolph. Quantum
coherence, time-translation symmetry, and thermodynamics. Phys. Rev. X, 5:021001,
Apr 2015.
[152] Albert W. Marshall, Ingram Olkin, and Barry Arnold. Inequalities: Theory of Ma-
jorization and Its Applications. Springer, 2011.
[153] Koji Maruyama, Franco Nori, and Vlatko Vedral. Colloquium: The physics of
maxwell’s demon and information. Rev. Mod. Phys., 81:1–23, Jan 2009.
949
[154] Iman Marvian. Symmetry, Asymmetry and Quantum Information. PhD thesis, Uni-
versity of Waterloo, 2012.
[156] Iman Marvian and Robert W Spekkens. The theory of manipulations of pure state
asymmetry: I. basic tools, equivalence classes and single copy transformations. New
Journal of Physics, 15(3):033001, 2013.
[157] Iman Marvian and Robert W Spekkens. Extending noether’s theorem by quantifying
the asymmetry of quantum states. Nature Communications, 5:3821, May 2014.
[158] Iman Marvian and Robert W. Spekkens. Modes of asymmetry: The application of
harmonic analysis to symmetric quantum dynamics and quantum reference frames.
Phys. Rev. A, 90:062110, Dec 2014.
[159] Keiji Matsumoto. Reverse test and characterization of quantum relative entropy. 2010.
[162] Adam Miranowicz and Satoshi Ishizaka. Closed formula for the relative entropy of
entanglement. Phys. Rev. A, 78:032310, Sep 2008.
[164] Tetsuzo Morimoto. Markov processes and the h-theorem. Journal of the Physical
Society of Japan, 18(3):328–331, 1963.
[165] Xiaosheng Mu, Luciano Pomatto, Philipp Strack, and Omer Tamuz. From blackwell
dominance in large samples to rényi divergences and back again. ECONOMETRICA,
89(1):475–506, Jan 2021.
[167] Martin Müller-Lennert, Frédéric Dupuis, Oleg Szehr, Serge Fehr, and Marco
Tomamichel. On quantum rényi entropies: A new generalization and some proper-
ties. Journal of Mathematical Physics, 54(12):122203, 2013.
[168] Varun Narasimhachar and Gilad Gour. Low-temperature thermodynamics with quan-
tum coherence. Nature Communications, 6(1):7689, 2015.
950
[169] M. A. Nielsen. Conditions for a class of entanglement transformations. Phys. Rev.
Lett., 83:436–439, Jul 1999.
[170] Michael A. Nielsen and Isaac L. Chuang. Quantum Computation and Quantum Infor-
mation. Cambridge University Press, 2000.
[171] Ismail Nikoufar, Ali Ebadian, and Madjid Eshaghi Gordji. The simplest proof of lieb
concavity theorem. Advances in Mathematics, 248:531–533, 2013.
[172] Michael Nussbaum and Arleta Szkola. The Chernoff lower bound for symmetric quan-
tum hypothesis testing. The Annals of Statistics, 37(2):1040 – 1057, 2009.
[173] T. Ogawa and H. Nagaoka. Strong converse and stein’s lemma in quantum hypothesis
testing. IEEE Transactions on Information Theory, 46(7):2428–2433, 2000.
[174] Jonathan Oppenheim, Michal Horodecki, Pawel Horodecki, and Ryszard Horodecki.
Thermodynamical approach to quantifying quantum correlations. Phys. Rev. Lett.,
89:180402, Oct 2002.
[175] A. Ostrowski. Sur quelques applications des fonctions convexes et concaves au sens de
i. schur. J. Math. Pures Appl., 31:253–292, 1952.
[176] Vern Paulsen. Completely Bounded Maps and Operator Algebras. Cambridge Studies
in Advanced Mathematics. Cambridge University Press, 2003.
[177] Asher Peres. Separability criterion for density matrices. Phys. Rev. Lett., 77:1413–1415,
Aug 1996.
[178] Dénes Petz. Quasi-entropies for states of a von neumann algebra. European Mathe-
matical Society Publishing House, 21(4):787–800, 1985.
[182] John Preskill. Lecture Notes for Physics 229:Quantum Information and Computation.
CreateSpace Independent Publishing Platform, 2015.
[184] W. Pusz and S. L. Woronowicz. Passive states and kms states for general quantum
systems. Communications in Mathematical Physics, 58(3):273–290, Oct 1978.
951
[185] E. M. Rains. Bound on distillable entanglement. Phys. Rev. A, 60:179–184, Jul 1999.
[186] Alexey E Rastegin. Notes on general SIC-POVMs. Physica Scripta, 89(8):085101, jun
2014.
[188] Bartosz Regula, Kun Fang, Xin Wang, and Mile Gu. 21(10):103017, oct 2019.
[189] Joseph M. Renes. Relative submajorization and its use in quantum resource theories.
Journal of Mathematical Physics, 57(12):122202, 2016.
[190] Alfréd Rényi. On measures of entropy and information. The 4th Berkeley Symposium
on Mathematics, Statistics and Probability, 1960, pages 547–561, 1961.
[191] Arnau Riera, Christian Gogolin, and Jens Eisert. Thermalization in nature and on a
quantum computer. Phys. Rev. Lett., 108:080402, Feb 2012.
[192] Roberto Rubboli and Marco Tomamichel. New additivity properties of the relative
entropy of entanglement and its generalizations. 2022.
[193] Ernst Ruch, Rudolf Schranner, and Thomas H. Seligman. The mixing distance. The
Journal of Chemical Physics, 69(1):386–392, 1978.
[194] Oliver Rudolph. Some properties of the computable cross-norm criterion for separa-
bility. Phys. Rev. A, 67:032312, Mar 2003.
[195] Jun John Sakurai. Modern quantum mechanics; rev. ed. Addison-Wesley, Reading,
MA, 1994.
[196] David Sauerwein, Nolan R. Wallach, Gilad Gour, and Barbara Kraus. Transformations
among pure multipartite entangled states via local operations are almost never possible.
Phys. Rev. X, 8:031020, Jul 2018.
[197] Valerio Scarani, Sofyan Iblisdir, Nicolas Gisin, and Antonio Acı́n. Quantum cloning.
Rev. Mod. Phys., 77:1225–1256, Nov 2005.
[199] I. Schur. Uber einc klasse von mittelbildungen mit anwendungen auf die determinanten-
theorie. Sitzungsberichte der Berliner Mathematischen Gesellschaft, 22:9–20, 1923.
952
[201] John A. Smolin, Frank Verstraete, and Andreas Winter. Entanglement of assistance
and multipartite state distillation. Phys. Rev. A, 72:052317, Nov 2005.
[202] Carlo Sparaciari, Jonathan Oppenheim, and Tobias Fritz. Resource theory for work
and heat. Phys. Rev. A, 96:052112, Nov 2017.
[203] C. Spee, J. I. de Vicente, and B. Kraus. The maximally entangled set of 4-qubit states.
Journal of Mathematical Physics, 57(5), 05 2016. 052201.
[204] C. Spee, J. I. de Vicente, D. Sauerwein, and B. Kraus. Entangled pure state transforma-
tions via local operations assisted by finitely many rounds of classical communication.
Phys. Rev. Lett., 118:040503, Jan 2017.
[205] Erling Størmer. Positive linear maps of operator algebras. Acta Mathematica,
110(none):233 – 278, 1963.
[206] Khatri Sumeet and Mark M. Wilde. Principles of quantum communication theory: A
modern approach. 2021.
[207] Barbara M. Terhal and Pawel Horodecki. Schmidt number for density matrices. Phys.
Rev. A, 61:040301, Mar 2000.
[208] M. Tomamichel. Quantum Information Processing with Finite Resources: Mathemati-
cal Foundations. SpringerBriefs in Mathematical Physics. Springer International Pub-
lishing, 2015.
[209] Marco Tomamichel, Mario Berta, and Masahito Hayashi. Relating different quantum
generalizations of the conditional rényi entropy. Journal of Mathematical Physics,
55(8):082206, 2014.
[210] Marco Tomamichel, Roger Colbeck, and Renato Renner. A fully quantum asymptotic
equipartition property. IEEE Transactions on Information Theory, 55(12):5840–5847,
2009.
[211] Robert R. Tucci. Relaxation method for calculating quantum entanglement. 2001.
[212] S. Turgut. Catalytic transformations for bipartite pure states. Journal of Physics A:
Mathematical and Theoretical, 40(40):12185, 2007.
[213] J. A. Vaccaro, F. Anselmi, H. M. Wiseman, and K. Jacobs. Tradeoff between ex-
tractable mechanical work, accessible entanglement, and ability to act as a reference
system, under arbitrary superselection rules. Phys. Rev. A, 77:032114, Mar 2008.
[214] Wim van Dam and Patrick Hayden. Universal entanglement transformations without
communication. Phys. Rev. A, 67:060302, Jun 2003.
[215] Tim van Erven and Peter Harremos. Rényi divergence and kullback-leibler divergence.
IEEE Transactions on Information Theory, 60(7):3797–3820, 2014.
953
[216] V. Vedral. The role of relative entropy in quantum information theory. Rev. Mod.
Phys., 74:197–234, Mar 2002.
[217] V. Vedral and M. B. Plenio. Entanglement measures and purification procedures. Phys.
Rev. A, 57(3):1619–1633, Mar 1998.
[218] F. Verstraete, J. Dehaene, B. De Moor, and H. Verschelde. Four qubits can be entangled
in nine different ways. Phys. Rev. A, 65:052112, Apr 2002.
[220] Frank Verstraete, Jeroen Dehaene, and Bart De Moor. Normal forms and entanglement
measures for multipartite quantum states. Phys. Rev. A, 68:012103, Jul 2003.
[221] G. Vidal, W. Dür, and J. I. Cirac. Entanglement cost of bipartite mixed states. Phys.
Rev. Lett., 89:027901, Jun 2002.
[223] Guifré Vidal. Entanglement of pure states for a single copy. Phys. Rev. Lett., 83:1046–
1049, Aug 1999.
[225] Guifré Vidal and Rolf Tarrach. Robustness of entanglement. Phys. Rev. A, 59:141–155,
Jan 1999.
[226] Nolan R. Wallach. Lectures on quantum computing, venice c.i.m.e., june 2004 (un-
published). 2004.
[227] Nolan R. Wallach. Geometric Invariant Theory. Springer Cham, sep 2017.
[228] Ligong Wang and Renato Renner. One-shot classical-quantum capacity and hypothesis
testing. Phys. Rev. Lett., 108:200501, May 2012.
[229] Xin Wang and Mark M. Wilde. Cost of quantum entanglement simplified. Phys. Rev.
Lett., 125:040502, Jul 2020.
[230] John Watrous. The Theory of Quantum Information. Cambridge University Press,
2018.
[232] Mark M. Wilde. Quantum Information Theory. Cambridge University Press, second
edition, 2017.
954
[233] Mark M. Wilde, Andreas Winter, and Dong Yang. Strong converse for the classical
capacity of entanglement-breaking and hadamard channels via a sandwiched rényi
relative entropy. Communications in Mathematical Physics, 331(2):593–622, Oct 2014.
[234] Andreas Winter. Tight uniform continuity bounds for quantum entropies: Condi-
tional entropy, relative entropy distance and energy constraints. Communications in
Mathematical Physics, 347(1):291–313, Oct 2016.
[236] S.L. Woronowicz. Positive maps of low dimensional matrix algebras. Reports on
Mathematical Physics, 10(2):165–183, 1976.
[237] Nicole Yunger Halpern. Beyond heat baths ii: framework for generalized thermo-
dynamic resource theories. Journal of Physics A: Mathematical and Theoretical,
51(9):094001, 2018.
[238] Nicole Yunger Halpern and Joseph M. Renes. Beyond heat baths: Generalized resource
theories for small-scale thermodynamics. Phys. Rev. E, 93:022126, Feb 2016.
[239] Elia Zanoni, Thomas Theurer, and Gilad Gour. Complete characterization of entan-
glement embezzlement. 2023.
[240] Li-Jun Zhao and Lin Chen. Additivity of entanglement of formation via an
entanglement-breaking space. Phys. Rev. A, 99:032310, Mar 2019.
955
956