0% found this document useful (0 votes)

27 views958 pages

Resources of The Quantum World: A Modern Textbook On Quantum Resource Theories

Uploaded by

hr5nttgnpg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views958 pages

Resources of The Quantum World: A Modern Textbook On Quantum Resource Theories

Uploaded by

hr5nttgnpg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 958

arXiv:2402.

05474v1 [quant-ph] 8 Feb 2024

Resources of the Quantum World

A modern textbook on quantum resource theories

Volume 1: Static Resources

Gilad Gour

February 9, 2024
2

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

Dedicated to the memory of my beloved father,
Gideon Gour
1949 – 2021
2

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

Contents

List of Symbols 13

1 Introductory Material 17
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2 About This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3 The Structure of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4 Resurrection of Quantum Entanglement: The Birth of a Fundamental Resource 24
1.5 Resource Analysis and Reversibility . . . . . . . . . . . . . . . . . . . . . . . 30
1.6 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

I Preliminaries 37

2 Elements of Quantum Mechanics I: Closed Systems 39

2.1 The Stern-Gerlach Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.2 Inner Product Spaces and Hilbert Spaces . . . . . . . . . . . . . . . . . . . . 43
2.3 Linear Operators on Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . 51
2.4 Encoding Information in Quantum States . . . . . . . . . . . . . . . . . . . . 64
2.5 Quantum Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.6 Hidden Variable Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.7 Unitary Evolution and the Schrödinger Equation . . . . . . . . . . . . . . . . 85
2.8 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

3 Elements of Quantum Mechanics II: Open Systems 91

3.1 Generalized Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.2 The Mixed Quantum State . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.3 Positive Operator Valued Measure (POVM) . . . . . . . . . . . . . . . . . . 101
3.4 Evolution of Open Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
3.5 Examples of Quantum Channels . . . . . . . . . . . . . . . . . . . . . . . . . 129
3.6 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

3
4 CONTENTS

II Tools and Methods 151

4 Majorization 153
4.1 Majorization Between Probability Vectors . . . . . . . . . . . . . . . . . . . 153
4.2 Approximate Majorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
4.3 Relative Majorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
4.4 The Trumping Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
4.5 Catalytic Majorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
4.6 Conditional Majorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
4.7 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

5 Divergences and Distance Measures 229

5.1 Classical Divergences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
5.2 Quantum Divergences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
5.3 Optimal Quantum Extensions of Divergences . . . . . . . . . . . . . . . . . . 249
5.4 Divergences that are Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
5.5 Distance Between Sub-normalized States . . . . . . . . . . . . . . . . . . . . 275
5.6 The Purified Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
5.7 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286

6 Entropies and Relative Entropies 287

6.1 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
6.2 Classical Relative Entropies . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
6.3 Quantum Relative Entropies . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
6.4 Optimal Quantum Extensions of Relative Entropies . . . . . . . . . . . . . . 316
6.5 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327

7 Conditional Entropy 329

7.1 Quantum Conditional Majorization . . . . . . . . . . . . . . . . . . . . . . . 329
7.2 Definition of Conditional Entropy . . . . . . . . . . . . . . . . . . . . . . . . 341
7.3 Inevitability of Negative Quantum Conditional Entropy . . . . . . . . . . . . 343
7.4 Conditional Entropies from Relative Entropies . . . . . . . . . . . . . . . . . 350
7.5 Examples of Quantum Conditional Entropies . . . . . . . . . . . . . . . . . . 352
7.6 Duality Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
7.7 The Decoupling Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
7.8 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371

8 The Asymptotic Regime 373

8.1 Classical Typicality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
8.2 Quantum Typicality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
8.3 Relative Typicality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
8.4 The Method of Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
8.5 Strong Typicality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

CONTENTS 5

8.6 Classical Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 401

8.7 Quantum Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 410
8.8 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421

III The General Framework of Resource Theories 423

9 Static Quantum Resource Theories 425
9.1 The Structure of Quantum Resource Theories . . . . . . . . . . . . . . . . . 425
9.2 State-Based Resource Theories . . . . . . . . . . . . . . . . . . . . . . . . . . 430
9.3 Affine Resource Theories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
9.4 Resource Witnesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
9.5 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447

10 Quantification of Quantum Resources 449

10.1 Definitions and Properties of Resource Measures . . . . . . . . . . . . . . . . 449
10.2 Distance-Based Resource Measures . . . . . . . . . . . . . . . . . . . . . . . 455
10.3 Computation of the Relative Entropy of a Resource . . . . . . . . . . . . . . 468
10.4 Smoothing of Resource Measures . . . . . . . . . . . . . . . . . . . . . . . . 474
10.5 Resource Monotones and Support Functions . . . . . . . . . . . . . . . . . . 487
10.6 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490

11 Manipulation of Resources 491

11.1 Single-Shot Interconversions . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
11.2 Generalized Asymptotic Equipartition Property . . . . . . . . . . . . . . . . 500
11.3 The Generalized Quantum Stein’s Lemma . . . . . . . . . . . . . . . . . . . 505
11.4 The Uniqueness of the Umegaki Relative Entropy . . . . . . . . . . . . . . . 507
11.5 Asymptotic Interconversions . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
11.6 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523

IV Entanglement Theory 525

12 Pure-State Entanglement 527
12.1 Definition of Quantum Entanglement . . . . . . . . . . . . . . . . . . . . . . 527
12.2 Exact Manipulations of Entanglement . . . . . . . . . . . . . . . . . . . . . . 530
12.3 Quantification of Pure Bipartite Entanglement . . . . . . . . . . . . . . . . . 538
12.4 Stochastic Interconversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
12.5 Approximate Single-Shot Conversions . . . . . . . . . . . . . . . . . . . . . . 544
12.6 Asymptotic Entanglement Theory of Pure States . . . . . . . . . . . . . . . 554
12.7 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

6 CONTENTS

13 Mixed-State Entanglement 557

13.1 Detection of Entanglement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
13.2 Quantification of Entanglement . . . . . . . . . . . . . . . . . . . . . . . . . 570
13.3 The Conversion Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
13.4 Single-Shot Distillable Entanglement . . . . . . . . . . . . . . . . . . . . . . 609
13.5 Asymptotic Distillable Entanglement . . . . . . . . . . . . . . . . . . . . . . 612
13.6 Single-Shot Entanglement Cost . . . . . . . . . . . . . . . . . . . . . . . . . 616
13.7 Asymptotic Entanglement Cost . . . . . . . . . . . . . . . . . . . . . . . . . 622
13.8 Beyond LOCC: Non-Entangling Operations . . . . . . . . . . . . . . . . . . 626
13.9 Beyond LOCC: NPT-Entanglement Theory . . . . . . . . . . . . . . . . . . 629
13.10Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641

14 Multipartite Entanglement 645

14.1 Stochastic LOCC (SLOCC) . . . . . . . . . . . . . . . . . . . . . . . . . . . 645
14.2 SL-Invariant Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647
14.3 Characteristics of Multipartite Entanglement . . . . . . . . . . . . . . . . . . 657
14.4 Multipartite Entanglement of Three and Four Qubits . . . . . . . . . . . . . 663
14.5 Deterministic Interconversions of Multipartite Entanglement . . . . . . . . . 671
14.6 Entanglement of Assistance . . . . . . . . . . . . . . . . . . . . . . . . . . . 677
14.7 Monogamy of Entanglement . . . . . . . . . . . . . . . . . . . . . . . . . . . 682
14.8 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 691

V Additional Examples of Static Resource Theories 693

15 The Resource Theory of Asymmetry 695
15.1 Free States and Free Operations . . . . . . . . . . . . . . . . . . . . . . . . . 695
15.2 Distinctive Concepts in the QRT of Asymmetry . . . . . . . . . . . . . . . . 699
15.3 Quantification of Asymmetry . . . . . . . . . . . . . . . . . . . . . . . . . . 715
15.4 Manipulation of Pure Asymmetry . . . . . . . . . . . . . . . . . . . . . . . . 732
15.5 Manipulation of Mixed Asymmetry . . . . . . . . . . . . . . . . . . . . . . . 744
15.6 Time Translation Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . 746
15.7 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756

16 The Resource Theory of Nonuniformity 757

16.1 The Free Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757
16.2 Measures of Nonuniformity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 761
16.3 Interconversions in the Single-Shot Regime . . . . . . . . . . . . . . . . . . . 761
16.4 Asymptotic Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 767
16.5 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

CONTENTS 7

17 Quantum Thermodynamics 771

17.1 Thermal States and Athermality States . . . . . . . . . . . . . . . . . . . . . 772
17.2 The Free Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 778
17.3 Quasi-Classical Athermality . . . . . . . . . . . . . . . . . . . . . . . . . . . 786
17.4 Quantification of Athermality . . . . . . . . . . . . . . . . . . . . . . . . . . 789
17.5 Single-Shot Exact Interconversions . . . . . . . . . . . . . . . . . . . . . . . 793
17.6 The Conversion Distance of Athermality . . . . . . . . . . . . . . . . . . . . 801
17.7 Distillation and Cost in the Single-Shot Regime . . . . . . . . . . . . . . . . 808
17.8 The Asymptotic Regime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 810
17.9 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 819

VI Appendices 821
A Elements of Convex Analysis 823
A.1 The Hyperplane Separation Theorem . . . . . . . . . . . . . . . . . . . . . . 823
A.2 Convex Hulls, Faces, and Polytopes . . . . . . . . . . . . . . . . . . . . . . . 826
A.3 Extreme Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 828
A.4 Polyhedrons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831
A.5 Affine Subspaces and the Birkhoff Polytope . . . . . . . . . . . . . . . . . . 832
A.6 Polarity and Half Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834
A.7 Support Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 837
A.8 Convex Cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838
A.9 Conic Linear Programming and Semidefinite Programming . . . . . . . . . . 839
A.10 Fixed-Point Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843
A.11 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844

B Operator Monotonicity and Operator Convexity 845

B.1 Definitions and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . 845
B.2 Key Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847
B.3 Trace Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 848
B.4 Characterization of Operator Convexity . . . . . . . . . . . . . . . . . . . . . 852
B.5 The Kobu-Ando Operator Mean . . . . . . . . . . . . . . . . . . . . . . . . . 856
B.6 Lieb’s Concavity Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857
B.7 The Quantum Weighted Geometric Mean Inequality . . . . . . . . . . . . . . 859
B.8 The Schur Complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 860
B.9 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 862

C Elements of Representation Theory 863

C.1 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863
C.2 Group Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 868
C.3 Unitary Projective Representations . . . . . . . . . . . . . . . . . . . . . . . 871
C.4 Invariant Measures Over a Lie Group . . . . . . . . . . . . . . . . . . . . . . 878

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

8 CONTENTS

C.5 Orthogonality Between Irreps . . . . . . . . . . . . . . . . . . . . . . . . . . 881

C.6 The Regular Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 883
C.7 The Characteristic Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 889
C.8 Positive Definite Functions on a Group . . . . . . . . . . . . . . . . . . . . . 892
C.9 G-Invariant Isometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895
C.10 The Symmetric Subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896
C.11 Notes and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 902

D Miscellany 903
D.1 The Divided difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 903
D.2 The Maximal f -Divergence: Singular Case . . . . . . . . . . . . . . . . . . . 907
D.3 Smoothing with the Second Variable of Dmax . . . . . . . . . . . . . . . . . . 912
D.4 Two Proofs of the Classical Stein’s Lemma . . . . . . . . . . . . . . . . . . . 914
D.5 Alternative (direct) proofs of Theorem 12.6.1 and Theorem 12.6.2 . . . . . . 918
D.6 Beyond States that are G-Regular . . . . . . . . . . . . . . . . . . . . . . . . 920
D.7 Proof of Theorem 17.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 922
D.8 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 928
D.9 Alternative Proof of Blackwell Theorem . . . . . . . . . . . . . . . . . . . . . 929
D.10 Symmetric Purification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 936

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

Acknoledgments

I extend my deepest gratitude to the numerous colleagues who have profoundly influenced my
understanding of quantum information and quantum resource theories. Through countless
discussions, their insights have enriched my perspective and deepened my knowledge. I am
particularly indebted to Fernando G.S.L. Brandão, S. Brandsen, Francesco Buscemi, Giulio
Chiribella, Eric Chitambar, Nilanjana Datta, Julio De Vicente, Runyao Duan, Kun Fang,
Shmuel Friedland, Yu Guo, Aram Harrow, Michal Horodecki, David Jennings, Amir Kalev,
Barbara Kraus, Ludovico Lami, Iman Marvian, David A. Meyer, Markus P. Müller, Varun
Narasimhachar, Jonathan Oppenheim, Carlo Maria Scandolo, Bartosz Regula, Robert W.
Spekkens, Marco Tomamichel, Nolan Wallach, Xin Wang, Mark M. Wilde, Andreas Winter,
and Nicole Yunger Halpern for their invaluable contributions.
My sincere appreciation goes to Julio Inigo De Vicente Majua for his meticulous review of
the chapter on multipartite entanglement, offering numerous improvements and corrections.
Special thanks to Thomas Theurer, whose relentless feedback on various drafts has been
instrumental in refining this work. Additionally, I am grateful to Mark M. Wilde for his
advice on enhancing the notations and clarity of presentation.
I owe a debt of gratitude to the many students I’ve had the privilege of interacting with
over the years. Their keen observations and identification of errors and typos in various
drafts have been crucial in shaping the final content of this book. I thank John Burniston,
Nuiok Decaire, Raz Firanko, Kimberly Golubeva, Michael Grabowecky, Alexander Hickey,
Takla Nateeboon, Gaurav Saxena, Kuntal Sengupta, Guy Shemesh, Samuel Steakley, Goni
Yoeli, and Elia Zanoni for their contributions.
Lastly, but most importantly, I wish to express my heartfelt appreciation to my family.
To my parents, Iris and Gideon Gour, whose unwavering support and belief in my pursuits
have been the bedrock of my resilience and determination. Their love and encouragement
have been a constant source of strength throughout this journey. To my children, Sophia
and Elijah Gour, who have been my greatest source of inspiration. Their curiosity and
enthusiasm for life remind me daily of the joy of discovery and the importance of sharing
knowledge. And to my life partner, Eve Zhang, whose endless support and understanding
have been nothing short of miraculous. Her presence and encouragement have been my
guiding light, helping me navigate the challenges of this endeavor and crossing the finish
line of completing this book. My journey is as much theirs as it is mine, and I am eternally
grateful for their love, patience, and sacrifice.

9
10 CONTENTS

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

Notations

First letters of the English alphabet, such as A, B, and C, are used to denote both quantum
physical systems and their corresponding Hilbert spaces. The letter R is used to denote a
quantum reference physical system (and its corresponding Hilbert space), and sometimes the
letter E is used to denote the environment system. The last letters of the English alphabet,
such as X, Y , and Z are used to denote classical systems or classical registers. The dimension
of a Hilbert space is denoted with vertical lines; e.g. the dimension of systems A, B, and
X, are denoted respectively as |A|, |B|, and |X|. The tilde symbol above a system always
represents a replica of the system. For example, Ã represents another copy of system A and
in particular |A| = |Ã|.
We use |ΩAÃ ⟩ to denote the unnormalized maximally entangled state x∈[m] |xx⟩, and
P

|ΦAÃ ⟩ the normalized maximally entangled state √1

P
|A| x∈[m] |xx⟩. We also use notations

ψ, ϕ, Ω, Φ to denotes respectively the rank one pure states |ψ⟩⟨ψ|, |ϕ⟩⟨ϕ|, |Ω⟩⟨Ω|, |Φ⟩⟨Φ|.

11
12 CONTENTS

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

List of Symbols

[n] The set {1, . . . , n}

|A| Dimension of the Hilbert space A
Cm×n , Rm×n The set of all m × n complex and real matrices
0m,n The m × n zero matrix
Rm×n
+ The set of all matrices in Rm×n with non-negative components.
[a, b]n The set of all vectors in Rn whose components are between a and b.
Prob(n) The set of all probability vectors in Rn+
Prob>0 (n) The vectors in Prob(n) whose components are all positive.
Prob(m, n) The set of matrices in Rm×n
+ whose components sum to one
Prob↓ (n) The set of all vectors in Prob(n) whose components are arranged in
non-increasing order
Pr(X = x) The probability that a random variable X equals x
STOCH(m, n) The set of all m × n column stochastic matrices
Diag(c1 , . . . , cn ) An n × n diagonal matrix with c1 , . . . , cn on its diagonal
L(A, B) The set of all linear operators from the Hilbert space A to B
L(A) The set of all linear operators from the Hilbert space A to itself
Pos(A) The set of all positive semidefinite operators in L(A)
Pos>0 (A) The set of all positive definite operators in L(A)
D(A) The set of all density matrices (mixed quantum states) in Pos(A)
D⩽ (A) The set of all the elements in Pos(A) with trace no greater than one

13
14 CONTENTS

Pure(A) The set of all pure (i.e. rank one) density matrices in D(A)
Herm(A) The set of all Hermitian operators in L(A)
L(A → B) The set of all linear operators form L(A) to L(B)
Herm(A → B) The subset {E ∈ L(A → B) : E(ρ) ∈ Herm(B) ∀ ρ ∈ Herm(A)}
CP(A → B) The set of all completely positive maps in L(A → B)
CPTP(A → B) The set of all quantum channels in L(A → B)
Pos(A → B) The set of all positive maps in L(A → B)
idA The identity element (channel) of L(A → A)
#f The Kobu-Ando Operator Mean (Definition B.5.1)
Irr(π) The set of all irreps (up to equivalency) appearing in the
decomposition of π
In n × n identity matrix
IA The identity operator in L(A)
uA The maximally mixed state in L(A)
u(n) The uniform probability vector in Prob(n)
1n The column vector (1, . . . 1)T in Rn
|ΩAÃ ⟩
P
The unnormalized maximally entangled state x∈[m] |xx⟩
|ΦAÃ ⟩ The normalized maximally entangled state √1
P
x∈[m] |xx⟩
|A|

Eff(A) The set of all effects in Pos(A); i.e. Λ ∈ Eff(A) if and only if 0 ⩽ Λ ⩽ I A .
Im(T ) The image of T ∈ L(A, B).
Ker(T ) The kernel of T ∈ L(A, B).
supp(T ) The support subspace of T ∈ L(A, B).
supp(p) The set {x ∈ [n] : px > 0}, where p = (p1 , . . . , pn )T ∈ Prob(n).
ρ≪σ Inclusion of supports; supp(ρ) ⊆ supp(σ) for ρ, σ ∈ D(A).
spec(H) The set of all distinct eigenvalues of an Hermitian operator H ∈ Herm(A).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

CONTENTS 15

∥ · ∥p The Schatten p-norm with p ∈ [1, ∞] (see (2.68)).

∥ · ∥(k) The Ky Fan norm with k ∈ [n] (see Definition 2.3.2 and (2.78)).
Bε (p) The set {p′ ∈ Prob(n) : 1
2
∥p − p′ ∥1 ⩽ ε}, where p ∈ Prob(n) and ε ∈ [0, 1].
Bε (ρ) The set {ρ′ ∈ D(A) : 1
2
∥ρ − ρ′ ∥1 ⩽ ε}, where ρ ∈ D(A) and ε ∈ [0, 1].
σ ≈ε ρ Short notation for σ ∈ Bε (ρ).
Πρ Projection to the support of ρ.
Tε (X n ) The set of ε-typical sequences.
Tε (An ) The ε-typical subspace of An .
Tst n
ε (X ) The set of strongly ε-typical sequences.
Tst n
ε (A ) The strongly ε-typical subspace of An .
PPT(AB) The set of density matrices in D(AB) with positive partial transpose.
ρΓ The partial transpose of a bipartite state ρ ∈ D(AB) w.r.t. system B.

a⪆b a ⩾ log 2b , a, b ∈ R+

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

16 CONTENTS

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

CHAPTER 1

Introductory Material

1.1 Introduction
A recurring theme in the field of physics is the endeavor to unify a variety of distinct physical
phenomena into a comprehensive framework that can offer both descriptions and explana-
tions for each of them. One of the most astounding achievements in this endeavor is the
unification of fundamental forces. When physicists realized that the forces of electricity and
magnetism could be elegantly described using a single framework, it not only substantially
enhanced our comprehension of these forces but also gave birth to the expansive domain of
electromagnetism.
The remarkable success in unifying forces serves as a testament to the fact that seemingly
unrelated phenomena can often be traced back to a common origin. This approach extends
beyond the realm of forces and finds resonance in the burgeoning field of quantum information
science. Within this field, a novel discipline has emerged, which seeks to identify shared
characteristics among seemingly disparate quantum phenomena. The overarching theme
of this approach lies in the recognition that various attributes of physical systems can be
defined as “resources.” This recognition not only alters our perspective on these phenomena
but also seamlessly integrates them within a comprehensive framework known as “quantum
resource theories.”
For example, take the case of quantum entanglement. In the 1990s, it was transformed
from a topic of philosophical debates and discussions into a valuable resource. This trans-
formative shift revolutionized our perception of entanglement; it evolved from being an
intriguing and non-intuitive phenomenon into the essential driving force behind numerous
quantum information tasks. This new perspective on entanglement opened up a vast array
of possibilities and applications, starting with its utilization in quantum teleportation and
superdense coding. Today, entanglement stands as a fundamental resource in fields such as
quantum communication, quantum cryptography, and quantum computing.
Given the success of entanglement theory, it is only natural to explore other physical
phenomena that can also be recognized as valuable resources. Currently, there are several
quantum phenomena that have been identified as such. These encompass areas such as

17
18 CHAPTER 1. INTRODUCTORY MATERIAL

quantum and classical communication, athermality (within the realm of quantum thermo-
dynamics), asymmetry, magic (in the context of quantum computation), quantum coherence,
Bell non-locality, quantum contextuality, quantum steering, incompatibility of quantum mea-
surements, and many more. The recognition of all these phenomena as resources enables us
to unify them under the umbrella of quantum resource theories.
Resource theories serve as a crucial framework for addressing complex questions. They
aim to unravel puzzles such as determining which sets of resources can be transformed into
one another and the methods by which such conversions can occur. Additionally, they explore
how to measure and detect different resources. If a direct transformation between particular
resources is not feasible, resource theories examine the possibility of non-deterministic con-
versions and the computation of their associated probabilities. The introduction of catalysts
into the equation further deepens the inquiry.
This investigative approach often yields profound insights into the underlying nature
of the physical or information-theoretic phenomena under scrutiny—such as entanglement,
asymmetry, athermality, and more. Furthermore, this perspective provides a structured
framework for organizing theoretical findings pertaining to these phenomena. As demon-
strated by the evolution of entanglement theory, the resource-theoretic perspective possesses
the potential to revolutionize our understanding of familiar subjects.
In this context, chemistry exemplifies this framework, elucidating how abundant collec-
tions of chemicals can be converted into more valuable products. Similarly, thermodynamics
fits this mold by addressing inquiries about the conversion of various types of nonequilibrium
states—thermal, mechanical, chemical, and more—into one another, including the extraction
of useful work from heat baths at differing temperatures.
Within the realm of quantum resource theories, a fundamental challenge arises in identi-
fying equivalence classes of quantum systems that can be reversibly interconvert (or simulate
each other) when considering an abundance of resource copies, and determining the rates at
which these interconversions occur. The relative entropy of a resource plays a pivotal role
in such reversible transformations, gauging the resourcefulness of a system by quantifying
its deviation from the set of free (non-resourceful) systems. Remarkably, this function uni-
fies essential (pseudo) metrics across seemingly disparate scientific domains. For instance,
the relative entropy of a resource manifests as free energy in thermodynamics, entangle-
ment entropy in pure state entanglement theory, and the entanglement-assisted capacity of
a quantum channel in quantum communication; see Fig. 1.1.

1.2 About This Book

As mentioned in the introduction, quantum resource theories have recently emerged as a
vibrant research area within the quantum information science community. Initially, the em-
phasis was on understanding the resources used in quantum information processing tasks.
However, it has become increasingly evident that quantum resources have broad relevance,
extending from quantum computing to quantum thermodynamics and the fundamental prin-
ciples of quantum physics. This realization has spurred rapid developments in the field, re-

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

1.2. ABOUT THIS BOOK 19

Figure 1.1: Unification of Resources

sulting in a proliferation of publications and the development of new tools and mathematical
methods that firmly underpin this area of study.
In light of the extensive literature in the field of quantum information science, one might
understandably question the need for yet another book on quantum resources. Isn’t this
territory already covered in existing quantum information textbooks? For instance, quantum
Shannon theory can be seen as a theory of interconversions among different types of resources,
and Wilde [232] and Watrous [230] have produced outstanding books delving into these
topics. Additionally, detailed treatments of subjects covered in this book, such as quantum
divergences and Rényi entropies, can be found in Tommamichel’s noteworthy work [208].
While it is accurate to say that many of the topics covered in this book are available
elsewhere, what distinguishes this book is its unique approach. It explores well-trodden
subjects like entropy, uncertainty, divergences, non-locality, entanglement, and energy from
a fresh perspective rooted in resource theories. Specifically, the book adopts an axiomatic
approach to rigorously introduce these concepts, providing illustrative examples. Only then
does it transition to operational aspects that involve the examples discussed.
Take, for instance, the topic of conditional entropy, a subject widely covered in numerous
textbooks in both classical and quantum information theory. This book, however, offers a
distinctive approach by presenting this concept from three distinct perspectives: axiomatic,
constructive, and operational. Notably, all three perspectives converge to the same notion of
conditional entropy. This approach not only provides the reader with a deeper understanding
of the concept but also underscores its robust foundation.
The primary goal of this book is pedagogical in nature, with the hope of providing readers
with a contemporary perspective on quantum resource theories. It aspires to equip readers
with the necessary physical principles and advanced mathematical techniques required to
comprehend recent advancements in this field. Upon completing this book, readers should
have the ability to explore open problems and research directions within the field, some of
which will be highlighted in the text.
In anticipation of a diverse readership, this book is designed to be inclusive, targeting
both graduate students and senior undergraduate students who possess a foundational un-

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

20 CHAPTER 1. INTRODUCTORY MATERIAL

derstanding of linear algebra. It aims to provide them with a comprehensive resource for
delving into this fascinating field. Simultaneously, the book serves as a reference, offering
fresh insights and innovative approaches that researchers in the early stages of their careers
may find valuable. With numerous examples and exercises, it aims to serve as a textbook
for courses on the subject, enhancing the learning experience for students.
While the primary audience for this textbook consists of entry-level graduate students
interested in pursuing research at the master’s or Ph.D. level in quantum resource theories,
encompassing quantum information science, it may also prove valuable to researchers in fields
influenced by quantum information and resource theories, such as quantum thermodynamics
and condensed matter physics. They may find this book to be a useful and accessible
reference source.
Although we have endeavored to make the book self-contained, a basic understanding of
linear algebra is essential. The goal was to create a resource accessible to graduate students
from diverse backgrounds in mathematics, physics, and computer science. As a result, the
book includes preliminary chapters and several appendices that fill potential knowledge gaps,
given the interdisciplinary nature of the subject matter.
Quantum resource theories constitute a vast research area, with new properties of physical
systems continually being recognized as resources. Consequently, the aim of this book is
not to exhaustively cover all resource theories but rather to select those that illustrate the
techniques used in quantum resource theories effectively. On the technical front, we have
chosen to begin with the modern single-shot approach and employ it to derive asymptotic
rates. Historically, asymptotic rates were studied first, but from a pedagogical standpoint,
it is more intuitive to start with the single-shot regime.
To the best of our knowledge, there are currently no dedicated books specifically focused
on quantum resource theories. With this book, we hope to contribute to the field by providing
a comprehensive overview and integrating both new and existing results within a unified
framework. While we do not claim this book to be the ultimate authority, we believe it can
serve as a valuable reference that consolidates ideas scattered across various journal articles,
addressing the need for a centralized resource in the field of quantum resource theories.

1.3 The Structure of the Book

In this book, we delve deep into the comprehensive framework of quantum resource theories,
offering a detailed study of their general principles and equipping readers with the neces-
sary tools and methodologies. We extensively cover three illustrative examples of resource
theories—Entanglement, Asymmetry, and Thermodynamics—chosen for their pedagogical
value in showcasing the diverse facets of quantum resource theories. While we do not have
a dedicated chapter solely focused on quantum coherence, this concept is seamlessly woven
into our broader discussions. It serves as a recurring illustrative example that enriches our
understanding of various aspects of quantum resource theories throughout the book.
The initial volume of this book is structured into five main parts, with an additional
sixth part containing supplementary materials.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

1.3. THE STRUCTURE OF THE BOOK 21

Figure 1.2: The Structure of Quantum Resource Theories

Part I: The opening section of this book is thoughtfully designed to cater to readers who may
not possess prior knowledge of quantum mechanics or quantum information. Within
this segment, we embark on a rigorous mathematical journey through quantum the-
ory, emphasizing precise definitions and mathematical proofs of fundamental physical
theorems. Key subjects covered in this section encompass quantum states, general-
ized quantum measurements, quantum channels, POVMs, and more. Moreover, this
section extends its reach beyond the boundaries of quantum theory, delving into top-
ics such as Ky-Fan norms, the Strømer-Woronowicz theorem, the Pinching Inequality,
the Reverse Hölder Inequality, certain hidden variable models, and other subjects that
may not commonly cross the paths of graduate students in physics, mathematics, or
computer science. Therefore, even those well-versed in these topics may find it bene-
ficial to skim through this chapter briefly, as it has the potential to reveal previously
undiscovered insights.

Part II: The second section of this book delves deep into the methodologies and tools employed
within the realm of quantum resource theories and quantum information. While it
explores numerous quantum information concepts, it distinguishes itself from conven-
tional quantum information theory textbooks. The introductory chapter of this section
provides an all-encompassing mathematical review of majorization theory, encapsulat-
ing recent groundbreaking discoveries, such as relative majorization, conditional ma-
jorization, and the intersection of probability theory with this field.
Subsequent chapters in this section adopt a distinctive approach to elucidate concepts
associated with metrics, divergences, and entropies. These notions are introduced

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

22 CHAPTER 1. INTRODUCTORY MATERIAL

and dissected using techniques and insights drawn from the framework of quantum
resource theories. For instance, entropy, conditional entropy, relative entropies, and
other divergences are introduced as additive functions that adhere to monotonicity
under the set of free operations, a foundational concept in quantum resource theories.
The final chapter in this part of the book is dedicated to the asymptotic regime,
focusing on the consequences of the “law of large numbers” in quantum information
and quantum resource theories. This chapter introduces concepts such as weak and
strong typicality, the method of types, classical and quantum hypothesis testing, and
the symmetric subspace. These tools prove particularly valuable in the asymptotic
domain of quantum resource theories when exploring inter-conversion rates among
infinitely many resources.
In summary, although the contents of this second section share some commonalities
with conventional quantum information theory textbooks, they diverge significantly
by presenting concepts and tools in a unique manner. Rather than employing Venn
diagrams to define key concepts like entropy, this part of the book aims to provide a
comprehensive and rigorous approach to precisely define these concepts by employing
axiomatic, constructive, and operational approaches. Leveraging the framework of
quantum resource theories, this section offers a fresh and innovative perspective on
these familiar topics.

Part III: In the third section of the book, we delve into the fundamental framework of quantum
resource theories. Our journey begins with a meticulous mathematical elucidation of
a quantum resource theory. We proceed to examine its foundational principles, in-
cluding but not limited to the golden rule of free operations, resource non-generating
operations, physically implementable operations, convex and affine resource theories,
state-based resource theories, as well as resource witnesses and their associated prop-
erties.
Next, we delve into the quantification of quantum resources. In this context, we in-
troduce a plethora of resource measures and resource monotones, delving deep into
their properties, which include additivity, sub-additivity, convexity, strong monotonic-
ity, and asymptotic continuity. These concepts form the bedrock of quantum resource
theories, and understanding them is pivotal.
Resource monotones and resource measures offer a valuable means of quantifying re-
sources. Our emphasis is on divergence-based resource measures, such as the relative
entropy of a resource, given their operational interpretations across various resource
theories. We also explore techniques for computing these measures, including semidefi-
nite programming, and delve into a practical approach for “smoothing” these measures,
a technique commonly employed in single-shot quantum information science.
Concluding this section of the book, we introduce a rich array of resource intercon-
version scenarios. These encompass exact interconversions, stochastic (probabilistic)
interconversions, approximate interconversions, and asymptotic interconversions. We

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

1.3. THE STRUCTURE OF THE BOOK 23

delve into essential tools intricately linked to resource interconversions, such as the
conversion distance within the single-shot regime, the asymptotic equipartition prop-
erty, and the quantum Stein’s lemma within the asymptotic domain. Additionally, we
explore the uniqueness of the Umegaki relative entropy within the context of quantum
resource theories. Our investigation extends to the evaluation of both the cost and
distillation of resources, examining these processes within both the single-shot and
asymptotic regimes. We have encapsulated the essence of this section of the book in
Figure 1.2.

Part IV: The fourth section of this book is dedicated to the quintessential exemplar of quantum
resource theories, often referred to as the “poster child” – entanglement theory. This
section comprises three chapters, each focusing on distinct facets of entanglement.
The first chapter delves into the realm of pure bipartite entanglement, followed by the
second chapter, which explores mixed bipartite entanglement. The third chapter, in
turn, delves into the intricacies of multipartite entanglement.
Within these chapters, we leverage the techniques and concepts developed in parts II
and III to delve into the theory of entanglement. This enables us to furnish a precise
definition of quantum entanglement and undertake a comprehensive examination of its
detection, manipulation, and quantification. Notably, the first of these three chapters
serves as the cornerstone, offering an in-depth exploration of pure bipartite entangle-
ment, which forms the foundational knowledge upon which the subsequent chapters on
mixed and multipartite entanglement build.

Part V: The fifth part comprises three chapters, with the first two chapters focusing on asym-
metry and non-uniformity, laying the groundwork for the third chapter on quantum
thermodynamics. In this part of the book, we reveal that athermality, the resource
essential for thermodynamic tasks, consists of two components: time-translation asym-
metry and non-uniformity.
The first chapter explores the resource theory of asymmetry, introducing an operational
framework that arises from practical constraints when multiple parties lack a common
shared reference frame. This theory has found numerous applications in quantum
information and beyond.
The second chapter delves into the resource theory of non-uniformity. In this theory,
maximally mixed states are considered free, while all other states are regarded as
valuable resources. This theory can be seen as a unique variant of thermodynamics,
involving completely degenerate Hamiltonians. Indeed, we introduce this chapter to
serve as a gentle introduction to the world of quantum thermodynamics.
Finally, in the third chapter of this section, we dive into quantum thermodynamics.
Throughout the book, whenever we introduce a new quantum resource theory, we
adhere to the structured framework outlined in Figure 1.2.

Part VI: The final section of the book serves as a comprehensive resource aimed at ensuring

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

24 CHAPTER 1. INTRODUCTORY MATERIAL

the self-containment of the entire text. It exclusively includes material that directly
complements the core content of the book.
In the initial three chapters, we delve into key subjects: convex analysis, operator
monotonicity, and representation theory. It’s important to note that each of these
topics is vast in its own right, with numerous dedicated books solely focused on repre-
sentation theory or convex analysis, for example. In this section, we have thoughtfully
curated and presented the aspects of these topics that are pertinent to our book’s
core themes. Our approach emphasizes utilizing quantum notations and placing a
strong emphasis on furnishing all the essential elements needed to ensure the book’s
self-contained nature.

1.4 Resurrection of Quantum Entanglement: The Birth

of a Fundamental Resource
In this section, we delve into the transformative protocols of quantum teleportation and
super-dense coding. These groundbreaking techniques marked a pivotal moment in the
history of quantum physics, elevating entanglement from a purely theoretical curiosity to a
precious resource with tangible applications. This paradigm shift, akin to the “resurrection”
of quantum entanglement, carried profound implications for the burgeoning field of quantum
information. In essence, it played a significant role in catalyzing the emergence of quantum
information science. If you’re new to the formalism of quantum mechanics, we recommend
starting with Chapter 2 before delving into the following two sections.
Even in the early days of quantum mechanics, entanglement stood out as a distinctive
and defining feature of the theory. As articulated by Schrödinger, he remarked, “I would not
call [entanglement] one but rather the characteristic trait of quantum mechanics, the one
that enforces its entire departure from classical lines of thought.” This statement underscores
the profound departure from classical physics that entanglement embodies. Significantly, the
intriguing properties of entanglement were recognized well before Bell’s seminal paper on the
exclusion of local hidden variable models (as discussed in Section 2.6).
To illustrate this point, consider a composite system consisting of two 1/2-spin particles,
such as electrons, in the singlet state:
1
|Ψ− ⟩ = √ (| ↑n ⟩| ↓n ⟩ − | ↓n ⟩ ↑n ⟩) . (1.1)
2
Here, {| ↑n ⟩, | ↓n ⟩} forms an orthonormal basis in the complex vector space C2 , representing
the two eigenvectors of the spin observable corresponding to the “up” and “down” ori-
entations along a direction n ∈ R3 . Notably, a remarkable property of this state is its
independence from the specific spin direction n (see Chapter 2).
Now, if Alice performs a measurement in the n direction, it will instantaneously dic-
tate Bob’s post-measurement state to align with the opposite n direction. This peculiar
phenomenon allows Alice to exert immediate influence on Bob’s state by simply choosing

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

1.4. RESURRECTION OF QUANTUM ENTANGLEMENT: THE BIRTH OF A
FUNDAMENTAL RESOURCE 25
whether to conduct a Stern-Gerlach measurement (as discussed in Section 2.1) along the n or
m direction. This non-intuitive behavior of entangled composite quantum systems prompted
Einstein to describe it as a “spooky action at a distance.”
Beyond its profound implications from a fundamental standpoint, entanglement has
gained recognition as a valuable and indispensable resource for the realization of specific
quantum information processing tasks. This shift in perspective has given rise to a sub-
stantial body of research, as entanglement is no longer solely a philosophical curiosity but a
powerful tool with remarkable practical applications. These applications encompass proto-
cols like quantum teleportation, superdense coding, and numerous innovations in quantum
cryptography and quantum computing.
In this section, we embark on a journey through some of these protocols, known as unit
protocols, as they exclusively rely on unit noiseless resources. These protocols serve as a
testament to the versatility of entanglement and employ three distinct resource types: a
noiseless quantum communication channel, a noiseless classical communication channel, and
the entangled bit, abbreviated as ebit.

1.4.1 Quantum Teleportation

Quantum teleportation is a groundbreaking protocol enabling Alice to transmit an unknown
quantum state |ψ⟩ to Bob, all without the need for a dedicated quantum communication
channel. Instead, it relies on the clever utilization of entanglement and a classical com-
munication channel to achieve this remarkable feat, as illustrated in Figure 1.3 below. To
elucidate, consider the scenario where Alice and Bob share a composite system comprising
two electrons initially prepared in the singlet state:

1
|ΨAB
− ⟩ = √ (|01⟩ − |10⟩) . (1.2)
2

Furthermore, let’s consider the scenario where Alice possesses an additional electron in
her system, characterized by a quantum state |ψ Ã ⟩ = a|0⟩+b|1⟩. Importantly, both Alice and
Bob lack knowledge regarding the spin state of this electron, which means they are unaware
of the specific values of a and b. According to the principles of quantum mechanics, the
collective quantum state of these three electrons—two under Alice’s control and one under
Bob’s—is described by the tensor product:

1
|ψ Ã ⟩ ⊗ |ΨAB
− ⟩ = √ (a|0⟩ + b|1⟩) ⊗ (|01⟩ − |10⟩)
2
(1.3)
Openning 1
Parentheses→ =
√ a|001⟩ + b|101⟩ − a|010⟩ − b|110⟩ .
2

It’s noteworthy that in our description above, we represented the state |ψ Ã ⟩⊗|ΨAB− ⟩ using
the computational basis of the vector space ÃAB. However, we can achieve a more insightful
representation by substituting the computational basis |00⟩, |01⟩, |10⟩, |11⟩ of system ÃA with

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

26 CHAPTER 1. INTRODUCTORY MATERIAL

the Bell basis consisting of |ΦÃA √1 (|00⟩ ± |11⟩) and |ΨÃA ⟩ = √1 (|01⟩ ± |10⟩). This
± ⟩ = 2 ± 2
substitution allows us to express the state as follows:
1h
|ψ⟩Ã ⊗ |ΨAB a |ΦÃA ÃA ÃA ÃA

− ⟩ = + ⟩ + |Φ− ⟩ |1⟩ + b |Ψ+ ⟩ − |Ψ− ⟩ |1⟩
2
i
− a |ΨÃA ÃA ÃA ÃA

+ ⟩ + |Ψ − ⟩ |0⟩ − b |Φ + ⟩ − |Φ − ⟩ |0⟩
(1.4)
1 h ÃA ÃA
Collecting terms→ = |Φ+ ⟩(a|1⟩ − b|0⟩) + |Φ− ⟩(a|1⟩ + b|0⟩)
2 i
+ |ΨÃA + ⟩(b|1⟩ − a|0⟩) − |ΨÃA
− ⟩(a|0⟩ + b|1⟩)

Therefore, if Alice performs the Bell measurement on her two qubits ÃA, i.e. the basis
(projective) measurement
n o
ÃA ÃA ÃA ÃA ÃA ÃA ÃA ÃA
P0 = |Ψ− ⟩⟨Ψ− |, P1 = |Φ− ⟩⟨Φ− |, P2 = |Φ+ ⟩⟨Φ+ |, P3 = |Ψ+ ⟩⟨Ψ+ | , (1.5)

she will get with equal probability four possible outcomes (denoted x = 0, 1, 2, 3, and global
phase is ignored):

Simplification
Outcome Post-Measurement State (Up to a global phase)

x=0 |ΨÃA
− ⟩ ⊗ (a|0⟩ + b|1⟩) |ΨÃA
− ⟩ ⊗ |ψ⟩

x=1 |ΦÃA
− ⟩ ⊗ (a|1⟩ + b|0⟩) |ΦÃA
− ⟩ ⊗ σ1 |ψ⟩

x=2 |ΦÃA
+ ⟩ ⊗ (a|1⟩ − b|0⟩) |ΦÃA
+ ⟩ ⊗ σ2 |ψ⟩

x=3 |ΨÃA
+ ⟩ ⊗ (b|1⟩ − a|0⟩) |ΨÃA
+ ⟩ ⊗ σ3 |ψ⟩

where we denoted by {σx }x=0,1,2,3 , the identity matrix σ0 = I2 , and the 3 Pauli matrices
σ1 , σ2 , σ3 . Hence, up to a global phase, Bob’s state after outcome x occurred is σx |ψ⟩. After
Alice sends (via a classical communication channel) the measurement outcome x to Bob,
Bob can then perform the unitary operation Ux = σx to obtain the state

σx (σx |ψ⟩) = σx2 |ψ⟩ = |ψ⟩ . (1.6)

Therefore, by using shared entanglement, and after transmitting two classical bits (cbits),
Alice was able to transfer her unknown qubit state |ψ⟩ to Bob’s side.
If Bob did not receive the classical message from Alice, then his state is one of the four
states {σx |ψ⟩}x=0,1,2,3 . Since he does not know x, from his perspective his state is (see
Exercise 1.4.1)
3
1X 1
ρ= σx |ψ⟩⟨ψ|σx = I . (1.7)
4 x=0 2
That is, without the knowledge of x, Bob’s resulting state is the maximally mixed state, and
contains no information about |ψ⟩.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

1.4. RESURRECTION OF QUANTUM ENTANGLEMENT: THE BIRTH OF A
FUNDAMENTAL RESOURCE 27

Figure 1.3: Quantum teleportation. Single-line arrows correspond to quantum systems. Double
line arrows correspond to classical systems.

Exercise 1.4.1. Show that for any density matrix ρ ∈ D(C2 ),

3
1X 1
σx ρσx = I . (1.8)
4 x=0 2

Hint: Prove first that the left-hand side of the equation above is invariant under a conjugation
by σx .
Exercise 1.4.2. Show that if instead of the singlet state |ΨAB
− ⟩ above, Alice and Bob share
another maximally entangled state |Φ ⟩ (i.e. the reduced density matrix of |ΦAB ⟩ is the
AB

maximally mixed state), then, by modifying slightly the protocol, they can still teleport an
unknown quantum state from Alice to Bob.
The protocol above can be generalized in several different ways. First, in Exercise 1.4.3
you will generalize it to d-dimensions. Moreover, in general, if Alice and Bob do not share
the singlet state, but instead their particles are prepared in some other non-seperable state
(i.e. entangled state, but not maximally entangled state) ρAB ∈ D(AB), then typically
perfect/faithful teleportation will not be possible. Still in this case one can design a protocol
achieving quantum teleportation with probability that is less than one (see Exercise 1.4.4),
and/or in the end of the protocol the state in Bob’s lab is not exactly equal to Alice’s original
state |ψ⟩Ã but only close to it up to some treshold. Thus, the protocol described above is
called faithful teleportation, since the protocol teleport perfectly |ψ⟩ form Alice to Bob with
100% success rate.
Exercise 1.4.3. Let |ΦAB ⟩ := √1d z∈[d] |zz⟩ be a 2-qudit (normalized) maximally entangled
P

state in AB ∼ = Cd ⊗ Cd . Consider a family of d2 states in AB defined by

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

28 CHAPTER 1. INTRODUCTORY MATERIAL

AB
1. Show that {|ψxy ⟩}x,y∈[d] is an orthonormal basis of AB.
AB
2. Show that the reduced density matrix of |ψxy ⟩ is the maximally mixed state for all
x, y ∈ [d].
3. Find a protocol for faithful teleportation of a qudit from Alice’s lab to Bob’s lab. Assume
that the joint measurement that Alice performs on her two qudits is a basis measurement
AB
in the basis {|ψxy ⟩}x,y∈[d] . What are the unitary operators performed by Bob? How
many classical bits (cbits) Alice transmits to Bob?
√
Exercise 1.4.4. Suppose Alice and Bob share the state |ψ AB ⟩ = 21 |00⟩ + 23 |11⟩. Show that
there exists a 2-outcome (basis) measurement that Alice can perform, such that with some
probability greater than zero, the state of Alice and Bob after the measurement becomes the
maximally entangled state |ΦAB √1
+ ⟩ = 2 (|00⟩ + |11⟩).

So far we assumed that the teleported state is a pure state. However, the exact same
protocol works even if the unknown state |ψ⟩ is replaced with a mixed state ρ. This is
because we can view any mixed state as some ensemble of pure states {px , |ψx ⟩} in which
the parameter x is unknown. Irrespective of the value of x, the protocol above will teleport
|ψx ⟩ from Alice to Bob, and thereby, given
P that the value of x is unknown, Alice effectively
teleported to Bob the mixed state ρ := x px |ψx ⟩⟨ψx |. Alternatively, note that the quantum
teleportation protocol in Fig. 1.3 can be described as a realization of the identity quantum
channel id ∈ CPTP(A → B) (with |A| = |B| := d) given by
X h ∗ i
A→B A ÃA B A ÃB ÃA B
id (ρ ) = TrAÃ Px ⊗ Ux ρ ⊗Φ Px ⊗ Ux (1.10)
x∈[d2 ]

where {PxÃA }x∈[d2 ] corresponds to the measurement on systems Ã and A in the maximally
entangled basis, Ux is the unitary performed by Bob after he received the value x from
Alice, and ΦAB is the maximally entangled state on system AB. The quantum teleportation
protocol states that there exists {PxÃA } and {Ux } such that the quantum channel idA→B
above is indeed the identity channel. Although, in the protocol above we proved it only for
pure input states |ψ⟩⟨ψ|, from the linearity of the quantum channel idA→B , it follows that
idA→B is the identity quantum channel on all mixed states.
Exercise 1.4.5. [Entanglement Swapping] Consider four qubit systems A, B, C, and
D, in the double-singlet state |ΨAB CD
− ⟩ ⊗ |Ψ− ⟩.

1. Show that a joint Bell measurement on system BC generates a maximally entangled

state on system AD (along with another maximally entangled state on system BC) for
all four possible outcomes of the measurement (see Fig. 1.4).
2. Show that the singlet state in AD can be generated by quantum teleportation between
system BC and system D.
3. Generalize the entanglement swapping protocol to four qudit systems each of dimension
d.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

1.4. RESURRECTION OF QUANTUM ENTANGLEMENT: THE BIRTH OF A
FUNDAMENTAL RESOURCE 29

Figure 1.4: Entanglement Swapping.

1.4.2 Superdense Coding

How much classical information can be transmitted with a single qubit? Suppose Alice wants
to transmit a classical message to Bob, but all she has at her disposal is a single qubit (e.g.
spin of an electron) and a perfect noiseless quantum communication channel that she can
use to transmit the electron to Bob. She can prepare the single electron in the spin that
she wants and send the electron to Bob over the noiseless quantum channel. Alice and Bob
agree at the beginning that a message 0 corresponds to spin up in the z direction, and a
message 1 corresponds to spin down in the z direction. In this way, Alice can transmit one
cbit with one use of a perfectly noiseless quantum channel. Can they do better? We will see
later on that it is not possible to encode more than one classical bit into a single electron,
as long as Alice’s electron is not entangled with another electron in Bob’s system.
Suppose now that Alice’s electron is maximally entangled with another electron in Bob
lab, so that Alice and Bob share the singlet state |ΨAB− ⟩. In the first step of the protocol,
Alice encode a message x into a her qubit. She is doing it by performing one out of several
(unitary) rotations {UxA }m−1
x=0 on her qubit (electron). If Alice chose to do the x rotation, the
state of the system after the rotation is

|ψxAB ⟩ := UxA ⊗ I B |Ψ− ⟩AB

(1.11)

Taking Ux = σx to be the four Pauli matrices (with σ0 = I2 ) we get that the four states
{|ψxAB ⟩}3x=0 are orthonormal and form a basis of C2 ⊗ C2 . In fact, this is the Bell basis we
encountered in the previous subsection. In the next step of the protocol, Alice sends her
electron (over a noiseless quantum communication channel) to Bob. Upon receiving Alice’s
electron, Bob has in his lab two electrons in the state |ψxAB ⟩. Given that the set of states
{|ψxAB ⟩}3x=0 form an orthonormal basis, in the last step of the protocol, Bob performs a joint
basis measurement on his two electrons, in the basis {|ψxAB ⟩}3x=0 , and thereby learns the
outcome x. The outcome x is the message that Alice intended to send Bob.

Exercise 1.4.6. Show that the set of states {|ψxAB ⟩}3x=0 is an orthonormal basis of C2 ⊗ C2 .

Exercise 1.4.7. Let |ΦAB ⟩ := √1d z∈[d] |zz⟩ be a maximally entangled state in Cd ⊗ Cd .
P

Show that Alice can use it to transmit to Bob 2 log2 (d) cbits.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

30 CHAPTER 1. INTRODUCTORY MATERIAL

Figure 1.5: Superdense Coding. Double-lines corresponds to classical systems, and single lines to
quantum systems.

1.5 Resource Analysis and Reversibility

The previous two protocols demonstrate that entanglement is a valuable resource with which
certain tasks will not be possible without it. We have seen in the protocols above that
entanglement can be converted to other types of resources such as quantum or classical
communication channels. We will use the following notations to denote these resources:
1. [qq] denotes one ebit; i.e. a unit of a static noiseless resource comprising of two qubits
in a maximally entangled state.
2. [q → q] denotes one use of an ideal (noiseless) qubit channel
3. [c → c] denotes one use of a classical bit channel capable of transmitting perfectly one
classical bit.
4. [cc] denotes one bit of shared randomness.
Observe that the resource units above can be classified into being classical or quantum, and
static (such as [qq] and [cc]) or dynamic (such as [q → q] or [c → c]).
With the above notations, the teleportation can be viewed as a process in which one ebit
plus two uses of a classical bit channel are consumed to simulate a qubit channel. In resource
symbols this can be characterized as the following resource inequality
[qq] + 2[c → c] ⩾ [q → q] . (1.12)
Note that the use of the inequality above is justified by the fact that a single use of a quantum
channel cannot generate both an entangled state and a double use of a classical channel.
For superdense coding, an ebit plus one use of a quantum channel is used to simulate
two uses of a classical channel. This can be expressed as the resource inequality
[qq] + [q → q] ⩾ 2[c → c] . (1.13)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

1.5. RESOURCE ANALYSIS AND REVERSIBILITY 31

Note that if entanglement is not considered as a resource, that is, the parties are supplied
with unlimited singlet states, then we can remove the ebit cost [qq] in (1.12) and (1.13) and
get that for teleportation 2[c → c] ⩾ [q → q] and for superdense coding [q → q] ⩾ 2[c → c].
This makes teleportation and superdense coding the dual protocols of each other, and in this
case we can say that [q → q] = 2[c → c].
However, in almost all practical scenarios, entanglement is an expensive resource that can
be difficult to generate over long distances and that is also highly sensitive to decoherence and
noise. Therefore, specifically pure maximally entanglement is scarce, and must be treated
as a resource. The question then becomes if it is possible to change slightly the protocols
of teleportation and superdense coding making them more symmetric, in the sense that the
two resource inequalities in (1.12) and (1.13) merge into a single resource equality. This is
indeed possible if we replace 2[c → c] in the right-hand side of (1.13) with two uses of an
isometry channel known as the coherent bit channel.

1.5.1 The Coherent Bit Channel

We introduce here another unit resource that is called the coherent bit channel, or in short
the cobit channel, and is denoted by [q → qq]. As the symbol suggest, this unit resource
represents one use of a channel. The channel is defined by the isometry, V : A → A ⊗ B,
with |A| = |B| = 2, according to the following action on the basis of A:
1
X
V |x⟩A = |x⟩A |x⟩B ∀ x ∈ {0, 1} or equivalently V = |x⟩⟨x|A ⊗ |x⟩B . (1.14)
x=0

We will denote by
VZ (ρ) := V ρV ∗ , ∀ ρ ∈ L(A) , (1.15)
where the subscript Z indicate that the basis {|0⟩, |1⟩} is an eigenbasis of the third Pauli
operator (i.e., eigenvectors of the spin observable in the z-direction). One can define V with
respect to other bases. For example, we will denote by VX (·) = U (·)U ∗ the coherent bit
channel with respect to the basis {|+⟩, |−⟩}, where U is the isometry defined by U |±⟩A =
|±⟩A |±⟩B .
How is this resource related to other resources? First note that with such a resource Alice
can transmit a classical bit to Bob. Indeed, Alice can encode a cbit x ∈ {0, 1} in the state
|x⟩A and send it over the channel VZ . Then, Bob receives |x⟩B on his system and performs
a basis measurement to learn x. We therefore have

[q → qq] ⩾ [c → c] . (1.16)

The exercise below shows that we also have [q → qq] ⩾ [qq] . Among other things, this also
implies that [c → c] ̸⩾ [q → qq] or in other words, [c → c] is a strictly less resourcefull than
[q → qq].

Exercise 1.5.1. Show that VZ |+⟩⟨+|A = |ΦAB AB
+ ⟩⟨Φ+ |.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

32 CHAPTER 1. INTRODUCTORY MATERIAL

1.5.2 Coherent Superdense Coding

For superdense coding, we saw that an ebit, [qq], plus one use of a quantum channel, [q → q],
can be used to simulate two uses of a classical channel, 2[c → c]. We now show that the
same resources can also be used to simulate two uses of the coherent map V. That is, we
will show that
[qq] + [q → q] ⩾ 2[q → qq] . (1.17)
Note that due to (1.16), the above equation also implies the resource inequality (1.13). The
quantum protocol that achieves this resource conversion is called coherent superdense coding.

Figure 1.6: Coherent Superdense Coding. One ebit plus one use of a noiseless qubit channel are
implemented to realize two uses of the cobit channel.

Coherent superdense coding protocol (see Fig. 1.6) consists of several steps. Initially,
Alice and Bob share the maximally entangled state |ΦAB + ⟩. Alice then prepares an input
A1 A2
state |x⟩ |y⟩ so that Alice and Bob’s initial state (time t0 is the figure) is

|x⟩A1 |y⟩A2 |ΦAB

+ ⟩ . (1.18)

Alice then performs a sequence of two controlled unitary gates, controlled X on system A2
and A, followed by controlled Y gate on system A1 and A. The resulting state at time t1 is

|x⟩A1 |y⟩A2 |ϕAB AB x y B

AB
xy ⟩ where |ϕxy ⟩ := Z X ⊗ I |Φ+ ⟩ and x, y ∈ {0, 1} , (1.19)

where Z x equals the identity matrix for x = 0, and the third Pauli matrix for x = 1 (X y is
defined similarly). A key observation is that {|ϕAB
xy ⟩}x,y∈{0,1} is precisely the Bell basis, and
therefore forms an orthonormal basis for C ⊗C . Note also that this encoding (x, y) → |ϕAB
2 2
xy ⟩
is done by Alice alone, and therefore essentially identical to the superdense coding protocol
we encountered earlier.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

1.5. RESOURCE ANALYSIS AND REVERSIBILITY 33

In the next step Alice uses a noiseless qubit channel to transmit system A to Bob.
Therefore, at time t2 the state of the system is |x⟩A1 |y⟩A2 |ϕBxy
1 B2
⟩, where as before,

|ϕB 1 B2
⟩ := Z x X y ⊗ I B2 |ΦB1 B2

xy + ⟩. (1.20)

Since both {|ϕB 1 B2

xy ⟩}x,y∈{0,1} and {|xy⟩B1 B2 }x,y∈{0,1} are orthonormal bases of Bob’s two qubit
space B1 B2 , we conclude that there exists a unitary matrix U B1 B2 such that

|xy⟩B1 B2 = U B1 B2 |ϕB1 B2
xy ⟩ ∀ x, y ∈ {0, 1} . (1.21)

It turns out that the unitary U AB as defined above can be expressed as a CNOT gate followed
by a Hadamard gate on system B1 (see Bob’s side in Fig. 1.6 between time steps t2 and t3 ).
Explicitly,
U B1 B2 = H|0⟩⟨0|B1 ⊗ I B2 + H|1⟩⟨1|B1 ⊗ X B2

(1.22)
= |+⟩⟨0|B1 ⊗ I B2 + |−⟩⟨1|B1 ⊗ X B2
Hence, after the application of the unitary U B1 B2 on Bob’s system, Alice and Bob state is
|x⟩A1 |y⟩A2 |x⟩B1 |y⟩B2 . That is, the quantum circuit in Fig. 1.6 simulates the linear transfor-
mation
|x⟩A1 |y⟩A2 → |x⟩A1 |x⟩B1 ⊗ |y⟩A2 |y⟩B2 (1.23)
which is equivalent to two coherent channels. The resources we used to simulate these two
coherent channels are precisely the same ones used in superdense coding to simulate two
noiseless classical channels.
Exercise 1.5.2. Show that the unitary matrix U B1 B2 above satisfies (1.21).
Exercise 1.5.3. Suppose the initial ebit shared between Alice and Bob was given in the
singlet state |ΨAB AB
− ⟩ instead of |Φ+ ⟩, and consider the exact same protocol as in Fig. 1.6,
until time step t2 . Revise the unitary matrix U B1 B2 after time step t2 so that the protocol
still simulates 2 coherent channels.

1.5.3 Coherent Teleportation

The coherent teleportation protocol is the resource reversal of the coherent superdense coding
protocol. Particularly, it reveals that two uses of a cobit channel are sufficient to generate
one ebit and at the same time simulate one use of a qubit channel. That is, the coherent
teleportation protocol demonstrates that

2[q → qq] ⩾ [qq] + [q → q] . (1.24)

The protocol achieving this resource inequality is depicted in Fig. 1.7.

In the first step of the protocol Alice sends a qubit |ψ A ⟩ = a|0⟩A + b|1⟩A into the first
cobit channel. The cobit channel VZ transforms this state into the state
1
a|0⟩A |0⟩B1 + b|1⟩A |1⟩B1 = √ |+⟩A a|0⟩B1 + b|1⟩B1 + |−⟩A a|0⟩B1 − b|1⟩B1 .

(1.25)
2

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

34 CHAPTER 1. INTRODUCTORY MATERIAL

Figure 1.7: Coherent Quantum Teleportation. Two cobit channels produce one ebit plus one use
of a noiseless qubit channel.

In the next step system A goes through the second cobit channel, VX , yielding the state
1
√ |+⟩A |+⟩B2 a|0⟩B1 + b|1⟩B1 + |−⟩A |−⟩B2 a|0⟩B1 − b|1⟩B1 .

(1.26)
2
Finally, in the last step Bob sends his systems through a CNOT gate. Note that X|+⟩ = |+⟩
so that the CNOT gate only changes |−⟩B2 |1⟩B1 to −|−⟩B2 |1⟩B1 while keeping all the other
terms intact. Hence, after Bob’s CNOT gate, Alice and Bob share the state
1
√ |+⟩A |+⟩B2 a|0⟩B1 + b|1⟩B1 + |−⟩A |−⟩B2 a|0⟩B1 + b|1⟩B1 = |ΦAB B1

+ ⟩|ψ ⟩ . (1.27)
2

2
That is, at the end of the protocol Alice teleported her quantum state |ψ⟩ to Bob’s system
B1 , and also share with Bob’s system B2 the maximally entangled state ΦAB+ .
2

Coherent quantum teleportation and coherent superdense coding demonstrate that two
cobit channels have the same resource value as one ebit and one use of a qubit channel.

[qq] + [q → q] = 2[q → qq] . (1.28)

This means that coherent teleportation is the reversal process of coherent superdense coding
and vice versa.

1.6 Notes and References

Quantum teleportation was discovered by [20], and quantum superdense coding by [23].
These seminal papers paved the way for the development of quantum Shannon theory and
entanglement theory, as they demonstrated that entanglement, besides of being interesting
from a fundamental point of view, is a resource that can be consumed to achieve certain

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

1.6. NOTES AND REFERENCES 35

exotic tasks such as quantum teleportation. Moreover, these protocols are considered as the
unit protocols (see e.g. [232]), since they form the building blocks with which one studies
the capabilities of noisy quantum channels to transmit information in asymptotic settings
involving many uses of the channels.
The resource analysis using notations such as [q → q] was first introduced in [62] were
the rules of this ‘resource calculus’ developed. The coherent bit, coherence teleportation,
and coherent superdense coding is due to [112]. More details on coherent communication
can be found in the book of Wilde [232].

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

36 CHAPTER 1. INTRODUCTORY MATERIAL

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

Part I

Preliminaries

37
CHAPTER 2

Elements of Quantum Mechanics I:

Closed Systems

Quantum mechanics, which was discovered during the first quarter of the twentieth century,
has profoundly transformed our understanding of the world around us. The theory pre-
dicts a plethora of non-intuitive phenomena, including entanglement, quantum non-locality,
wave-particle duality, coherence, the uncertainty principle, quantum contextuality, quantum
steering, and no-cloning, to name a few. This remarkable departure from classical physics
has led to numerous thought-provoking papers and interpretations of quantum mechanics.
To date, there is no consensus on which view of quantum mechanics is the most natural one
to adopt. Additionally, sub-fields of science, such as quantum logic, have arisen from these
phenomena, particularly the inconsistency of classical logic and the uncertainty principle.
The development of quantum mechanics was a gradual process that involved many trials
and errors. It began in 1900 with Max Planck’s discretization of energy values used to
solve the black body radiation. The process continued with Einstein’s 1905 paper on the
correspondence between energy and frequency, which provided a quantum explanation for
the photoelectric effect. The process ultimately ended with the formalism that was developed
in the mid-1920s by Erwin Schrödinger, Werner Heisenberg, Max Born, and others.
In this book, we will not review quantum mechanics from a historical or traditional per-
spective. Instead, we will study it in the context of the modern field of quantum information
science, which emerged and developed in the early 1990s. We will discuss the theory’s basic
postulates, its corresponding mathematical structure, and its many consequences and appli-
cations, particularly to information theory, resource theories, and more broadly to physics
and science.
The interplay between the concept of ‘information’ and the field of quantum science is a
complex and multifaceted one. The fundamental principles of quantum mechanics, such as
the superposition of states and entanglement, have led to the development of quantum infor-
mation theory, which studies the processing and transmission of information using quantum
systems. Moreover, the very act of observing a quantum system can alter its state, and
this observation is itself a form of information. This has profound implications for our un-

39
40 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS

derstanding of the nature of reality and the limits of our ability to measure it. Thus, the
relationship between ‘information’ and quantum science is a rich and nuanced one, encom-
passing a broad range of topics from the foundations of quantum mechanics to the practical
applications of quantum technologies.

In Shannon’s terms, information is defined as “that which can distinguish one thing from
another.” For instance, a coin has two sides, “head” and “tail,” and the ability to differentiate
between the two implies that the coin can store information. Typically, when referring to
the distinguishability of two elements, we denote the options as 0 and 1, and information is
then measured in bits, where the number of bits represents the number of distinguishable
elements. For example, two bits correspond to four possible elements.

The definition of information is abstract and detached from any specific implementation
or labeling. For instance, one bit may correspond to the head or tail of a coin, or the 5-
volt versus 0-volt of an electrical circuit. All information processing, such as communication,
computation, and manipulation of information, can be performed with either coins or electri-
cal circuits. While storing information in coins is impractical (especially for large numbers
of bits), from an information-theoretic perspective, any object can be used to implement
classical bits, and we say that information is fungible.

However, what happens when we attempt to encode information in the spin of an elec-
tron? The electron, being an elementary particle, is uniquely determined by its mass and
spin. The magnitude of its spin can only take one value, 12 ℏ (where ℏ is a unit of angular
momentum), and its spin can point in any direction. The Stern-Gerlach experiment, dis-
cussed below), demonstrates that it is possible to differentiate between “up” and “down”
along the z-direction of the spin of an electron (see Fig. 2.1). This implies that information
can be encoded in the spin of an electron as well.

Are there any advantages to encoding information in the spins of quantum particles, such
as electrons, as opposed to larger classical systems like coins or electrical circuits? If informa-
tion is fungible, why should encoding information in quantum particles make any difference?
Before addressing these questions, we will introduce the Stern-Gerlach experiment and the
necessary elements from linear algebra for the study of quantum physics.

Figure 2.1: Encoding information with the spin of an electron

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

2.1. THE STERN-GERLACH EXPERIMENT 41

2.1 The Stern-Gerlach Experiment

The Stern-Gerlach (SG) experiment, conducted in Frankfurt, Germany in 1922 by Otto
Stern and Walther Gerlach, is a groundbreaking demonstration of the quantization of the
spatial orientation of angular momentum in physical systems. The experiment has become a
paradigm of quantum measurement, and is considered one of the most significant experiments
of the 20th century. It paved the way for numerous subsequent experiments, complementing
the double-slit experiment and contributing to our understanding of the behavior of quantum
systems.
The SG experiment’s relative simplicity and its ability to measure discrete properties of
physical systems in finite dimensions make it an excellent example of how information can be
extracted from a physical system. In the experiment, a beam of particles is passed through
a spatially varying magnetic field, which causes the beam to split into discrete components
that are detected on a screen. This behavior indicates that the angular momentum of the
particles is quantized; i.e., taking only discrete values. A heuristic illustration of the SG
experiment is provided in Fig. 2.2.

Figure 2.2: The Stern-Gerlach Experiment.

The SG experiment involves emitting electrons or atoms from an oven and passing them
through a non-homogeneous magnetic field. Their spin orientation determines whether they
hit the screen above or below the horizontal line, with the distance from the line reflecting
the magnitude of their spin. The experiment can be performed on any physical system, and
always yields a discrete spectrum for the angular momentum, given by 21 nℏ where n is an
integer. Regions on the screen in Fig. 2.2 where no particles hit indicate this quantization,
which is true regardless of the type of particle used in the experiment.
The SG experiment has two noteworthy implications. Firstly, electrons always hit the
screen at the same distance from the horizontal line, indicating a consistent magnitude of
1
2
ℏ for their spin. Secondly, all electrons hit the screen in the same two areas, irrespective of

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

42 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS

their initial spin direction. Even if an electron’s initial spin is pointing in the x-direction, it
should have zero spin in the z-direction and therefore should not be deflected by the non-zero
z-gradient of the magnetic field. However, the fact that electrons still hit the screen in the
same two areas shows that measuring the spin in one direction affects its value in another
direction.
To verify that an electron’s spin remains intact after it is deflected by the SG experiment,
we can concatenate two SG experiments in a variety of ways. For example, suppose we want
to confirm that an electron deflected upwards in the first SG box has a spin pointing in the
upward z-direction. We can set up the experiment as shown in Fig. 2.3 (a). After passing
through the first SG box, only electrons deflected upwards will continue on to the second
SG box, while those deflected downward are blocked. After passing through the second SG
box, the electrons hit the screen. We observe that no electrons hit the screen below the
horizontal line, indicating that all electrons deflected upwards in the first SG box also have a
spin pointing in the upward z-direction. This implies that after an electron’s spin has been
measured, it remains intact by further measurements in the same direction.

Figure 2.3: Combination of Stern-Gerlach (SG) experiments

Another interesting feature of the SG experiment is that it allows us to measure the spin
of particles in different directions. For instance, we can measure the spin in the x-direction
by using a modified version of the experiment. In this case, the electrons are deflected to
the left or to the right, depending on their spin in the x-direction. Figure 2.3 (b) shows a
schematic of this experiment, where the first SG box measures the spin in the x-direction,
and the second SG box measures the spin in the z-direction.
The results of this experiment are surprising from a classical point of view. According
to classical physics, any physical system with its angular momentum pointing in the x-
direction should have zero angular momentum pointing in the z-direction. However, in the
SG experiment, we find that 50% of the electrons are deflected upward and 50% are deflected
downward after passing through the second SG box. This shows that the spin in different
directions of a quantum particle is not fully determined by its spin in one particular direction,
and that measurements of spin along different axes can yield nontrivial and unexpected
results.
What happens to a particle with spin in the x-direction when it passes through an SG-
experiment in the z-direction, followed by an SG-experiment in the x-direction? Will its
resulting spin in the x-direction remain the same? Figure 2.4 illustrates such a scenario

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

2.2. INNER PRODUCT SPACES AND HILBERT SPACES 43

with three SG-boxes. The first box filters only particles with spins in the positive (right) x-
direction. The second box measures their spin in the z-direction, and the third box measures
their spin in the x-direction. Remarkably, there is a 50% chance that the resulting particle
will have spin in the negative x-direction. This implies that the measurement in the z-
direction erases the information about the spin in the x-direction completely.
This phenomenon, known as quantum mechanical complementarity, is a fundamental
feature of quantum mechanics and is one of the most profound concepts in physics. It
implies that it is impossible to simultaneously measure certain pairs of physical quantities
with arbitrary precision, such as position and momentum, or spin in different directions. Any
measurement of one quantity necessarily disturbs the other, and the uncertainty principle
sets a fundamental limit on the precision with which they can be measured simultaneously.

Figure 2.4: Erasing information with measurement.

To summarize, the SG experiments discussed above demonstrate that measurements

can disturb the state of a system. For instance, an electron with spin initially pointing
upward may end up pointing downward after a sequence of SG-experiments. However,
this disturbance is not unique to a specific measurement apparatus; the same behavior and
statistical outcomes will occur regardless of the device used. In other words, if a particle’s
spin is known to be pointing in the positive z-direction, then a measurement in the x or
y directions will erase all knowledge about its spin in the z-direction, irrespective of which
measurement apparatus is employed. Therefore, it is the knowledge of the spin in one
direction that prevents its knowledge in another orthogonal direction. This phenomenon is
a special instance of the uncertainty principle.

2.2 Inner Product Spaces and Hilbert Spaces

In this section and the following one, we discuss essential topics from linear algebra that
serve as foundations for quantum theory. While most of the material is covered in standard
textbooks on linear algebra, we introduce the Dirac notation for vectors and operators,
as well as concepts such as quantum states and maximally entangled states that are used
throughout this book. Therefore, even readers familiar with the topics covered here may
benefit from reviewing and acquainting themselves with the notations we utilize.
We consider vector spaces over either the real or complex numbers and use the notation
F when referring to either field.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

44 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS

Inner Product Space

Definition 2.2.1. An inner product space is a vector space A over a field F
equipped with a map
⟨|⟩ :A×A→F (2.1)
that satisfies for all vectors ψ, ϕ, χ ∈ A and every scalar c ∈ F the following three
axioms:

1. Conjugate symmetry: ⟨ψ|ϕ⟩ = ⟨ϕ|ψ⟩.

2. Linearity in the second argument: ⟨ψ|cϕ⟩ = c⟨ψ|ϕ⟩ and

⟨ψ|ϕ + χ⟩ = ⟨ψ|ϕ⟩ + ⟨ψ|χ⟩.

3. Positive-definiteness: ⟨ψ|ψ⟩ ⩾ 0 with equality if and only if |ψ⟩ = 0.

Remark. We adopt the notation ⟨ | ⟩ instead of ⟨ , ⟩, as it conforms with the Dirac notation
that we will define shortly. Additionally, while most mathematics textbooks define the
linearity property with respect to the first argument, we will use the aforementioned notation
to suitably align with the Dirac notation. We use the letters A, B, and C to denote Hilbert
spaces since, in quantum physics, Hilbert spaces correspond to physical systems that are
operated on by parties such as Alice, Bob, Charlie, etc.
The inner product induces a norm defined by
∥ψ∥2 := ⟨ψ|ψ⟩1/2 ∀ψ∈A, (2.2)
and a metric
d(ψ, ϕ) := ∥ψ − ϕ∥2 ∀ ψ, ϕ ∈ A . (2.3)

A Norm
Definition 2.2.2. A norm is a real-valued function defined on the vector space, A,
that has the following properties:

1. For all ψ ∈ A, ∥ψ∥ ⩾ 0 with equality if and only if ψ = 0.

2. For all c ∈ F and ψ ∈ A, ∥cψ∥ = |c|∥ψ∥.

3. The triangle inequality: for all ψ, ϕ ∈ A, ∥ψ + ϕ∥ ⩽ ∥ψ∥ + ∥ϕ∥.

A vector space equipped with a norm is called a normed space.

Exercise 2.2.1. Show that the norm defined by the inner product in equation (2.2) indeed
satisfies the three fundamental properties of a norm.
Exercise 2.2.2. Let A be a normed space, and let ψ, ϕ ∈ A with ∥ϕ∥ = 1. Show that
1
ψ − ϕ ⩽ 2∥ψ − ϕ∥ . (2.4)
∥ψ∥

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

2.2. INNER PRODUCT SPACES AND HILBERT SPACES 45

1 1
Hint: Write ∥ψ∥
ψ −ϕ= ∥ψ∥
ψ − ψ + ψ − ϕ and use the norm properties.

It is important to note that not all norms are derived from an inner product, and as a
result, not every metric necessarily originates from either an inner product or a norm.

Exercise 2.2.3 (The p-Norms and The Hölder Inequality). The normed space ℓp (Cn ) (with
p ∈ [1, ∞]) is the vector space Cn equipped with the p-norm, ∥ · ∥p , defined on all ψ =
(a1 , . . . , an )T ∈ Cn as:
∥ψ∥p := (|a1 |p + · · · + |an |p )1/p . (2.5)
1 1
The Hölder inequality states that for any p, q ∈ [1, ∞] with p
+ q
= 1 and any ψ =
(a1 , . . . , an )T ∈ Cn and ϕ = (b1 , . . . , bn )T ∈ Cn we have
X
|ax bx | ⩽ ∥ψ∥p ∥ϕ∥q . (2.6)
x∈[n]

Use this to show that ∥ · ∥p is a norm, and that for p ̸= 2 this norm is not induced from an
inner product.

The p-norms above have numerous applications in many fields of science. Of particular
importance, are the extreme cases of p = 1 and p = ∞:

∥ϕ∥1 := |c1 | + · · · + |cn | and ∥ϕ∥∞ := max |cx | , (2.7)

x∈[n]

where throughout the book we use the notation [n] := {1, . . . , n} for every integer n ∈ N.
For these cases, the norms above behave monotonically under stochastic matrices as we show
now.
Let S = (sxy ) ∈ STOCH(m, n) be an m × n column stochastic matrix; i.e. a matrix
whose components are non-negative real numbers, and all the columns sums to one. Then,
the 1-norm behaves monotonically under such matrices; i.e.

∥Sv∥1 ⩽ ∥v∥1 ∀v ∈ Cn . (2.8)

Indeed, by definition
X X X X X
∥Sv∥1 = sxy vy ⩽ sxy |vy | = |vy | = ∥v∥1 . (2.9)
x∈[m] y∈[n] x∈[m] y∈[n] y∈[n]

where we used the triangle inequality.

Exercise 2.2.4. Let R ∈ Rm×n

+ be a row stochastic matrix (i.e. a matrix with non-negative
components whose rows sum to one). Show that

∥Rv∥∞ ⩽ ∥v∥∞ ∀ v ∈ Cn . (2.10)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

46 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS

2.2.1 Hilbert Spaces

A Cauchy sequence in an inner product space, A, is a sequence of vectors {ψn }n∈N in A with
the property that for any ε > 0 there exists N ∈ N such that for any n, m > N

∥ψn − ψm ∥ < ε . (2.11)

An inner product space A is said to be complete if any Cauchy sequence in A converges in

A, with respect to the metric induced by the inner product. That is, A is complete if for any
Cauchy sequence, {ψn }n∈N ⊆ A, there exists a state ψ ∈ A such that limn→∞ ∥ψn − ψ∥ = 0.
Complete inner product spaces are called Hilbert spaces and complete normed spaces are
called Banach spaces.
Common examples of Hilbert spaces are Rn and Cn with the standard inner products.
In finite dimensions, all Hilbert spaces are isomorphic to these spaces. A common example
in quantum mechanics is the Hilbert space of square integrable functions
n Z ∞ o
2
L (R) := f : R → C : |f (x)|2 dx < ∞ , (2.12)
−∞

where the inner product between two function f, g ∈ L2 (R) is defined as

Z ∞
⟨g, f ⟩ := f (x)g(x)dx . (2.13)
−∞

Exercise 2.2.5. Show that L2 (R) satisfies all the axioms of a Hilbert space.
To distinguish more clearly between Hilbert spaces and inner product spaces, consider
the space C([a, b]) of continuous complex-valued functions. This infinite-dimensional vector
space is equipped with an inner product given by:
Z b
⟨g, f ⟩ := f (t)g(t)dt ∀ f, g ∈ C([a, b]) . (2.14)
a

However, this space is not a Hilbert space since it is not complete with respect to the metric
induced by this inner product. For example, consider the following sequence of continuous
functions in C([−1, 1]): 
0
 if t ∈ [−1, 0)
fk (t) := kt if t ∈ [0, k1 ) . (2.15)

1 if t ∈ [ k1 , 1]


Note that while the sequence is Cauchy, its limit does not exist in C([−1, 1]) as fk cannot
converge to a continuous function. However, this book mostly considers finite-dimensional
Hilbert spaces, in which all inner product spaces are complete and isomorphic to Cn (or Rn ).
Thus, such examples are not relevant for our purposes, and we will use the terms ‘Hilbert
space’ and ‘inner product space’ interchangeably. Similarly, in finite dimensions, all normed
spaces are Banach spaces (i.e., complete normed spaces).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

2.2. INNER PRODUCT SPACES AND HILBERT SPACES 47

Another important example of a Hilbert space, is the space of n × m complex matrices,

denoted by Cn×m . In this space, the inner product is defined as follows: for all M, N ∈ Cn×m

⟨M |N ⟩ := Tr [M ∗ N ] (2.16)

where M ∗ denotes the adjoint of M . This specific inner product is referred to as the ‘Hilbert-
Schmidt’ inner product, and it is occasionally denoted with a subscript as ⟨ | ⟩HS .
Exercise 2.2.6. Consider the space Cm×n .
1. Show that the definition in (2.16) satisfies the three axioms of an inner product.

2. Find an isometrical isomorphism vec : Cm×n → Cmn such that for all M, N ∈ Cm×n

⟨M |N ⟩HS = ⟨vec(M )|vec(N )⟩2 (2.17)

where ⟨ | ⟩HS is the Hilbert-Schmidt inner product of Cm×n , and ⟨ | ⟩2 is the standard
inner product of Cmn .

3. Express the norm induced by the inner product (2.16).

Exercise 2.2.7. Prove the following two properties of a Hilbert space A:
1. The Cauchy-Schwarz inequality: for all ψ, ϕ ∈ A

|⟨ϕ, ψ⟩| ⩽ ∥ϕ∥∥ψ∥ . (2.18)

2. Pythagorean theorem: If ψ1 , . . . , ψn are orthogonal vectors, that is, ⟨ψx |ψy ⟩ = 0 for
distinct indices x, y ∈ [n], then
X 2 X
ψx = ∥ψx ∥2 . (2.19)
x∈[n] x∈[n]

2.2.2 The Dirac Notations

We will employ Dirac notation along with the ket symbol to represent vectors in a Hilbert
space A. Consequently, going forward, we will represent the vectors ψ, ϕ, and χ in A as
|ψ⟩, |ϕ⟩, and |χ⟩. The standard basis, also known as the computational basis of Cn , will be
denoted as {|1⟩, |2⟩, . . ., |n⟩} or sometimes as {|0⟩, |1⟩, . . ., |n − 1⟩}. The latter notation is
typically used for n = 2, in which case the standard basis is given by
   
1 0
|0⟩ :=   and |1⟩ :=   . (2.20)
0 1

The dual space, A∗ , of a Hilbert space, A, is defined as the set of all linear functionals
on A. A linear functional is a function f : A → F with the property that for all |ψ⟩, |ϕ⟩ ∈ A

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

48 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS

and a, b ∈ F we have f (a|ψ⟩ + b|ϕ⟩) = af (|ψ⟩) + bf (|ϕ⟩). For a fixed vector |χ⟩ ∈ A, the
function fχ : A → F defined by fχ (|ψ⟩) := ⟨χ|ψ⟩ is a linear functional, and every linear
functional has this form. It is therefore convenient to denote the linear functionals with a
‘bra’ notation. That is, instead of fχ , we denote this functional simply by ⟨χ|, so that its
action on an element |ψ⟩ is given by the inner product ⟨χ|ψ⟩. Hence, A∗ consists of bra
vectors.
As an example, the dual space of C2 is spanned by the standard bra basis

⟨0| := (1, 0) and ⟨1| := (0, 1). (2.21)

Note that there is a one-to-one correspondence between A = Cn and its dual A∗ , via the
bijective mapping X X
|ψ⟩ = cx |x⟩ 7→ ⟨ψ| = c̄x ⟨x| , (2.22)
x∈[n] x∈[n]

where cx ∈ C and c̄x denotes the complex conjugate of cx . Moreover, we will denote
⟨ψ|ρ|ϕ⟩ := ⟨ψ|ρϕ⟩, where ρ is a linear transformation (see below).

2.2.3 Direct Sum of Hilbert Spaces

Given two subspaces A and B of a Hilbert space C, we say that C is the direct sum of A
and B, and write C = A ⊕ B if C = A + B and A ∩ B = {0}. Note that in this case
|C| = |A| + |B|, where we use the notation |A| to denote the dimension of A. If A and B
are two arbitrary Hilbert spaces we can always construct a third Hilbert space, C, such that
C = A ⊕ B. For example, given two Hilbert spaces A and B define (abstractly)

C := {(ψ, ϕ) : |ψ⟩ ∈ A , |ϕ⟩ ∈ B} (2.23)

with addition rule (ψ1 , ϕ1 ) + (ψ2 , ϕ2 ) := (ψ1 + ψ2 , ϕ1 + ϕ2 ), scalar multiplication c(ψ, ϕ) =

(cψ, cϕ), and inner product,

⟨(ψ1 , ϕ1 )|(ψ2 , ϕ2 )⟩ := ⟨ψ1 |ψ2 ⟩ + ⟨ϕ1 |ϕ2 ⟩ . (2.24)

Identifying an element |ψ⟩ ∈ A with an element (ψ, 0) ∈ C and an element |ϕ⟩ ∈ B with an
element (0, ϕ) ∈ C, we conclude that C = A ⊕ B. For example, R3 can be decomposed as
R2 ⊕ R or as R ⊕ R ⊕ R.

2.2.4 Tensor Product of Hilbert Spaces

Another way to combine two Hilbert spaces is through their tensor product. In quantum
information, a composite physical system is often described by an element of a tensor product
of Hilbert spaces. We will only consider tensor products of finite dimensional Hilbert spaces,
although it’s worth noting that the tensor product can be defined in a basis-independent
manner as a quotient space of a free vector space.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

2.2. INNER PRODUCT SPACES AND HILBERT SPACES 49

Let A and B be two finite dimensional Hilbert spaces with dimensions |A| and |B|,
respectively. We define a bilinear function ⊗ that takes two vectors |ψ⟩ ∈ A and |ϕ⟩ ∈ B
and returns an element of the form |ψ⟩ ⊗ |ϕ⟩. The bilinearity of ⊗ means that for all c ∈ F,
and all vectors |ψ1 ⟩, |ψ2 ⟩ ∈ A, and |ϕ1 ⟩, |ϕ2 ⟩ ∈ B,
1. (|ψ1 ⟩ + |ψ2 ⟩) ⊗ |ϕ⟩ = |ψ1 ⟩ ⊗ |ϕ⟩ + |ψ2 ⟩ ⊗ |ϕ⟩

2. |ψ⟩ ⊗ (|ϕ1 ⟩ + |ϕ2 ⟩) = |ψ⟩ ⊗ |ϕ1 ⟩ + |ψ⟩ ⊗ |ϕ2 ⟩

3. c (|ψ⟩ ⊗ |ϕ⟩) = (c|ψ⟩) ⊗ |ϕ⟩ = |ψ⟩ ⊗ (c|ϕ⟩)

The span of all such objects, |ψ⟩ ⊗ |ϕ⟩, with |ψ⟩ ∈ A and |ϕ⟩ ∈ B is denoted by A ⊗ B.
Alternatively, let {|x⟩A }x∈[m] and {|y⟩B }y∈[n] be two corresponding orthonormal bases of A
and B, where m := |A| and n := |B| denote the dimensions of A and B, respectively. Then,
A ⊗ B can be defined as the collection
nX X o
A ⊗ B := µxy |x⟩A ⊗ |y⟩B : µxy ∈ F (2.25)
x∈[m] y∈[n]

From its definition above, it follows that A ⊗ B is a vector space with an orthonormal basis
{|x⟩A ⊗|y⟩B }. In particular, note that |A⊗B| = |AB| := |A||B| (we will therefore sometimes
use the notation AB to mean A ⊗ B). The inner product between two elements |ψ1 ⟩ ⊗ |ϕ1 ⟩
and |ψ2 ⟩ ⊗ |ϕ2 ⟩ is simply given by the product of the inner products; i.e. ⟨ψ2 |ψ1 ⟩⟨ϕ2 |ϕ1 ⟩.
More generally, given two states
X X X X
|ψ AB ⟩ = µxy |x⟩A ⊗ |y⟩B and |ϕAB ⟩ = νxy |x⟩A ⊗ |y⟩B , (2.26)
x∈[m] y∈[n] x∈[m] y∈[n]

their inner product is given by

⟨ψ AB |ϕAB ⟩ = Tr [M ∗ N ] (2.27)

where the matrices M := (µxy ) and N := (νxy ). Using these definitions, the set A ⊗ B forms
a Hilbert space. Notably, the inner product defined in Equation (2.27) is the same as the
one defined in Equation (2.16). This is because each element |ψ⟩ ∈ A ⊗ B can be represented
as a matrix M = (µxy ). It can be easily demonstrated that this mapping between bipartite
vectors and matrices is an isometric isomorphism.
Exercise 2.2.8. Show that Cm×n ∼
= Cm ⊗ Cn .

2.2.5 The Kronecker Tensor Product

We will use the notation |ψ⟩|ϕ⟩ to mean |ψ⟩ ⊗ |ϕ⟩. For basis elements, we will use inter-
changeably the notations |xy⟩AB := |x⟩A |y⟩B := |x⟩A ⊗ |y⟩B . Since all Hilbert
P spaces inAfinite
n A
dimensions are isomorphic to C (for some n ∈ N), any vector |ψ ⟩ = x∈[m] ax |x⟩ ∈ A
(where m := |A|) corresponds to a vector a := (a1 , . . . , am )T ∈ Cm , and any vector

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

50 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS

|ϕB ⟩ = y∈[n] by |y⟩B ∈ B (where n := |B|) corresponds to a vector b := (b1 , . . . , bn )T ∈ Cn .

Hence, the vector |ψ A ⟩|ϕB ⟩ = x∈[m] y∈[n] ax by |xy⟩AB corresponds to the vector a⊗b given
P P
by  
ab
 1 
 .. 
a ⊗ b :=  .  ∈ Cmn . (2.28)
 
am b
The above definition of a tensor product between vectors in Cm and Cn is the Kronecker
product which is also denoted with the symbol ⊗. The Kronecker product is defined on
arbitrary matrices as follows. Let M = (µxy ) ∈ Ck×ℓ and N ∈ Cp×q . The Kronecker product
M ⊗ N is a matrix in Ckp×ℓq defined by
 
µ N · · · µ1ℓ N
 11
 .. .. 

M ⊗N = . ..
. .  . (2.29)
 
µk1 N · · · µkℓ N

It is simple to check that the tensor product above is bilinear and associative, however, it
is not commutative. Also, note that for A = Cn×m and B = Cp×q the tensor product given
in (2.25) is equivalent to the Kronecker product. We will therefore use the terms tensor
product and Kronecker product interchangeably.

Exercise 2.2.9. Let M be an m × n matrix and N be an n × k matrix. Find all the values
of m, n, k for which M ⊗ N = M N .

Exercise 2.2.10. Show that the Kronecker product is not commutative, and that there al-
ways exist permutations matrices P and Q in appropriate dimensions such that M ⊗ N =
P (N ⊗ M ) Q.

Exercise 2.2.11. Prove the following properties. For any matrices K, L, M, N in appropri-
ate dimensions (in some cases square matrices):

1. (K ⊗ L)(M ⊗ N ) = KM ⊗ LN .

2. K ⊗ L is invertible if and only if K and L are invertible and in this case (K ⊗ L)−1 =
K −1 ⊗ L−1 .

3. (K ⊗ L)T = K T ⊗ LT and (K ⊗ L)∗ = K ∗ ⊗ L∗ .

4. For K ∈ Cm×m and L ∈ Cn×n , Tr [K ⊗ L] = Tr[K]Tr[L] and det(K⊗L) = (det(K))n (det(L))m .

5. Ket K ∈ Cm×m and L ∈ Cn×n , and let λ1 , . . . , λm be the eigenvalues of K and

µ1 , . . . , µn be those of L (listed according to multiplicity). Then the eigenvalues of
K ⊗ L are λx µy with x ∈ [m] and y ∈ [n].

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

2.3. LINEAR OPERATORS ON HILBERT SPACES 51

6. Rank(K ⊗ L) = Rank(K)Rank(L).
Kronecker also defined a direct sum that is closely related to the definition above. Given
two matrices M ∈ Cm×m and N ∈ Cn×n their Kronecker sum is defined by

M ⊕ N := M ⊗ In + Im ⊗ N . (2.30)

The sum appears naturally in physics, typically when describing the Hamiltonian of a com-
posite system consisting of non-interacting subsystems. A celebrated result connecting the
Kronecker product and Kronecker sum is given in the following exponential relation:

eM ⊕N = eM ⊗ eN . (2.31)
P∞ Mn
Exercise 2.2.12. Prove the above equality. Hint: Use the formula eM = n=0 n! and the
commutativity of M ⊗ In and Im ⊗ N .

2.3 Linear Operators on Hilbert Spaces

An operator (i.e. a map) M : A → B is said to be linear if and only if for all |ψ⟩, |ϕ⟩ ∈ A
and a, b ∈ F
M (a|ψ⟩ + b|ϕ⟩) = aM |ψ⟩ + bM |ϕ⟩ . (2.32)
When |B| = 1 the linear operator M is called a functional. Given a basis {|x⟩A }x∈[m] of A and
{|y⟩B }y∈[n] of B, M can be represented in terms of a matrix (µxy ) with components µyx :=
B
⟨y|M |x⟩A which is a notation for the inner product between M |x⟩A and |y⟩B . Specifically,
M can be expressed as: X X
M= µyx |y⟩⟨x| . (2.33)
x∈[m] y∈[n]

For improved clarity in our exposition, we have omitted the superscripts A and B from |x⟩A
and |y⟩B . This use of the Dirac notations has the advantage that the action of M on a vector
|ψ⟩ ∈ A becomes
X X X X
M |ψ⟩ = µyx |y⟩⟨x|ψ⟩ = µyx ⟨x|ψ⟩ |y⟩ (2.34)
y∈[n] x∈[m] y∈[n] x∈[m]

Note that the numbers µyx form a matrix which is known as the matrix representation
of M . Sometimes we will identify the matrix (µyx ) with the operator M and write M =
(µyx ). However, note that different choices of orthonormal bases {|x⟩A }x∈[m] and {|y⟩B }x∈[n] ,
correspond to different matrix representations (µyx ) of the same linear operator M (see
Exercise 2.3.3).
Exercise 2.3.1. Show that any linear operator M : A → B can be expressed as
X
M= λz |vzB ⟩⟨uA
z| , (2.35)
z∈[k]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

52 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS

where {λz }z∈[k] are the singular values of M , and {|uA B

z ⟩}z∈[k] and {|vz ⟩}z∈[k] are orthonormal
sets of vectors in A and B, respectively. Hint: Use the singular value decomposition of a
complex matrix.

The adjoint of a linear operator M : A → B is itself a linear operator M ∗ : B → A

defined by the relation

⟨ϕ|M ψ⟩ = ⟨M ∗ ϕ|ψ⟩ ∀ |ψ⟩ ∈ A and |ϕ⟩ ∈ B . (2.36)

If M has the form (2.33) then

X X
M∗ = µyx |x⟩⟨y| (2.37)
x∈[m] y∈[n]

where the bar indicates complex conjugation.

Exercise 2.3.2. Let U : A → A be a linear operator on a vector space A. Show that U is

unitary, i.e. U ∗ = U −1 if and only if there exist two orthonormal bases of A, {|vx ⟩}x∈[m] and
{|ux ⟩}x∈[m] , such that
X
U= |vx ⟩⟨ux | . (2.38)
x∈[m]

Exercise 2.3.3. Let M be a linear operator as in (2.33), and denote by M̃ the matrix whose
components are µyx := ⟨y|M |x⟩ (i.e. M is an operator whereas M̃ is a matrix). Let {|ax ⟩}
and {|by ⟩} be two orthonormal bases of A and B, respectively. Show that there exists two
unitary matrices U and V (not necessarily of the same size) such that
X X
M= νyx |by ⟩⟨ax | (2.39)
x∈[m] y∈[n]

where {νyx } are the components of the matrix N := V M̃ U .

For any linear operator T : A → B its kernel, denoted Ker(T ), is the subspace of A
consisting of all vectors |ψ⟩ ∈ A such that T |ψ⟩ = 0. The image of T , denoted by Im(T ),
is the set of vectors {T |ψ⟩} over all vectors |ψ⟩ ∈ A. Finally, the support of T , denoted
supp(T ), is also a subspace of A consisting of all the vectors that are orthogonal to all the
elements in Ker(T ). In particular, for any non-zero vector |ψ⟩ ∈ supp(T ) we have T |ψ⟩ =
̸ 0.

Exercise 2.3.4. Let A and B be two Hilbert spaces, and let T : A → B be a linear trans-
formation.

1. Show that if Ker(T ) = {0} and Im(T ) = B then A = B and T is invertible.

2. Show that if Ker(T ) = A then T = 0.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

2.3. LINEAR OPERATORS ON HILBERT SPACES 53

2.3.1 Isometries and Partial Isometries

A linear operator V : A → B is called an isometry if

⟨V ψ|V ψ⟩ = ⟨ψ|ψ⟩ ∀ |ψ⟩ ∈ A . (2.40)

Using the dual V ∗ : B → A we can express the equation above as

⟨ψ|V ∗ V |ψ⟩ = ⟨ψ|ψ⟩ ∀ |ψ⟩ ∈ A . (2.41)

We therefore conclude (see Exercise 2.3.5) that V is an isometry if and only if V ∗ V = I A ,

where I A is the identity operator on A. The condition that V ∗ V = I A implies that |A| ⩽ |B|
and if |A| = |B| then V is necessarily a unitary operator. One can view isometries as
‘embeddings’ of the Hilbert space A into B. In particular, they preserve the inner product;
indeed, for any |ϕ⟩, |ψ⟩ ∈ A

⟨V ϕ|V ψ⟩ = ⟨ϕ|V ∗ V |ψ⟩ = ⟨ϕ|ψ⟩ . (2.42)

We say that V : A → B is a partial isometry if V is an isometry when restricted to its

support. Therefore, an isometry is a partial isometry but the converse is not necessarily true.
Let V : A → B be a partial isometry, and let {|az ⟩}z∈[k] be an orthonormal basis of supp(V ).
Then, since V is an isometry on supp(V ) ⊆ A, it follows that the vectors |bz ⟩ := V |az ⟩, with
z ∈ [k], form an orthonormal set of vectors in B. Therefore, k cannot exceed |B|, and V can
be expressed as X
V = |bz ⟩⟨az | (2.43)
z∈[k]

where k := |supp(V )| ⩽ min{|A|, |B|}, {|az ⟩}z∈[k] is an orthonormal set of vectors in A, and
{|bz ⟩}[k] is an orthonormal set of vectors in B. In other words, a linear operator V : A → B
is a partial isometry if and only if there exists two set of orthonormal vectors, {|az ⟩}z∈[k] ⊂ A
and {|bz ⟩}z∈[k] ⊂ B such that (2.43) holds.
Exercise 2.3.5. Show that if Eq. (2.41) holds then V ∗ V = I A .
Exercise 2.3.6. Use (2.35) to show (2.43).
Exercise 2.3.7. A linear operator Π : A → A is called an orthogonal projection if and only
if Π2 = Π = Π∗ .
1. Show that Π : A → A is an orthogonal projection if and only if Π∗ Π = Π.
2. Show that V : A → B is a partial isometry if and only if V ∗ V is an orthogonal
projection in A, and V V ∗ is an orthogonal projection in B.
Exercise 2.3.8. Let A ⊆ B be a subspace of B, and let V : A → B be an isometry satisfying
V ∗ V = Π, where Π : B → A is the projection onto the subspace A. Show that there exists a
unitary matrix U : B → B such that

UΠ = V . (2.44)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

54 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS

2.3.2 Hermitian and Positive Operators

A linear operator H : A → A (A is a Hilbert space) is called Hermitian if H = H ∗ . Any
Hermitian operator H has a spectral decomposition
X
H= λx |vx ⟩⟨vx | , (2.45)
x∈[m]

where {|vx ⟩}x∈[m] is an orthonormal basis of A. The coefficients {λx }x∈[m] are the eigenvalues
of H. We denote by Tr[H] the trace of Hermitian operator H : A → A. That is,
X
Tr[H] := ⟨x|H|x⟩ . (2.46)
x∈[m]

Note that the definition above is independent on the choice of the orthonormal basis {|x⟩}
of A.
We say that a linear operator ρ : A → A is positive semidefinite, and write ρ ⩾ 0, if and
only if
⟨ψ|ρ|ψ⟩ ⩾ 0 ∀ |ψ⟩ ∈ A . (2.47)
If the above inequality is strict for all non-zero |ψ⟩ ∈ A then we say that ρ is positive definite
and write ρ > 0. We will also write ρ ⩾ σ to mean ρ − σ ⩾ 0, and will use the Greek
letters such as ρ and σ to denote linear operators that are positive semidefinite. The set of
all positive semidefinite operators acting on Hilbert space A will be denoted by Pos(A).
Every positive linear operator ρ : A → A is necessarily Hermitian. To see why, observe
that the positivity property above implies

⟨ψ|ρ|ψ⟩ = ⟨ψ|ρ|ψ⟩ = ⟨ψ|ρ∗ |ψ⟩ . (2.48)

Therefore, for all |ψ⟩ ∈ A we have

⟨ψ|ρ − ρ∗ |ψ⟩ = 0 . (2.49)

Now, observe that the operator N := ρ − ρ∗ satisfies N ∗ N = N N ∗ . Such operators are called
normal operators and are known to be diagonalizable. Therefore, taking |ψ⟩ above to be
an eigenvector of N we conclude that all the eigenvalues of N are zero. Hence, N = 0 or
equivalently ρ = ρ∗ .
Exercise 2.3.9. Let ρ : A → A be a linear operator. Show that the following are equivalent:
1. ρ ⩾ 0

2. ρ is Hermitian and all its eigenvalues are non-negative.

3. There exists a linear map M : A → A such that ρ = M ∗ M.

Exercise 2.3.10. Let ρ ∈ Pos(A). Show that for any complex matrix M : A → B we have
M ρM ∗ ∈ Pos(B).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

2.3. LINEAR OPERATORS ON HILBERT SPACES 55

Exercise 2.3.11. Show that for any two vectors |ψ⟩, |ϕ⟩ ∈ A

Tr [|ψ⟩⟨ϕ|] = ⟨ϕ|ψ⟩ . (2.50)

Note that the identity operator I A : A → A is a positive operator (i.e. an operator with
all eigenvalues strictly greater than zero) given by
X
IA = |x⟩⟨x| . (2.51)
x∈[m]

Note that for any orthonormal basis {|vx ⟩}x∈[m] of A we have

 
X X
 |vx ⟩⟨vx | |ψ⟩ = ⟨vx |ψ⟩|vx ⟩ = |ψ⟩. (2.52)
x∈[m] x∈[m]

|vx ⟩⟨vx | = I A for any orthonormal basis {|vx ⟩}x∈[m] of A.

P
Therefore, x∈[m]

Decomposition of Hermitian Operators

Let H : A → A be an Hermitian operator. Since every Hermitian operator is diagonalizable,
we can express H as
X
H= λx |ϕx ⟩⟨ϕx | (2.53)
x∈[n]

where {|ϕx ⟩}x∈[n] is an orthonormal basis, and the eigenvalues {λx }x∈[n] are all real. There-
fore, it is possible to decompose H as

H = H+ − H− (2.54)

where X X
H+ := λx |ϕx ⟩⟨ϕx | ⩾ 0 and H− := |λx ||ϕx ⟩⟨ϕx | ⩾ 0 . (2.55)
x: λx ⩾0 x: λx <0
P
By definition H+ , H− ⩾ 0 and H+ H− = H− H+ = 0. Further, denote by Π− := x: λx <0 |ϕx ⟩⟨ϕx |
the projection to the negative eigenspace of H, and by Π+ = I − Π− the projection to the
non-negative eigenspace of H. Then, H± = HΠ± = Π± H.

Exercise 2.3.12. Let H : A → A be an Hermitian operator as above. Show that

|H| ± H
H± = , (2.56)
2
where H± are the positive and negative parts of H as defined in (2.55).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

56 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS

2.3.3 Quantum States

A positive semidefinite operator ρ : A → A (which may be denoted as ρA to indicate its
underlying Hilbert space A) is known as a quantum state or a density matrix if its trace
is equal to one. In this case, we denote its eigenvalues as {px }x∈[m] , as they represent the
components of a probability distribution.
A quantum state ρ is called pure if its rank is equal to one. Consequently, pure states
are projections onto 1-dimensional subspaces of A, and the normalized vector they project
onto is also referred to as a quantum state. It should be noted that a positive semidefinite
matrix ρ is a pure state if and only if Tr[ρ2 ] = Tr[ρ] = 1. We often denote pure states by
ϕ := |ϕ⟩⟨ϕ| or ψ := |ψ⟩⟨ψ|, where |ψ⟩ and |ϕ⟩ are normalized vectors in A.
Exercise 2.3.13. Show that any Hermitian linear operator ρ : A → A (with A being a finite
dimensional Hilbert space) is a pure quantum state if and only if

Tr[ρ3 ] = Tr[ρ2 ] = Tr[ρ] = 1 . (2.57)

Give an example of Hermitian operator H : A → A such that Tr[H 2 ] = Tr[H] = 1 but H is

not a pure quantum state.
Exercise 2.3.14. A linear operator G ∈ L(A) is called a contraction if G∗ G ⩽ I A .
1. Show that G ∈ L(A) is a contraction if and only if G∗ is a contraction.

2. Show that G is a contraction if and only ig G|ψ⟩ 2

⩽ |ψ⟩ 2
for all |ψ⟩ ∈ A.

3. Show that if G is a contraction then the real part of G, P := 12 (G + G∗ ), satisfies

P ⩽ I A . Hint: Use the triangle inequality to show that P |ψ⟩ 2 ⩽ |ψ⟩ 2 .
Every collection of m normalized vectors {|ψx ⟩}x∈[m] in A, along with a probability distri-
bution {px }x∈[m] is called an ensemble of states. To every ensemble of states, {|ψx ⟩, px }x∈[m]
there is a corresponding quantum state defined by
X
ρ= px |ψx ⟩⟨ψx | . (2.58)
x∈[m]

Note that the above pure state decomposition of ρ is not necessarily the spectral decom-
position since the pure states |ψx ⟩ are not necessarily orthogonal. In fact, any quantum
state corresponds to infinitely many ensembles of quantum states. For example, consider a
quantum state ρ : C2 → C2 defined by:
1 3
ρ = |0⟩⟨0| + |1⟩⟨1| . (2.59)
4 4
Clearly, this is the spectral decomposition of ρ. Now, it is simple to check that ρ can also
be expressed as
1 1
ρ = |u⟩⟨u| + |v⟩⟨v| , (2.60)
2 2

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

2.3. LINEAR OPERATORS ON HILBERT SPACES 57

where r r r r
1 3 1 3
|u⟩ := |0⟩ + |1⟩ , |v⟩ := |0⟩ − |1⟩ . (2.61)
4 4 4 4
Note that |u⟩ and |v⟩ are not orthogonal, and both ensembles in (2.59) and (2.60) corresponds
to the same quantum state ρ.
Exercise 2.3.15. Let {|ψx ⟩, px }x∈[m] and {|ϕy ⟩, qy }y∈[n] be two ensembles of quantum states
in A with m ⩾ n. Show that they correspond to the same density matrix
X X
ρ := px |ψx ⟩⟨ψx | = qy |ϕy ⟩⟨ϕy | (2.62)
x∈[m] y∈[n]

if and only if there exists an m × n isometry matrix V = (vxy ) (i.e. V ∗ V = In ) such that
√ X √
px |ψx ⟩ = vxy qy |ϕy ⟩ ∀ x ∈ [m] . (2.63)
y∈[n]

2.3.4 The Space of Linear Operators

We will denote by L(A, B) the set of all linear operators from A to B, and set L(A) :=
L(A, A). The space L(A, B) is itself a Hilbert space. Let m := |A| and n := |B|, and observe
that L(A, B) is isomorphic to Cn×m with inner product defined for all M, N ∈ L(A, B) by
⟨M |N ⟩ := Tr [M ∗ N ] . (2.64)
Note that M ∗ N ∈ L(A) for which the trace is well defined (Tr[M ] is not well defined if
m ̸= n). The standard basis of L(A, B) is given by {|y⟩⟨x|}, with indices running over
x ∈ [m] and y ∈ [n], where |x⟩ ∈ A and |y⟩ ∈ B. Hence, the dimension of L(A, B) is mn.
Most of the objects in quantum mechanics, like quantum states, observables, positive
operator valued measures (POVM), and so on, are described with Hermitian operators. It
is therefore useful to characterize the space of Hermitian operators in L(A). We denote this
space by Herm(A). That is, H ∈ Herm(A) if and only if H ∈ L(A) with H ∗ = H. Note that
Herm(A) cannot be a vector space over the complex numbers. This is because if c ∈ C is a
non-real complex number, and 0 ̸= H ∈ Herm(A) then
(cH)∗ = c̄H ∗ = c̄H ̸= cH . (2.65)
Therefore, cH ̸∈ Herm(A) so that Herm(A) cannot be a vector space over the complex
numbers.
However, it is simple to verify that Herm(A) is a vector space over the real numbers.
Particularly, if r1 , r2 are real numbers and H1 , H2 ∈ Herm(A) are two Hermitian operators,
then also r1 H1 + r2 H2 is an Hermitian operator. The inner product between two elements
of Herm(A) is induced from L(A), and is given by
⟨H1 |H2 ⟩ := Tr [H1 H2 ] . (2.66)
To summarize, Herm(A) is a real Hilbert space.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

58 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS

Exercise 2.3.16. Show that the dimension of Herm(A) is |A|2 .

Exercise 2.3.17. Show that L(A) has an orthonormal basis consisting of Hermitian matri-
ces.
Exercise 2.3.18. Let L0 (A) be the space of all matrices in L(A) with zero trace.
1. Show that L0 (A) is an inner product space (under the Hilbert-Schmidt inner product)
of dimension d := |A|2 − 1.
2. Show that if {η1 , . . . , ηd } is an orthonormal basis of L0 (A), then { √1 I A , η1 , . . . , ηd }
|A|
is an orthonormal basis of L(A).
3. Let ω ∈ L(A). Show that ω is proportional to the identity matrix if and only if Tr [ωη] =
0 for all traceless matrices η ∈ L0 (A).
Exercise 2.3.19. Let A be a 2-dimensional Hilbert space (e.g. A = C2 ) with an orthonormal
basis {|0⟩, |1⟩} and let

σ0 := I A = |0⟩⟨0|+|1⟩⟨1| , σ1 = |0⟩⟨1|+|1⟩⟨0| , σ2 = −i|0⟩⟨1|+i|1⟩⟨0| , σ3 = |0⟩⟨0|−|1⟩⟨1| .

(2.67)
The three operators σ1 , σ2 , σ3 are known as the Pauli operators.
1. √
Show that all the operators above are Hermitian, unitary, and have a norm equals to
2.
2. Show that the set {σ0 , σ1 , σ2 , σ3 } form an orthogonal basis of Herm(A).
P
3. Show that the commutator [σi , σj ] = 2i k∈[3] εijk σk , where the structure constant εijk
is the Levi-Civita symbol, and i, j, k ∈ {1, 2, 3}.
4. Show that the anti-commutator {σi , σj } = 2δij I.
Like the inner product, also norms in Cn have a natural extension to the space L(A, B).
In particular, the ℓp norms as defined in Exercise (2.2.3) on vectors in Cn can be extended
to elements of L(A, B).

The Schatten Norms

Definition 2.3.1. Let A and B be two Hilbert spaces, M ∈ L(A, B), and p ∈ [1, ∞].
The Schatten p-norm of M is defined as
p1 √
∥M ∥p := Tr|M |p where |M | := M ∗M . (2.68)

The case p = 1 is often called the trace norm and we will discuss it in details in Sec. 5.4.1.
The case p = ∞ is understood in terms of the limit p → ∞. It is often called the operator
norm and is given by
∥M ∥∞ = λmax (|M |) , (2.69)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

2.3. LINEAR OPERATORS ON HILBERT SPACES 59

where λmax (|M |) is the largest eigenvalue of |M |, or equivalently, the largest singular value
of M . The Schatten Norms appears quite often in quantum Shannon theory due to their
relation to the Rényi entropies that we will study later on. We leave it as an exercise for the
reader to prove some of their key properties.
Exercise 2.3.20. Let A and B be two Hilbert spaces, M, N ∈ L(A, B), and p, q ∈ [1, ∞]
such that p1 + 1q = 1. Show that the p-Schatten norm is indeed a norm satisfying the following
properties:
1. Invariance. For any two Hilbert spaces A′ , B ′ with |A′ | ⩾ |A| and |B ′ | ⩾ |B|, and
any isometries V ∈ L(B, B ′ ) and U ∈ L(A, A′ )
∥V M U ∗ ∥p = ∥M ∥p . (2.70)

2. Hölder Inequality.
∥M N ∥1 ⩽ ∥M ∥p ∥N ∥q . (2.71)
3. Sub-Multiplicativity.
∥M N ∥p ⩽ ∥M ∥p ∥N ∥p . (2.72)
4. Monotonicity. If p ⩽ q
∥M ∥1 ⩾ ∥M ∥p ⩾ ∥M ∥q ⩾ ∥M ∥∞ . (2.73)

5. Duality. n o
∥M ∥p = sup |Tr [M ∗ L]| : ∥L∥q = 1 , L ∈ L(A, B) (2.74)

Exercise 2.3.21 (Young’s Inequality). Let A and B be two Hilbert spaces, M, N ∈ Pos(A),
and p, q ∈ [1, ∞) such that p1 + 1q = 1. Use the Hölder inequality of the Schatten norm to
show that
1 1
Tr[M N ] ⩽ Tr[M p ] + Tr[N q ] . (2.75)
p q
with equality if and only if M p = N q . Hint: Take the logarithm on both sides of Hölder
inequality and use the concavity property of the logarithm.
Exercise 2.3.22. Show that for any M ∈ Herm(A), the operator norm and the trace norm
can be expressed as
∥M ∥1 = max Tr[ηM ] and ∥M ∥∞ = max Tr[ηM ] . (2.76)
η∈Herm(A) η∈Herm(A)
∥η∥∞ ⩽1 ∥η∥1 ⩽1

The Ky Fan Norms

Definition 2.3.2. Let A and B be two Hilbert spaces, M ∈ L(A, B),
n := min{|A|, |B|}, and k ∈ [n]. The Ky Fan k-norm of M is defined as

∥M ∥(k) := s1 + s2 + · · · + sk , (2.77)

where s1 ⩾ s2 ⩾ · · · ⩾ sk are the k largest singular values of M .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

60 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS

Remark. When restricting M to be diagonal real matrix we get the following definition of
the Ky Fan norm on Rn : For any r ∈ Rn and k ∈ [n] the kth-Ky Fan norm of r is defined as
X
∥p∥(k) := |rx↓ | , (2.78)
x∈[k]

where {rx↓ }x∈[n] are the components of r arranged such that |r1↓ | ⩾ |r2↓ | ⩾ · · · ⩾ |rn↓ |.
The Ky Fan norms plays an important role in the resource theory of entanglement. We
leave it as an exercise to prove that the Ky Fan norms are indeed norms.

Exercise 2.3.23. Show that the Ky Fan norms are indeed norms that have the following
invariance property. Using the same notations as in the definition above, show that for any
two Hilbert spaces A′ , B ′ with |A′ | ⩾ |A| and |B ′ | ⩾ |B|, and any isometries V ∈ L(B, B ′ )
and U ∈ L(A, A′ )
∥V M U ∗ ∥(k) = ∥M ∥(k) . (2.79)

Exercise 2.3.24. Show that the Ky Fan k-norm can be expressed as

∥M ∥(k) = sup Tr Π|M | , (2.80)

where the supremum is over all orthogonal projections Π with rank no greater than k.

Exercise 2.3.25 (The Ky Fan norms on Rn ). Consider the variant of the Ky Fan norm as
defined in (13.110).

1. Show that the Ky Fan norms are indeed norms in Rn .

2. Let D be an n × n matrix that can be written as a convex combination of permutation

matrices (such matrices are known as doubly-stochastic matrices; see Appendix A.5 for
more details). Show that for all k ∈ [n] and all p ∈ Rn+

∥p∥(k) ⩾ ∥Dp∥(k) . (2.81)

3. Show that for any two probability vectors p, q ∈ Prob(n) and any k ∈ [n] we have

1
∥p − q∥(k) ⩽ ∥p − q∥1 (2.82)
2
and conclude that
1
∥p∥(k) − ∥q∥(k) ⩽ ∥p − q∥1 . (2.83)
2
Hint: For the first inequality use the fact that 21 ∥p − q∥1 = x∈[n] (px − qx )+ , and for
P

the second inequality use the properties of a norm. For any r ∈ R the symbol (r)+ := r
if r ⩾ 0 and otherwise (r)+ := 0.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

2.3. LINEAR OPERATORS ON HILBERT SPACES 61

2.3.5 Linear Operators as Bipartite Vectors

As seen in Exercise 2.2.8, given two finite dimensional Hilbert spaces A and B of dimensions
m := |A| and n := |B|, the space A ⊗ B is (isometrically) isomorphic to the Hilbert space of
m × n matrices Cm×n . For convenience, we will denote AB := A ⊗ B and call it a bipartite
Hilbert space and its elements bipartite vectors. Moreover, we will use the tilde notation to
indicate spaces with the same dimension as without the tilde. For example, A and Ã both
will have the same dimension m := |A|, and consequently AÃ := A ⊗ Ã will have dimension
m2 . The isomorphism A ⊗ B ∼ = Cm×n ∼ = L(B, A) indicates that we can think of bipartite
vectors in AB as linear operators from B to A. This correspondence will be very useful later
on, so we discuss it in more details here.
Any bipartite vector |ψ⟩ ∈ AB can be expressed in terms of the orthonormal basis
{|x⟩A ⊗ |y⟩B } as
X X X X
|ψ AB ⟩ = µxy |x⟩A ⊗ |y⟩B = µxy |x⟩A ⊗ |y⟩B , (2.84)
x∈[m] y∈[n] y∈[n] x∈[m]

where µxy ∈ C. Let Mψ be a linear map from B̃ to A defined below by its action on the
basis elements {|y⟩B̃ } of B̃: X
Mψ |y⟩B̃ := µxy |x⟩A . (2.85)
x∈[m]

With this definition we get that

X X
AB
|ψ ⟩ = Mψ |y⟩ ⊗ |y⟩B = Mψ ⊗ I B
B̃
|yy⟩B̃B (2.86)
y∈[n] y∈[n]

Denoting by X
|ΩB̃B ⟩ := |yy⟩B̃B (2.87)
y∈[n]

we conclude that
|ψ AB ⟩ = Mψ ⊗ I B |ΩB̃B ⟩ . (2.88)
AB
Therefore, for any bipartite vector |ψ⟩ there is a corresponding linear map Mψ : B̃ → A
and vice versa. In other words, the mapping
|ψ AB ⟩ 7→ Mψ (2.89)
is an (isometrically) isomorphism map from the space AB and the space Cm×n . The vector
|ΩB̃B ⟩ has many interesting properties, and later on we will see that, physically, its normalized
version corresponds to a composite system of two maximally entangled subsystems.
Exercise 2.3.26. Prove the following properties of |ΩAÃ ⟩:
1. For any matrix N ∈ L(Ã)

⟨ΩAÃ |I ⊗ N |ΩAÃ ⟩ = Tr [N ] . (2.90)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

62 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS

2. Let M : B → A be a linear map and denote its transpose map by M T : A → B. Show

that
I A ⊗ M T |ΩAÃ ⟩ = M ⊗ I B |ΩB̃B ⟩ (2.91)

3. Show that for any invertible matrices M, N ∈ L(A)

T
M ⊗ N |ΩAÃ ⟩ = |ΩAÃ ⟩ ⇐⇒ M = N −1 . (2.92)

4. Show that if |ψ⟩ = L ⊗ R|φ⟩ for some matrices L and R, then

Mψ = LMφ RT . (2.93)

5. Schmidt Decomposition: Show that for any normalized vector |ψ AB ⟩ ∈ AB there

exists orthonormal sets of k ⩽ min{|A|, |B|} vectors, {|uA B
z ⟩}z∈[k] and {|vz ⟩}z∈[k] , in A
and B, respectively, such that
X√
|ψ AB ⟩ = pz |uA B
z ⟩ ⊗ |vz ⟩ . (2.94)
z∈[k]
P
where pz > 0 for all z ∈ [k], and z∈[k] pz = 1. Hint: Use the singular value decompo-
sition of the matrix Mψ = U DV , where U, V are unitary matrices and D is a |B| × |A|
diagonal matrix with diagonal consisting of the singular values of Mψ .

The Reduced Density Matrix

∗
For any bipartite state |ψ AB ⟩ as in (2.88), the density matrix ρA ψ := Mψ Mψ is called the
reduced density matrix on system A of the bipartite state |ψ AB ⟩, and the density matrix
∗
T
ρB
ψ := M ψ M ψ is called the reduced density matrix on system B of the bipartite state
|ψ AB ⟩. Note that ρA B
ψ is an |A| × |A| matrix while ρψ is an |B| × |B| matrix.

Exercise 2.3.27. Show that the two reduced density matrices above, ρA B
ψ and ρψ , have the
same non-zero eigenvalues.
The Partial Trace
Exercise 2.3.28. Let TrB : L(AB) → L(A) be a linear map defined by its action on
the basis elements of L(AB) as:

TrB |x⟩⟨y|A ⊗ |z⟩⟨w|B := |x⟩⟨y|A Tr |z⟩⟨w|B = δzw |x⟩⟨y|A .

(2.95)

Show that the reduced density matrix ρA

ψ of a bipartite pure state |ψ⟩ ∈ AB can be
expressed as
ρA
AB
⟩⟨ψ AB | .

ψ = TrB |ψ (2.96)

It is well known that the trace remains invariant under cyclic permutation of product of
matrices. The following exercise states that this remain true also for the partial trace.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

2.3. LINEAR OPERATORS ON HILBERT SPACES 63

Exercise 2.3.29. Show that for any ρ ∈ L(AB) and any two matrices η, ζ ∈ L(B)
TrB (I A ⊗ η B )ρAB (I A ⊗ ζ B ) = TrB (I A ⊗ ζ B η B )ρAB .

(2.97)
For any pure bipartite state as in (2.88) there is a unique reduced density matrix ρA
ψ . On
the other hand, for any density matrix ρ ∈ D(A) there are many bipartite pure states |ψ AB ⟩
with the same reduced density matrix ρ (see the exercise below).
Exercise 2.3.30. Let ρ ∈ L(AB). Show that if for all η ∈ L(B) the matrix
TrB I A ⊗ η B ρAB

(2.98)
1 A
is proportional to the identity matrix, then ρAB = uA ⊗ ρB , where uA := |A| I is the uniform
density matrix also known as the maximally mixed state. Hint: Use Part 3 of Exercise 2.3.18.
Exercise 2.3.31. Let A, B, A′ , B ′ be four Hilbert spaces and let Λ ∈ L(AB, A′ B ′ ). Show
that if h ′ i
A
TrB I ⊗ T Λ = 0 ∀ T ∈ L(B ′ , B) (2.99)
′
then Λ = 0. Observe that the operator I A ⊗ T Λ belongs to L(AB, A′ B), so the partial

trace above over B is well defined. Hint: Let {Nx } be anPorthonormal basis (w.r.t. the
Hilbert-Schmidt inner product) of L(B, B ′ ) and write Λ = x Mx ⊗ Nx , where {Mx } are
some matrices in L(A, A′ ). Then show that by taking T above to be Ny you get My = 0.

Definition 2.3.3. Let ρ ∈ D(A) be a density operator. A normalized bipartite pure

quantum state |ψ AB ⟩ ∈ A ⊗ B is called a purification of ρA , if ρA is the reduced
density matrix of |ψ AB ⟩.

Exercise 2.3.32. Let ρ ∈ D(A) be a density matrix.

3. Use Part 2 to provide alternative (simpler!) proof of the claim in Exercise 2.3.15.
Exercise 2.3.33. Operator Schmidt Decomposition: Let A and B be two Hilbert spaces
of dimensions m := |A| and n := |B| and denote by k := min{m2 , n2 }. Show that for every
ρ ∈ Herm(AB) there exists k non-negative real numbers {λz }z∈[k] , and two orthonormal sets
of Hermitian matrices (w.r.t. the Hilbert-Schmidt inner product) {ηz }z∈[k] ⊂ Herm(A) and
{ζz }z∈[k] ⊂ Herm(B) such that
X
ρAB = λz ηzA ⊗ ζzB . (2.101)
z∈[k]

Hint: Use similar lines as in part five of Exercise 2.3.26.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

64 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS

2.4 Encoding Information in Quantum States

The first postulate of quantum mechanics states that to any physical system there is a
(separable) Hilbert space A that is associated with it, and that the information about the
system is completely described by quantum states; that is, unit-trace positive semidefinite
operators in L(A). Furthermore, for isolated physical systems, the information is described
by a pure state |ψ⟩⟨ψ| ∈ L(A). Such pure states can be described by rays of the form
{eiθ |ψ⟩ : θ ∈ [0, 2π]} where |ψ⟩ is a unit vector in A. Therefore, aside from an irrelevant
phase, isolated systems are described with unit vectors in a Hilbert space. As an example,
we start with the building block of quantum information: the quantum bit.

2.4.1 The Quantum Bit

The quantum bit, or in short the qubit, is the quantum generalization of the classical bit.
We will use here the example of a spin of an electron to describe the qubit. The spin of an
electron in some fixed direction can take two possible values. We therefore associate with
it a two dimensional Hilbert space A ∼ = C2 . If the spin of the electron is pointing in the
positive z-direction it is described by a pure state, say |0⟩⟨0|. The question now is how to
represent the spin of the electron in all other directions if we choose |0⟩⟨0| to correspond to
the positive z-direction. To answer this question we will make use of a representation of the
rotation group SO(3) on the space C2 .
Consider a counterclockwise rotation along an axis of rotation that is described by the
unit vector n ∈ R3 . In R3 , such a rotation by an angle θ is described by an orthogonal 3 × 3
(n) (n)
matrix, Rθ , rotating a vector v ∈ R3 to the vector Rθ v (see (C.7) for the explicit form
(n) (y)
of Rθ in terms of n and θ). For example, x = R π z, where x, y, z are the unit vectors
2
(n)
in the x, y, and z directions. Our goal now is to find a 2 × 2 complex matrix Tθ such
(n)
that if |0⟩ ∈ C2 corresponds to the spin in the z-direction, then Tθ |0⟩ corresponds to the
(n) (n)
spin in the direction Rθ z. More generally, we look for a matrix Tθ that has the following
property: if the qubit state |ψ⟩ ∈ C2 corresponds to a spin in a direction m, the state
(n) (n)
Tθ |ψ⟩ corresponds to the spin in the direction Rθ m. Note that in this definition we made
(n)
an assumption that it is the same matrix Tθ that is applied to describe a rotation by an
angle θ along the n-axis irrespective if the initial state of the system was in the z-direction or
any other m-direction. This is justified physically since the physical process that causes the
spin of the particle to rotate is external to the particle and therefore is independent on the
direction that the initial spin of the particle is pointing at. This assumption is also related
to the fourth (evolution) axiom of quantum systems that we will discuss later on.
(n) (n)
The matrices Tθ are not unique since for any phase α ∈ [0, 2π] the matrices eiα Tθ also
(n) (n) (n)
transform ψ := |ψ⟩⟨ψ| to Tθ ψ(Tθ )∗ . Hence, Tθ is determined uniquely up to a phase.
(n) (n)
Moreover, since ∥Tθ |ψ⟩∥ = 1 for all normalized vectors |ψ⟩ ∈ C2 , Tθ must be a unitary
matrix. This is a special case of Wigner theorem that states that physical symmetries act
on the Hilbert space of quantum states unitarily (or antiunitarily). Moreover, repeating

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

2.4. ENCODING INFORMATION IN QUANTUM STATES 65

(n ) (n )
the previous arguments, we conclude that the state Tθ2 2 Tθ1 1 |ψ⟩ is the quantum state that
(n ) (n )
corresponds to thespin in the direction Rθ2 2 Rθ1 1 m. Combining everything we conclude that
the mapping
(n) (n) (n) (n) (n)
Rθ 7→ Tθ where Tθ (ρ) := Tθ ρ(Tθ )∗ ∀ρ ∈ L(A) , (2.102)

is a group representation of SO(3) on the Hilbert space L(A → A) (i.e. the space of linear
operators from L(A) to L(A)).
It will be more convinient to work with a unitary representation on the space of L(A) itself
rather than the Hilbert space L(A → A). For this purpose we need to eliminate the freedom
(n) (n)
in the choice of the phase so that Rθ is mapped to a unique Tθ . We therefore assume
(n) (n)
without loss of generality that det Tθ = 1 so that Tθ ∈ SU(2). This almost eliminates
(n) (n)
completely the ambiguity in the phase, although note that if Tθ ∈ SU(2) then also −Tθ ∈
(n) (n)
SU(2). This would mean that both ±Tθ correspond to the same Rθ . To summarize, up to
(n)
a sign factor, the collection of matrices {Tθ }n,θ form a group representation of SO(3). Such
a 2 : 1 and onto homomorphism h : SU(2) → SO(3) with the property that h(T ) = h(−T )
for any T ∈ SU(2) was found by Cornwell in 1984 (see the Exercise C.1.3). We now discuss
(n)
the explicit form of Tθ .
In Appendix C we show that the most general unitary matrix in SU(2) has the form
−i θ2 (n·σ)
e (see (C.15)), where the factor 1/2 implies that under a 2π addition to θ we get
−i θ+2π θ (n)
e 2 (n·σ)
= −e−i 2 (n·σ) . This property will be consistent with the identification Tθ =
1 (n)
e−i 2 θ(n·σ) which we motivate below (recall that Tθ ∈ SU(2)) since a rotation by θ or by
θ + 2π along any axis n should have the same effect on any qubit state; that is,
θ+2π θ+2π θ θ
e−i 2
(n·σ)
|ψ⟩⟨ψ|ei 2
(n·σ)
= e−i 2 (n·σ) |ψ⟩⟨ψ|ei 2 (n·σ) . (2.103)
(n)
On the other hand, if we didn’t include the factor 1/2, then the identification Tθ = e−iθ(n·σ)
would imply an undesired property that a rotation by θ or by θ + π (along any fixed axis n)
would have the same effect on a qubit |ψ⟩⟨ψ|.
(n) 1
To justify the identification Tθ = e−i 2 θ(n·σ) , recall that any rotation around the z-axis
should not change the state |0⟩⟨0|, as it represents spin in the z-direction. Taking n = z we
get
θ θ
e−i 2 (z·σ) |0⟩ = cos (θ/2) I − i sin (θ/2) σ3 |0⟩ = e−i 2 |0⟩ ,

(2.104)
θ
where we used (C.14). Recall that the vector ei 2 |0⟩ corresponds to the same quantum state
|0⟩⟨0|. Therefore, although there are many possible representations for SO(3) in C2 (such
(n) 1
as ei0.7θ(n·σ) for example), the representation Tθ = e−i 2 θ(n·σ) is the only one that have the
following essential properties:
(n) (n)
1. The mapping Tθ 7→ Rθ
is an onto homomorphism between SU (2) and SO(3).
∗ ∗
2 (n) (n) (n) (n)
2. For any |ψ⟩ ∈ C we have Tθ+2π |ψ⟩⟨ψ| Tθ+2π = Tθ |ψ⟩⟨ψ| Tθ .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

66 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS
∗
(z) (z)
3. Tθ |0⟩⟨0| Tθ = |0⟩⟨0| for all θ ∈ R.

With this representation at hand, we are ready to identify spins in different directions.
We start with a few examples. The spin in the negative z-direction can be obtained by
rotating |0⟩ by 180◦ along the x (or y) axis. It is therefore given by
π π
(x)
Tπ |0⟩ = cos I − i sin x · σ |0⟩ = −i|1⟩ (2.105)
2 2
Therefore, the quantum state |1⟩⟨1| corresponds to the negative z-direction. Recall from
the previous section that using the SG experiment one can determine with certainty if an
electron was prepared in the positive z-direction or negative z-direction. This ability to
distinguish between the two possible spins of the electron is reflected mathematically by the
orthogonality of the vectors |0⟩ and |1⟩. This is a general property of quantum mechanics
that any two distinguishable states of a physical system are described mathematically by
orthogonal vectors. Other examples include:

• Spin in the positive x-direction;

π π 1
(y)
T π |0⟩ = cos I − i sin σ2 |0⟩ = (|0⟩ + |1⟩) := |+⟩ . (2.106)
2 4 4 2

• Spin in the negative x-direction;

π π 1
(y)
T− π |0⟩ = cos I + i sin σ2 |0⟩ = (|0⟩ − |1⟩) := |−⟩ . (2.107)
2 4 4 2

• Spin in the positive y-direction;

π π 1
(x)
T− π |0⟩ = cos I + i sin σ1 |0⟩ = (|0⟩ + i|1⟩) := | + i⟩ . (2.108)
2 4 4 2

• Spin in the negative y-direction;

π π 1
(x)
T π |0⟩ = cos I − i sin σ1 |0⟩ = (|0⟩ − i|1⟩) := | − i⟩ . (2.109)
2 4 4 2

In general, rotations along the n-axis do not change a spin that points in the positive or
negative n-direction. We can use this physical property to compute the qubit representing a
spin in the n-direction. Specifically, a quantum state |ψ⟩ represents an electron with spin in
(n)
the positive or negative n-direction if and only if Tθ |ψ⟩ = eiα |ψ⟩ for some phase eiα . Now,
(n) (n)
since Tθ = cos(θ/2)I − i sin(θ/2)n · σ we get that |ψ⟩ is an eigenvector of the matrix Tθ
if and only if it is an eigenvector of the spin matrix Sn := 21 n · σ.

Exercise 2.4.1. Let Sn be the spin matrix in direction n = (sin(α) cos(β), sin(α) sin(β), cos(α))T ,
with α and β being its spherical coordinates.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

2.4. ENCODING INFORMATION IN QUANTUM STATES 67

1. Show that Sn2 = 14 I.

2. Show that the eigenvalues of Sn are ± 12 .

3. Show that if Sn |ψ⟩ = 12 |ψ⟩ then up to global phase

α α
|ψ⟩ = cos |0⟩ + eiβ sin |1⟩ . (2.110)
2 2

From the exercise above it follows that any qubit is characterized as in (2.110) and
corresponds to spin in the positive direction of n = (sin(α) cos(β), sin(α) sin(β), cos(α))T .
This correspondence between the point on the sphere and a qubit is known in the community
as the Bloch representation of a qubit. In Fig. 2.5 we show some of the popular qubit states
and their location on the Bloch sphere.

Figure 2.5: The Bloch Sphere

Note that although we focused here on the spin of an electron, the qubit corresponds
to any two level quantum system. For example, one can implement a qubit with a photon,
using, say, the |0⟩ to correspond to positive (or left) circular polarization, and the |1⟩ to
corresponds to the negative (or right) circular polarization. Any linear combination of |0⟩
and |1⟩ will then correspond to different types of polarizations. Other examples are atoms,
molecules, and nucleuses, with two energy levels (excited state vs ground state). All these
examples demonstrate that the qubit can be implemented in many different ways and in this
sense, we can claim that quantum information is fungible!

2.4.2 The Quantum Dit and Observables

The quantum dit (qudit) represents any d-level quantum system with d > 2. To any such
physical system we associate a d-dimensional Hilbert space A ∼
= Cd . How should we interpret
the quantum states in A? Recall that in the qubit case we associated to any orthonormal
basis of C2 a spin in some direction n, with the two possibilities of “spin up” and “spin
down” corresponding to the two elements of the basis. Moreover, we were able to construct

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

68 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS

the “spin matrix” Sn whose eigenvectors are the basis elements, and its eigenvalues give the
spins (i.e. ±1/2) associated with the two basis elements.
In a similar way, any orthonormal basis of A ∼ = Cd corresponds to d possible outcomes
that can be, at least in principle, observed in some experiment. Moreover, the second
postulate of quantum mechanics states that any observable (a dynamic variable that can
be measured, like position, momentum, spin, energy, etc) is represented with an Hermitian
operator whose eigenvalues correspond to the values of the observable. Recall that for any
Hermitian operator, H, there exists an orthonormal basis of A consisting only of eigenvectors
of H. This basis corresponds to the possible outcomes in the measurement of the observable
H.
For example, in the qubit case, the spin matrix Sn is an observable corresponding to the
measurement of spin P with the SG experiment. In physical systems of d energy levels, the
Hamiltonian H = x∈[d] Ex |x⟩⟨x|, is an observable corresponding to the measurement of
energy. This particular observable states that the values for the energy of the system (i.e.
Ex ) are discrete, and that the eigenvectors {|x⟩}x∈[d] correspond to these energy levels.

2.4.3 Composite Systems

The third postulate of quantum mechanics states that the Hilbert space associated with
a composite system is the Hilbert space formed by the tensor product of the state spaces
associated with the component subsystems. For example, the Hilbert space associated with
three electrons is given by ABC = C2 ⊗ C2 ⊗ C2 . Note that although C2 ⊗ C2 ⊗ C2 ∼= C8 , the
tensor product structure has a physical significance as the component subsystems correspond
to individual particles. For example, the state

|0⟩A |−⟩B | + i⟩C (2.111)

corresponds to three spin particles (e.g. electrons) with spin A pointing in the positive z-
direction, spin B pointing in the negative x-direction, and spin C pointing in the positive
y-direction. Of course, not all states have the same tensor product form as the state above.
For example, the Greenberger-Horne-Zeilinger (GHZ) state of three qubits

1
|GHZ⟩ := √ |0⟩A |0⟩B |0⟩C + |1⟩A |1⟩B |1⟩C

(2.112)
2

cannot be written as a tensor product of three vectors. States like this will be called entan-
gled.

Exercise 2.4.2. Show that the GHZ state above cannot be written as a tensor product of
three vectors; i.e.
|GHZ⟩ ≠ |ψ A ⟩|ϕB ⟩|χC ⟩ (2.113)
for any three qubit states |ψ A ⟩, |ϕB ⟩, and |χC ⟩.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

2.4. ENCODING INFORMATION IN QUANTUM STATES 69

Exercise 2.4.3. Show √ that for any unit vector n ∈ R3 the singlet state
|ΨAB
− ⟩ := (|01⟩ − |10⟩) / 2 can be expressed as

1
|ΨAB
− ⟩ = √ (| ↑n ⟩| ↓n ⟩ − | ↓n ⟩| ↑n ⟩) (2.114)
2

where | ↑n ⟩ and | ↓n ⟩ are the eigenvalues of the spin matrix Sn . In other words, for any 2 × 2
unitary matrix U we have U ⊗ U |Ψ− ⟩⟨Ψ− |U ∗ ⊗ U ∗ = |Ψ− ⟩⟨Ψ− |.

Exercise 2.4.4. Consider the Hilbert space of two electrons AB ∼

= C2 ⊗ C2 . For any unit
vector n ∈ R denote by Jn := Sn ⊗ I + I ⊗ Sn , and by J := Jx + Jy2 + Jz2 .
3 2 2

1. Show that the eigenvalues of Jn are 1, 0, and −1.

2. Find the eigenvalues of J 2 .

3. Show that [J 2 , Jz ] = 0.

4. Show that each of the following four 2-qubit states are eigenvectors of both J 2 and Jz :
1
|00⟩ , |11⟩ , and |Ψ± ⟩ := √ (|01⟩ ± |10⟩) (2.115)
2

Exercise 2.4.5. Let Sn = 21 n · σ and Sm = 12 m · σ be the spin matrices (observables) in the

directions of the unit vectors n and m, respectively.

1. Show that the commutator [Sn , Sm ] = iSr , where r ∈ R3 is a unit vector. What is the
direction of r?

2. Calculate
1
⟨Ψ− |Sn ⊗ Sm |Ψ− ⟩ where |Ψ− ⟩ = √ (|0⟩ ⊗ |1⟩ − |1⟩ ⊗ |0⟩) . (2.116)
2

3. Let n′ and m′ be two more unit vectors, and let

B := Sn ⊗ Sm + Sn ⊗ Sm′ + Sn′ ⊗ Sm − Sn′ ⊗ Sm′ . (2.117)

Show that
1
B 2 = I − [Sn , Sn′ ] ⊗ [Sm , Sm′ ] , (2.118)
4
and use it to prove that
1
|⟨ψ|B|ψ⟩| ⩽ √ , (2.119)
2
for any state in |ψ⟩ ∈ C2 ⊗ C2 .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

70 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS

2.5 Quantum Measurements

Since quantum mechanics aim to study the behaviour of subatomic particles, the process of
measurement is essential to the theory and requires a rigorous treatment. The SG experi-
ment demonstrates that physical systems are not separated from the apparatuses that are
measuring them. From a philosophical standpoint, “the observer is not separated from that
which is being observed”. The effect of observation on physical systems is not unique to
quantum mechanics. The following story from Jostein Gaarder’s novel, “Sophie’s World”,
shows, among many other things, that even the behaviour of a centipede is effected by
measurements and observations.

“Once upon a time there was a centipede that was amazingly good at dancing
with all hundred legs. All the creatures of the forest gathered to watch every
time the centipede danced, and they were all duly impressed by the exquisite
dance. But there was one creature that didn’t like watching the centipede dance
- that was a tortoise.
How can I get the centipede to stop dancing? thought the tortoise. He couldn’t
just say he didn’t like the dance. Neither could he say he danced better himself,
that would obviously be untrue. So he devised a fiendish plan.
He sat down and wrote a letter to the centipede. ‘O incomparable centipede,’
he wrote, ‘I am a devoted admirer of your exquisite dancing. I must know how
you go about it when you dance. Is it that you lift your left leg number 28 and
then your right leg number 39? Or do you begin by lifting your right leg number
17 before you lift your left leg number 44? I await your answer in breathless
anticipation. Yours truly, Tortoise.’ ”

You can easily guess now what happened next!

2.5.1 Born’s Rule and von-Neumann Projective Measurements

Consider the SG experiment, which involves measuring the spin of an electron along an arbi-
trary direction denoted as n. It is well-established that the act of making this measurement
can impact the state of the electron. This change occurs when the electron’s initial spin is
not aligned with either the positive or negative directions of n.
Consider the initial quantum state |ψ⟩ = a|0⟩ + b|1⟩, where a and b are complex numbers,
subject to the normalization condition |a|2 + |b|2 = 1. Additionally, let | ↑n ⟩ and | ↓n ⟩ denote
the eigenvectors of the spin operator Sn , corresponding to the electron’s spin aligned with
the upward and downward directions along n, respectively. Under an SG-experiment in the
direction n, the transformation of the state |ψ⟩ follows this simple rule:

1. If the SG experiment yields an outcome in the upward direction along n, the state
evolves to | ↑n ⟩.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

2.5. QUANTUM MEASUREMENTS 71

2. If the SG experiment yields an outcome in the downward direction along n, the state
evolves to | ↓n ⟩.

It’s crucial to emphasize that this transformation is independent of the specific form of |ψ⟩.
What does vary with |ψ⟩ is the probability associated with each possible outcome. For
instance, consider the case where n = z and |ψ⟩ is initially prepared as | ↑x ⟩. In this case,
both the upward and downward outcomes are equally probable, each with a 50% chance.
The general rule governing the probability of obtaining a particular outcome in the mea-
surement is known as Born’s rule. According to Born’s rule, the probability, denoted as
Pr(ψ, n), of observing the outcome ↑n (i.e., the electron’s spin aligned with the positive
direction of n) in an SG experiment along the n direction, when the electron is initially
prepared in the state |ψ⟩, is given by:

Pr(ψ, n) = |⟨ψ| ↑n ⟩|2 . (2.120)

This fundamental principle provides a mathematical framework for determining the likeli-
hood of various outcomes in quantum measurements, and it plays a central role in quan-
tum mechanics. For example, suppose an electron in the state |ψ⟩ = a|0⟩ + b|1⟩ is sent
through a SG-experiment in the z-direction. Then, using the Born’s rule (2.120) we get that
|⟨ψ| ↑z ⟩|2 = |a|2 is the probability to obtain spin up (in the z-direction), and |⟨ψ| ↓n ⟩|2 = |b|2
is the probability to obtain spin down.
Similarly, we can extend the Born’s rule for any qudit |ψ⟩ ∈ A ∼ = Cm , and any quantum
measurement that corresponds to an orthonormal basis {|ϕx ⟩}x∈[m] of A. The probability to
obtain an outcome x is given by

Pr(ψ, ϕx ) = |⟨ψ|ϕx ⟩|2 . (2.121)

Note that the above assignment of probability to each ϕx is indeed a probability; that is,
X X X
Pr(ψ, ϕx ) = |⟨ψ|ϕx ⟩|2 = ⟨ψ|ϕx ⟩⟨ϕx |ψ⟩
x∈[m] x∈[m] x∈[m]
X (2.122)
= ⟨ψ| |ϕx ⟩⟨ϕx | |ψ⟩ = ⟨ψ|ψ⟩ = 1 .
x∈[m]

We call every such a measurement that corresponds to an orthonormal basis a basis mea-
surement.
To establish a connection between a basis measurement and a physical observable, let’s
consider the energy of a physical system. Energy is a fundamental observable in quantum
mechanics and therefore can be measured. As previously discussed, any observable in quan-
tum mechanics is represented by an Hermitian operator acting on the Hilbert space A. We
denote the energy operator, often referred to as the Hamiltonian, as:
X
H= ax |ϕx ⟩⟨ϕx | , (2.123)
x∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

72 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS

where {|ϕx ⟩}x∈[m] is an orthonormal basis of A. Therefore, in order to measure the energy,
one has to perform a basis measurement corresponding to the orthonormal basis {|ϕx ⟩}x∈[m] ,
since the energy ax is determined by the value of x. However, this system of m-energy levels,
can be degenerate as it happens quite often in many physical systems. In this case, not all of
the energy values {ax }x∈[m] are distinct. Suppose for example that a1 = a2 < a3 < · · · < am ;
i.e. the state with minimum energy (the ground state) is degenerate. In this case, both
outcomes 1 and 2 correspond to the same ground state, so that the probability that the
energy is equal a1 = a2 := b1 is given by

Pr(ψ, ϕ1 ) + Pr(ψ, ϕ2 ) = ⟨ψ| |ϕ1 ⟩⟨ϕ1 | + |ϕ2 ⟩⟨ϕ2 | |ψ⟩ := ⟨ψ|Π|ψ⟩ , (2.124)

where Π := |ϕ1 ⟩⟨ϕ1 | + |ϕ2 ⟩⟨ϕ2 |. More generally, if we have degeneracy in other energy levels,
we can always express the observable H as
X
H= by Πy , (2.125)
y∈[r]

where b1 < b2 < · · · < br , and each Πy is a sum of rank one projections from {|ϕx ⟩⟨ϕx |} that
correspond to the same energy level by . With this at hand, the probability to measure an
energy of value by is given by
Pr (ψ, Πy ) = ⟨ψ|Πy |ψ⟩ . (2.126)
Therefore, the basis-measurement that we considered so far, can be extended to projective
von-Neumann measurement which is defined as follows.

Definition 2.5.1. A von-Neumann projective measurement (or, in short, projective

measurement) on a Hilbert space PA, is a collection of mutually orthogonal
A
projections {Πx }x∈[r] satisfying x∈[r] Πx = I and for all x, y ∈ [r]

Πx Πy = δxy Πx . (2.127)

Historically, the Born’s rule above (see (2.126)) was determined essentially from consis-
tency with experiments. That is, one can perform many experiments, like the SG-experiment
for example, collect the data, and find a rule that is consistent with the data. Later on, how-
ever, Gleason came up with a theorem showing how to calculate probabilities in quantum
mechanics, and loosely speaking derived the Born’s rule above from a few fundamental prin-
ciples involving measures of a Hilbert space. Gleason’s theorem is applicable for general
(separable) Hilbert spaces in any dimension, but for us, only the finite dimensional case,
i.e. the qudit, will be relevant. We postpone the discussion on Gleason’s theorem for the
next chapter, after we discuss other types of quantum measurements, in order to prove a
slightly more generalized version of Gleason’s theorem, that will be applicable to all types
of measurements (not only to projective von-Neumann measurements).
Exercise 2.5.1. Let Π be a projection on a Hilbert space A. Show that {Π, I − Π} is a
two-outcome von-Neumann projective measurement.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

2.5. QUANTUM MEASUREMENTS 73

Exercise 2.5.2. Let {Πx }x∈[r] be a projective von-Neumann measurement on a finite dimen-
sional Hilbert space A. Show that the collection of all the linearly independent normalized
eigenvectors, of all the projections {Πx }x∈[r] , form an orthonormal basis of A.

2.5.2 The Post-Measurement State

Recall that the basis measurements we discussed earlier lead to a change in the system’s state
according to the rule |ψ⟩ → |ϕx ⟩ if outcome x occurs. However, when it comes to projective
measurements, it’s possible to encounter a scenario where a degenerate energy value, such
as b1 = a1 = a2 , occurs, as illustrated in (2.124). In this case, after the measurement
yields outcome b1 , all we can ascertain is that the system’s state belongs to the subspace
B, defined as B := span{|ϕ1 ⟩, |ϕ2 ⟩}, since any state within this subspace is an eigenvector
of the Hamiltonian H associated with the same energy eigenvalue b1 . However, it raises
the question of which specific state within this subspace will become the post-measurement
state.
Notably, B is a subspace of A, and Π1 = |ϕ1 ⟩⟨ϕ1 | + |ϕ2 ⟩⟨ϕ2 | projects states from A to B.
Furthermore, if the pre-measurement state |ψ⟩ ∈ B, then |ψ⟩ is already an eigenvector of H
and should remain unaffected by the measurement of H. Consequently, if we denote by Λ
the transformation that converts the pre-measurement state |ψ⟩ into the post-measurement
state (which may not be normalized), it follows that Λ|ψ⟩ = |ψ⟩ for |ψ⟩ ∈ B.
On the other hand, if |ψ⟩ ∈ B ⊥ (the orthogonal complement of B in A), the energy
b1 has zero probability of occurring, and therefore, we assume that its corresponding post-
measurement state is Λ|ψ⟩ = 0. In this context, Λ must be equivalent to Π1 . However,
unless |ψ⟩ is within B, the state Π1 |ψ⟩ is not normalized. Thus, the rule stipulates that the
pre-measurement state |ψ⟩ undergoes transformation to the normalized post-measurement
state Π1 ∥ψ⟩/|Π1 |ψ⟩∥.
To summarize, for any physical system that is prepared in a state |ψ⟩ ∈PA, and for any
von-Neumann projective measurement, {Πx }x∈[r] , of the observable H = x∈[r] ax Πx , the
rules of quantum mechanics state that the probability to obtain a value ax is given by the
Born’s rule
Pr (ψ, Πx ) = ⟨ψ|Πx |ψ⟩ , (2.128)
and the quantum state of the system after the outcome x occurred is
1
Px |ψ⟩ . (2.129)
∥Px |ψ⟩∥2
Exercise 2.5.3. Consider the space A ∼= C3 .
1. Find the projection Π0 to the 2-dimensional subspace B := span{|0⟩ + |1⟩, |0⟩ + |2⟩}.
2. Use the projection Π0 that you found in part 1 to construct a two-outcome projective
measurement {Π0 , Π1 }, with Π1 = I − Π0 . If a physical system was prepared initially in
the state |ψ⟩ = √13 (|0⟩ + |1⟩ + |2⟩), what is the probability that this projective measure-
ment yields an outcome 0 (i.e. corresponds to Π0 )? What will be the post-measurement
state in this case?

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

74 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS

Exercise 2.5.4. Let A be a d-dimensional Hilbert space, and let |ψ⟩, |ϕ⟩ ∈ A be two quantum
states.

1. Show that if |ψ⟩ and |ϕ⟩ are orthogonal, then there exists a projective measurement
that distinguishes them. That is, there exists a two-outcome projective measurement
{Π0 , Π1 } such that
Pr(ψ, Π0 ) = 1 and Pr(ϕ, Π1 ) = 1. (2.130)

2. Show that if |ψ⟩ and |ϕ⟩ are not orthogonal, then there is no projective measurement
that distinguishes them.

2.6 Hidden Variable Models

The axioms of quantum mechanics that we considered so far have profound consequences.
For example, suppose an electron has been prepared with a spin in the positive z-direction.
The rules of quantum mechanics tells us that if we where to measure its spin in other direc-
tions (including x or y directions) there is a non-zero probability to find out the spin pointing
in those directions. This raises the question of whether it is possible to model the quantum
system (i.e. the electron in this case) with a (possibly uncountable) collection of classical
random variables, sometimes called hidden variables, each contains information about the
spin of the electron in some direction. Remarkably, such attempts to model physical systems
with local random variables instead of quantum states lead to inconsistencies with experi-
ments. That is, the axioms of quantum mechanics are inconsistent with local hidden variable
models of reality.

2.6.1 The CHSH Inequality and Local Realism

Let 0 and 1 represent positive and negative directions of a spin of an electron, respectively,
and let n be a unit vector in R3 . Denote by An ∈ {0, 1} the value of the spin in the n direction.
That is, we replace the Hilbert space A of the electron, with a collection of classical variables
{An }n , where n is running over all unit vectors in R3 . Suppose the electron is prepared at
the positive z-direction. This means that Az = 0, while from the SG-experiment above we
already saw that Ax = 0 with probability 50% and Ax = 1 also with 50% probability. That
is, An is a random bit, and the probability that An = a (with a ∈ {0, 1}) is denoted by
p(a|n). The conditional probabilities p(a|n) provide all the information about the spin of an
electron that was prepared initially in the z-direction.
Aside from being somewhat artificial (i.e. the model does not provide a mechanism to
derive p(a|n) from a set of axioms), a-priori, it seems to provide a valid description of the
information about the spin. One can view An as a hidden variable that the observer does
not know (unless n = ±z) until she/he performs an SG-experiment (or any other experiment
for that matter). Note that after every measurement the observer will need to update the
conditional probability p(a|n).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

2.6. HIDDEN VARIABLE MODELS 75

For any such a hidden variable model, there is an inherent assumption that the values of
the hidden variables are fixed, predetermined, and corresponds to an element of reality. It
is just the observer’s lack of knowledge about this element of reality that leads to statistical
behaviours. Historically, hidden variable theories were promoted by some physicists who
argued that the formulation of quantum mechanics (as we will discuss in the rest of this
book), does not provide a complete description for the system. Along with Albert Einstein,
they argued that quantum mechanics is ultimately incomplete, and that a complete theory
would avoid any indeterminism. Indeed, hidden variable models as the one described above
for the spin of one electron cannot be ruled out, although, as we discuss now, local hidden
variable models can!
Consider two friends, Alice and Bob, that are located far from each other, and each one
of them posses an electron in their lab. How can we describe the spins of the two electrons?
Following the same line of thoughts as above, we denote by An the random variable associated
with the spin of Alice’s electron in the n-direction, and by Bm the random variable associated
with the spin of Bob’s electron in the m-direction. We denote by p(ab|nm), with a, b ∈ {0, 1},
the joint probability that the two SG experiments in Alice’s lab and Bob’s lab will yield
respectively An = a and Bm = b.
Since it is possible that the spins are correlated in some way, we are not assuming that
p(ab|nm) has the form pA (a|n)pB (b|m), where pA (a|n) is the probability that Alice will get
the value a in a SG experiment in the n-direction (and pB (b|m) is defined similarly). Instead,
since in general An and Bm can be correlated, there exists a parameter λ (λ can describe a
collection of variables) and a probability distribution qλ over it, such that
Z
p(ab|nm) = dλ qλ pA B
λ (a|n) pλ (b|m) , (2.131)

where pA B
λ (a|n) and pλ (b|m) are probability distributions that depend on the correlating
parameter λ. The parameter λ can be either continuous or discrete, and for the latter the
integral above is replaced with a sum. Note that the distribution above is more general than
the form pA (a|n)pB (b|m) as it allows for correlations between Alice’s and Bob’s spins. Yet,
it is a local probability distribution depending only on the local variables An and Bm . We
now discuss a crucial consequence of this local hidden variable model for the spin of two
electrons.

Exercise 2.6.1. Let n, n′ , m, and m′ , be four unit vectors in R3 , and use the tilde symbol
over a random variable X to mean X̃ = 2X − 1 (i.e. X̃ takes values ±1 while X takes values
0, 1). Show that
Ãn B̃m + Ãn′ B̃m + Ãn B̃m′ − Ãn′ B̃m′ ⩽ 2 . (2.132)

Exercise 2.6.2. Denote by

X
⟨AB⟩nm := ⟨An Bm ⟩ := ab p(ab|nm) . (2.133)
a,b∈{0,1}

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

76 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS

Show that any local probability distribution p(ab|nm) as in (2.131) satisfies

⟨ÃB̃⟩nm + ⟨ÃB̃⟩nm′ + ⟨ÃB̃⟩n′ m − ⟨ÃB̃⟩n′ m′ ⩽ 2 . (2.134)

The inequality in the exercise above is called the CHSH inequality after Clauser, Horne,
Shimony and Holt, and it generalizes a similar inequality that was proved in a seminal paper
by John Bell from 1964. As we will see in the exercise below, not all probability distributions
p(ab|nm) satisfy this inequality. One obvious property of the local distribution (2.131) is
that if we sum over a the dependance on n disappears, and similarly if we sum over b
the dependance in m disappear. This property is called “no-signalling” since by choosing
different directions of n, Alice cannot signal Bob, since the marginal distribution on his side
remains intact. The no-signalling property can be stated as follows:
X X
p(ab|nm) = p(ab|n′ m) := pB (b|m) ∀ b, n, n′ , m
a a
X X
p(ab|nm) = p(ab|nm′ ) := pA (a|n) ∀ a, n, m, m′ . (2.135)
b b

The following exercise shows that there exists a probability distribution that on one hand,
is non-signalling, and on the other hand, is violating the CHSH inequality (2.134).

Exercise 2.6.3. Denote the two directions in Alice’s side by n0 := n and n1 := n′ , and the
two direction vectors in Bob’s side by m0 := m and m1 := m′ . Denote also by p(ab|xy) =
p(ab|nx my ) with x, y ∈ {0, 1}. Consider the probability distribution given by
(
1
if a ⊕ b = xy
p(ab|xy) = 2 , (2.136)
0 otherwise

where the ⊕ denotes addition modulo 2.

1. Show that p(ab|xy) above is non-signalling; i.e. satisfies (2.135).

2. Show that p(ab|xy) is non-local by showing that it violates the CHSH inequality (2.134).

3. Show that no other probability distribution (even a signalling distribution) can provide
a higher violation than the one achieved by the distribution (2.136).

To summarize, any local hidden variable model have two main assumptions. The first
one is called the realism assumption, corresponding to our assumption that the spins of the
electrons in all directions have definite values which exist independently of observation. The
second assumption is called the locality assumption corresponding to our implicit assumption
that if, say Alice, is performing a measurement on her electron, it does not influence the
result of Bob’s measurement (on the spin of the electron in his lab). The following violation
of the CHSH inequality demonstrates that local realism does not hold!

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

2.6. HIDDEN VARIABLE MODELS 77

Figure 2.6: Measurements of the spins of two electrons

2.6.2 Quantum Violation of the CHSH Inequality

The violation of the Bell and CHSH inequalities is one of the most profound results of the
20th century. It states that the formalism of quantum mechanics allows for a violation of
the inequality in (2.134). This means that a local hidden variable model cannot account
for quantum correlations. To see the violation, consider two electrons one located in Alice’s
lab and the other in Bob’s lab, that are prepared in some state |ψ AB ⟩ ∈ AB ∼ = C2 ⊗ C2 .
Both Alice and Bob perform a SG experiment with Alice in the direction n and Bob in the
direction m (see Fig. 2.6). As in the previous section, denote by p(ab|nm) the probability
that Alice obtains an outcome a (with a = 0 for positive n-direction and a = 1 for negative
n-direction) and Bob obtains an outcome b (again with the same correspondence of positive
and negative directions for b = 0 and b = 1, respectively). According to Born’s rule, this
probability is given by
B 2
pψ (ab|nm) = ⟨ψ AB |ϕAa ⊗ φb ⟩ (2.137)
where |ϕA 2 B
a ⟩ ∈ C (with a = 0, 1) are the eigenvectors of the spin matrix Sn , and |φb ⟩ ∈ C
2

(with b = 0, 1) are the eigenvectors of the spin matrix Sm . The corresponding eigenvalues
are given by, 12 − a, for |ϕA 1 B
a ⟩ and by, 2 − b, for |φb ⟩. Note the relation with the previous
A A
notations; for example, |ϕ0 ⟩ = | ↑n ⟩ and |ϕ1 ⟩ = | ↓n ⟩.

Exercise 2.6.4. Show that the product spin average is given by

X 1
1

−a − b pψ (ab|nm) = ⟨ψ AB |Sn ⊗ Sm |ψ AB ⟩ . (2.138)
2 2
a,b∈{0,1}

Recall that Ãn and B̃m in (2.134) are random variables taking the values ±1, whereas
the eigenvalues of Sn and Sm are ± 21 . Keeping this in mind, from the exercise above we
conclude that the probability distribution pψ (ab|nm) as given in (2.137) violates the CHSH
inequality (2.134) if
1
|⟨ψ|B|ψ⟩| > , (2.139)
2
where B is the Bell/CHSH operator from Exercise 2.4.5. √ From Exercise 2.4.5 it follows that
2 2
for any state |ψ⟩ ∈ C ⊗ C , we have |⟨ψ|B|ψ⟩| ⩽ 1/ 2. From the next exercise it follows
that there exists directions n, m, n′ , m′ such that this bound is saturated, thereby violating
the CHSH inequality since √12 > 21 . This bound is called the Tsirelson bound.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

78 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS

Exercise 2.6.5. Prove that the Tsirelson

√ bound can be achieved by taking |ψ AB ⟩ to be the
′ ′
singlet state |ΨAB
− ⟩ = (|01⟩ − |10⟩) / 2. That is, find four directions n, n , m, m such that

1
|⟨Ψ− |B|Ψ− ⟩| = √ . (2.140)
2
Note that the violation of the CHSH inequality implies that the quantum probability
distribution pψ (ab|nm) is in general not local; i.e. not of the form (2.131). Such non-local
probability distributions have other non-intuitive consequences as we discuss below.

2.6.3 John Preskill’s Example: Quantum Coins

Consider two electrons, one in√ Alice’s lab and the other in Bob’s lab, prepared in the singlet
AB
state |Ψ− ⟩ = (|01⟩ − |10⟩) / 2. Suppose Alice wants to measure the spin of her electron in
two directions n1 and n2 . She knows that if she performs a measurement in the n1 -direction
that will affect the state of her electron, and she will not be able to determine what would
have happened if she did the n2 -measurement instead. Therefore, she asks Bob to perform
the n2 -measurement on his electron, while she performs the n1 -measurement on her√ electron.
From Exercise 2.4.3, the singlet state can be written as (| ↑n2 ⟩| ↓n2 ⟩ − | ↓n2 ⟩| ↑n2 ⟩) / 2. This
means that if Bob’s measurement output is ↑n2 then Alice would have measured ↓n2 had she
chose to do the n2 measurement instead. Similarly, if Bob’s measurement output is ↓n2 then
Alice would have measured ↑n2 . This gives a way for Alice to determine what would be the
output of the measurement if she chose to perform the n2 -measurement, and thereby know
simultaneously the values of the spins in directions n1 and n2 of her electron. This idea
of determining the outputs of several possibly counterfactual measurements by the use of
entangled states (like the singlet) is the key technique used in many experiments including
the famous delayed choice quantum eraser experiment.
Exercise 2.6.6. Using the method described above, show that the probability psame (n1 , n2 ),
that Alice will obtain both spins of her (single) electron pointing in the same direction (i.e.
both positive ↑A A A A
n1 ↑n2 or both negative ↓n1 ↓n2 ) is:

2 2 1
psame (n1 , n2 ) := ⟨ΨAB A B AB A B
− | ↑n1 ↓n2 ⟩ + ⟨Ψ− | ↓n1 ↑n2 ⟩ = (1 + cos(θ)) (2.141)
2
where θ is the angle between the unit vectors n1 and n2 .
Consider now three unit vectors n1 , n2 , n3 ∈ R3 with an angle of 120◦ between any two; see
Fig. 2.7. From the exercise above we get that psame (n1 , n2 ) = psame (n1 , n3 ) = psame (n2 , n3 ) =
1
4
since cos(120) = −1/2. Therefore,

3
psame (n1 , n2 ) + psame (n1 , n3 ) + psame (n2 , n3 ) = <1. (2.142)
4
On the other hand, suppose it was possible to describe Alice’s electron n1 -spin, n2 -spin,
and n3 -spin, with three random variables X1 , X2 , and X3 (with some underlying probability

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

2.6. HIDDEN VARIABLE MODELS 79

distribution over the three variables). Each of the three random variables can take the values
± 12 determining if the spin is pointing in the positive or negative direction. Then, irrespective
of the underlying probability distribution, the probabilities Pr(Xj = Xk ) (with j ̸= k and
j, k ∈ {1, 2, 3}) must satisfy
Pr(X1 = X2 ) + Pr(X1 = X3 ) + Pr(X2 = X3 ) ⩾ 1 . (2.143)
This problem is analogous to the problem of flipping 3 coins and asking what is the probability
that at least two of them are the same (either two heads or two tails). Clearly, flipping three
coins will always yield two that show the same symbol (either head or tail). Eq. (2.142)
shows that this is not the case for quantum coins (i.e. spins of an electron).

Figure 2.7: Three directions with an angle of 120◦ between any two.

So far we have seen a contradiction between quantum mechanics and local realism through
the violation of the CHSH inequality (2.134), and the inequality in (2.142). The next two
paradoxes show that this inconsistency between quantum mechanics and local realism can
be expressed without inequalities. In the literature they are referred to as “Bell non-locality
without inequalities”.

2.6.4 Hardy’s Paradox

Using the same notations as in Exercise (2.6.3), we consider as before two electrons, one
on Alice’s side and the other on Bob’s side. We denote two directions in Alice’s side by
n0 and n1 , and the two directions in Bob’s side by m0 and m1 . Further, we denote by
p(ab|xy) = p(ab|nx my ) (with x, y ∈ {0, 1}) the probability that two SG experiments one on
Alice’s side and the other on Bob’s side, yield an outcome a for Alice’s spin-measurement in
the direction nx , and b for Bob’s spin-measurement in the direction my .
Recall that the probability distribution p(ab|xy) is said to be local if there exists condi-
tional probabilities pA B
λ (a|x) and pλ (b|y) such that
Z
p(ab|xy) = dλ qλ pA B
λ (a|x) pλ (b|y). (2.144)

Suppose now that the probability distribution p(ab|xy) satisfies

p(00|00) = p(01|10) = p(10|01) = 0 . (2.145)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

80 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS

Exercise 2.6.7. Show that if p(ab|xy) is local and satisfies (2.145) then p(00|11) = 0 .
We now show that the logical implication of the exercise above does not hold for quantum
mechanics. Unlike the use of the singlet in the previous examples, here we consider a bipartite
state |ψθAB ⟩ that has the form:
tan(θ) 1
|ψθAB ⟩ = p 2
(|01⟩ + |10⟩) − p |11⟩ , (2.146)
1 + 2 tan (θ) 1 + 2 tan2 (θ)
with θ ∈ [0, 2π] being some angle. Note that the state above is normalized for all θ.
Suppose that Alice and Bob perform the same measurements, and in particular n0 =
m0 = z corresponds to a measurement in the computational basis, while n1 = m1 cor-
responds to a measurement in the orthonormal basis |u0 ⟩ := cos(θ)|0⟩ + sin(θ)|1⟩ and
|u1 ⟩ := sin(θ)|0⟩ − cos(θ)|1⟩.
Exercise 2.6.8. Verify that the above choices satisfy:
pψ (00|00) = |⟨ψ AB |0⟩|0⟩|2 = 0
pψ (01|10) = |⟨ψ AB |u0 ⟩|1⟩|2 = 0 (2.147)
pψ (10|01) = |⟨ψ AB |1⟩|u0 ⟩|2 = 0
while
sin4 (θ)
pHardy (θ) := pψ (00|11) = |⟨ψ AB |u0 ⟩|u0 ⟩|2 = . (2.148)
1 + 2 tan2 (θ)
We therefore see that for this example pHardy (θ) > 0 for all 0 < θ < π2 . Interestingly,
pHard > 0 for all non-product
√ states in {|ψθAB ⟩}θ except for the maximally entangled state
AB
|ψθ=π/2 ⟩ = (|01⟩ + |10⟩)/ 2 for which pHard (π/2) = 0. The maximum value of the function
pHard (θ) can easily be computed to give
1 √
max pHardy (θ) = (5 5 − 11) ≈ 0.09 (2.149)
θ∈[0,2π] 2

2.6.5 The GHZ Paradox

The Hardy paradox shows that the inconsistency of local realism with quantum mechanics
can be demonstrated without inequalities as the CHSH inequality, but it is still probabilistic;
i.e. provide constraints on pψ (ab|xy). Our final example of this inconsistency is due to
Greenberger, Horne, and Zeilinger. Perhaps this is the example for which the contradiction
between local realism and quantum mechanics is the sharpest.
Consider three electrons shared between Alice’s, Bob’s, and Charlie’s labs, and prepared
in the state
1
|GHZ⟩ := √ (|000⟩ + |111⟩) (2.150)
2
The state |GHZ⟩ in written above in the zzz basis; that is, |0⟩ := | ↑z ⟩ and |1⟩ := | ↓z ⟩ are the
eigenvectors of Sz . We can also rewrite this vector in many other bases such as the yyx-basis
or the xxx-basis.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

2.6. HIDDEN VARIABLE MODELS 81

Exercise 2.6.9. Show that the GHZ state as defined in (2.150) can be expressed in the
yyx-basis as
1
| ↑A B A B C A B A B C

|GHZ⟩ = ↑
y y ⟩ + | ↓ ↓
y y ⟩ ⊗ | ↓x ⟩ + | ↑ ↓
y y ⟩ + | ↓ ↑
y y ⟩ ⊗ | ↑x ⟩ (2.151)
2
and in the xxx-basis as
1
| ↑A B A B C A B A B C

|GHZ⟩ = x ↑x ⟩ + | ↓x ↓x ⟩ ⊗ | ↑x ⟩ + | ↑x ↓x ⟩ + | ↓x ↑x ⟩ ⊗ | ↓x ⟩ . (2.152)
2
Denote by Ax (and similarly Ay ) the random variables that take the value +1 if the spin
of the first electron in the x-direction is positive, and take the value −1 if it is in the negative
x-direction. The random variables Bx , By , Cx , and Cy , are defined similarly.
Now, according to (2.152), if Alice, Bob, and Charlie perform the xxx-measurement, the
results of their measurements, given by Ax , Bx , and Cx , must satisfy

Ax Bx Cx = 1 . (2.153)

On the other-hand, if they chose to do the yyx-measurement, according to (2.151) they

would get
Ay By Cx = −1 . (2.154)
Moreover, since the GHZ state (2.150) is invariant under any permutation of the three
subsystems, we conclude also that both

Ay Bx Cy = −1 and Ax By Cy = −1 . (2.155)

With all this at hand, we get the following contradiction:

−1 = (−1)(−1)(−1) = (Ay By Cx )(Ay Bx Cy )(Ax By Cy )

= (Ax Bx Cx )A2y By2 Cy2 (2.156)
= Ax Bx Cx = 1 ,

where we used A2x = By2 = Cy2 = 1 since these variable can only take the two values ±1.
To summarize, according to quantum mechanics, an xxx-measurement can only yield one of
the four possible outcomes:

| ↑A B C A B C A B C A B C
x ↑x ↑x ⟩ , | ↓x ↓x ↑x ⟩ , | ↑x ↓x ↓x ⟩ , | ↓x ↑x ↓x ⟩ . (2.157)

On the other hand, local realism predicts that an xxx-measurement yields the four possible
outcomes:
| ↑A B C A B C A B C A B C
x ↑x ↓x ⟩ , | ↓x ↓x ↓x ⟩ , | ↑x ↓x ↑x ⟩ , | ↓x ↑x ↑x ⟩ , (2.158)
in maximal contrast with quantum mechanics. One may argue that we used quantum me-
chanics to express the GHZ state in the form (2.151), but this does not affect the conclusion
that local realism cannot co-exist with the quantum mechanical formalism.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

82 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS

2.6.6 The CHSH Game

Consider the following game known as the CHSH game played by two players, Alice and Bob,
along with a referee. The referee chooses at random (sampled from a uniform distribution)
two bits x and y, and sends x to Alice and y to Bob. After receiving the bits from the
referee, Alice sends back to the referee the number a, and Bob sends back the number b (see
Fig. 8). The rule of the game is that Alice and Bob win the game if a ⊕ b = xy, where ⊕ is

Figure 2.8: The CHSH game.

addition modulus 2. The following table summarizes the desired value for a ⊕ b for each of
the values of x and y:

x y a⊕b
0 0 0
0 1 0
1 0 0
1 1 1

Clearly, from the table above it is obvious that if Alice and Bob always choose a = b = 0
(no matter what the values of x and y) then they will win the game 3/4 of the times. Can
they do better?

Exercise 2.6.10. Show that Alice and Bob cannot win more than 3/4 of the times even if
they use some randomness (i.e. they share some correlated random variable).

Suppose now that Alice and Bob share quantum correlations; in particular, suppose they
each posses an electron in their lab, and that the two electrons are prepared in the some
bipartite state |ψ AB ⟩. With this state at hand, they use the following strategy. Based on the
bits x and y that they receive from the referee, they choose to perform spin-measurements
in the direction nx for Alice, and in the direction my for Bob. They then send to the referee
the outcomes of their corresponding measurements. The probability that Alice and Bob win
this CHSH game is given by
1 X
pwin := pψ (ab|xy) δxy,a⊕b , (2.159)
4 x,y,a,b

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

2.6. HIDDEN VARIABLE MODELS 83

where the factor 1/4 represent the (uniform) probability that the referee sends x to Alice
and y to Bob. From the following exercise it follows that for appropriate choices of nx , my ,
and ψ AB , Alice and Bob can win the game with a probability greater than 3/4.

Exercise 2.6.11. Consider the quantum strategy described above, and denote by plose =
1 − pwin the probability that Alice and Bob lose the game. Recall the Bell operator B as
defined in Exercise 2.4.5 with n := n0 , n′ := n1 , m := m0 , and m′ := m1 .

1. Show that
⟨ψ|B|ψ⟩ = pwin − plose . (2.160)

2. Use Part 1 together with the Tsirelson bound to show that there exists a quantum
strategy (i.e. directions nx , my and a quantum state |ψ AB ⟩) such that

1 1 3
pwin = + √ > (2.161)
2 2 2 4

2.6.7 All Bell Inequalities

The CHSH inequality is one out of many similar inequalities known as Bell inequalities. Any
Bell inequality can be expressed in the following form
X
s · p := sabxy p(ab|xy) ⩽ c , (2.162)
a,b,x,y

where 0 < c ∈ R and p is the vector whose components are the conditional probabilities
p(ab|xy) and s is any real vector with the same dimension as p. Note that for any real vector
s one can take c in (2.162) to be
c = max s · p (2.163)
p∈L(n)

where L(n) ⊂ Rn is the set of all vectors p = {p(ab|xy)} whose components have the form
(cf. (2.131)) Z
p(ab|xy) = dλ qλ pA B
λ (a|x) pλ (b|y) . (2.164)

We can therefore identify any Bell inequality with a single real vector s (since the constant c
is determined from above). In the general case, x = 1, . . . , |X|, y = 1, . . . , |Y |, a = 1, . . . , |A|,
and b = 1, . . . , |B|, can take more than two values. We also denoted by n := |X| · |Y | · |A| · |B|
the dimension of the vectors p and s. This corresponds to higher dimensional systems, and in
the quantum case a corresponds to the outcome of a projective von-Neumann measurement
that is labeled by x on Alice’s subsystem, and similarly b corresponds to the outcome of a
projective measurement that is labeled by y on Bob’s subsystem. Note that the definition
of local distribution as in (2.164) remains unchanged in higher dimensions. Therefore, there
are many Bell inequalities, and in recent years much effort has been made to characterize
and understand the structure of all them.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

84 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS

The Bell inequalities that we consider here are those that can be used to test if a given
distribution vector p is local (i.e. has the form (2.164)). If a given distribution vector p
violates a Bell inequality s (i.e. a Bell inequality of the form (2.162)) then we learn from it
that p is non-local. However, if a probability distribution does not violate a particular Bell
inequality, s, this alone does not mean that the distribution is local.
Given a probability vector p, how can we decide if it is local (i.e. can be written in
the form (2.164))? To answer this question, we first discuss the convexity property of local
distributions.

Exercise 2.6.12. Denote by P(n) ⊂ Rn the space of all real vectors in dimension n =
|ABXY | whose components are given in terms of conditional probabilities {p(ab|xy)}, and
let L(n) ⊂ P(n) be the set of all local vectors as in (2.164). Show that L(n) is a convex set.

Exercise 2.6.13. Show that P(n) is a polytope in Rn . Hint: Recall the definition of a
polytope in Sec. A.2.

Consider now a vector p ∈ P(n), and define the set {p} consisting of exactly one vector.
As such, it is (trivially) a convex set in Rn . Suppose now that p ̸∈ L(n). This means that
{p} ∩ L(n) = ∅, or in other words, {p} and L(n) are two disjoint convex sets. Therefore,
from the hyperplane separation theorem (see Theorem A.2) it follows that there exists a
vector s ∈ Rn and a real number r such that

s·q⩽r <s·p ∀ q ∈ L(n) . (2.165)

The above equation can be interpreted as follows. If p ̸∈ L(n) then there exists a Bell
inequality s that it violates. We summarize it in the following theorem.

Theorem 2.6.1. Let L(n) ⊂ P(n) be the set of all local probability vectors as
in (2.164) with fixed cardinalities |X|, |Y |, |A|, |B|, and n = |ABXY |. Then,
p ̸∈ L(n) if and only if it violates at least one Bell inequality.

The theorem above implies that in order to determine if a probability distribution is

local, one has to check that it doesn’t violate all Bell inequalities as in (2.162). This, at
first, may seem as an impossible task as one will have to check infinite number of inequalities
corresponding to each vector s ∈ Rn . However, as we know from Sec. A (convex analysis)
this is not necessary. First, note that there are many cases that are redundant (e.g, if we
checked the Bell inequality for s there is no need to check it for 2s). More importantly, it
follows that L(n) in addition to being convex, is in fact a polytope.

Exercise 2.6.14. Show that L(n) is a polytope in Rn (i.e. a convex hull of a finite number
of points). Hint: Show first that the set of vectors pA , whose components are any conditional
probabilities {p(a|x)}, is itself a polytope (i.e. find its extreme points and show that there are
a finite number of them).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

2.7. UNITARY EVOLUTION AND THE SCHRÖDINGER EQUATION 85

From Theorem A.6.3, the polytope L(n) can be represented as an intersection of finitely
many half-spaces. Denoting by s(j) (with j = 1, . . . , m) the normal vectors to these half
spaces, we therefore conclude that p ∈ L(n) if and only if

s(j) · p ⩽ cj ∀j = 1, . . . , m (2.166)

where cj := maxq∈L(n) s(j) · q. In other words, there exists finitely many Bell inequalities that
can determine if a vector p ∈ L(n).
This analysis may give the impression that deciding if p is local is easy. Therefore, it is
important to note first that the computation of s(j) may be hard, and that the number m
may grow exponentially with the cardinalities |X|,|Y |, |A|, and |B|. In particular, already
for the case that |A| = |B| = 2 with arbitrary large |X| = |Y |, it was shown that the
decision problem of whether p is local is NP-complete [11]. On the other hand, the simplest
case in which |A| = |B| = |X| = |Y | = 2 was fully characterized in [81], and independently
by [78], and, in particular, it was shown that the only non-trivial Bell inequality is the CHSH
inequality. That is, for bits x, y, a, b ∈ {0, 1}, the 16-dimensional vector p = (p(ab|xy)) is
local if and only if it does not violate any of the CHSH inequalities.

2.7 Unitary Evolution and the Schrödinger Equation

The last postulate of quantum mechanics is about time evolution of physical systems. It
states that the state of a closed physical system evolves unitarily in time. That is, if |ψ(t)⟩ is
the state describing the system at time t, then there exists a parameterized family of unitary
matrices U (t) with parameter t ∈ R such that

|ψ(t)⟩ = U (t)|ψ(0)⟩ . (2.167)

We emphasize here that U (t) (with t > 0) does not depend on the initial state (i.e. on the
preparation of the system at time t = 0). It is important also to note that the formalism
of quantum mechanics does not propose which unitary family U (t) one should choose to
describe a particular evolution of a quantum system. It just states the evolution (whatever
the specific causes for it) is described with a unitary matrix. We saw earlier something
similar about quantum states of the spin of an electron. The first postulate of quantum
mechanics did not tell us which states to assign to a specific system. It only stated that
all the information about the system is encoded in a quantum state. We then used, as in
the example of the spin of an electron, further symmetry properties, to assign the physical
interpretation of any qubit state like |0⟩, |1⟩, |+⟩ | − i⟩.
One can view the unitary evolution postulate as a principle of distinguishability preserv-
ing. Recall from Exercise 2.5.4 that if two quantum states are orthogonal then they can
be perfectly distinguished by a suitable projective measurement. The principle of distin-
guishability preserving asserts that if a closed system is prepared in one out of two or more
distinguishable states, then the ability to distinguish between them remains intact through-
out the evolution, unless some type of external noise is pumped into the system. Therefore,

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

86 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS

one can view a unitary evolution as a distinguishability preserving map. Alternatively, since
information quantifies the ability to distinguish between one thing from another, the unitary
evolution postulate of quantum mechanics, loosely speaking, is the statement that closed
systems don’t loose information (i.e. the ability to distinguish) if they don’t interact with
the external world.
We now discuss the form of the parametrized family of the unitaries U (t) given in (2.167).
We will assume here that the function t 7→ U (t) is continuous, and even differentiable.
Moreover, U (0) = I is the identity matrix so that we can express for a small t = ε > 0

U (ε) = I − iHε + O(ε2 ) (2.168)

where H is some Hermitian matrix. Note that H must be Hermitian since otherwise U (ε)
will not be a unitary matrix; i.e.

U ∗ (ε)U (ε) = (I + iHε)(I − iHε) + O(ε2 ) = I + O(ε2 ) (2.169)

where we assumed that H is Hermitian. Therefore, taking the derivative on both sides
of (2.167) and setting t = 0 gives:
d
|ψ(t)⟩ = −iH|ψ(0)⟩ . (2.170)
dt t=0

Now, since the system is isolated, the state |ψ(t)⟩ must evolve according to the same rule as
the state |ψ(0)⟩. Hence, this homogeneity assumption implies that for all t > 0
d
|ψ(t)⟩ = −iH|ψ(t)⟩ . (2.171)
dt
Exercise 2.7.1. Show that from the equation above it follows that:

U (t) = e−iHt . (2.172)

As we discussed before, in quantum mechanics, any Hermitian operator corresponds to

an observable. The observable H above is known as the Hamiltonian of the system and it
corresponds to the energy of the system. There are many books in physics from which you
can learn how to construct the Hamiltonian H for specific physical systems, but generally
speaking, quantum mechanics itself does not provide the prescription on how to construct
the Hamiltonian of a specific physical system. Hamiltonians are also constructed in classical
physics.
The Hamiltonian has the units of energy. Therefore, when incorporating the physical
dimensions, Eq. (2.171) takes the form of the celebrated Schrödinger equation,

The Schrödinger Equation

d
iℏ |ψ(t)⟩ = H|ψ(t)⟩ , (2.173)
dt

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

2.7. UNITARY EVOLUTION AND THE SCHRÖDINGER EQUATION 87

where the Plank’s constant ℏ = h/2π has the units of energy×time so that both sides of the
equation have the same P dimensions. Since the Hamiltonian H is an Hermitian matrix it can
be diagonalized as H = x Ex |φx ⟩⟨φx |, where {Ex } are the energy levels of the system, and
{|φx ⟩} are the corresponding eigenstates. The eigenstate |φx ⟩ that corresponds to the lowest
energy level is called the ground state of the system.
Finally, we assumed above that the system is closed, i.e. does not interact with the envi-
ronment in any way. This led us to assume a continuous uniform evolution. However, many
physical systems are not closed, and even us, the experimenters, can change the Hamiltonian
by changing parameters in the lab at different times. We leave this discussion to the next
chapter that covers evolution of open systems.

2.7.1 The Measurement Problem

Quantum mechanics allows for two types of evolutions for isolated systems. One is a prob-
abilistic evolution, in which quantum measurements such as the projective von-Neumann
measurement, transform a state |ψ⟩ ∈ A to another post-measurement state |ψx ⟩ with some
probability px . The other is a deterministic evolution, in which a quantum state |ψ⟩ evolves
unitarily and deterministically to another state U |ψ⟩, where the unitary matrix U is deter-
mined from the Hamiltonian of the system. It is therefore natural to ask if both evolutions
can co-exist or if they lead to inconsistencies within quantum theory.
We already learned that a physical system is not really isolated if it is being measured,
since the measurement apparatus can be viewed as external system that interacts with the
system. In fact, we saw that the measurement can change the state of the system. Therefore,
at first glance it seems that there is no contradiction between quantum measurements and
the assertion that closed systems evolve unitarily. However, we can consider both the system
and its measuring device as a single composite system.
Any measuring device (including the device itself and us, the experimenters) consists of
numerous atoms and molecules. This means that practically it is impossible to write down
its Hamiltonian as one will need to include the contributions from all the 1023 (even more)
particles constituting the device. Yet, according to the rules of quantum mechanics, there
exists a Hamiltonian, H AE , associated with the measuring device (environment system E)
+ the quantum system (system A) that is being measured. Then, according to Schrödinger
equation, since the system + environment form a closed composite system, they must evolve
AE
unitarily according to the joint unitary matrix U AE = e−iH t . On the other hand, ac-
cording to Born’s rule they must evolve probabilistically. Which evolution will occur, the
deterministic one or the probabilistic one?
Let’s consider the SG experiment for measuring the spin in the z-direction of a single
electron. In this case, the quantum system A is described with a unit vector |ψ A ⟩ = a|0⟩+b|1⟩
in A ∼ = C2 , where |a|2 + |b|2 = 1. Denote by E the Hilbert space associated with the
measuring device (plus the experimenters and the rest of the universe for that matter). If
the measurement apparatus is treated externally, according to Born’s rule one obtains the
outcome |0⟩ with probability |a|2 and the outcome |1⟩ with probability |b|2 .
If the measurement apparatus is treated internally, then we can assume that prior to

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

88 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS

the experiment, the measuring device was given in some “ready” state. That is, according
to the first postulate of quantum mechanics there exists a vector |ready⟩ ∈ E containing
all the information about the measuring device prior to the measurement. Now, suppose
first that the state of the system was |0⟩A . Then, the initial state of the system+device is
|0⟩A |ready⟩E . After the measurement, the joint system evolves to

|0⟩A |ready⟩E → U AE |0⟩A |ready⟩E = |0⟩A |output “0”⟩E (2.174)

where the equality follows from the fact that the measurement is performed in the z-direction,
so the system state |0⟩A must remain intact, while the vector |ready⟩E of the measuring device
is transformed to another vector |output “0”⟩E in E, containing the information that the
output was 0. Similarly, if the initial state of the system was |1⟩A , then the initial state
of the system+device is |1⟩A |ready⟩E , and after the measurement, the joint system would
evolve to
|1⟩A |ready⟩E → U AE |1⟩A |ready⟩E = |1⟩A |output “1”⟩E . (2.175)
Now, lets consider the case in which the initial state of the system is |ψ⟩A = a|0⟩ + b|1⟩.
In this case, as before, the initial state of the system+device is |ψ⟩A |ready⟩E . However, after
the measurement, the system evolves unitarily to the state

|ψ⟩A |ready⟩E → U AE |ψ⟩A |ready⟩E = U AE (a|0⟩A + b|1⟩A )|ready⟩E

= aU AE |0⟩A |ready⟩E + bU AE |1⟩A |ready⟩E

= a|0⟩A |output “0”⟩E + b|1⟩A |output “1”⟩E ,

(2.176)
where in the last equality we used (2.174) and (2.175). The above state is an entangled state
between the system and the measuring device representing quantum correlations between
the two.
We therefore see a sharp contrast between the two types of evolution of a quantum
state. Although this problem haunt quantum mechanics right from its early formulation at
the beginning of the 20th century, there is a controversy on how to resolve this problem.
This is also related to the different interpretations of quantum mechanics. For example, the
Everett “many worlds” interpretation adopts the unitary evolution whereas others adopt the
probabilistic nature of it. This is a fascinating topic, but it goes far beyond the scope of this
book.

2.7.2 The No-Cloning Theorem

One of the very useful properties of classical information is that it can be cloned; that is, it
can be copied and for example broadcast to several other parties (see Fig. 2.9a). We now
explore if quantum information also has this property. Quantum information is encoded in
quantum states, so cloning of quantum information corresponds to copying of an unknown
quantum state |ψ⟩ ∈ A (see Fig. 2.9b).
Suppose there exists a quantum machine that takes an unknown state |ψ⟩ ∈ A and output
two copies of it |ψ⟩|ψ⟩. Since it maps any normalized vector to a normalized vector, it can

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

2.7. UNITARY EVOLUTION AND THE SCHRÖDINGER EQUATION 89

Figure 2.9: Classical and quantum cloning machines.

be modelled by an isometry V : A → A ⊗ A with the property that V ∗ V = I A . Consider

now two arbitrary states |ψ⟩, |ϕ⟩ ∈ A. Then, from our assumption

V |ψ⟩ = |ψ⟩|ψ⟩ and V |ϕ⟩ = |ϕ⟩|ϕ⟩ . (2.177)

Therefore, 2
⟨ψ|ϕ⟩ = ⟨ψ|V ∗ V |ϕ⟩ = ⟨ψ|ϕ⟩ (2.178)
But any complex number that satisfies c = c2 must be equal to 0 or 1. Hence, ⟨ψ|ϕ⟩ ∈ {0, 1}
which means that either |ψ⟩ = |ϕ⟩ or that |ψ⟩ is orthogonal to |ϕ⟩. Therefore, there is no
quantum machine that is capable of generating two copies of an arbitrary unknown quantum
state. This result is known by the term the no-cloning theorem.

2.7.3 The Controlled Unitary Operation

The control unitary operation is a particular quantum evolution that is a key building block
in quantum circuits. It is used quite often in quantum computing, and in particular, the
controlled not (CNOT) gate, which is a key component in the construction of a quantum
computer, is a special type of a controlled unitary map. In its most general form, a unitary
map U : AB → AB (recall that AB = A ⊗ B) is a controlled unitary map (or gate) if U AB
can be written as X
U AB = |x⟩⟨x|A ⊗ UxB (2.179)
x∈[m]

where {|x⟩A }x∈[m] is some orthonormal basis

of A, and {UxB} is a collection of |A| unitary
AB
matrices in B. Note that U |x⟩ |ψ ⟩ = |x⟩A ⊗ UxB |ψ B ⟩ , so that by choosing the input
A B

|x⟩A , Alice controls the unitary that is acted on |ψ⟩B .

Exercise 2.7.2. Verify that U AB in the equation above is indeed a unitary matrix.

Figure 2.10: (a) Controlled unitary gate. (b) Controlled NOT (CNOT) gate.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

90 CHAPTER 2. ELEMENTS OF QUANTUM MECHANICS I: CLOSED SYSTEMS

In quantum circuits, the controlled unitary is depicted as in Fig. 2.10a. The CNOT gate
is the controlled unitary map

U AB = |0⟩⟨0| ⊗ I + |1⟩⟨1| ⊗ σ1 (2.180)

where σ1 is the first Pauli (unitary) matrix. The CNOT gate is depicted in Fig. 2.10b.

Exercise 2.7.3.
√ Show that the CNOT gate can generate the maximally entangled state
(|00⟩ + |11⟩)/ 2 from a tensor product of two vectors (i.e. from a product state of the
form |ψ⟩A |ϕ⟩B ).

Figure 2.11: CNOT gate in Hadamard basis

Exercise 2.7.4. Show the equivalence of the two circuits in Fig. 2.11, where
 
1 1 1
H := √   (2.181)
2 1 −1

is the Hadamard unitary matrix.

2.8 Notes and References

Many books on quantum mechanics contains much of the material presented here. More
details on the Stern-Gerlach experiment can be found for example in traditional books on
quantum mechanics such as [161] and [195].
For topics on inner product spaces in finite dimensions that include many of the concepts
used in this book we refer to [25, 123]. The treatment of linear algebra with Dirac notations
can be found in many text books on quantum physics and quantum information including, for
example, [170, 232, 230]. Each of these books also provide a review on quantum mechanics.
The example of the three quantum coins was taken from [182], and the Hardy paradox can
be found in [111]. More details on Bell nonlocality and many related references can be found
in the review article [38]. Readers interested to learn more about the measurement problem
and the different interpretations of quantum mechanics may find the review article [198] as
useful starting point.
Much more details on the no-cloning theorem can be found in [197].

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

CHAPTER 3

Elements of Quantum Mechanics II:

Open Systems

Open physical systems are systems that have interactions with other external systems. These
external systems, which we will refer to as ‘the environment’, can either be correlated with
the system, and/or exchange information, energy, or matter, with it. Consequently, the
description and evolution of such systems can be very different than those we discussed for
isolated systems. Yet, there is no need to introduce new postulates in order to develop
thetheory of open quantum systems. Instead, we will see that all the postulates of quantum
mechanics on isolated systems are sufficient to determine the evolution, the measurements,
and the description of open systems.

3.1 Generalized Measurements

In the previous chapter we saw that isolated physical systems can undergo two types of
evolutions: the unitary (Schrödinger) evolution and the probabilistic measurement evolution.
The combination of these two evolutions yield another type of evolution. Explicitly, let
|ψ⟩ ∈ A be a pure state of an isolated physical system, U be a unitary operator, and
{Px }x∈[m] be a projective measurement. Applying the projective measurement after the
state |ψ⟩ evolved to the state U |ψ⟩ yields the state |ψx ⟩ := √1px Px U |ψ⟩ with probability
px := ⟨ψ|U ∗ Px U |ψ⟩. Denoting by Mx := Px U we get

1
|ψx ⟩ = √ Mx |ψ⟩ and px = ⟨ψ|Mx∗ Mx |ψ⟩ . (3.1)
px

Therefore, the combination of a unitary evolution followed by a projective measurement can

be modelled by a collection of complex matrices {Mx = Px U }x∈[m] . Note that Mx are not
projections, although they have a very special form given by Px U . If additional ancillary
systems are available (i.e. the system is not closed), then the combination of a unitary
evolution on both the system and ancilla, followed by a projective (or basis) measurement

91
92 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

yields an even more general type of evolution known by the name generalized measurement
as it generalizes the von-Neumann projective measurement.
In Fig. 3.1 we describe the following evolution of a quantum state |ψ⟩ ∈ A. In the first
step of the evolution, an ancillary system is introduced which is prepared in some state
|1⟩ ∈ R. Consequently, the state of the joint system is |1⟩R |ψ A ⟩. Next, a joint unitary
evolution, U RA is applied to the joint state |1⟩|ψ⟩ yielding the bipartite state U RA |1⟩|ψ⟩.
Finally, a basis measurement {|y⟩⟨y|R }y∈[n] is applied on the reference system R.

Figure 3.1: Realization of a generalized measurement

We discuss now how the output state |ψy ⟩ is related to the input state |ψ⟩, and what is
the probability to obtain an outcome y. Denote by m := |A| and n := |R|. As an operator
in the vector space L(R ⊗ A), the unitary matrix U RA can be express as
X
U RA = |y⟩⟨y ′ | ⊗ Λyy′ where Λyy′ ∈ L(A) . (3.2)
y,y ′ ∈[n]

Note that any operator in L(R ⊗ A) has the above form, but since U RA is unitary we have
the equivalence
X
U ∗ U = I RA ⇐⇒ Λ∗yz Λyx = δxz I A ∀ x, z ∈ [n] . (3.3)
y∈[n]

Now, observe that

|y⟩⟨y|R ⊗ I A U RA |1⟩R |ψ A ⟩ = |y⟩R Λy1 |ψ A ⟩ .

(3.4)

Therefore, denoting by Mx := Λx1 , we get

1
|ψy ⟩ = √ My |ψ⟩ with py := ⟨ψ|My∗ My |ψ⟩ . (3.5)
py

Moreover, from (3.3) it follows that y∈[n] My∗ My = I A , so that y∈[n] py = 1, where py is
P P

the probability to obtain an outcome y. Note that the post-measurement state |ψy ⟩ with
its associated probability py , has a very similar form to the form of |ψx ⟩ and px in (3.9).
However, unlike the form Px U of Mx in (3.9), the only condition on {My := Λy1 } is that
they can be extended to a family of matrices {Λyy′ } that satisfies (3.3). This will ensure that
U AB is unitary. We now show that any set of complex matrices {My }y∈[n] with the property
∗ A
P
y∈[n] My My = I can be completed to a full family of matrices {Λyy ′ } that satisfies (3.3).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.1. GENERALIZED MEASUREMENTS 93

To see this, observe that the matrix U RA can be expressed in the following block form
 
Λ Λ12 · · · Λ1n
 11 
 Λ21 Λ22 · · · Λ2n 
 
RA
U =  .. .. .. .. 
 (3.6)
 . . . . 
 
Λn1 Λn2 · · · Λnn

with the matrices {My := Λy1 }y∈[n] appearing in the first block column. Moreover, this first
column satisfies  
M
 1
h i 
 M2  X ∗
∗ ∗ ∗ 
M1 M2 · · · Mn  .  = My My = I A . (3.7)
.
 .  y∈[n]
 
Mn
Therefore, the first column block in (3.6) consists of m := |A| orthonormal vectors. Any
such set of m orthonormal vectors in Cmn can be completed to a full orthonormal basis
of Cmn (for example, by the Gram-Schmidt process). Therefore, it is always possible to
construct a unitary matrix U RA as above, from a set of matrices {Λy1 }y∈[n] that satisfy
∗ A
P
y∈[n] Λy1 Λy1 = I .

Generalized Measurement
Definition 3.1.1. A generalized measurement is a collection of m ∈ N complex
matrices {Mx }x∈[m] ⊂ L(A) with the property that
X
Mx∗ Mx = I A . (3.8)
x∈[m]

When a generalized measurement is applied to a physical system, it transforms the state

of the system, |ψ⟩, to the post-measurement state
1
|ψx ⟩ := √ Mx |ψ⟩ with probability px := ⟨ψ|Mx∗ Mx |ψ⟩ . (3.9)
px
We showed above that a generalized measurement can always be realized as in Fig. 3.1.
Further, both projective measurements, and the measurements described in (3.9) with Mx =
Px U , are special types of generalized measurements.
How do we know that the generalized measurement described above is indeed the most
general one? Can we construct another circuit like the one in Fig. 3.1 which would yield
perhaps a more general measurement? Note that the generalized measurement described in
Fig. 3.1 make use of the two types of evolutions in quantum mechanics: a unitary evolution
followed by a projective measurement. Since these are the only two types of evolution in

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

94 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

quantum mechanics, any evolution can be decomposed into a sequence of these two types
of processes. Since both unitary evolution and projective measurements are themselves
generalized measurement, we conclude that the most general measurement on a quantum
system can be described as a sequence of generalized measurements. In the following exercise
it is argued that any such sequence of generalized measurements can be simulated by a single
generalized measurement. Hence, the generalized measurement described above is indeed
general enough to describe the most general measurement in quantum mechanics.

Exercise 3.1.1. Show that if {Mx }x∈[m] and {Ny }y∈[n] are two generalized measurements
then {Mx Ny } is also a generalized measurement. Use this to show that a sequence of gener-
alized measurements can be simulated by a single generalized measurement.

Exercise 3.1.2. Show that the matrices (operators) Mx do not have to be square. That
is, show that any collection of m operators {Mx }x∈[m] ⊂ L(A, B) that satisfy (3.8) can also
be realized as a generalized measurement as depicted in Fig. 3.1. Hint: Consider a unitary
operator U : RA → R′ B where the reference systems R and R′ are such that |RA| = |R′ B|.

Exercise 3.1.3. Let A = C2 , and let M0 = a|+⟩⟨0| and M1 = b|0⟩⟨+| be two operators
in L(A) with a, b ∈ C. Find the precise conditions on a and b for the existence of a third
operator M2 ∈ L(A) such that {M0 , M1 , M2 } form a generalized measurement.

Exercise 3.1.4. Consider d (rank-one) operators {Mx = |ψx ⟩⟨ϕx |}x∈[d] in L(Cd ), where
{|ψx ⟩}x∈[d] and {|ϕx ⟩}x∈[d] are some normalized states in Cd . Show that {Mx }x∈[d] is a gen-
eralized measurement if and only if {|ϕx ⟩}x∈[d] is an orthonormal basis of Cd .

3.2 The Mixed Quantum State

The first postulate of quantum mechanics states that the information about closed physical
systems is encoded in pure quantum states. Here we show how this postulate implies that
for open quantum systems, mixed quantum states encode all the information that can be
extracted from the system. We derive this conclusion in two different ways, one by consid-
ering a system that is correlated classically to another ancillary system, and the other by
considering quantum correlations with the ancillary system.

3.2.1 The Emergence of Density Operators from Classical Corre-

lations
So far we considered isolated systems that we described with a pure state |ψ⟩ ∈ A, or
more precisely, with the rank one matrix |ψ⟩⟨ψ| ∈ D(A). Suppose now, that in addition to
thequantum system, Alice also has access to classical systems, like coins or dice, that can
generate random numbers. In this case, Alice can roll a dice with m possible outcomes, and
based on the outcome, prepare one of the m states {|ψx ⟩⟨ψx |}x∈[m] ⊂ D(A). This way, Alice
can prepare the state |ψx ⟩ with some probability px (the classical systems, i.e. the coins or

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.2. THE MIXED QUANTUM STATE 95

dice, do not have to be unbiased). Now, suppose that Alice forgot the value of x. Then, Alice
knows that her state is one out of the m states in the ensemble of states {|ψx ⟩⟨ψx |, px }x∈[m] .
How should we characterize the ensemble {|ψx ⟩⟨ψx |, px }x∈[m] ? We will see that there exists
many other ensemble of states that contains the exact same information as the ensemble
{|ψx ⟩⟨ψx |, px }x∈[m] . Therefore, instead of characterizing the information with a particular
ensemble (such as {|ψx ⟩⟨ψx |, px }x∈[m] ), we will characterize it with a mathematical object
that remains invariant under exchanges of such equivalent ensembles.
To gather information about her system, Alice can execute a generalized measurement,
denoted as {My }y∈[n] , on her system, characterized by the ensemble {|ψx ⟩⟨ψx |, px }x∈[m] .
This measurement results in an outcome y with a corresponding probability denoted as
qy . Furthermore, following the occurrence of outcome y, there emerges a post-measurement
ensemble that describes the state of Alice’s system. We will now delve into these details
to demonstrate that the dependencies of these quantities rely solely on a density matrix
associated with the ensemble {|ψx ⟩⟨ψx |, px }x∈[m] .
If the pre-measurement state was |ψx ⟩ then the post-measurement state after outcome y
occurred is
1
|ϕxy ⟩ := √ My |ψx ⟩ (3.10)
py|x

with probability
py|x := ⟨ψx |My∗ My |ψx ⟩ = Tr My∗ My |ψx ⟩⟨ψx | .

(3.11)

However, since Alice does not know the value of x, if she performs a measurement {My }y∈[n]
on her system, she will get the outcome y with probability

X X
px Tr My∗ My |ψx ⟩⟨ψx | = Tr My∗ My ρ

qy := py|x px = (3.12)
x∈[m] x∈[m]

where
X
ρ := px |ψx ⟩⟨ψx | , (3.13)
x∈[m]

is the density matrix associated with the ensemble {|ψx ⟩⟨ψx |, px }x∈[m] .
Note that py|x px is the probability that both the pre-measurement state is |ψx ⟩ and that
the outcome of its measurement is y. Therefore, using the Bayesian rule of probabilities, the
probability that the pre-measurement state is |ψx ⟩ given that the measurement outcome is y,
can be expressed as qx|y := py|x px /qy . Consequently, after outcome y occurred the ensemble
{px , |ψx ⟩⟨ψx |} changes to
{qx|y , |ϕxy ⟩⟨ϕxy |}x∈[m] . (3.14)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

96 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

Note that the density operator, σy , that is associated with the above ensemble is given by
X
σy := qx|y |ϕxy ⟩⟨ϕxy |
x∈[m]
1 X
qx|y := py|x px /qy −−−−→ = px py|x |ϕxy ⟩⟨ϕxy |
qy
x∈[m]
(3.15)
1 X
(3.10)→ = px My |ψx ⟩⟨ψx |My∗
qy
x∈[m]
1
= My ρMy∗ .
qy
To summarize, the outcome
∗
y, of any generalized measurement {My }y∈[n] , occurs with
probability qy = Tr My My ρ , when applied to an ensemble {px , |ψx ⟩⟨ψx |}x∈[m] . Recall
from Exercise 2.3.15 that aside from the ensemble {px , |ψx ⟩⟨ψx |}x∈[m] , there are infinitely
many other ensembles that also correspond to the same density operator ρ. Therefore, the
dependance of qy only on ρ demonstrates that the statistics of any measurement outcome
depends only on the density operator, and not on the particular ensemble that realizes it. To
clarify, suppose {px , |ψx ⟩⟨ψx |}x∈[m] and {rz , |φz ⟩⟨φz |}z∈[k] are two ensembles that correspond
to the same density operator ρ. Then, the probability to obtain an outcome y is the same
for both ensembles, and therefore there is no way to distinguish between the two ensembles.
One may argue that maybe there is a way to distinguish between the post-measurement
ensembles, however, as can be seen in the above equation, any post-measurement ensemble
is also associated with a unique density operator q1y My ρMy∗ that depends only on ρ and not
on the particular ensemble {px , |ψx ⟩⟨ψx |}x∈[m] or {rz , |φz ⟩⟨φz |}z∈[k] that realizes ρ.
Given that the formalism of quantum mechanics lacks the means to differentiate between
ensembles of states corresponding to the same density operator, all the information about
the physical system accessible to us as observers is encapsulated within the density operator.
Consequently, instead of characterizing physical systems using ensembles of states, we shall
henceforth employ density operators for their descriptions.

The Rules of Quantum Measurements on Density Operators

To any physical system (open or closed) there is a Hilbert space A that is associated
with it. The information about a physical system is encoded in a density operator
ρ ∈ D(A). Information about the system can be extracted with an m-output
generalized measurement, {Mx }x∈[m] . The probability to obtain an outcome x is
given by
px = Tr [Mx∗ Mx ρ] , (3.16)
and the post-measurement state of the system, σx , after output x occurred is
1
σx = Mx ρMx∗ . (3.17)
px

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.2. THE MIXED QUANTUM STATE 97

As an example, consider a qubit state ρ ∈ D(C2 ). That is, ρ ⩾ 0 and Tr[ρ] = 1. Any
such qubit state can be expressed as a linear combination of the Pauli basis of Herm(C2 )

ρ = r0 σ0 + r1 σ1 + r2 σ2 + r3 σ3 , (3.18)

where σ0 := I2 . Now, since the Pauli matrices σ1 , σ2 , and σ3 , are traceless, the condition
Tr[ρ] = 1 gives r0 = 1/2. What are the conditions on r := (r1 , r2 , r3 )T ∈ R3 that ensure that
ρ ⩾ 0 ? Since ρ has two eigenvalues, say λ and 1 − λ, it follows that ρ ⩾ 0 if and only if
0 ⩽ λ ⩽ 1. This condition is equivalent to Tr[ρ2 ] = λ2 + (1 − λ)2 ⩽ 1. Therefore, ρ ⩾ 0 if
and only if
1 X 1
1 ⩾ Tr[ρ2 ] = + rj rk Tr[σj σk ] = + 2∥r∥22 . (3.19)
2 j,k
2

That is, ∥r∥2 ⩽ 1/2. Therefore, after the renaming r → 12 r we conclude that all qubit
quantum states has the form:
1
ρ = (I2 + r · σ) (3.20)
2
with ∥r∥2 ⩽ 1. Moreover, since Tr[ρ2 ] = 1 if and only if ∥r∥2 = 1 we get that ρ above is
pure if and only if ∥r∥2 = 1. Hence, a qubit can be represented by the Bloch Sphere (see
Fig. 2.5) with the pure states represented on the boundary of the sphere and mixed states
in the interior of the sphere. Note that the center of the sphere, i.e. r = 0, corresponds to
the state ρ = 21 I, which is called the maximally mixed state.

Exercise 3.2.1. Show that for r = (sin(α) cos(β), sin(α) sin(β), cos(α))T , ρ in (3.20) is given
by the state ρ = |ψ⟩⟨ψ| with |ψ⟩ as in (2.110).

Exercise 3.2.2. Consider a density operator for a qutrit; that is, ρ ∈ D(C3 ), ρ ⩾ 0,
and Tr[ρ] = 1. Let λ = (λ1 , λ2 , . . . , λ8 ) be a vector of matrices with {λj }j∈[8] being some
Hermitian traceless 3 × 3 matrices satisfying the condition Tr(λi λj ) = 2δij (note that also
the Pauli matrices satisfy this orthogonality condition).

1. Show that ρ can be written as:

1
ρ = I3 + t · λ , (3.21)
3
where t ∈ R8 and I3 is the 3 × 3 identity matrix.

2. Show that ∥t∥2 ⩽ √1 .

3. Show that if ρ is a pure state then ∥t∥2 = √1 .

4. Is it true that for every t with ∥t∥2 ⩽ √13 , ρ above corresponds to a density matrix? If
yes prove it, otherwise give a counter example.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

98 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

3.2.2 The Emergence of Density Operators from Quantum Cor-

relations
The density operator can be interpreted from a different perspective, distinct from the
ensemble-based interpretation discussed earlier. Let’s delve into this interpretation within
the context of a composite system comprising two particles distributed between two separate
entities, namely Alice and Bob. If the system is prepared in a pure state |ψ AB ⟩ ∈ AB, how
should Alice represent the marginal state that corresponds to the electron in her lab? We
assume here that the labs are very far from each other (perhaps on two different galaxies!)
and Alice and Bob cannot even communicate.
Without loss of generality, we will assume that |A| ⩽ |B| and write |ψ AB ⟩ as (see Exer-
cise 2.3.32)
√
|ψ AB ⟩ = ρ ⊗ V |ΩAÃ ⟩ , (3.22)
AB
where V : Ã → B is an isometry and ρ := TrB ψ is the reduced density matrix of ψ AB =
AB AB
|ψ ⟩⟨ψ |. Suppose now that Alice performs a generalized measurement, {Nx }x∈[m] ⊂
L(A), on her subsystem. Then, the probability that outcome x occurs is given by

px := ⟨ψ AB |Nx∗ Nx ⊗ I B |ψ AB ⟩
√ √
(3.22)→ = ΩAÃ ρNx∗ Nx ρ ⊗ V ∗ V ΩAÃ
(3.23)
AÃ √ √
V ∗ V = I Ã −−−−→ = Ω ρNx∗ Nx ρ ⊗ I Ã ΩAÃ
Part 1 of Exercise 2.3.26→ = Tr Nx∗ Nx ρA .

Thus, the outcome probability px depends only on the reduced density matrix ρA and not
(directly) on the bipartite state |ψ AB ⟩. Moreover, the post measurement state after outcome
x occurred is given by
1
|ψxAB ⟩ = √ Nx ⊗ I B |ψ AB ⟩
px
(3.24)
1 √
(3.22)→ = √ Nx ρ ⊗ V ΩAÃ .
px
Therefore, the reduced density matrix σxA of ψxAB = |ψxAB ⟩⟨ψxAB | is given by
1 √ h ∗ i √ ∗
σxA := TrB ψxAB = Nx ρ TrB I A ⊗ V ΩAÃ I A ⊗ V

ρNx , (3.25)
px
where we substitute the expression in (3.24) for ψxAB . Now, from the cyclic property of the
partial trace (see Exercise 2.3.29) we have that
h ∗ i h AÃ i h i
A ∗
AÃ A A AÃ
TrB I ⊗ V Ω I ⊗V = TrÃ I ⊗ V V Ω = TrÃ Ω = IA . (3.26)

Combining this with the previous equation we conclude that

1
σxA = Nx ρA Nx∗ . (3.27)
px

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.2. THE MIXED QUANTUM STATE 99

This demonstrates that the reduced density matrix ρA along with the measurement operators
{Nx }x∈[m] determine the post-measurement reduced density matrices in the exact same way
as we saw in the previous section. For the same reasons as before, we conclude that all
the information that can be extracted from Alice’s subsystem (via quantum generalized
measurements) is encoded in the marginal state ρA . Therefore, if Alice has no accesses to
Bob’s subsystem, then from her perspective, the state of her subsystem can be characterized
by the marginal density operator ρA , and the fact that her subsystem is entangled with Bob’s
can be ignored.

3.2.3 The Classical-Quantum State

Any ensemble of quantum states {px , |ψx ⟩⟨ψx |}x∈[m] in D(A) can be viewed from two distinct
perspectives, depending on the treatment of the variable x.This duality emerges when con-
sidering whether x remains unknown and unrecorded, or if it is explicitly stored in a classical
system.
When x remains both unknown and unrecorded, as discussed previously, P the complete
characterization of the system is encapsulated by the density operator ρA = x∈[m] px |ψx ⟩⟨ψx |.
Conversely, when x is recorded within the classical system X using the mapping x 7→ |x⟩⟨x|X ,
the description of the system adopts a classical-quantum state, abbreviated as a ‘cq-state’,
represented by:
X
ρXA := px |x⟩⟨x|X ⊗ |ψx ⟩⟨ψx |A . (3.28)
x∈[m]

To establish the equivalence between ρXA and the ensemble {px , |ψx ⟩⟨ψx |}{ x ∈ [m]},
we demonstrate that it is possible to transform ρXA into {px , |ψx ⟩⟨ψx |}x∈[m] and vice versa.
Firstly, consider performing a measurement in the |x⟩ basis on system X of a composite
system XA in the cq-state ρXA . This measurement yields the state |ψx ⟩ with a probability
of px . Consequently, this process reconstructs the ensemble {px , |ψx ⟩⟨ψx |}x∈[m] from ρXA .
Conversely, imagine that we have a state |ψx ⟩ randomly selected from the ensemble
px , |ψx ⟩⟨ψx |x∈[m] . If Alice possesses knowledge of which state was selected (i.e., she knows
the value of x), she can encode this information by introducing |x⟩⟨x|X , resulting in her
state transitioning to |x⟩⟨x|X ⊗ |ψx ⟩⟨ψx |A . When Alice opts to forget the specific value of x,
her quantum state becomes identical to ρXA . Furthermore,
XA P it is worth noting that when we
A
only have access to the marginal state ρ = TrX ρ = x∈[m] px |ψx ⟩⟨ψx |, it is generally
impossible to perfectly recover the value of x.
Cq-states play a pivotal role in quantum information science, particularly when describing
the outcomes of quantum measurements. Let’s consider a physical system characterized by
the density operator ρ ∈ D(A) and a generalized quantum measurement {Mx }x∈[m] . As
previously discussed, the application of this generalized measurement to the state ρA results
in the state σxA , as outlined in (3.17), with the associated probability px as defined in (3.16).
Because we know the outcome x, we have the option to record it within a classical system
denoted as X. In this context, we can perceive the measurement’s effect as a transformation,

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

100 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

mapping the state ρA to a cq-state, represented by σ XA , defined as follows:

X X
σ XA := px |x⟩⟨x|X ⊗ σxA = |x⟩⟨x|X ⊗ Mx ρA Mx∗ , (3.29)
x∈[m] x∈[m]

where we substitute px σxA = Mx ρA Mx∗ . In essence, by employing a cq-state, we can describe

the impact of a generalized measurement {Mx }x∈[m] on ρA as a “deterministic” process,
transforming ρA into σ XA . It’s important to note that we use the term “deterministic” to
describe the transformation ρA → σ XA , not to characterize the mapping ρA → σxA , which is
inherently indeterministic in nature.
If the information about the measurement outcome x is lost, or if Alice has no access to
the value of x, then the post-measurement state is given by
X
σ A = TrX σ XA = Mx ρA Mx∗ .

(3.30)
x∈[m]

We will see later on that in this case, the measurement acts as a quantum channel, converting
one density operator, ρA , to another, σ A (see Fig. 3.3c below).

3.2.4 Separable Density Operators

Density operators that are acting on a bipartite Hilbert space A ⊗ B can be divided into two
types: separable and entangled. A separable density matrix can be prepared in the following
way (see Fig. (3.2)). A referee samples a number x with a probability distribution px (e.g.
roll a possiblly biased dice, or flip a coin) and sends the number x to Alice and Bob who
are spatially separated. Based on this value, Alice prepares the state ρA x , and Bob prepares
B
the state τx . Then, if Alice and Bob forget the value of x, but still know the distribution px
from which x was sampled, the state of their composite system becomes
X
σ AB = px ρ A B
x ⊗ τx ∈ D(A ⊗ B) . (3.31)
x∈[m]

Note that the role of the referee above is to provide Alice and Bob with a shared random-
ness. Therefore, any separable state as in (3.31) can be prepared by local operations assisted
with shared randomness. Bipartite density matrices that do not have this form are called
entangled and we will discuss them in details in the following chapters on entanglement
theory.
Exercise 3.2.3. Show that the maximally entangled state ΦAB := |ϕAB ⟩⟨ΦAB | ∈ D(A ⊗ B)
is not separable.
Exercise 3.2.4. Show that if σ ∈ D(A ⊗ B) is separable then there exists an integer k ∈ N,
a probability distribution {qz }z∈[k] , a set of k pure states {ψz }z∈[k] in Alice’s Hilbert space
(i.e. each ψz ∈ Pure(A)), and a set of k pure states {ϕz }z∈[k] on Bob’s Hilbert space B, such
that X
σ AB = qz ψzA ⊗ ϕB
z . (3.32)
z∈[k]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.3. POSITIVE OPERATOR VALUED MEASURE (POVM) 101

Figure 3.2: Preparation of a separable state with shared randomness.

3.3 Positive Operator Valued Measure (POVM)

Every quantum measurement can be viewed as a box (see Fig. 3.3a) that takes as its input
a quantum state ρ, and outputs a classical variable x and a post-measurement state σx .
In the SG-experiment that we discussed earlier, the electrons get absorbed by the screen,
and all there is left after the measurements are the spots on the screen. Therefore, the SG-
experiment can be viewed as a special type of measurement in which the quantum output
is “traced out” (see Fig. 3.3b). Such quantum measurements with only classical output are
called positive operator valued measures, or in short POVM.

Figure 3.3: Three types of generalized quantum measurements

Recall that the Born’s rule (adapted to density matrices and generalized measurements)
states that the probability to obtain an outcome x, when a measurement {Mx }x∈[m] is per-
formed on a system described by a density operator ρ, is given by px = Tr [Mx∗ Mx ρ]. There-
fore, to describe a POVM we only need to consider the operators, {Λx := Mx∗ Mx }x∈[m] , since
we are only interested in the statistics of the measurement and not the post-measurement
state. The POVM operators Λx , are called effects, and have the following two properties:
X
Λx ⩾ 0 and Λx = I A . (3.33)
x∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

102 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

To every generalized measurement there exists a unique POVM that corresponds to it via
the relation Λx = Mx∗ Mx . However, for every POVM there are many quantum measurements
corresponding to it.

Exercise 3.3.1. [Polar Decomposition]

1. Show that for any n × n complex matrix A there exists an n × n unitary matrix U such
that
√
A = U |A| where |A| := A∗ A . (3.34)

2. Show that for A and U as above,

max Tr [AU ] = Tr[|A|] , (3.35)

where the maximum is over all unitary matrices U .

Exercise 3.3.2. Let {Λx }x∈[m] be a POVM in Pos(A). Show that a generalized measurement
{Mx }x∈[m] ⊂ L(A) corresponds to the POVM {Λx }x∈[m] if and only if there exists m unitary
matrices, {Ux }x∈[m] , in L(A) such that
p
Mx = Ux Λx . (3.36)

Hint: Use the polar decomposition of a complex matrix.

Exercise 3.3.3. Consider the following POVM in Pos(C2 )

Λ1 = a|1⟩⟨1| , Λ2 = b|−⟩⟨−| , Λ3 = I − Λ1 − Λ2 a, b ∈ R. (3.37)

1. Find all the possible values of a and b for which the set {Λ1 , Λ2 , Λ3 } is a POVM.

2. Which values of a and b that you found in part 1 correspond to a rank 1 POVM (i.e.
all the POVM elements have rank 1)?

Exercise 3.3.4. Suppose Alice and Bob share a composite quantum system in the state ρAB .
Alice performs a measurement on her system described by a POVM {Λx }x∈[m] , and record the
outcome x in a classical system X. Show that the post-measurement state can be expressed
as a cq-state of the form
X
σ XB = px |x⟩⟨x|X ⊗ σxB . (3.38)
x∈[m]

Express the probabilities px , and density matrices σxB , in terms of Λx and ρAB .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.3. POSITIVE OPERATOR VALUED MEASURE (POVM) 103

3.3.1 Informationally Complete POVMs and Quantum Tomogra-

phy
Consider a scenario where you have access to a machine that consistently produces an un-
known quantum state ρ ∈ D(A). Your objective is to learn the identity of this state ρ by
employing quantum measurements. Since we assume that this machine can be used repeat-
edly, generating an abundance of copies of ρ, it is sufficient to focus on POVMs. This is
due to the fact that post-measurement states cannot reveal more information about ρ than
ρ itself.
One effective strategy for learning ρ is to carry out basis measurements {|x⟩⟨x|}x∈[d]
(where d ≡ |A|). By conducting these measurements multiple times, you can approximate
the probabilities associated with each outcome x. Given that px = Tr[|x⟩⟨x|ρ] = ⟨x|ρ|x⟩,
this approach allows you to estimate the diagonal elements of ρ within the basis {|x⟩}x∈[d] .
Repeating this procedure using various bases enables you to ultimately learn the complete
structure of ρ by capturing its diagonal elements with respect to multiple bases. As we
illustrate now, opting for POVMs over projective measurements in specific bases enables the
construction of a single POVM that can be employed to fully identify the state ρ.

Definition 3.3.1. A POVM {Λx }x∈[m] in Herm(A) is said to be informationally

complete if
spanR {Λ1 , Λ2 , . . . , Λm } = Herm(A) . (3.39)

The span above is with respect to the real numbers since Herm(A) is a real vector space.
By definition, if a POVM {Λx }x∈[m] is informationally complete then m ⩾ d2 , where d := |A|.
Moreover, if m = d2 then the informationally complete POVM form a basis of Herm(A).
Clearly, the basis is not orthonormal since Λx ⩾ 0 for all x ∈ [d2 ]. A theorem from linear
algebra states that for any basis of a vectors space there exists a dual basis. That is, if
{Λ1 , Λ2 , . . . , Λd2 } is a basis of Herm(A) then there exists another basis {Γ1 , Γ2 , . . . , Γd2 } of
Herm(A), such that the Hilbert Schmidt inner products

Tr [Λx Γy ] = δxy ∀x, y ∈ [d2 ] . (3.40)

Exercise 3.3.5. Consider the following four elements of Pos(C2 )

Λ0 := |0⟩⟨0| , Λ1 := |1⟩⟨1| , Λ2 := |+⟩⟨+| , Λ3 := | + i⟩⟨+i| . (3.41)

1. Show that {Λ0 , Λ1 , Λ2 , Λ3 } is a basis of Herm(C2 ).

2. Find its dual basis {Γ0 , Γ1 , Γ2 , Γ3 }.

3. Set Λ := 3x=0 Λx and show that it is invertible, and that the operators {Λ̃0 , Λ̃1 , Λ̃2 , Λ̃3 },
P
with Λ̃x := Λ−1/2 Λx Λ−1/2 , form a rank 1 informationally complete POVM.

Exercise 3.3.6.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

104 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

1. Construct a rank 1 informationally complete POVM in Herm(A). Hint: Try to gener-

alize the qubit example in the previous exercise to any (finite) dimension.
2. Let Γ ∈ Herm(AB). Show that if
Tr ΓAB ψ A ⊗ ϕB = 0

∀ ψ ∈ Pure(A) ∀ ϕ ∈ Pure(B) , (3.42)
then ΓAB = 0.
Informationally complete POVM can be used to learn an unknown quantum state. Let
ρ ∈ D(A) be an unknown quantum state, and let {Λx }x∈[m] be an informationally complete
POVM in Herm(A). If m > d2 than {Λx }x∈[m] is an over-complete basis, also referred to,
in linear algebra, as a frame. Every frame, like a basis, has a unique dual frame, {Γy }y∈[m] ,
defined by the condition
X X
Tr[Γx M ]Λx = Tr[Λx M ]Γx ∀ M ∈ Herm(A) . (3.43)
x∈[m] x∈[m]

However, unlike a basis, in general a frame does not satisfy the relation (3.40) with its dual.
Consider now the case that m = d2 so that {Λx }x∈[m] is a basis of Herm(A). Since its
dual {Γy }y∈[m] also span Herm(A) it follows that the density matrix ρ can be written in
terms of the linear combination X
ρ= py Γy (3.44)
y∈[m]

with some real coefficients py . A key observation is that

X
Tr [Λx ρ] = py Tr [Λx Γy ]
y∈[m]
X (3.45)
(3.40)→ = py δxy = px .
y∈[m]

That is, the coefficients py in (3.44) are given by py = Tr[Λy ρ] ⩾ 0, and can be interpreted
as the probability to obtain an outcome y. The significance of (3.44) with py = Tr[Λy ρ] ⩾ 0,
is that by repeating the POVM {Λx }x∈[m] on many copies of ρ, one can estimate from the
measurement outcomes the values of the py s, and thereby learn ρ due to the relation (3.44).
Exercise 3.3.7. Let {Λx }x∈[m] be an informationally complete POVM in Herm(A), and let
its dual frame be {Γy }y∈[m] .
1. Show that at least one of the matrices in the dual frame is not positive semidefinite.
That is, there exists y ∈ [m] such that Γy ̸⩾ 0.
2. Show that Tr[Γy ] = 1 for all y ∈ [m].
Exercise 3.3.8. [Symmetric Informationally Complete (SIC) POVM]
Let d := |A|, m := d2 , and {Λx }x∈[m] be an informationally complete POVM in Herm(A),
with the following properties:

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.3. POSITIVE OPERATOR VALUED MEASURE (POVM) 105

1. Each Λx is rank one.

2. Tr [Λx ] = Tr [Λy ] for all x, y ∈ [m].
3. Tr [Λx Λy ] = Tr [Λx′ Λy′ ] for all x, x′ , y, y ′ ∈ [m] with x ̸= y and x′ ̸= y ′ .
Such a POVM is called symmetric informationally complete POVM or in short SIC-POVM.
1. Show that
1 dδxy + 1
Tr [Λx ] = and Tr [Λx Λy ] = ∀ x, y ∈ [m] . (3.46)
d d2 (d + 1)
2 2
Hint: Denote ax := Tr[ΛPx ] and b := Tr[Λx Λy ] for any x ̸= y. Then, ax = Tr[Λx ] (why?)
2 2
so that ax = Tr[Λx I] = y∈[m] Tr[Λx Λy ] = ax + (d − 1)b and conclude that ax = ay for
all x, y ∈ [m].
2. Find the dual frame {Γx }x∈[m] . Hint: Express each Γx as a linear combination of Λx
and the identity.
3. Show that any density operator ρ ∈ Herm(A) can be expressed as
X
ρ= d(d + 1)px − 1 Λx with px := Tr[Λx ρ] . (3.47)
x∈[m]

4. Show that
 √   √ 
1 3 3 + 1 −5 + i 3 3 + 1 1 − 5i
Λ1 = √  √  , Λ2 = 1√  √ 
12 3 −5 − i 3 3 − 1 12 3 1 + 5i 3 3 − 1
 √   √ 
1 3 3−5 1+i 1  3+1 1+i 
Λ3 = √  √  , Λ4 = √ √ . (3.48)
12 3 1−i 3 3+5 4 3 1−i 3−1
form a SIC POVM in Herm(C2 ). Moreover, show that the four pure state {2Λx }x∈[4]
are the vertices of a tetrahedron in the Bloch Sphere.

3.3.2 Gleason’s Theorem

One of the enigmas of quantum mechanics concerns with the emergence of probabilities
at a very fundamental level. One may wonder if the Born’s rule, which specifies how to
assign probabilities to different measurement outcomes, could have been different. Since
we are only interested in the statistics of the measurement we will consider here a POVM
{Λx }x∈[m] . According to Born’s rule, the probability to obtain an outcome x is given by the
formula px = Tr[Λx ρ]. Is this formula unique?
Specifically, consider the convex set of all effects
Eff(A) := Λ ∈ Pos(A) : Λ ⩽ I A .

(3.49)
A measure on the set Eff(A) is a function µ : Eff(A) → R with the following three properties:

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

106 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

1. 0 ⩽ µ(Λ) ⩽ 1 for all Λ ∈ Eff(A).

2. µ(I A ) = 1.
3. For any possibly incomplete POVM {Λx }x∈[m] with x∈[m] Λx ⩽ I A
P
X X
µ Λx = µ(Λx ) . (3.50)
x∈[m] x∈[m]

We now show that any function µ with these properties must have the form µ(Λ) = Tr[ρΛ]
for some fixed density operator ρ ∈ D(A). This means that there is no way to assign
probabilities to effects other than the Born’s rule. Originally, this remarkable result was
proved by Gleason for the case of projective measurements, i.e. the set Eff(A) was replaced
with the set of all projections on A, and consequently the requirement on µ was much weaker,
assuming only that (3.50) holds for orthogonal projections. Nonetheless, Gleason was able
to derive the probability formula for systems of dimension d ⩾ 3, and in the qubit case he
showed that there are counter examples. Gleason also considered in his proof the infinite
dimensional case, and derived similar results. Gleason’s proof for projective measurements
goes beyond the scope of this book, and we will follow here a much simpler proof of the
above generalization of Gleason theorem (see the section on Notes and References for more
details).

Theorem 3.3.1. Let A be a finite dimensional Hilbert space, and µ a measure on

the set Eff(A). Then, there exists ρ ∈ D(A) such that

µ(Λ) = Tr[ρΛ] ∀ Λ ∈ Eff(A) . (3.51)

The idea of the proof is as follows. First we will show that for any r ∈ [0, 1] µ(rΛ) =
rµ(Λ), and use it to show that µ can be extended to a linear functional on the space of
Hermitian operators Herm(A). Then, as a linear functional, it can be expressed as µ(Λ) =
Tr[Λρ], and we end by showing that ρ must be a density operator.
Proof. Note that for any effect Λ ∈ Eff(A) and any integer n, we have Λ = n1 Λ+· · · n1 Λ ⩽ I A ,
where the sum contains n terms. Therefore, from (3.50), µ(Λ) = nµ( n1 Λ). Multiplying this
equation by an integer m ⩽ n and dividing by n gives m 1 m

n
µ(Λ) = mµ( n
Λ) = µ n
Λ , where
we used the third property of a measure µ as defined above. So far we showed that for any
rational number p ∈ [0, 1], we must have µ(pΛ) = pµ(Λ). Let r ∈ [0, 1] be a real number and
let {pj } and {qk } be two sequences of rational numbers in [0, 1] that converge to r and have
the property that pj ⩽ r ⩽ qk for all j and k. We therefore have
pj µ(Λ) = µ(pj Λ) ⩽ µ(rΛ) ⩽ µ(qk Λ) = qk µ(Λ) , (3.52)
where the inequalities above follows from the fact that if two effects satisfy Λ ⩾ Γ (i.e.
Λ − Γ ⩾ 0) then

µ(Λ) = µ Γ + (Λ − Γ) = µ(Γ) + µ(Λ − Γ) ⩾ µ(Γ) . (3.53)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.3. POSITIVE OPERATOR VALUED MEASURE (POVM) 107

Taking the limit j, k → ∞ gives µ(rΛ) = rµ(Λ) for all r ∈ [0, 1].
We now extend the definition of µ to any element in Herm(A). First, for any positive
semidefinite matrix P ⩾ 0 that is not in Eff(A) there always exists r > 1 such that 1r P ∈
Eff(A). Define µ(P ) := rµ( 1r P ). To show that µ(P ) is well define, let r′ > 1 be another
number such that r1′ P ∈ Eff(A) and assume without loss of generality that r′ > r so that
r
r′
< 1. Then,
1 ′r 1 ′ 1
rµ P = r ′µ P = r µ ′P . (3.54)
r r r r
Note that this extension of the domain of µ to any element of Pos(A) preserves the
linearity of µ; i.e. for any two matrices M, N ∈ Pos(A) and large enough r such that 1r (M +
N ) ∈ Eff(A),

1 1 1
µ(M + N ) = rµ (M + N ) = rµ M + rµ N = µ(M ) + µ(N ) . (3.55)
r r r

Finally, we extend the definition of µ to include in its domain any matrix L ∈ Herm(A). Any
such matrix can be expressed as L = M − N , with M, N ∈ Pos(A). We therefore define:

µ(L) := µ(M ) − µ(N ) . (3.56)

To show that µ(L) is well defined, we need to show that for any other decomposition of
L = M ′ − N ′ with M ′ , N ′ ∈ Pos(A), we have

µ(M ) − µ(N ) = µ(M ′ ) − µ(N ′ ) . (3.57)

This is indeed the case since the equality L = M −N = M ′ −N ′ implies that M +N ′ = M ′ +N

and from the additivity property in (3.55) we conclude that

µ(M ) + µ(N ′ ) = µ(M ′ ) + µ(N ) (3.58)

which is equivalent to (3.57).

To summarize, we where able to extend µ to a linear functional µ : Herm(A) → R. Since
the dual space of Herm(A) is itself, any linear functional on Herm(A) can be expressed as
µ(Λ) = Tr [ρΛ], where ρ ∈ Herm(A) is a fixed matrix. Now, from the fact that µ(Λ) ⩾ 0 for
all effects Λ ∈ Eff(A) we conclude that Tr[ρΛ] ⩾ 0 for all Λ ⩾ 0 which implies that ρ ⩾ 0
(see Exercise below). Finally, to show that ρ is a density operator, note that the condition
µ(I) = 1 gives Tr[ρ] = 1. This completes the proof.
Exercise 3.3.9. Show that ρ ⩾ 0 if and only if Tr[ρΛ] ⩾ 0 for all Λ ⩾ 0.

3.3.3 Naimark’s Dilation Theorem

Naimark’s dilation theorem reveals that any POVM can be implemented with a von-Neumann
projective measurement on a larger Hilbert space. This is not much of a surprise to us as
we already saw that generalized measurements can be implemented by a bipartite unitary

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

108 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

map followed by a projective measurement. However, Naimark’s theorem deals directly with
POVMs and makes the connection between POVMs and projective measurements more
transparent. Moreover, Naimark’s dilation theorem is applicable to infinite dimensional
systems, although we will prove it here only in the finite dimensional case.

Naimark’s Theorem
Theorem 3.3.2. Let A be a Hilbert space and {Λx }x∈[m] ⊂ Eff(A) be a POVM.
Then, there exists an extended Hilbert space B (i.e. |B| ⩾ |A|), an isometry
V : A → B, and a von-Neumann projective measurement {Px }x∈[m] ⊂ Eff(B), such
that Λx = V ∗ Px V for all x ∈ [m].

Proof. Every POVM element Λx can be expressed as Λx = Mx∗ Mx where {Mx } is a general-
ized measurement. In Sec. 3.1 we saw that for every generalized measurement there exists
an ancillary system R, and unitary matrix U RA , such that (cf. (3.2))
R
MxA = R x U RA 1 (3.59)

where |1⟩R is some fixed state in R. Define B := RA and define the operator V : A → B by
R
V := U RA 1 . That is, for any |ψ⟩ ∈ A
R A
V |ψ⟩A := U RA 1 ψ ∈B. (3.60)

Clearly, V ∗ V = I A so that V is an isometry. With this definition we get that

Λx = Mx∗ Mx
R
(3.59)→ = R 1 U ∗RA |x⟩⟨x|R ⊗ I A U RA 1

(3.61)
= V ∗ |x⟩⟨x|R ⊗ I A V

Denoting by PxB := |x⟩⟨x|R ⊗ I A we conclude that Λx = V ∗ PxB V . This completes the

proof.
We point out that any POVM {Λx }x∈[m] ⊂ Eff(A) can be implemented with a rank one
POVM in the following simple way. Since each Λx is positive semidefinite, it can be expressed
in terms of its (unnormalized) eigenvectors {|ϕxy ⟩}y∈[n] as
X
Λx = ϕxy . (3.62)
y∈[n]

A
P
Moreover, the set of rank one matrices {ϕxy } also form a POVM (i.e. x,y ϕxy = I ).
Therefore, one can implement the POVM {Λx }x∈[m] by first implementing the rank one
POVM {ϕxy }, with corresponding (x, y) outcomes, and then forgetting/ignoring the outcome
y.
Exercise 3.3.10. Show that if {Λx }x∈[m] is a rank 1 POVM in Eff(A), where m := |A|, then
{Λx }x∈[m] is a basis measurement (i.e. rank one von-Neumann projective measurement).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.4. EVOLUTION OF OPEN SYSTEMS 109

Exercise 3.3.11. Consider therank one matrices

2
Γx = | ↑nx ⟩⟨↑nx | , x = 1, 2, 3 ; | ↑nx ⟩ ∈ C2 (3.63)
3
were the unit vectors nx ∈ R3 satisfies n1 + n2 + n3 = 0.

1. Show that the set {Γx }x∈[3] is a POVM.

2. Find an orthonormal basis {|φx ⟩}x∈[3] in C3 such that {Γx }x∈[3] is realized by {φx }x∈[3] .

3.4 Evolution of Open Systems

What is the most general evolution that a (possibly open) quantum system can undergo?
We will first tackle this problem axiomatically, using minimal physical assumptions, and
then provide several ways to demonstrate how to physically realize such an evolution. We
will see that any evolution of a physical system can be described with a quantum channel.
Quantum channels lie at the heart of quantum information theory, and we will devote much
of this section to describe their different representations.

3.4.1 The Axiomatic Approach

For closed systems, a pure state |ψ⟩ can evolve either deterministically (described by a
unitary matrix U ) to U |ψ⟩, or probabilistically to |ϕx ⟩ with some probability px . The latter
P as an evolution from the density matrix |ψ⟩⟨ψ| to the classical quantum
can be described
state σ XA = x px |x⟩⟨x| ⊗ |ϕx ⟩⟨ϕx | . Since any open system is described with a density
operator, any evolution of a quantum system can be described with a transformation E that
takes density operators in D(A) to density operators in D(B). Note that since the systems
are open, the transformation E can change the dimension (e.g. particles added or discarded)
so that the input dimension |A| can be different than the output dimension |B|.
Recall that we use the notation L(A, B) to indicate linear transformations from A to B.
However, our current focus lies in the set L(A → B), which denotes transformations from
L(A) to L(B). We will typically employ calligraphic letters like E, F, N , and M to denote
the elements of L(A → B) and sometimes include a superscript, such as E A→B , to emphasize
the underlying input and output Hilbert spaces. The identity element of L(A → A) will be
denoted as idA or idA→A .
Exercise 3.4.1. Let A and B be two Hilbert spaces, m := |A|, and {ηj }j∈[m2 ] be an or-
thonormal basis of L(A) (in the Hilbert-Schmidt inner product). For any two elements
E, F ∈ L(A → B) define
X
⟨E, F⟩ := ⟨E(ηj ), F(ηj )⟩HS , (3.64)
j∈[m2 ]

where ⟨ , ⟩HS denote the Hilbert-Schmidt inner product between matrices.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

110 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

1. Show that the function above is well defined in the sense that it is independent on the
choice of the orthonormal basis {ηj }j∈[m2 ] of L(A).

2. Show that the function above is an inner product in the vector space L(A → B). Hence,
L(A → B) is a Hilbert space.

We are now ready to introduce the axiomatic approach describing a physical evolution
from system A into B. Below each axiom we provide the physical justification.

Axiom 1: A physical evolution can be described with a linear transformation.

Consider the scenario where Alice rolls a dice to obtain a classical variable x ∈ [m], with
associated probabilities {px }x∈[m] . If she obtains the value x, she prepares her system in the
state ρx . After a quantum evolution takes place, if the initial state was ρx , it will evolve to
E(ρx ). Now, if Alice forgets which state she prepared (i.e., she forgets
P the outcome of the
dice roll), from her new perspective, the input state is given by x∈[m] px ρx . In the same
P
vein, the post-evolution state of the system becomes x∈[m] px E(ρx ). Consequently, we must
have: X X
E px ρ x = px E(ρx ) . (3.65)
x∈[m] x∈[m]

The above equation signifies that E is convex-linear (i.e., linear under convex combinations)
on the set of density matrices and can be extended linearly to act on the entire space L(A)
(not limited to D(A) and not limited to convex combinations). Therefore, we infer that
every map describing a physical evolution is an element of L(A → B).

Axiom 2: A physical evolution is trace preserving.

Since a physical evolution E ∈ L(A → B) takes density matrices to density matrices

it must satisfy for all ρ ∈ D(A), Tr[E(ρ)] = 1, since E(ρ) is a density matrix. Now, let
η ∈ Herm(A) be an arbitrary Hermitian matrix. Then, η can always be written as η = tρ−sσ,
where t, s ⩾ 0 and ρ, σ ∈ D(A). Therefore, from the linearity of E we get that

Tr [E(η)] = tTr [E(ρ)] − sTr [E(σ)] = t − s = Tr[η] , (3.66)

so that E preserve the trace of hermitian matrices. Moreover, if M ∈ L(A) is not Hermitian
it can still be expressed as M = η0 +iη1 , where both η0 := (M +M ∗ )/2 and η1 := (M +M ∗ )/2
are Hermitian matrices, so that

Tr [E(M )] = Tr [E(η0 )] + iTr [E(η1 )] = Tr [η0 ] + iTr [η1 ] = Tr[M ] . (3.67)

We therefore conclude that any physical evolution E is a trace preserving (TP) linear map.

Axiom 3: A physical evolution is completely positive.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.4. EVOLUTION OF OPEN SYSTEMS 111

Since a physical evolution E takes density matrices to density matrices it also preserves
positivity. That is, if ρ ∈ Pos(A) is positive semidefinite matrix then also E(ρ) is a positive
semidefinite matrix in Pos(B). We call such linear maps positive maps. There is yet one
more property that E has to satisfy if it describes an evolution a physical system.
Consider a composite system consisting of two subsystems A and B. Such a system is
described by a bipartite density operator ρAB ∈ D(A ⊗ B). If the subsystem B undergoes
a physical evolution described by a linear map E ∈ L(B → B ′ ), while system A does not
evolve and remain intact, then the state ρAB will evolve to the state
′ ′
σ AB := idA ⊗ E B→B ρAB .

(3.68)

Therefore, if E represents a physical evolution, then both E and idA ⊗ E must takes density
matrices to density matrices. In particular, the linear map id ⊗ E ∈ L(AB → AB ′ ) must
also be a positive map for any system A. It turns out that there are linear maps E that are
positive while idA ⊗ E is not positive. One such example is the transposition map.
Consider the linear map T ∈ L(A → A) defined by

T (ρ) := ρT ∀ ρ ∈ L(A) . (3.69)

The transpose map preserves the eigenvalues and therefore is a trace-preserving positive
map. Now, take A = C2 and consider the matrix ΩAÃ := |ΩAÃ ⟩⟨ΩAÃ | ∈ L(C2 ⊗ C2 ). The
matrix ΩAÃ is a rank one, positive semidefinite matrix that can be expressed as
 
1 0 0 1
 
 
AÃ
0 0 0 0
Ω = |00⟩⟨00| + |00⟩⟨11| + |11⟩⟨00| + |11⟩⟨11| =  
 .
 (3.70)
0 0 0 0
 
1 0 0 1

On the other hand, its partial transpose on system Ã is given by

 
1 0 0 0
 
 
0 0 1 0
T Ã→Ã ΩAÃ = |00⟩⟨00| + |01⟩⟨10| + |10⟩⟨01| + |11⟩⟨11| = 

 . (3.71)
 
0 1 0 0
 
0 0 0 1

It is relatively easy to check (see the exercise below) that T Ã→Ã ΩAÃ has three eigenvalues

equals to 1, and one eigenvalue equals −1. Hence, T Ã→Ã ΩAÃ is not positive semidefinite!
√
Exercise 3.4.2. Let ψ AÃ := |ψ⟩⟨ψ| with |ψ AÃ ⟩ =
P
x∈[d] px |x⟩|x⟩ ∈ A ⊗ Ã. Find the
Ã→Ã AÃ

eigenvalues and eigenvectors of T ψ . For which values of px , the matrix T Ã→Ã ψ AÃ
is positive semidefinite?

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

112 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

Complete Positivity
Definition 3.4.1. A linear map E ∈ L(A → B) is called k-positive if idk ⊗ E is
positive, where idk ∈ L(Ck → Ck ) is the identity map. Furthermore, E is called
completely positive (CP) if it is k-positive for all k ∈ N.

By definition, every map that is k-positive is also k ′ -positive if k ′ ⩽ k. On the other

hand, there are maps that are k-positive but not (k + 1)-positive. The canonical example of
such a map is the map E (k) ∈ L(A → A) (with d := |A|) defined by

E (k) (η) = kTr[η]Id − η ∀ η ∈ L(A) . (3.72)

From Exercise 3.4.15 in the next few subsections it follows that E (k) is k-positive, and yet, if
k < d then E (k) is not (k + 1)-positive. If k ⩾ d then this map is completely positive. More
generally, we will see later that a map is d-positive if and only if it is completely positive.
In conclusion, any evolution of a physical system has to be (1) linear, (2) trace-preserving
(TP), and (3) completely positive (CP). Such a linear CPTP map is called a quantum
channel. The set of all quantum channels in L(A → B) will be denoted by CPTP(A →
B). In the next subsections we discuss several representations of quantum channels, and
along the way show that any quantum channel has a physical realization; that is, it can
be implemented by physical processes. Therefore, the axiomatic approach above led us to
the precise conditions on the evolution of a physical system that are both necessary and
sufficient for the existence of its physical realization.

Exercise 3.4.3. Let Λ ∈ L(A) and define a map FΛ : L(A) → L(A) via

FΛ (ω) := ΛωΛ∗ ∀ ω ∈ L(A) . (3.73)

Show that FΛ is a completely positive linear map. Hint: Prove it first that FΛA→A (ψ RA ) ⩾ 0
for a pure state |ψ RA ⟩ = M ⊗ I A |ΩÃA ⟩.

The Dual of a Linear Map in L(A → B)

We saw earlier that any vector in a Hilbert space has a dual vector. In particular, we saw
that any matrix M ∈ L(A, B) has a dual or adjoint matrix M ∗ ∈ L(B, A). Since the space
L(A → B) is also a Hilbert space with respect to the inner product given in (3.64), it follows
that any map in L(A → B) has a dual map in L(B → A).

Definition 3.4.2. Let E ∈ L(A → B) be a linear map. Its dual or adjoint map is
the map E ∗ ∈ L(B → A) that satisfies

Tr [σ ∗ E(ρ)] = Tr [(E ∗ (σ))∗ ρ] ∀ ρ ∈ L(A) and ∀ σ ∈ L(B) . (3.74)

Furthermore, we say that E is self-adjoint if |A| = |B| and E = E ∗ .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.4. EVOLUTION OF OPEN SYSTEMS 113

The definition above of the dual map is analogous to the definition of a dual map in
L(A, B). Recall that the dual of a matrix M ∈ L(A, B) is defined via the relation
⟨ϕ|M ψ⟩ = ⟨M ∗ ϕ|ψ⟩ ∀ |ψ⟩ ∈ A, |ϕ⟩ ∈ B . (3.75)
Similarly, the definition of E ∗ in (3.74) can be expressed as
⟨σ, E(ρ)⟩HS = ⟨E ∗ (σ), ρ⟩HS ∀ ρ ∈ L(A), σ ∈ L(B) , (3.76)
where ⟨·, ·⟩HS is the Hilbert-Schmidt inner product.
Exercise 3.4.4. Let E ∈ L(A → B) be a linear map.
1. Show that E is trace preserving if and only if its dual E ∗ is unital; i.e. E ∗ (I B ) = I A .
2. Show that E is trace non-increasing (i.e., Tr[E(η)] ⩽ Tr[η] for all η ∈ Pos(A)) if and
only if its dual E ∗ is sub-unital; i.e. E ∗ (I B ) ⩽ I A .
3. Show that E is positive if and only if E ∗ is positive.
4. Show that E is completely positive if and only if E ∗ is completely positive.
Exercise 3.4.5. Show that a unitary evolution is a CPTP map. Specifically, show that a
unitary map U ∈ L(A → A) defined by
U(ρ) := U ρU ∗ ∀ ρ ∈ L(A) (3.77)
where U ∈ U(A) is a unitary operator, is a quantum channel.
Exercise 3.4.6. The replacement map is a map E ∈ L(A → B) defined by
E(ρ) := Tr[ρ] σ ∀ ρ ∈ L(A) (3.78)
where σ ∈ D(B) is some fixed density matrix.
1. Show that E is a quantum channel.
2. Show that for any two quantum states ρ ∈ D(A) and σ ∈ D(B) there exists a quantum
channel E such that E(ρ) = σ.
Exercise 3.4.7. Let A and B be two finite dimensional Hilbert spaces, and denote the set
of positive maps in L(A → B) by
n o
Pos(A → B) := E ∈ L(A → B) : E(ρ) ∈ Pos(B) ∀ ρ ∈ Pos(A) (3.79)
1 B
Denote also by R ∈ CPTP(A → B) the replacement channel R(ρA ) := Tr[ρA ] |B| I for all
ρ ∈ L(A).
1. Show that Pos(A → B) is a convex cone in the Hilbert space L(A → B).
2. Prove the equivalence of the following properties of a map E ∈ Pos(A → B):
(a) E belongs to the interior of the cone Pos(A → B).
(b) E = (1 − t)F + tR for some t ∈ (0, 1] and some F ∈ Pos(A → A).
(c) E ∗ belongs to the interior of the cone Pos(A → B).
(d) For any non-zero ρ ∈ Pos(A) we have E(ρ) > 0.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

114 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

3.4.2 The Matrix Representation

We begin our exploration of linear maps with the most familiar representation: the matrix
representation. In the context of linear transformations between two Hilbert spaces, every
element in the set L(A → B) possesses a corresponding matrix representation. While this
matrix representation can prove valuable in certain applications, especially when dealing
with low-dimensional cases, we will discover later on that it is, in fact, the least intuitive
representation and finds limited use in the realm of quantum information. The reason for
this lies in the fact that the concept of complete positivity, a crucial property of quantum
channels, does not translate naturally and tends to become cumbersome within this matrix-
based framework.
Let E ∈ L(A → B) be a linear map, ρ ∈ L(A), and σ := E(ρ) ∈ L(B). The relationship
σ = E(ρ) can be expressed in matrix form as sσ = ME rρ , where rρ and sσ denote column
vectors representing ρ and σ, respectively, and ME is a complex matrix representing the
linear map E. To illustrate it explicitly, let {Λx }x∈[m2 ] with m := |A| be a fixed orthonormal
basis of L(A), and {Γy }y∈[n2 ] with n := |B| be a fixed orthonormal basis of L(B). Any
operator ρ ∈ L(A) can be expressed as a linear combination of the basis elements
X
ρ= rx Λx . (3.80)
x∈[m2 ]

2
with rρ := (r1 , . . . , rm2 )T ∈ Cm . Observe that the mapping P ρ 7→ rρ defines an isomor-
m2
phism between L(A) and C . Similarly, for any σ = x∈[n2 ] sy Γy ∈ L(B) we define
sσ := (s1 , . . . , sn2 ) . Finally, for any linear map E ∈ L(A → B) we define the n2 × m2 matrix
T

ME whose components are given by

(ME )yx := ⟨Γy |E(Λx )⟩HS := Tr Γ∗y E(Λx ) .

(3.81)

With these notations we have

σ = E(ρ) ⇐⇒ sσ = ME rρ . (3.82)

Therefore, ME is the matrix representation of the linear map E.

Suppose now that the map E is a quantum channel, and that the orthonormal bases
{Λx }x∈[m2 ] and {Γy }y∈[n2 ] consist of Hermitian matrices (see Exercise 2.3.17). Then, ρ is
Hermitian if and only if rρ is a real vector, and similarly, σ is Hermitian if and only if sσ
is a real vector. Therefore, E is Hermitian preserving if and only if ME is a real matrix.
Moreover, since E is trace preserving, it will be convenient to choose the orthonormal bases
{Λx }x∈[m2 ] and {Γy }y∈[n2 ] with the first element proportional to the identity. That is, we
choose
1 1
Λ1 := √ I A and Γ1 := √ I B , (3.83)
m n
2 2
so that the remaining elements of the orthonormal bases, {Λx }m n
x=2 and {Γy }y=2 , are all
traceless (since they have to be orthogonal in the Hilbert-Schmidt inner product to the first

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.4. EVOLUTION OF OPEN SYSTEMS 115

element). With these choices, an operator ρ ∈ L(A) has trace one if and only if rρ has
the form ( √1m , r2 , . . . , rm2 )T . We therefore conclude that the linear map E is both trace
preserving and Hermitian preserving if and only if its matrix representation has the form
p 
m
0
ME =  n  where t ∈ Rn2 −1 and NE ∈ R(n2 −1)×(m2 −1) . (3.84)
t NE

What are the conditions on the matrix NE and the vector t that correspond to the condition
that E is completely positive? These conditions can be very complicated since, in general,
even the set of all vectors rρ for which ρ ⩾ 0 doesn’t have a simple characterization.
Exercise 3.4.8. Let E ∈ L(A → B) be a linear map.
1. Show that
ME ∗ = ME∗ . (3.85)

2. Show that E is a self-adjoint map (i.e. E = E ∗ ) that is both trace-preserving and

Hermitian preserving if and only if the matrix ME in (3.84) has t = 0 and NE is a real
symmetric matrix (i.e. NET = NE ).

The Qubit Case

In the qubit case, where |A| = |B| = 2, the Bloch representation simplifies the situation. In
particular, ρ ∈ L(C2 ) is a density matrix if and only if rρ = √12 (1, r)T , where r corresponds
to the Bloch vector belonging to R3 and its length ∥r∥ ⩽ 1. Therefore, in this case, σ = E(ρ)
if and only if the Bloch vector of σ, denoted by s, is related to r via

s = t + NE r . (3.86)

From the relation above, E is a positive map if the matrices t and NE are such that whenever
|r| ⩽ 1, |s| ⩽ 1 also holds. It’s important to note, however, that this criterion pertains
solely to the positivity of E and does not address its complete positivity. In the specific
scenario of qubits, it is feasible to articulate the conditions governing NE and t for E to be
completely positive. Nevertheless, these conditions tend to be rather intricate, and we direct
the interested reader to the pertinent literature found in the Notes and References section at
the end of this chapter. In the exercises below, you will demonstrate that these conditions
become more straightforward in the case of doubly-stochastic maps.
Doubly stochastic maps encompass mappings that possess two key properties: trace-
preservation and unitality, meaning they preserve the identity operator. One of the simplest
examples of such maps is the unitary map U(ρ) := U ρU ∗ , where U is a unitary P matrix.
Another example includes convex combinations of unitary maps in the form x∈[m] px Ux ,
where {px }x∈[m] forms a probability distribution, and each Ux represents a unitary quantum
channel. These instances exemplify completely positive maps that also qualify as doubly
stochastic.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

116 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

Conversely, the transpose map T (ρ) = ρT or its combination with a unitary map, denoted
as U ◦ T , serve as examples of doubly stochastic maps that are positive but not completely
positive. In the subsequent set of problems, we will see that for the qubit case, all positive
doubly stochastic maps can be expressed as convex combinations of such maps.
Exercise 3.4.9. Let E ∈ Pos (C2 → C2 ) be a positive linear map.
1. Show that E is both trace-preserving and unital (i.e. doubly-stochastic) if and only if
 
1 0
ME =   (3.87)
0 NE

where NE ∈ R3×3 has the property that ∥NE r∥2 ⩽ 1 for all r ∈ R3 with ∥r∥2 = 1 (in
particular, the absolute value of the eigenvalues of NE cannot exceed one).
2. Suppose E = U is the doubly stochastic unitary map given by U(ρ) = U ρU ∗ , where
U ∈ SU (2) can be expressed as U = wI2 + i(xσ1 + yσ2 + zσ3 ) with w, x, y, z ∈ R and
w2 + x2 + y 2 + z 2 = 1 (cf. (C.10)). Show that the matrix NU of (3.87) is an orthogonal
matrix in SO(3) given by
 
1 − 2y 2 − 2z 2 2xy + 2zw 2xz − 2yw
 
NU =  2xy − 2zw 1 − 2x2 − 2z 2 (3.88)
 
2yz + 2xw 
 
2xz + 2yw 2yz − 2xw 1 − 2x2 − 2y 2

(cf. (C.7)). Hint: Calculate directly the components 21 Tr [σi U σj U ∗ ] for i, j ∈ {1, 2, 3}.
3. Use the previous parts to show that for any map of the form U(ρ) = U ρU ∗ , we have
that U ∈ U (2) if and only if the matrix NU ∈ SO(3). Hint: Every unitary U ∈ U (2)
can be written as U = exp(iθ)Ũ , where Ũ ∈ SU (2) and θ ∈ [0, 2π).
Exercise 3.4.10. Let T ∈ L (C2 → C2 ) be the transpose map defined by T (ρ) = ρT for all
ρ ∈ L(C2 ).
1. Show that the matrix representation of the transpose map with respect to the Pauli basis
of L(C2 ) is given by  
1 0 00
 
 
0 1 0 0
MT =   (3.89)
0 −1 0
 
0
 
0 0 0 1

2. Show that if E ∈ L(C2 → C2 ) is a positive doubly stochastic linear map with NE ∈ O(3)
and with det(NE ) = −1 then E = T ◦ U for some unitary map U. Hint: Use the fact
that any 3 × 3 orthogonal matrix can be expressed as a matrix product of an element
in SO(3) with NT .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.4. EVOLUTION OF OPEN SYSTEMS 117

Exercise 3.4.11. Use the exercises above in conjunction with Exercise A.3.3 to conclude
that any doubly stochastic positive map E ∈ Pos(C2 → C2 ) can be expressed as

E = tN1 + (1 − t)T ◦ N2 , (3.90)

where
P t ∈ [0, 1] and both N1 and N2 are mixtures of unitary maps; i.e. maps of the form
j∈[m] pj Uj with each Uj being a unitary map and {pj }j∈[m] is a probability distribution.
Hint: Use Exercise A.3.3 to show that NE can be expressed as a finite convex combination
of orthogonal matrices.

Positivity vs Complete Positivity in Low Dimensions

We have seen before that the transpose map T is positive, but not 2-positive (and conse-
quently, the transpose map is not completely positive). The following theorem that was
proved originally by Strømer and Woronowicz shows essentially that in low dimensions com-
binations of the transpose map with completely positive maps are the only maps that can
be positive but not completely positive.

Størmer-Woronowicz Theorem
Theorem 3.4.1. Let E ∈ Pos(A → B) be a positive linear map. If |A| = 2 and
|B| ⩽ 3 then there exists two CP maps N1 , N2 ∈ CP(A → B) such that

E = N1 + T ◦ N2 (3.91)

where T ∈ Pos(B → B) is the transpose map.

Remark. The case |B| = 2 was proven by Størmer, and the case |B| = 3 was proven by
Woronowicz. Here, we will only prove Størmer theorem (i.e. |A| = |B| = 2) and refer the
reader to the section ‘Notes and References’ (at the end of this chapter) for more details.
Proof. We prove the theorem for the case that E is in the interior of Pos(A → A) (the
more general case will then follow from a continuity argument; see Exercise 3.4.13). From
Exercise 3.4.7 it follows that also E ∗ is in the interior of Pos(A → A), and furthermore,
E(ρ) > 0 for any non-zero ρ ∈ Pos(A).
The key idea of the proof is to find two positive definite operators Λ, Γ > 0 with the
property that the channel
D := FΛ ◦ E ◦ FΓ (3.92)
is doubly stochastic, where

FΛ (ω) := ΛωΛ and FΓ (ω) := ΓωΓ ∀ ω ∈ L(A) . (3.93)

From Exercise 3.4.3 (see also the section on operator sum representation below) it follows
that the above maps are completely positive. Apriori it is not clear if such positive definite
matrices Λ and Γ exists, but if they do then from (3.92) we have E = FΛ−1 ◦ D ◦ FΓ−1 , and

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

118 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

since all doubly stochastic positive maps have the form (3.91) (see Exercise 3.4.11) it follows
that also E has the form (3.91). It is therefore left to show that such Λ and Γ do exist.
By definition, the channel D is doubly stochastic if and only if both D and its dual D∗
are unital channels. Since the dual of D is given by D∗ = FΓ ◦ E ∗ ◦ FΛ (see Exercise 3.4.12)
we conclude that D is doubly stochastic if and only if the matrices Λ and Γ satisfies

I = D(I) = ΛE(Γ2 )Λ and I = D∗ (I) = ΓE ∗ (Λ2 )Γ . (3.94)

By conjugating with the inverses of Λ and Γ, the two equations above can be expressed as

Λ−2 = E(Γ2 ) and Γ−2 = E ∗ (Λ2 ) . (3.95)

It is therefore left to show that there exists ρ := Λ−2 > 0 and σ := Γ2 > 0 such that

ρ = E(σ) and σ −1 = E ∗ ρ−1 .

(3.96)

Observe that if ρ and σ satisfy the two equations above then for any s > 0 also sρ and sσ
satisfy the two equations. Hence, without loss of generality we can assume that if there exists
ρ and σ that satisfy the equation above then σ is normalized and since E is trace preserving
this implies that also ρ := E(σ) is normalized.
The equation ρ = E(σ) can be taken to be the definition of ρ. Substituting this ρ into
the second equality of (3.96) implies that
−1
σ −1 = E ∗ E(σ) . (3.97)

To show that such a σ exists, define the function f : D(A) → Pos(A) via

∗
−1 −1
f (ω) := E E(ω) ∀ ω ∈ D(A) , (3.98)

and observe that (3.97) is equivalent to f (σ) = σ. We also define the normalized version of
f , the function g : D(A) → D(A), as

f (ω)
g(ω) := ∀ ω ∈ D(A) . (3.99)
Tr [f (ω)]

Then, from Brouwer’s fixed-point theorem (see Theorem A.10.1) there exists a density matrix
σ ∈ D(A) such that g(σ) = σ. Denoting by t := Tr[f (σ)] > 0 this is equivalent to

f (σ) = tσ . (3.100)

It is therefore left to show that t = 1. For this purpose, observe first that with the definition
ρ := E(σ) we can express the above equation as

(tσ)−1 = E ∗ ρ−1 ,

(3.101)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.4. EVOLUTION OF OPEN SYSTEMS 119

so that
1 1
t−1 I = σ 2 E ∗ ρ−1 σ 2 = D∗ (I) , (3.102)
where D is defined in (3.92) with Λ = ρ−1/2 and Γ = σ 1/2 . On the other hand, the relation
ρ = E(σ) can be written as
1 1
I = ρ− 2 E (σ) ρ− 2 = D(I) (3.103)
which implies that the linear map D is unital. Since the dual of a unital map is trace
preserving (see the first part of Exercise 3.4.4) we conclude that D∗ is trace preserving.
Hence, by taking the trace on both sides of (3.102) and using the fact that D∗ is trace
preserving we conclude that t = 1. This completes the proof.

Exercise 3.4.12. Show that if D := FΓ ◦ E ◦ FΛ as in the proof above, then

D∗ := FΛ ◦ E ∗ ◦ FΓ . (3.104)

Exercise 3.4.13. Use a continuity argument to prove that if all maps in the interior of
Pos(A → A) have the form (3.91) then all the maps in Pos(A → A) has this form.

3.4.3 The Choi Representation

The Choi representation, also known as the Choi-Jamiolkowski isomorphism, is another
method to characterize linear maps using matrices, offering a simpler way to identify complete
positivity. In this representation, quantum channels are associated with positive semidefinite
matrices. This characteristic makes the Choi representation particularly useful in various
applications within quantum information science, especially because it allows for the trans-
lation of certain optimization problems into semidefinite programs.
In subsequent discussions and throughout the rest of the book, we will adopt the short-
B→B ′ AB A B→B ′ AB
hand notation E (ρ ) to denote id ⊗ E (ρ ). This notation simplifies expres-
sions and discussions involving the application of a quantum channel E from system B to
system B ′ on part of a bipartite state ρAB .
Given a linear map E ∈ L(A → B), the Choi matrix is defined by the action of E on one
subsystem of a maximally entangled state. Setting m := |A| and denoting by
X
ΩAÃ = |ΩAÃ ⟩⟨ΩAÃ | = |x⟩⟨y|A ⊗ |x⟩⟨y|Ã , (3.105)
x,y∈[m]

the Choi matrix of E is defined by

X
JEAB := E Ã→B ΩAÃ =

|x⟩⟨y| ⊗ E |x⟩⟨y| . (3.106)
x,y∈[m]

One of the key properties of the Choi matrix is that it satisfies the relation

E(ρ) = TrA JEAB ρT ⊗ I

∀ ρ ∈ L(A) . (3.107)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

120 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS
P
To see this, let ρ = x,y∈[m] rxy |x⟩⟨y| be a linear operator in L(A) with components rxy ∈ C
and m := |A|. From the definition of the Choi matrix in (3.106) we get
X
TrA JEAB ρT ⊗ I = Tr ρT |x⟩⟨y| E (|x⟩⟨y|)

x,y∈[m]
X
= rxy E (|x⟩⟨y|)
(3.108)
x,y∈[m]
X
=E rxy |x⟩⟨y| = E(ρ) .
x,y∈[m]

The two relations (3.106,3.107) demonstrate that the mapping E 7→ JE is a linear bijection
(i.e. an isomorphism). This isomorphism is between the vector space L(A → B) of linear
operators from L(A) to L(B), and the space of bipartite matrices/operators L(AB). In the
following exercise you show that the mapping E 7→ JE is in fact isometrically isomorphism
between these two spaces.

Exercise 3.4.14. Let A and B be two finite dimensional Hilbert spaces and consider the
Hilbert space L(A → B) equiped with the inner product defined in (3.64). Show that this
inner product can be expressed as follows. For all E, F ∈ L(A → B) we have

⟨E, F⟩ = ⟨JEAB , JFAB ⟩HS (3.109)

where on the right-hand side we have the Hilbert-Schmidt inner product between the two Choi
matrices of E and F.

Theorem 3.4.2. A linear map E ∈ L(A → B) is completely positive if and only if

JEAB ⩾ 0.

Proof. If E is completely positive then by definition JEAB := E Ã→B (ΩAÃ ) ⩾ 0. Suppose now
that JEAB ⩾ 0. Let k ∈ N, and |ψ RA ⟩ ∈ Ck ⊗ Cd , where R is a k-dimensional (reference)
system. Recall that any bipartite vector |ψ⟩RA can be expressed as

|ψ⟩RA = M ⊗ I A |ΩÃA ⟩ , (3.110)

where M : Ã → R is some linear operator. We therefore have

(idk ⊗ E) |ψ RA ⟩⟨ψ RA | = (idk ⊗ E) (M ⊗ I A )ΩAÃ (M ∗ ⊗ I A )

∗
B Ã→B AÃ
=M ⊗I E Ω M ⊗ IB (3.111)
= M ⊗ I B JEAB M ∗ ⊗ I B

Exercise 2.3.10→ ⩾ 0 .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.4. EVOLUTION OF OPEN SYSTEMS 121

Finally, any operator ρRA ⩾ 0 can be diagonalized as ρRA = RA RA

P
x∈[m] |ψx ⟩⟨ψx |, where
|ψxRA ⟩ ∈ Ck ⊗ Cd are some (possibly unnormalized) pure states. Since |ψ RA ⟩ above was
arbitrary, we conclude that
X
(idk ⊗ E) ρRA = (idk ⊗ E) |ψxRA ⟩⟨ψxRA | ⩾ 0 ,

(3.112)
x∈[m]

since each term in the sum is positive semidefinite. This completes the proof.

Corollary 3.4.1. A linear map E ∈ L(A → B) is |A|-positive if and only if it is

completely positive.

Proof. By definition if E is a CP map then it is |A|-positive. Conversely, if E is |A|-positive,

then its Choi matrix E Ã→B (ΩAÃ ) is positive semidefinite. Hence, from the theorem above E
is a CP map.

Theorem 3.4.3. A linear map E ∈ L(A → B) is trace preserving if and only if the
marginal state JEA := TrB JEAB = I A .

Proof. Suppose E is trace preserving and set m := |A|. Then, from (3.106)
X
TrB JEAB =

|x⟩⟨y|Tr [E (|x⟩⟨y|)]
x,y∈[m]
X
E is trace-preserving→ = |x⟩⟨y|Tr [|x⟩⟨y|] (3.113)
x,y∈[m]
X
= |x⟩⟨y|δxy = I A .
x,y∈[m]

Conversely, suppose JEA = I A , then from (3.107) for every ρ ∈ L(A)

Tr [E(ρ)] = Tr JEAB ρT ⊗ I = Tr JEA ρT = Tr[ρT ] = Tr[ρ] .

(3.114)

This completes the proof.

We therefore conclude that a linear map E is a quantum channel if and only if its Choi
matrix JEAB ⩾ 0, and its marginal JEA = I A . In particular, the Choi matrix has trace |A| so
1
that |A| JEAB ∈ D(A ⊗ B). Hence, the Choi representation reveals that quantum channels can
be represented with bipartite quantum states. This equivalence between quantum channels
and bipartite quantum states is used very often in quantum information science.

Exercise 3.4.15. Show that the linear map, E (k) , defined in (3.72), with k < d, is k-positive
but not (k + 1)-positive.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

122 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

Exercise 3.4.16. Let E ∈ L(A → A) be a quantum channel with the property that E(U ρU ∗ ) =
U E(ρ)U ∗ for any unitary matrix U ∈ U(A). Show that its Choi matrix JEAÃ must satisfy
∗
Ū ⊗ U JEAÃ Ū ⊗ U = JEAÃ

(3.115)
for all unitary matrices U .
Exercise 3.4.17. Show that any density matrix ρ ∈ D(AB) can be expressed as

ρAB = E Ã→B ψ AÃ (3.116)

where ψ ∈ Pure(AÃ) is pure bipartite state that has a marginal ψ A = ρA , and E ∈

1 1
CPTP(A → B) is a quantum channel. Hint: look at (ρA )− 2 ρAB (ρA )− 2 .
Exercise 3.4.18. Show that if instead of the standard Choi representation above, one defines
X√
JEAB := E Ã→B ψ AÃ where |ψ AÃ ⟩ := px |x⟩A |x⟩Ã

(3.117)
x∈[m]

with {px } a probability distribution, then

E (ρ) = TrA JEAB σ −1/2 U ρT U ∗ σ −1/2 ⊗ I B

, (3.118)
AÃ
where U is some fixed diagonal unitary on system A and σ := TrÃ ψ .

3.4.4 The Operator Sum Representation

In Fig. 3.3c we considered a generalized measurement {Mx }x∈[m] on a physical system de-
scribed by the state ρ ∈ D(A). If outcome x occurred then the state of the system ρ changes
to (Mx ρMx∗ ) /px , where px is the probability that outcome x occurred. However, if the
value of x is erased after the measurement, then the post-measurement state is given by the
average over all the possible outcomes; that is,
X Mx ρM ∗ X
px x
= Mx ρMx∗ . (3.119)
px
x∈[m] x∈[m]

We now show that the mapping ρ 7→ x∈[m] Mx ρMx∗ is a quantum channel and that every
P
quantum channel can be realized in this way. This representation of a quantum channel is
called the operator sum representation, and the elements {Mx }x∈[m] (with x∈[m] Mx∗ Mx =
P

I A ) are called the Kraus operators.

Theorem 3.4.4. A linear map E ∈ L(A → B) is a quantum channel if and only if it

has an operator sum representation. That is, E is a quantum channel if and only if
there exists a set of Kraus operators {Mx }x∈[m] ⊂ L(A, B), such that
X
E(ρ) = Mx ρMx∗ . (3.120)
x∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.4. EVOLUTION OF OPEN SYSTEMS 123

Proof. Suppose E is a quantum channel. Since the Choi matrix of a quantum channel is
positive semidefinite we can always express it as
X
JEAB = |ψxAB ⟩⟨ψxAB | (3.121)
x∈[m]

for some integer m and some (possibly unnormalized) vectors |ψxAB ⟩ ∈ A ⊗ B. Recall that
any bipartite state |ψxAB ⟩ can be expressed as

|ψxAB ⟩ = I A ⊗ Mx |ΩAÃ ⟩ = MxT ⊗ I B |ΩB̃B ⟩ , (3.122)

where Mx ∈ L(A, B) is a linear operator. Moreover, since the marginal Choi matrix JEA = I A
we get  
X
I A = TrB JEAB = TrB  MxT ⊗ I B ΩB̃B (Mx∗ )T ⊗ I B 

x∈[m] (3.123)
X h i X
= MxT TrB ΩB̃B (Mx∗ )T = MxT (Mx∗ )T
x∈[m] x∈[m]

By taking the transpose on both sides of the equation above we get

X
IA = Mx∗ Mx . (3.124)
x∈[m]

Moreover, substituting (3.121) into (3.107) gives

X
TrA |ψxAB ⟩⟨ψxAB | ρT ⊗ I

E(ρ) =
x∈[m]
X h i
I A ⊗ Mx |ΩAÃ ⟩⟨ΩAÃ | I A ⊗ Mx∗ ρT ⊗ I

= TrA (3.125)
x∈[m]
X h i ∗
= Mx TrA |ΩAÃ ⟩⟨ΩAÃ | ρT ⊗ I Mx
x∈[m]

To simplify this last term, note that

h i h i
TrA |ΩAÃ ⟩⟨ΩAÃ | ρT ⊗ I = TrA |ΩAÃ ⟩⟨ΩAÃ | (I ⊗ ρ)
h i (3.126)
= TrA |Ω ⟩⟨Ω | ρ = I Ã ρ = ρ .
AÃ AÃ

We therefore conclude that E(ρ) = x∈[m] Mx ρMx∗ .

To prove the converse, suppose that E has the form (3.120). Then, clearly the Choi matrix
JEAB := E Ã→B (ΩAÃ
) has the form (3.121) with |ψxAB ⟩ := I A ⊗ Mx |ΩAÃ ⟩. Hence, JEAB ⩾ 0 and
JEA = I A since x∈[m] Mx∗ Mx = I A . This completes the proof.
P

Exercise 3.4.19. Let E ∈ L(A → B) be a linear map.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

124 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

1. Show that there exists two sets of matrices {Mx }x∈[m] and {Nx }x∈[m] such that
X
E(ρ) = Mx ρNx∗ . (3.127)
x∈[m]

Hint:
P Start by showing that it is possible to express the complex Choi matrix as JEAB =
x∈[m] |ψx ⟩⟨ϕx |, and then follow similar lines as in the proof above.

2. Show that the dual (adjoint) map E ∗ : L(B → A) is given by:

X
E ∗ (ρ) = Mx∗ ρNx . (3.128)
x∈[m]

3. Show that E ∈ L(A → B) is completely positive if and only if its dual map E ∗ ∈ L(B →
A) is completely positive.

We next prove a uniqueness theorem of operator-sum representations.

Theorem 3.4.5. Let m, n ∈ N with m ⩽ n. Consider a generalized measurement

{Mx }x∈[m] ⊂ L(A, B), be a generalized measurement, and a set of matrices
{Ny }y∈[n] ⊂ L(A, B). The following statements are equivalent:

1. The sets {Mx }x∈[m] and {Ny }y∈[n] constitute two operator sum representations
of the same quantum channel.

2. There exists an n × m isometry V = (vyx ) such that for all y ∈ [n]:

X
Ny := vyx Mx . (3.129)
x∈[m]

Proof. We first prove the implication 2 ⇒ 1. From (3.129) it follows that

X X X
Ny∗ Ny = v̄yx vyx′ Mx∗ Mx′
y∈[n] x,x′ ∈[m] y∈[n]
X X (3.130)
V ∗ V = Im −−−−→ = δxx′ Mx∗ Mx′ = Mx∗ Mx = I A ,
x,x′ ∈[m] x∈[m]

so that {Ny }y∈[n] is a generalized measurement. Similarly, for any ρ ∈ L(A) we have
X X X
Ny ρNy∗ = vyx v̄yx′ Mx ρMx∗′
y∈[n] x,x′ ∈[m] y∈[n]
X X (3.131)
V ∗ V = Im −−−−→ = δxx′ Mx ρMx∗′ = Mx ρMx∗ .
x,x′ ∈[m] x∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.4. EVOLUTION OF OPEN SYSTEMS 125

Hence, the sets {Mx }x∈[m] and {Ny }y∈[n] are two operator sum representations of the same
quantum channel.
Next, we proof the implication 1 ⇒ 2. From the assumption we have in particular that
for every ψ ∈ Pure(A) we have
X X
ρ := Mx |ψ⟩⟨ψ|Mx∗ = Ny |ψ⟩⟨ψ|Ny∗ . (3.132)
x∈[m] y∈[n]

Since both {Mx |ψ⟩}x∈[m] and {Ny |ψ⟩}y∈[n] form an unnormalized pure-state decomposition
of the same density matrix ρ, we get from Exercise 2.3.15 that there exists an n×m isometry
matrix V = (vyx ) such that for all y ∈ [n]
X
Ny |ψ⟩ = vyx Mx |ψ⟩ . (3.133)
x∈[m]

Since the above equality holds for all ψ ∈ Pure(A) the relation (3.129) must hold. This
completes the proof.
Exercise 3.4.20. Show that for every quantum channel E ∈ CPTP(A → B) there exists an
operator-sum representation with no more than |AB| elements.

The Canonical Operator Sum Representation

Recall from Exercise 2.3.15 that JEAB has pure state decompositions that are all related
via some isometry in the same manner as the two sets of the Kraus operators above are
related. Therefore, each operator sum representations of E corresponds to a particular pure
state decomposition of JEAB as in (3.121). The canonical operator sum representation of the
quantum channel E is the one corresponding to the diagonalization of JEAB . That is, in the
canonical representation we take the vectors {|ψxAB ⟩}x∈[m] in (3.121) to be orthogonal. This
means that they are linearly independent and consequently we must have m ⩽ |AB| since
the rank of JEAB cannot exceed |A ⊗ B| = |AB|. Moreover, the orthogonality of the vectors
{|ψxAB ⟩}x∈[m] implies that for x ̸= x′

0 = ⟨ψxAB |ψxAB
′ ⟩ = ⟨Ω
AÃ
|Mx∗ Mx′ ⊗ I Ã |ΩAÃ ⟩ = Tr [Mx∗ Mx′ ] . (3.134)

That is, the Kraus operators are also orthogonal in the Hilbert-Schmidt inner product. We
therefore arrived at the following corollary.

Corollary 3.4.2. Let E ∈ L(A → B) be a quantum channel.P Then∗ E has aAcanonical

operator sum representation {Mx }x∈[m] , with m ⩽ |AB|, x∈[m] Mx Mx = I , and for
each x ̸= x′
Tr [Mx∗ Mx′ ] = 0 . (3.135)

In particular, there are always operator sum representations with no more than |AB| Kraus
operators.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

126 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

Exercise 3.4.21. Let {Mx }x∈[m] be a canonical Kraus decomposition of E ∈ CPTP(A → B).
Show that for any m × m unitary matrix U = (uyx ), also {Ny }y∈[n] with
X
Ny := uyx Mx , (3.136)
x∈[m]

is a canonical Kraus decomposition of E.

3.4.5 The Unitary Representation

In the previous section we saw that quantum measurement can be used to realize any CPTP
map. Here we show that a deterministic unitary evolution can be used to realize a quantum
channel. Specifically, in Fig. 3.4 below a system A is assumed to be initially uncorrelated
with the environment. The initial state of the system is denoted by ρA and the initial state of
the environment by |0⟩⟨0|E . Then, the system+environment undergoes a unitary evolution
which converts the initial state ρA ⊗ |0⟩⟨0|E to the state

U ρA ⊗ |0⟩⟨0|E U ∗ ; U ∗ U = I AE .

(3.137)

Finally, the environment system is traced out yielding the final state

E(ρA ) := TrE U ρA ⊗ |0⟩⟨0|E U ∗ .

(3.138)

Figure 3.4: Stinespring Dilation.

We show now that every quantum channel can be realized in this way, giving a new
interpretation for quantum channels as joint unitary evolutions on the system plus environ-
ment. Recall that typically, the degrees of freedom of the environment are not accessible
and therefore they are traced out at the end of the process.

Stinespring Dilation Theorem

Theorem 3.4.6. A linear map E ∈ L(A → B) is a quantum channel if and only if
there exist an environment (ancillary) system (and corresponding Hilbert space) E of
dimension |E| ⩽ |AB|, and an isometry V : A → B ⊗ E with V ∗ V = I A such that

E(ρA ) := TrE V ρA V ∗

∀ ρ ∈ L(A) . (3.139)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.4. EVOLUTION OF OPEN SYSTEMS 127

Remark. The theorem above is an adaptation of Stinespring Dilation Theorem to the finite
dimensional case.
Proof. Suppose E has the form (3.139). To show that E is a quantum channel we denote by
{|ϕE E
z ⟩}z∈[k] an orthonormal basis of E, where k := |E|, and by Mz := ⟨ϕz |V . By definition,
for every z ∈ [k], Mz : A → B, and from (3.139) we get
X X
A ∗ E
E(ρA ) = ⟨ϕE
z |V ρ V |ϕz ⟩ = Mz ρMz∗ . (3.140)
z∈[k] z∈[k]

Moreover, since V ∗ V = I A we obtain

X X
Mz∗ Mz = V ∗ |ϕE E ∗ E ∗ A
z ⟩⟨ϕz |V = V I V = V V = I . (3.141)
z∈[k] z∈[k]

Conversely, suppose E is a quantum

P channel. ∗Then, from the previousP section∗ it has an
operator sum representation E(ρ) = z∈[k] Mz ρMz , with k ⩽ |A||B| and z∈[k] Mz Mz = I A .
Set E := Ck and define the map V : A → BE via
X
V = Mz ⊗ |ϕE z ⟩ . (3.142)
z∈[k]

From its definition,

X X
V ∗V = Mz∗′ Mz ⟨ϕE E
z ′ |ϕz ⟩ = Mz∗ Mz = I A , (3.143)
z ′ ,z∈[k] z∈[k]

so that V is an isometry. Moreover, from the definition of V we get Mz = ⟨ϕE

z |V , so that
X
TrE V ρA V ∗ = ⟨ϕE A ∗ E

z |V ρ V |ϕz ⟩
z∈[k]
X (3.144)
= Mz ρMz∗ = E(ρ) .
z∈[k]

Hence, there exists an isometry V : A → BE such that (3.139) holds. This completes the
proof.
Exercise 3.4.22. Consider the isometry V : A → BE as expressed in (3.142), where
each Mz : A → B. Let {|x⟩A }x∈[m] be an orthonormal basis of A, and {|ψxBE ⟩}x∈[m] be an
orthonormal set of vectors in BE, such that
X
V = |ψxBE ⟩A⟨x| . (3.145)
x∈[m]

Finally, express each |ψxBE ⟩ as

X
|ψxBE ⟩ = |ϕB E
z|x ⟩|ϕz ⟩ (3.146)
z∈[k]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

128 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

with some vectors {|ϕB

z|x ⟩}z∈[k] ⊂ B. Show that with these notations

X
Mz = |ϕB A
z|x ⟩ ⟨x| . (3.147)
x∈[m]

Exercise 3.4.23. Show that a linear map E ∈ L(A → A) is a quantum channel if and only
if there exists an environment system E and a unitary matrix U : AE → AE such that E
has the form (3.138). Hint: Complete the isometry V in the above theorem into a unitary
operator.

Note that in the proof above we defined the isometry V in (3.142) using the Kraus
operators. Therefore, Eq. (3.142) provides a direct relationship between the Stinespring
representation and the operator sum representation. Moreover, the operator sum represen-
tation is directly related to the Choi representation via the relationship in Eqs.(3.121,3.122).
Therefore, together with (3.142) we can establish a direct relationship among all three rep-
resentations. We will use these relationships quite often in the next sections.

How unique is Stinespring dilation?

In the following theorem we address the uniqueness of the isometry V that appears in
Stinespring dilation theorem.

Theorem 3.4.7. Let E ∈ CPTP(A → B), V : A → BE be an isometry

satisfying (3.139), and let W : A → BE be a linear operator. The following
statements are equivalent:

1. There exists a unitary matrix U E : E → E such that

W = IB ⊗ U E V .

(3.148)

2. For all ρ ∈ L(A)

E(ρ) = TrE [W ρW ∗ ] . (3.149)

Proof. The proof of the implication 1 ⇒ 2 is straightforward and is left as an exercise.

We will now focus on proving the implication 2 ⇒ 1. Following the methodology used
in Stinesprings’ dilation theorem, let us denote by {|ϕEz ⟩}z∈[k] an orthonormal basis of E,
where k := |E|. For each z ∈ [k], we define Mz := ⟨ϕE E
z |V and Nz := ⟨ϕz |W . Note that
for all z ∈ [k], Mz and Nz are linear operators from A to B. In the proof of Stinesprings’
theorem, particularly in (3.144), it was demonstrated that {Mz }z∈[k] constitutes an operator
sum representation of E. Similarly, from (3.149), we get
X X
∗ E
E(ρA ) = ⟨ϕE A
z |W ρ W |ϕz ⟩ = Nz ρNz . (3.150)
z∈[k] z∈[k]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.5. EXAMPLES OF QUANTUM CHANNELS 129

Therefore, {Nz }z∈[k] also forms an operator sum representation of E. Observe in particular
that the property that E is trace preserving implies that
X
IA = Nz∗ Nz
z∈[k]
X
Nz := ⟨ϕE
z |W −−−−→ = W ∗ |ϕE E
z ⟩⟨ϕz |W (3.151)
z∈[k]

−−−−→ = W ∗ W .
X
ϕE
z =I
E

z∈[k]

Thus, W is an isometry. Given that both {Mz }z∈[k] and {Nz }z∈[k] are operator-sum repre-
sentations of E, Theorem 3.4.5 implies the existence of a k × k unitary matrix V = (vzw ),
such that for every z ∈ [k] X
Nz = uzw Mw . (3.152)
w∈[k]

This leads us to (cf. (3.142))

X
W = Nz ⊗ |ϕE
z ⟩
z∈[k]
X
= vzw Mw ⊗ |ϕE
z ⟩ (3.153)
w,z∈[k]
X X
= Mw ⊗ vzw |ϕE
z ⟩ .
w∈[k] z∈[k]

Defining the matrix U E := V T , we find for all w ∈ [k]

X
U E |ϕE
w ⟩ = vzw |ϕE
z ⟩ . (3.154)
z∈[k]

Thus, we conclude
X
W = IB ⊗ U E Mw ⊗ |ϕE
w⟩
w∈[k] (3.155)
B E

(3.142)→ = I ⊗ U V .
This concludes the proof.

3.5 Examples of Quantum Channels

In this section we list and discuss briefly several examples of quantum channels. These
channels have many interesting properties that make the interplay among the different rep-
resentations of a quantum channel more apparent. These channels appear quite often in the
field of quantum information, and we will also encounter some of them later on in the book.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

130 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

3.5.1 Qubit Channels

We discuss a few common examples of qubit channels; i.e. CPTP maps from L(C2 ) to itself.
Unlike quantum channels in higher dimensions, qubit channels can be characterized by their
effect on the Bloch vector. This is a convenient property that is useful for some applications.
We start with a communication channel that represents the most basic error in information
theory; namely, the bit flip.

The Classical Bit Flip

The classical bit flip channel (see Fig. 3.5) is a process that flips a classical bit according to
some probability distribution. Specifically, the zero is flipped to one with some probability
p and remain unchanged with probability 1 − p. The bit-flip channel is symmetric if the
probability to flip one is given by the same probability p.

Figure 3.5: The symmetric classical bit-flip channel.

The Quantum Bit Flip

The quantumbit-flip
 channel act in a similar way. The bit flip operator is the Pauli first
0 1
matrix X =  , since X|0⟩⟨0|X = |1⟩⟨1| and X|1⟩⟨1|X = |0⟩⟨0|. Denote by
1 0
p √
M0 = 1 − pI and M1 = pX , (3.156)
and note that M0∗ M0 + M1∗ M1 = I. The qubit channel E ∈ L(A → B) (with |A| = |B| = 2)
defined by E(ρ) = M0 ρM0∗ + M1 ρM1∗ is called the quantum bit flip channel. Its Choi
representation is given by
JEAB = E Ã→B (ΩAÃ ) = 2(1 − p)ΦAB AB
+ + 2p Ψ+ , (3.157)
where |ΦAB √1 (|00⟩ ± |11⟩) and |ΨAB √1 (|01⟩
± ⟩ = 2 ± ⟩ = 2
± |10⟩) form the Bell basis of the two
qubit system.

The Depolarizing Channel

Another very common example of a qubit channel is the depolarizing channel. It is defined
by
p
E(ρ) = Tr[ρ]I + (1 − p)ρ ∀ ρ ∈ L(C2 ). (3.158)
2

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.5. EXAMPLES OF QUANTUM CHANNELS 131

Note that this channel has the unique property that for any unitary matrix U ∈ L(C2 )

E(U ρU ∗ ) = U E(ρ)U ∗ . (3.159)

Exercise 3.5.1. Show that the depolarizing channel is indeed a quantum channel, by showing
that it has an operator sum representation that is given in terms of the following four Kraus
operators: r √
3p p
M0 = 1 − I and for j ∈ [3] , Mj = σj , (3.160)
4 2
where σ1 , σ2 , and σ3 , are the three Pauli matrices.
From the exercise above it also follows that the normalized version of the Choi matrix of
the depolarizing channel has the form
1 AB 3p

p AB
Ã→B AÃ
ΦAB Φ− + ΨAB AB

JE = E Φ+ = 1 − + + + + Ψ− . (3.161)
2 4 4
The state above is known (up to local unitary) as the 2-qubit isotropic state and is used
quite often in quantum information as it has several interesting properties. We will discuss
it in more details later on.
The Stinespring isometry of the depolarizing channel can also be computed from (3.142)
and the exercise above; it is given by
3 r √ X
X
E 3p p
V = Mj ⊗ |j⟩ = 1 − I ⊗ |0⟩ + σj ⊗ |j⟩ (3.162)
j=0
4 2
j∈[3]

where σ1 , σ2 , and σ3 , are the three Pauli matrices. Interestingly, the following exercise shows
that the Bloch representation is in some sense the simplest representation of the depolarizing
channel.
Exercise 3.5.2. Let ρ = 12 (I + r · σ) and ρ′ = 12 (I + r′ · σ) be two Bloch representations of
two quantum states, and let E be the depolarizing channel (3.158). Show that if ρ′ = E(ρ)
then r′ = (1 − p)r.

3.5.2 The Completely Dephasing Channel

The completely dephasing map, sometimes also referred to as the completely decohering
map, is the channel ∆ ∈ CPTP(A → A) , that removes the off-diagonal terms from any
matrix ρ ∈ L(A), with respect to some fixed orthonormal basis {|x⟩}x∈[m] , where m :=
|A|. Specifically, its Kraus representation is given by {Mx := |x⟩⟨x|}x∈[m] . Denoting by
{rxy }x,y∈[m] the matrix elements of ρ with respect to the basis {|x⟩}x∈[m] , we get from the
operator sum representation of ∆ that
X X
∆(ρ) = |x⟩⟨x|ρ|x⟩⟨x| = rxx |x⟩⟨x| . (3.163)
x∈[m] x∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

132 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

From the above representation of ∆, it is clear that the completely dephasing map is idem-
potent, that is, it satisfies
∆2 := ∆ ◦ ∆ = ∆ . (3.164)
Its Choi matrix is X
J∆ = ∆Ã→Ã (ΩAÃ ) = |x⟩⟨x| ⊗ |x⟩⟨x| . (3.165)
x∈[m]

Its Stinespring isometry V∆ : A → A ⊗ E is given by:

V∆ |x⟩A = |x⟩A |x⟩E (3.166)

where {|x⟩E } is an orthonormal basis of E.

Exercise 3.5.3. A generalized dephasing channel N ∈ CPTP(A → A) is a channel that
transmit some preferred basis {|x⟩}x∈[m] of A without error. That is,

N (|x⟩⟨x|) = |x⟩⟨x| . (3.167)

1. Show that the completely dephasing channel ∆ is a generalized dephasing channel.

2. Show that N above has a Stinespring isometry

VN |x⟩A = |x⟩A |ϕE

x⟩ ∀ x ∈ [m] , (3.168)

where {|ϕE E
x ⟩}x∈[m] are some normalized vectors in E (note that if {|ϕx ⟩} is an or-
thonormal set then N = ∆).

3. Show that
∆◦N =N ◦∆=∆. (3.169)

3.5.3 Classical Channels

A classical channel is a stochastic process that converts an input (classical) variable X = x
into an output variable Y = y. The classical “noise” is modelled by some conditional
probability distribution Pr(Y = y|X = x) that determines the probability that the output
variable Y is equal to y given the input variable X equals x. Denote by px := Pr(X = x)
with x ∈ [m], and qy = Pr(Y = y) with y ∈ [n], the (marginal) distributions associated
with the random variables X and Y . Then the probability vectors p = (p1 , . . . , pm )T and
q = (p1 , . . . , pn )T are related by
q = Tp (3.170)
where the n × m evolution matrix T = (ty|x ), also known as the transition matrix, has
components ty|x = Pr(Y = y|X = x). This is the Shannon’s model for a classical channel.
A classical channel can be viewed as a very special type of a quantum channel. In
particular, we say that a quantum channel E ∈ CPTP(A → B) is a classical channel if

∆B ◦ E ◦ ∆A = E , (3.171)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.5. EXAMPLES OF QUANTUM CHANNELS 133

where ∆A ∈ CPTP(A → A) and ∆B ∈ CPTP(B → B) are the completely dephasing

channels with respect to some fixed classical bases {|x⟩}x∈[m] and {|y⟩}y∈[n] of A and B,
respectively. From its definition it is clear that E(ρ) = E(∆A (ρ)). Therefore, it is sufficient
to consider only diagonal input states. Moreover, if σ = E(ρ) then ∆BP (σ) = σ; i.e. the output
state is always diagonal in the basis {|y⟩⟨y|}y∈[n] . Hence, taking ρ = x∈[m] px |x⟩⟨x|A we get
X
σ := E(ρ) = ∆B ((E(ρ)) = ⟨y |E(ρ)| y⟩|y⟩⟨y|B . (3.172)
y∈[n]
P
Denote by ty|x := ⟨y |E(|x⟩⟨x|)| y⟩ ⩾ 0, and note that y∈[n] ty|x = Tr[E(|x⟩⟨x|)] = 1 since E
is trace preserving. We therefore have
X X
qy := ⟨y|σ|y⟩ = ⟨y|E(ρ)|y⟩ = px ⟨y|E(|x⟩⟨x|)|y⟩ = ty|x px . (3.173)
x∈[m] x∈[m]

In other words, we get that q = T p as in (3.170), with p and q being the vectors whose
components are the diagonal elements of ρ and σ, respectively, and T = (ty|x ) being the
column stochastic matrix whose components are ⟨y|E(|x⟩⟨x|)|y⟩.
Exercise 3.5.4. Let {ρz }z∈[k] and {σz }z∈[k] be two sets of k diagonal density operators with
respect to two fixed bases {|x⟩}x∈[m] and {|y⟩}y∈[n] of A and B, respectively. Show that there
exist a quantum channel E ∈ CPTP(A → B) such that σz = E(ρz ) for all z ∈ [k] if and only
if there exists a classical channel with the same property.

3.5.4 POVM Channels

A quantum channel E ∈ CPTP(A → B) is called a quantum to classical channel, or POVM
channel, if
∆B ◦ E A→B = E A→B , (3.174)
where ∆B is the completely dephasing map on the output space with respect to some fixed
basis {|y⟩}y∈[n] of B, where n := |B|. Such channels have the property that for any ρ ∈ D(A)
X
E(ρ) = ∆ (E(ρ)) = ⟨y|E(ρ)|y⟩|y⟩⟨y| . (3.175)
y∈[n]

Note that
⟨y|E(ρ)|y⟩ = Tr [|y⟩⟨y|E(ρ)] = Tr [E ∗ (|y⟩⟨y|)ρ] := Tr [Λy ρ] (3.176)
where {Λy := E ∗ (|y⟩⟨y|)}y∈[n] are positive semidefinite matrices in Pos(A) satisfying
X X
Λy = E ∗ (|y⟩⟨y|) = E ∗ (I B ) = I A , (3.177)
y∈[n] y∈[n]

since the dual map of any CPTP map is unital (see Exercise 3.4.4). We therefore get that a
POVM channel has the form
X A A
E A→B ρA = Tr Λy ρ |y⟩⟨y|B . (3.178)
y∈[n]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

134 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

Exercise 3.5.5. Let E ∈ CPTP(A → B) be a POVM channel as in (3.178).

1. Show that its Choi matrix is given by
X
JEAB = ΛTy ⊗ |y⟩⟨y| . (3.179)
y∈[n]

2. Show that the operators Mxy : A → B with

p
Mxy := |y B ⟩⟨xA | Λy (3.180)
can be used to form an operator sum representation of E.
3. Show that the map V : A → B ⊗ ÃB̃ given by
X q
B Ã ⊗ |y⟩B̃
V = |y⟩ ⊗ ΛA→ y (3.181)
y∈[n]

is a Stinespring isometry (the environment system is ÃB̃) satisfying

X
TrÃB̃ [V ρV ∗ ] = Tr [Λy ρ] |y⟩⟨y|B = E(ρ) and V ∗ V = I A . (3.182)
y∈[n]
q q
A Ã
P A→A
Hint: Use (3.142) and show that x ⟨x| Λy ⊗ |x⟩ = ΛA→
y
Ã .

3.5.5 Preparation Channels

A preparation channel, commonly referred to as a classical-quantum (cq) channel, is a linear
map that transforms classical inputs into quantum states. To elaborate, a channel E ∈
CPTP(A → B) is classified as a cq-channel if it fulfills the following criterion:
E A→B ◦ ∆A→A = E A→B . (3.183)
Hence, for every ρ ∈ D(A) we get
X X
E(ρ) = px E(|x⟩⟨x|) = px σ x (3.184)
x∈[m] x∈[m]

where m := |A|, px := ⟨x|ρ|x⟩ for all x ∈ [m], and each σx := E(|x⟩⟨x|) is a fixed density
matrix in D(B). We can therefore view E as the mapping x 7→ σx . The Choi matrix of a
cq-channel has the form
JEAB = E Ã→B ΩAÃ = E Ã→B ◦ ∆Ã→Ã ΩAÃ

X
Ã→B AÃ
=E |xx⟩⟨xx|
x∈[m] (3.185)
X
= |x⟩⟨x|A ⊗ σxB .
x∈[m]

Therefore, E ∈ CPTP(A → B) is a cq-channel if and only if its Choi matrix JEAB is a cq-state.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.5. EXAMPLES OF QUANTUM CHANNELS 135

Exercise 3.5.6. Let E ∈ CPTP(A → B) be a cq-channel as above, and set m := |A| and
n := |B|.

1. Show that the set of Kraus operators {Mxy } with Mxy : A → B

√
Mxy := σx |y B ⟩⟨xA | x ∈ [m] , y ∈ [n] , (3.186)

forms an operator sum representation of E.

2. Show that the map V : A → B ⊗ ÃB̃ given by

X p
V = σxB ⊗ I B̃ |ΩB B̃ ⟩ ⊗ |xÃ ⟩⟨xA | (3.187)
x∈[m]

is a Stinespring isometry (the environment system is ÃB̃) satisfying

X
TrÃB̃ [V ρV ∗ ] = px σx = E(ρ) and V ∗ V = I A . (3.188)
x∈[m]

3.5.6 Measurement-Prepare Channels and Entanglement Break-

ing Channels
A measurement-prepare channel represents a process where a quantum system is first mea-
sured, and then, depending on the outcome of this measurement, a specific quantum state is
prepared. More specifically, a measurement-prepare channel, E ∈ CPTP(A → B), is a type
of quantum channel characterized by the following form:
X
E(ρA ) = Tr ΛA A
B
zρ σz (3.189)
z∈[k]

where {Λz }z∈[k] ⊂ Pos(A) is a POVM with m outcomes, and {σzB }z∈[k] are k quantum states
in D(B). Clearly, the map E above is linear and trace preserving. Moreover, setting m := |A|
we get that its Choi matrix is given by
X
JEAB = E Ã→B (ΩAÃ ) = |x⟩⟨x′ | ⊗ E(|x⟩⟨x′ |)
x,x′ ∈[m]
X X
= Tr [Λz |x⟩⟨x′ |] |x⟩⟨x′ | ⊗ σzB (3.190)
z∈[m] x,x′ ∈[m]
X
= ΛTz ⊗ σzB ⩾ 0 .
z∈[k]

Therefore, E, in (3.189) is a quantum channel. Observe that the Choi matrix above is
separable.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

136 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

Measurement-prepare channels can be viewed as a combination of a POVM channel

(i.e. qc-channel) followed by a preparation channel (i.e. cq-channel). Specifically, let M ∈
CPTP(A → Z) be a POVM channel given as in (3.178) by
X
MA→Z (ρA ) := Tr ΛA A
|z⟩⟨z|Z ,

zρ (3.191)
z∈[k]

for some POVM {Λz }z∈[k] in Pos(A), and a classical system Z of dimension |Z| = k. Let
P ∈ CPTP(Z → B) be a preparation channel given as in (3.184) via

P Z→B (|z⟩⟨z|Z ) = σzB ∀ z ∈ [k] , (3.192)

where σz ∈ D(B) are some density matrices. Then, the measurement-prepare channel E
given in (3.189) can be expressed as

E A→B = P Z→B ◦ MA→Z . (3.193)

Exercise 3.5.7. Let A, B and R, be three Hilbert spaces and let ∆ ∈ CPTP(R → R) be
the completely dephasing map with respect to some fixed basis of R. Show that for any two
quantum channels E ∈ CPTP(A → R) and F ∈ CPTP(R → B) the channel

F R→B ◦ ∆R ◦ E A→R (3.194)

is a measurement prepare channel in CPTP(A → B).

Exercise 3.5.8. Let E ∈ CPTP(B → B ′ ) be a measurement-prepare channel as above, and
let ρAB ∈ D(A ⊗ B) be a bipartite density operator. Show that the density operator
′ ′
σ AB := E B→B (ρAB ) (3.195)

is separable. Hint: Use (3.193) and consider the cq-state τ ZB := MA→Z (ρAB ).
The above exercise demonstrates that a measurement-prepare channel breaks the entan-
glement when applied to a subsystem of a composite bipartite system. Channels with this
property are called entanglement breaking channels. Therefore, measurement-prepare chan-
nels are entanglement breaking. It turns out that the converse is also true! That is, any
entanglement breaking channel can be represented as a measurement-prepare channel. For
this reason, we will use, depending on the context, both terms interchangeably.
Exercise 3.5.9. Show that any entanglement breaking channel is a measurement-prepare
channel. Hint: start by observing that the Choi matrix of any entanglement breaking channel
must be separable.
Exercise 3.5.10. Let E A→B be the quantum channel defined in (3.189). Show that its dual
is given by X
E ∗B→A η B = Tr η B σzB ΛA

z . (3.196)
z∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.5. EXAMPLES OF QUANTUM CHANNELS 137

3.5.7 The Partial Trace

The partial trace as defined in Exercise 2.3.28 is a linear map TrB ∈ L(AB → A). What is
its Choi matrix? Let
Ω(AB)(ÃB̃) = ΩAÃ ⊗ ΩB B̃ (3.197)
be the unnormalized maximally entangled state between system AB and system ÃB̃. Then,
by definition, its Choi matrix is

(AB)Ã
JE = idAB ⊗ TrB̃ Ω(AB)(ÃB̃)
(3.198)
(3.197)→ = ΩAÃ ⊗ I B ⩾ 0.
h i
(AB)Ã
Moreover, note that the marginal TrÃ JE = I AB . We therefore conclude that the
partial trace is a quantum channel. Physically, it represents the process of discarding a
subsystem from a composite system.

Exercise 3.5.11. Let A and B be two Hilbert spaces, and let {|y⟩B }y∈[n] be an orthonormal
basis of B (with n := |B|). Show that the set of operators {My }y∈[n] ⊂ L(AB, A) given by

My = I A ⊗ ⟨y|B , (3.199)

form an operator sum representation of the partial trace.

3.5.8 Isometry Channels

Let V : A → B be an isometry matrix; i.e. V ∗ V = I A and |A| ⩽ |B|. We define an isometry
channel V ∈ CPTP(A → B) as

V ρA = V ρA V ∗

∀ ρ ∈ L(A) . (3.200)

Such an isometry channel can be viewed as an embedding of system A into B. Note that
like unitary channels, isometry channels have an operator sum representation with a single
Kraus operator.
Inrestingly, isometry channels have inverses. Specifically, for any τ ∈ D(A) define

Vτ−1 σ B := V ∗ σ B V + Tr I B − V V ∗ σ B τ A

∀ σ ∈ L(B) . (3.201)

The linear map above is a quantum channel in CPTP(B → A) and it is an inverse of the
isometry channel V above (see the following Exercise).

Exercise 3.5.12. Show that for all τ ∈ D(A) the linear map Vτ−1 as defined above is a
channel in CPTP(B → A) that satisfies

Vτ−1 ◦ V = idA . (3.202)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

138 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

3.5.9 Unital Channels

A map E ∈ L(A → A) is termed a unital quantum channel, or more specifically, a doubly
stochastic channel, if it satisfies two conditions: it is a CPTP map, and it is a unital map
(meaning it preserves the identity, as in E(I) = I). In Exercise 3.4.4, you demonstrated that
a linear map is unital if, and only if, its dual is trace-preserving. Consequently, E is a doubly
stochastic channel if and only if both E and its dual E ∗ are quantum channels.
If E ∈ CPTP(X → X) is a classical doubly stochastic channel, then E can be represented
by the m×m evolution column-stochastic matrix E, where m := |X|, and the unital condition
E(I X ) = I X translates to E1m = 1m , with 1m is the m-dimensional vector 1m := (1, . . . , 1)T .
This means that the sum of each row of E is equal to one. Given that E is inherently a
column-stochastic matrix, it follows that classical unital channels correspond to doubly-
stochastic matrices. A doubly-stochastic matrix is characterized as a square, real matrix
with non-negative entries, where the sum of the elements in each row and each column is
one. Consequently, in a classical framework, the input distribution p transitions to the
output distribution q through the doubly stochastic matrix relationship q = Dp. Here, the
matrix D is used to denote the evolution matrix, emphasizing its nature as doubly stochastic.
In the quantum case there is also a similar relation between the input state ρ and the
output state σ = E(ρ) of a doubly stochastic quantum channel E ∈ CPTP(A → A). To see
this relation, set m := |A|, and denote by {|ψx ⟩}x∈[m] and {|ϕy ⟩}y∈[m] the eigenvectors of ρ
and σ, respectively, and by {px }x∈[m] and {qy }y∈[m] , their corresponding eigenvalues. With
these notations we have
X X
ρ= px |ψx ⟩⟨ψx | and σ = qy |ϕy ⟩⟨ϕy | , (3.203)
x∈[m] y∈[m]

The relation σ = E(ρ) is equivalent to

X
qy = ⟨ϕy |σ|ϕy ⟩ = ⟨ϕy |E(ρ)|ϕy ⟩ = px ⟨ϕy |E(|ψx ⟩⟨ψx |)|ϕy ⟩ (3.204)
x∈[m]

Let D = (dxy ) be the m × m matrix whose components are dyx := ⟨ϕy |E(|ψx ⟩⟨ψx |)|ϕy ⟩ , and
note that dyx ⩾ 0 for all x and y. Moreover,
X
dyx = ⟨ϕy |E(I)|ϕy ⟩ = ⟨ϕy |I|ϕy ⟩ = 1 and
x∈[m]
X (3.205)
dyx = Tr [E(|ψx ⟩⟨ψx |)] = Tr [|ψx ⟩⟨ψx |] = 1 .
y∈[m]

Hence, D is a doubly-stochastic matrix and (3.204) becomes q = Dp, where p and q are
the probability vectors consisting of the eigenvalues of ρ and σ, respectively.

Exercise 3.5.13. Let E ∈ CPTP(A → B) be a quantum channel. Show that if E(I A ) = I B

then we must have |A| = |B|.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.5. EXAMPLES OF QUANTUM CHANNELS 139

No Quantum Analogue to Birkhoff Theorem

Birkhoff theorem (see Theorem A.5.1) states that any doubly-stochastic matrix can be ex-
pressed as a convex combination of permutation matrices. For example,
       
5 1
0 1 0 0 1 0 0 0 1 0
6 6  1   1  1 
 1 1
 0 2 2  = 0 1 0 + 0 0 1 + 0 0 1 . (3.206)
     
  3  2  6 
1 1 1
6 3 2
0 0 1 0 1 0 1 0 0

In general, from Birkhoff

P theorem (Theorem A.5.1), any doubly stochastic matrix can be
expressed as D = w∈[k] tw Πw , where {tw }w∈[k] is a probability distribution and {Πw }w∈[k]
P
are permutation matrices. Therefore, the relation q = Dp = w∈[k] tw Πw p implies that q
is a convex combination of permuted versions of p. We now discuss whether there exists a
quantum analogue to this property.
A fundamental example of a unital quantum channel is the unitary evolution. This is
analogous to the permutation evolution matrix of a classical channel. Indeed, if σ = U ρU ∗
then the vector q, consisting of eigenvalues of σ, is related by a permutation matrix to
the vector p, consisting of the eigenvalues of ρ. Clearly, the unitary channel E(ρ) = U ρU ∗
satisfies E(I) = U IU ∗ = U U ∗ = I, where U is a unitary matrix. One can extend this
definition to include mixture of unitaries. Such channels are called mixed-unitary channels
(or random-unitary channels). That is, a mixed-unitary channel is a quantum channel E ∈
CPTP(A → A) that has the form
X
E(ρ) = tw Uw ρUw∗ (3.207)
w∈[k]

where {Uw }w∈[k] is a set of k unitary matrices, and {tw }w∈[k] is a probability distribution.
This is the quantum version of a convex combination of permutation matrices. One can
implement such a quantum channel, for example, by rolling a dice and based on the outcome
w of the dice apply the evolution ρ 7→ Uw ρUw∗ . After forgetting the value of w, such a process
can be described by the equation above.
Exercise 3.5.14. Find the operator sum representation of the mixed-unitary channel (3.207).
One may wonder whether all unital channels can be expressed as mixed-unitary channels.
To answer this question, consider the following example given by Peter Shor (2010). Let
A = C3 and E ∈ CPTP(A → A) be the quantum channel
E(ω) = M1 ωM1∗ + M2 ωM2∗ + M3 ωM3∗ ∀ ω ∈ D(A) , (3.208)
where the Kraus operators
|z⟩⟨z + 1| + |z + 1⟩⟨z|
Mz := √ ∀ z ∈ [3] , (3.209)
2
with |4⟩ := |1⟩ (i.e. the summation in |z + 1⟩ is modulo 3).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

140 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

Exercise 3.5.15. Show that the quantum channel E ∈ CPTP(A → A) as defined in (3.208)
and (3.209) is unital.
The unital channel E as defined in (3.208) and (3.208) is not a mixed-unitary channel.
To see√this, suppose by contradiction that E can be expressed as in (3.207). Then, {Mz }z∈[3]
and { tw Uw }w∈[k] are operator sum representations of the same channel E. Therefore, there
exists an m × 3 isometry channel V = (vwz ) such that
√ X
tw Uw = vwz Mz . (3.210)
z∈[3]

Multiplying each side of the equation above by its conjugate we get

X
tw I A = v̄wz′ vwz Mz∗′ Mz . (3.211)
z,z ′ ∈[3]

Now, taking all summations to be modulo 3, we have by definition that Mz Mz+1 = |z⟩⟨z + 2|
and Mz Mz+2 = |z + 1⟩⟨z + 2| (by definition Mz = Mz∗ ). Hence, the equation above implies
that
v̄wz vw(z+1) = v̄z vw(z+2) = 0 ∀ z ∈ [3] . (3.212)
In other words, we must have vwz′ vwz = 0 for all z ̸= z ′ ∈ [3]. This, in turn, implies that
vwz = 0 for at least two values of z ∈ [3]. Hence, the relation (3.210) implies that Uz is
proportional to one of the three matrices {Mz }z∈[3] , in contradiction with the fact that all
{Mz }z∈[3] have rank two, whereas Uw has a full rank. Therefore, the channel E is not a
mixed-unitary channel.
From the exmple above it follows that there is no quantum analogue to Birkhoff theorem,
as there are unital channels that are not mix-unitary channels. What is the distinction
between unital channels and mix-unitary channels in the Choi representation? Consider a
unital quantum channel E ∈ CPTP(A → A), and let

JEAÃ = E Ã→Ã (ΩAÃ ) (3.213)

be its Choi matrix. Then, since it is trace preserving, JEA = I A . On the other hand, since it
is unital, h i
JEÃ = TrA JEAÃ = E(I Ã ) = I Ã . (3.214)
Therefore, E is unital if and only if both marginals of its Choi matrix are equal to the identity
matrix.
If E ∈ CPTP(A → A) is a mixture of unitaries, then its Choi matrix is proportional to
a convex combination of maximally entangled states. A maximally entangled state ϕAÃ ∈
Pure(AÃ) is a normalized vector with the property that its reduced density matrix is the
maximally mixed state; that is, ϕA = uA . From Exercise 2.3.32 it follows that all maximally
entangled states in Pure(AÃ) must have the form
1 A
√ I ⊗ U |ΩAÃ ⟩ ,
Ã
(3.215)
m

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.5. EXAMPLES OF QUANTUM CHANNELS 141

where m := |A|, and U is some unitary matrix. Now, observe that the Choi matrix of the
mix-unitary channel (3.207) is given by
X X
JEAÃ = E Ã→Ã (ΩAÃ ) = tw (I A ⊗ Uw )ΩAÃ (I A ⊗ Uw∗ ) = m tw |ϕA Ã AÃ
w ⟩⟨ϕw | , (3.216)
w∈[k] w∈[k]

where each
1 A
|ϕA
w
Ã
⟩ := √ I ⊗ U Ã
w |Ω
AÃ
⟩, (3.217)
m
is a maximally entangled states.

Exercise 3.5.16. Let E ∈ CPTP(A → A) be a quantum channel of the form

E A→A (ρA ) := TrB U AB ρA ⊗ uB U ∗AB ,

(3.218)

where U ∈ L(AB) is a unitary operator.

1. Show that the above map is a unital quantum channel.

2. Show that every mixed-unitary channel, such as in (3.207), with rational probabilities
{tw }w∈[k] , can be expressed as in (3.218). That is, there exists a system B and a joint
unitary matrix U AB such that the expression for E(ρ) in (3.218) becomes (3.207).

3. Determine if the Shor example above can be expressed in the form (3.218).

3.5.10 Quantum Instruments

A quantum instrument is a quantum channel that takes a quantum state as its input and
outputs a cq-state. Consequently, it can be viewed as a mathematical abstraction of a
quantum measurement with the classical output as the recorded measurement outcome, and
with the quantum outcome as the post measurement state. Mathematically, let A be the
input system, X the classical register of the output system, and B be the quantum output.
Then a quantum channel N ∈ CPTP(A → XB) is a quantum instrument if

∆X→X ◦ N A→XB = N A→XB , (3.219)

where ∆ ∈ CPTP(X → X) is the completely dephasing channel with respect to the classical
basis of X (see Fig. 3.220).

Figure 3.6: Quantum instrument.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

142 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

Set m := |X| and observe that for all ω ∈ D(A)

N A→XB ω A = ∆X→X N A→XB ω A

(3.220)
X
|x⟩⟨x|X ⊗ NxA→B ω A ,

=
x∈[m]

where for each x ∈ [m] we define the linear map Nx ∈ L(A → B) via

NxA→B ω A := TrX |x⟩⟨x|X ⊗ I B N A→XB ω A

∀ ω ∈ L(A) . (3.221)

By definition, the marginal channel

X
N A→B := TrX ◦ N A→XB = NxA→B . (3.222)
x∈[m]

Observe further that N A→B is a quantum channel since it can be expressed as a combination
of the two quantum channels
TrX and N A→BX . Moreover, each NxA→B is a CP map. Indeed,
let JNAXB = N Ã→BX ΩAÃ be the Choi matrix of the quantum instrument N , then the Choi
matrix of Nx is given by

JNAB
A X B
AXB
x
= Tr X I ⊗ |x⟩⟨x| ⊗ I JN , (3.223)

which is positive semidefinite since JNAXB ⩾ 0. We therefore conclude that {NxA→B }x∈[m] are
trace non-increasing CP maps that sums up to a CPTP map.

Exercise 3.5.17. Let E ∈ CP(A → B) be a trace non-increasing map.

Mx∗ Mx ⩽ I .
P
1. Show that any operator sum representation {Mx }x∈[m] of E satisfies x∈[m]

2. Show that the marginal of the Choi matrix JEAB satisfies JEA := TrB JEAB ⩽ I A .

Exercise 3.5.18. Find an operator sum representation of the quantum instrument N A→XB
discussed above.

3.5.11 Complementary Channel

Due to the Stinespring dilation theorem, we can associate to any quantum channel N ∈
CPTP(A → B) an isometry channel V ∈ CPTP(A → BE) such that

N A→B = TrE ◦ V A→BE . (3.224)

The complementary channel of N , denoted by Nc , is a channel in CPTP(A → E) obtained

by tracing system B from V A→BE (see Fig. 3.7); i.e.

NcA→E := TrB ◦ V A→BE . (3.225)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.5. EXAMPLES OF QUANTUM CHANNELS 143

Figure 3.7: The complementary channel of N A→B .

As an example, let N A→B be the qubit depolarizing channel with |A| = |B| = 2. From
it’s Stinespring isometry
q in (3.162) we can get its complementary channel. For this purpose,
√
p
we denote by t0 := 1 − 3p 4
and t1 = t2 = t3 = 2
, and by {σj }3j=0 the four Pauli matrices
(including σ0 := I). We therefore get
3
A ∗ X
NcA→E (ρA ) = TrB Vρ V = tj tk Tr [σj σk ρ] |j⟩⟨k|E ∀ ρ ∈ L(A) . (3.226)
j,k=0

Exercise 3.5.19. Use the Bloch representation ρ = 21 (I + r · σ) to simplify the expression

in (3.226) for the complementary channel of the depolarizing channel.

3.5.12 The Pinching Channel

The pinching channel is a generalization of the completely dephasing channel. For any
observable H ∈ Herm(A) with m distinct eigenvalues spec(H) = {λ1 , . . . , λm } we define the
pinching channel PH ∈ CPTP(A → A) associated with H as
X
PH (ρ) := Px ρPx ∀ ρ ∈ L(A) , (3.227)
x∈[m]

where for each x ∈ [m], Px is the projector the eigenspace of λx . Note that PH is indeed
a quantum channel since {Px }x∈[m] form an orthogonal set of projectors that sum to the
identity. Moreover, if m = |A| (i.e. H has |A| distinct eigenvalues) then PH = ∆, where ∆
is the completely dephasing quantum channel in the basis comprising of the eigenvectors of
H.
Exercise 3.5.20. Let H ∈ Herm(A) and ρ ∈ D(A). Show that ρ = PH (ρ) if and only if
[ρ, H] = 0.

P 3.5.21. Let A be a quantum system and ρ, σ ∈ D(A) be two density matrices, with
Exercise
σ = y∈[n] λy Πy , where {Πy }y∈[n] form an orthogonal projective von-Neumann measurement
on system A, and {λy }y∈[n] is the set of distinct eigenvalues of σ. For each y ∈ [n], let
my := Tr[Πy ] be the multiplicity of the eigenvalue λy .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

144 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

1. Show that σ commutes with Pσ (ρ); i.e.,

[Pσ (ρ), σ] = 0 . (3.228)

2. Show that there exists an orthonormal basis of A, denoted by {|ϕxy ⟩}x,y , with the prop-
erty that X X X
Πy = ϕxy and Pσ (ρ) = rxy ϕxy (3.229)
x∈[my ] y∈[n] x∈[my ]
P P
for some rxy ⩾ 0 with y∈[n] x∈[my ] rxy = 1.

3. Let y ∈ [n] and x, x′ ∈ [my ]. Show that if x ̸= x′ then

⟨ϕx′ y |ρ|ϕxy ⟩ = 0 . (3.230)

4. Let ∆ ∈ CPTP(A → A) be the completely dephasing channel in the basis {|ϕxy ⟩}x,y .
Show that
∆(ρ) = Pσ (ρ) and ∆(σ) = σ . (3.231)

Exercise 3.5.22. Let A be a quantum system and ρ, σ ∈ D(A) be two density matrices
satisfying ρ ̸≪ σ (i.e. supp(ρ) ̸⊆ supp(σ)). Show that also

Pσ (ρ) ̸≪ σ . (3.232)

The following exercise demonstrate the pinching channel is a special type of mixture of
unitaries.
Exercise 3.5.23. Let H ∈ Herm(A) be an observable with spec(H) = {λ1 , . . . , λk } as above,
and let PH be its associated pinching channel as given in (3.227). Show that for any ρ ∈ L(A)
1 X X 2πxy
PH (ρ) = Uy ρUy∗ where Uy := ei m Px . (3.233)
m
y∈[m] x∈[m]

The above exercise demonstrate the pinching channel is a mixed unitary channel. Observe
also that for y = m we have Um = Im so that
1 1 X 1
PH (ρ) = ρ + Uy ρUy∗ ⩾ ρ . (3.234)
m m m
y∈[m−1]

Since m := |spec(H)| we get the following inequality known as the pinching inequality.

The Pinching Inequality

For any H ∈ Herm(A) and ρ ∈ D(A)
1
PH (ρ) ⩾ ρ. (3.235)
|spec(H)|

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.5. EXAMPLES OF QUANTUM CHANNELS 145

Since we got the inequality above by removing m−1 terms, it may give the impression that
this inequality is not very useful since it is never saturated and not tight. However, we will
see that in applications, this inequality can be used to provide good enough approximation
to PH (ρ) when we consider the asymptotic case in which H = σ ⊗n where σ is some quantum
state and n is a very large integer. In particular, we will see in Chapter 8 (particularly
Sec. 8.4.1) that in this case m = |spec(σ ⊗n )| grows polynomially with n. The fact that it is
not an exponential growth with n is one of the key reasons why the pinching inequality is
quite useful.
The pinching map can also be used to prove the reverse Hölder inequality with p ∈ (0, 1).
1
In this case, we still use the notation ∥M ∥p := (Tr [|M |p ]) p for any M ∈ L(A). However,
one has to be careful since for p ∈ (0, 1), ∥ · ∥p is not a norm.

Reverse Hölder Inequality

Lemma 3.5.1. Let τ, ω ∈ Pos(A) and p ∈ (0, 1). If τ ≪ ω then

∥τ ∥p
Tr[ωτ ] ⩾ (3.236)
∥ω −1 ∥ p
1−p

where the inverse is a generalized inverse.

Proof. We first prove the theorem for the case that τ and ω commutes. In this case,
∥τ ∥pp = Tr[τ p ] = Tr τ p ω p ω −p = τ p ω p ω −p 1

Hölder inequality (2.71) → ⩽ ∥τ p ω p ∥ 1 ω −p 1

(3.237)
p 1−p

= (Tr[τ ω])p ω −p 1 .
1−p

This prove the theorem for the case that ω and τ commutes. On the other hand, from
Exericse 3.5.21 we get that Pω (τ ) and ω commutes so that
∥Pω (τ )∥p
Tr [τ ω] = Tr [Pω (τ )ω] ⩾ (3.238)
∥ω −1 ∥ p
1−p

Finally, since t 7→ tp is operator concave for p ∈ (0, 1) (see Tabel B.1) we conclude that
∥Pω (τ )∥pp = Tr [(Pω (τ ))p ] ⩾ Tr [Pω (τ p )] = Tr[τ p ] . (3.239)
That is, ∥Pω (τ )∥p ⩾ ∥τ ∥p . Substituting this into (3.238) completes the proof.
Exercise 3.5.24 (Reverse Young’s Inequality). Let A and B be two Hilbert spaces, M, N ∈
Pos(A), p ∈ (0, 1), and q defined via p1 + 1q = 1 (hence q < 0). Use the reverse Hölder
inequality of the Schatten norm to show that
1 1
Tr[M N ] ⩾ Tr[M p ] + Tr[N q ] . (3.240)
p q
with equality if and only if M p = N q . Hint: Take the logarithm on both sides of the reverse
Hölder inequality and use the concavity property of the logarithm.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

146 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

3.5.13 Twirling Channels

In Chapter 15, particularly in Sec.15.2.1, we will explore a channel associated with any
compact Lie group through its Haar measure (refer to Appendix C for more definitions and
details on Haar measures and compact Lie groups). This channel is known as the G-twirling
operation. A simple example of such a channel, defined for all ρ ∈ L(A), is:
Z
G(ρ) := dU U ρU ∗ , (3.241)
U(A)

where dU is the Haar measure of the unitary group U(A). In other words, ρ is “twirled”
over all unitary matrices U ∈ U(A).

Exercise 3.5.25. Consider ρ ∈ L(A) and σ := G(ρ), where G is the channel defined
in (3.241). Demonstrate that the commutator [U, σ] = 0. Hint: Utilize the properties of
the Haar measure described in Sec. C.4 of the appendix.

The exercise above illustrates that the channel defined in (3.241) is not unfamiliar to us.
In fact, it can be represented as the replacement channel
Z
G(ρ) := dU U ρU ∗ = Tr[ρ]uA , (3.242)
U(A)

where uA = I A /|A| is the maximally mixed state. To understand why, remember from the
previous exercise that σ := G(ρ) commutes with all unitary matrices in U(A) and, thus,
must be proportional to the identity matrix. We now consider a more complex example with
numerous applications in quantum information science.
Let B be a replica of A, and set m := |A| = |B|. Consider the twirling map G ∈
CPTP(AB → AB), defined for all ρ ∈ L(AB) as:
Z
AB
G(ρ ) := dU (U ⊗ U )ρAB (U ⊗ U )∗ . (3.243)
U(m)

As in the previous example, for every ρ ∈ L(AB), the matrix σ AB := G(ρAB ) commutes with
U ⊗ U for all U ∈ U(A). The ensuing question is: which matrices σ AB commute with all
matrices of the form U ⊗ U , where U is a unitary matrix? Clearly, any matrix proportional
to the identity matrix satisfies this criterion. However, there exists another type of operator
that fulfills this property, known as the swap (or flip) operator:
X
F AB := |x⟩⟨y|A ⊗ |y⟩⟨x|B . (3.244)
x,y∈[m]

Exercise 3.5.26. Prove that the swap operator F AB satisfies

U ⊗ U, F AB = 0

∀ U ∈ U(m) . (3.245)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.5. EXAMPLES OF QUANTUM CHANNELS 147

Our analysis in Sec. C.10.2 indicates that any operator commuting with all matrices in
the set {U ⊗ U }U ∈U(m) can be expressed as a linear combination of the identity and swap
operators. Consequently, G(ρAB ) = aI AB + bF AB for some a, b ∈ R, determinable from
the requirement that G is CPTP (see the following exercise). Furthermore, in Sec. C.10.2 of
Appendix C, we demonstrate that the representation U 7→ U ⊗U decomposes into two irreps,
corresponding to the symmetric and antisymmetric subspaces, each with a multiplicity of
one. Hence, from Theorem C.3.3 in Appendix C, it follows that for all ρ ∈ L(AB),
ΠSym ΠAsy
G(ρ) = Tr [ρΠSym ] + Tr [ρΠAsy ] (3.246)
Tr [ΠSym ] Tr [ΠAsy ]
where ΠSym := 12 (I AB + F AB ) and ΠAsy := 21 (I AB − F AB ) are the projections onto the
symmetric and antisymmetric subspaces of AB (see(C.186) and (C.188)). When ρ ∈ D(AB)
is a density matrix, the right-hand side above represents a density matrix known as the
Werner quantum state.
Exercise 3.5.27. Let F AB be the swap operator, with m := |A| = |B|.
1. Starting with G(ρAB ) = aI AB + bF AB for some a, b ∈ R, prove that G is CPTP if and
only if for all ρ ∈ L(AB),

AB
mTr[ρAB ] − Tr ρAB F AB AB mTr ρAB F AB − Tr[ρAB ] AB
G ρ = I + F . (3.247)
m(m2 − 1) m(m2 − 1)

2. Derive (3.247) from (3.246) by expressing the projections onto the symmetric and an-
tisymmetric subspaces in terms of the swap operator.
Exercise 3.5.28. Show that for all M ∈ L(A) and B ∼ = A we have
Tr M = Tr (M ⊗ M ) F AB .
2
(3.248)

Exercise 3.5.29. Let B := Ã, G ∈ CPTP(AB → AB) be as in (3.243), and E ∈ CPTP(AB →

AB).
1. Show that E = E ◦ G if and only if there exists ω1 , ω2 ∈ Eff(AB) such that for all
ρ ∈ L(AB)
E ρAB = ω1AB Tr ρΠAB AB
AB
Sym + ω2 Tr ρΠAsy . (3.249)
2. Show that E = G ◦E if and only if there exists Λ ∈ Eff(AB) such that for all ρ ∈ L(AB)
1 1
E ρAB = AB ΠAB
AB AB
+ AB ΠAB
AB AB
Sym Tr ρ Λ Asy Tr ρ Λ . (3.250)
Tr ΠSym Tr ΠAsy

Exercise 3.5.30. Let G : CPTP(A2 → A2 ) be the twirling channel

Z
G(ρ) := dU (U ⊗ U )ρ(U ⊗ U )∗ ∀ ρ ∈ L(A2 ) , (3.251)
U (m)

where U := (U ∗ )T .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

148 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

1. Show that
G(ΦAÃ ) = ΦAÃ , (3.252)
where Φ ∈ D(AÃ) is the maximally entangled state.

2. Show that for m := |A| = |B| and ρ ∈ D(AB)

AB AB 2
AB AB
1 − Tr Φ ρ m Tr Φ ρ − 1 AB
G(ρAB ) = m2 2
uAB + 2
Φ . (3.253)
m −1 m −1
Hint: Use (3.247) and observe that ΦAB = T B→B (F AB ) where T B→B is the transpose
map.
Exercise 3.5.31. Let m := |A| = |B|, G ∈ CPTP(AB → AB) be as in (3.251), and
E ∈ CPTP(AB → AB). Define also the state

I AB − ΦAB
τ AB := . (3.254)
m2 − 1
1. Show that for all ρ ∈ D(AB)

G(ρAB ) = 1 − Tr ΦAB ρAB τ AB + Tr ΦAB ρAB ΦAB .

(3.255)

2. Show that E = E ◦ G if and only if there exists ω1 , ω2 ∈ Eff(AB) such that for all
ρ ∈ D(AB)

E ρAB = 1 − Tr ΦAB ρAB ω1AB + Tr ΦAB ρAB ω2AB .

(3.256)

3. Show that E = G ◦E if and only if there exists Λ ∈ Eff(AB) such that for all ρ ∈ D(AB)

E ρAB = 1 − Tr ΛAB ρAB τ AB + Tr ΛAB ρAB ΦAB .

(3.257)

Exercise 3.5.32. Let A be a Hilbert space of dimension m := |A|, and consider the channel
given (3.241) Show that for all ρ ∈ L(A)
1 X ∗
E(ρ) = Wp,q ρWp,q , (3.258)
m2
p,q∈[m]

where Wp,q are the are the Hiesenberg-Weyl operators defined in (C.35).

3.6 Notes and References

Many books on quantum information contains much of the material presented here. Infor-
mationally complete POVMs were first introduced in [183, 42] and since then they became
an integral part of quantum tomography. In particular, SIC POVMs have been studied

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

3.6. NOTES AND REFERENCES 149

intensively and more details and references about them can be found in the review article
by [186].
Gleason’s theory was proved originally for projective von-Neumann measurements in [85].
The proof in this case holds only in dimension greater than two, and there are counter
examples in the qubit case. The version of Gleason’s theorem that we considered here is
due to [41]. In this case the proof is much simpler (and holds in all dimensions) since effects
replace orthogonal projections (and effects are simpler to work with). Gleason’s theorem
is of particular importance in the foundations of quantum physics as well as the field of
quantum logic in its effort to minimize the number of axioms needed to formulate quantum
mechanics. It also closes the bridge between some of the axioms of quantum mechanics and
Born’s rule. More details, references, and History of Gleason’s theorem can be found in a
Wikipedia article entitled “Gleason’s theorem”.
Naimark’s and Stinspring’s dilation theorems are results from operator theory and are
valid also in infinite dimensional Hilbert spaces. Here we only studied their adaptation to
the finite dimensional case. More details on their infinite dimensional version can be found
in many books on operator theory; e.g. the book by [176].
The Størmer-Woronowicz theorem was first proved for the qubit-to-qubit case by [205]
and later on for the qubit-to-qutrit case by [236]. Counter examples exists in higher dimen-
sions so these dimensions are optimal. Both proofs involve somewhat complicated calcula-
tions, and the simplified proof of the Størmer case presented here is due to [7]. It is an open
problem to find a simpler proof of Woronowicz theorem.
More information on the pinching channel and its properties can be found in the book
by [208].

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

150 CHAPTER 3. ELEMENTS OF QUANTUM MECHANICS II: OPEN SYSTEMS

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

Part II

Tools and Methods

151
CHAPTER 4

Majorization

A pre-order is a binary relation between objects that is reflexive and transitive. For example,
consider the inclusion relation ⊇ between subsets of [n] := {1, . . . , n} for some n ∈ N. Then,
⊇ is reflexive since for any subset A of [n] we have A ⊇ A. The relation ⊇ is transitive
since for any three subsets A, B, C ⊆ [n] with A ⊇ B and B ⊇ C it follows that A ⊇ C.
Furthermore, the relation ⊇ has yet another property known as symmetry. That is, if
A, B ⊆ [n] satisfies both A ⊇ B and B ⊇ A then necessarily A = B. A pre-order that
satisfies this additional symmetry property is called a partial order.
Partial orders plays a fundamental role in quantum resource theories. They typically
stem from a set of restrictions imposed on quantum operations. For example, we saw in
quantum teleportation that Alice and Bob are restricted to act locally and cannot commu-
nicate quantum particles. We will see in Chapter 12 that this restriction imposes a partial
order between two entangled states, determining if one entangled state can be converted
to another under operations that are restricted to be local. It turns out that there is one
partial order with variants that appear in many resource theories. This partial order, known
as majorization, has been studied extensively in quantum information and other fields, par-
ticularly in the field of matrix analysis, and there are several books on the topic (see section
‘Notes and References’ at the end of this chapter).

4.1 Majorization Between Probability Vectors

Consider a gambling game in which a host rolls a biased dice, and a player has to guess
the outcome. Denote by p = (p1 , . . . , pn )T ∈ Prob(n) the probability vector corresponding
to the n possible outcomes. The player wins the game if he or she guesses correctly the
outcome. Clearly, the player will guess the value x ∈ [n] that satisfies px = max{p1 , . . . , pn }.
In other games, the player is allowed to provide more than one guess, and wins the game
if the outcome belongs to the set of numbers (guesses) he or she provided to the host. For
example, if the player is allowed to provide a set with two numbers as his/her guesses, then
in order to have the highest odds to win, he or she will choose the two numbers out of the
n numbers that have the highest probability to occur. Hence, the player will win the game

153
154 CHAPTER 4. MAJORIZATION

with probability p↓1 + p↓2 , where we denote by p↓ = (p↓1 , . . . , p↓n )T the vector obtained from p
by rearranging its components in non-increasing order. We call a game in which the player
is allowed to provide a set with k-numbers as guesses, a P k-gambling game. Note that the
highest probability to win a k-gambling game is given by x∈[k] p↓x .
Suppose now that at the beginning of each game, the player is allowed to choose between
two dice with corresponding probabilities p and q. Clearly, the player will choose the dice
that has better odds to win the game. For a k-game the player will choose the p-dice if
X X
p↓x ⩾ qx↓ . (4.1)
x∈[k] x∈[k]

If the relation above holds for all k ∈ [n], then the player will choose the p-dice for any
k-gambling game. In this case we say that p majorizes q and write p ≻ q.

Majorization
Definition 4.1.1. Let p, q ∈ Rn . We say that p majorizes q and write p ≻ q
if (4.1) holds for all k ∈ [n] with equality for k = n.

Remark. Note that in the definition above we did not assume that p and q are probability
vectors, however, in the applications we consider in this book, p and q will always be
probability vectors.
Majorization is a pre-order. That is, given three real vectors p, q, r ∈ Rn we have p ≻ p
(reflexivity), and if p ≻ q and q ≻ r then p ≻ r (transitivity). Moreover, if both p ≻ q
and q ≻ p then p and q are related by a permutation matrix. Therefore, using the notation
Prob↓ (n) to denote the subset of Prob(n) consisting of all vectors p ∈ Prob(n) with the
property that p = p↓ , we get that the majorization relation ≻ is a partial order on the set
Prob↓ (n).
Exercise 4.1.1. Show that for any n-dimensional probability vector p we have
(1, 0, . . . , 0)T ≻ p ≻ (1/n, . . . , 1/n)T . (4.2)
Exercise 4.1.2. Let p ∈ Prob(n), u(n) := (1/n, . . . , 1/n)T ∈ Prob(n) be the uniform proba-
bility vector, and t ∈ [0, 1]. Show that
p ≻ tp + (1 − t)u(n) . (4.3)
Exercise 4.1.3. Find an example of two vectors p, q ∈ Prob(3) such that p does not majorize
q, and q does not majorize p. Vectors with such a property that p ̸≻ q and q ̸≻ p are said
to be incomparable.
Exercise 4.1.4. Let n ∈ N and p ∈ Prob(n). Show that for sufficiently large m ∈ N we
have ⊗m
p ≻ u(2) , (4.4)
where u(2) := (1, 1)T is the 2-dimensional uniform distribution.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.1. MAJORIZATION BETWEEN PROBABILITY VECTORS 155

Exercise 4.1.5. Let m ∈ N, p, q ∈ Prob(m), and L be the m × m lower triangular matrix

 
1 0 0 ··· 0
 
1 1 0 · · · 0
 
 
L := 1 1 1 · · · 0 . (4.5)
 
 .. .. .. 
 
...
. . .
 
1 1 1 ··· 1

1. Show that p ≻ q if and only if Lp↓ ⩾ Lq↓ , where the inequality is entrywise.

2. Show that L is invertible and find L−1 .

3. Show that L−1 p ⩾ 0 (entrywise) if and only if p = p↑ .

4. Show that (LT )−1 p ⩾ 0 (entrywise) if and only if p = p↓ .

Exercise 4.1.6. Let p1 , . . . , pk ∈ Prob↓ (m) and q1 , . . . , qk ∈ Prob(m) be such that px ≻ qx
for all x ∈ [k]. Show that for any (r1 , . . . , rk )T ∈ Prob(k) we have
X X
rx px ≻ rx qx . (4.6)
x∈[k] x∈[k]

Exercise 4.1.7. Let p, q ∈ Prob↓ (n) and suppose p ̸= q and p ≻ q. Show that the largest
integer z ∈ [n] for which pz ̸= qz must satisfy qz > pz .
Exercise 4.1.8. Let p, q ∈ Prob↓ (n) and k, m ∈ [n − 1] be such that k ⩽ m. Suppose that
q has the form
q = (a, a, . . . , a, qk+1 , . . . , qm , b, b, . . . , b )T . (4.7)
| {z } | {z }
k-times (n−m)-times

Show that p ≻ q if and only if

∥p∥(ℓ) ⩾ ∥q∥(ℓ) ∀ ℓ ∈ {k, k + 1, . . . , m} . (4.8)

(i.e. there is no need to consider the cases ℓ < k or ℓ > m). Hint: Use the fact that p = p↓
and q = q↓ .

4.1.1 Characterization of Majorization

Eq. (4.2) corresponds to the fact that majorization can be used to determine if one probability
distribution is more spread out than another one. This intuition can be deepened with further
analysis of the dice games. Specifically, consider the p-dice and q-dice of the discussion above
with p, q ∈ Prob(n). We saw earlier that if p ≻ q then a player has better odds to win any
k-gambling game with the p-dice than with the q-dice. One can therefore conclude that the

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

156 CHAPTER 4. MAJORIZATION

outcomes obtained by rolling the q-dice are more uncertain than those obtained by rolling
the p-dice.
To make this notion of uncertainty precise, consider a game of chance in which the player
is allowed to permute the symbols on the dice (for example, permute the stickers on the dice).
Clearly, such a permutation cannot change the odds in any k-gambling game. This relabeling
of the outcomes, is equivalently described by a permutation matrix P that is acting on p; i.e.
after the relabeling (permutation) of the symbols on the p-dice, the new probability vector
is given by P p.
Consider now a (somewhat unrealistic) scenario in which the player chooses to perform
the relabeling at random. For example, the player can flip an unbiased coin and if the
outcome is “head” the player does nothing to the p-dice whereas if the outcome is a “tail”
the player performs relabeling described by the permutation matrix P . Moreover, suppose
also that the player forget the outcome of the coin flipping. Hence, the player changed his
odds in winning the game since with probability 1/2 he did nothing to the p-dice and with
probability 1/2 he changed the order of the stickers on the dice. This means that now,
effectively, the player holds a q-dice with
1 1
q := p + P p . (4.9)
2 2
Since by “forgetting” the outcome of the coin flip the player cannot decrease the uncertainty
associated with the p-dice, we must conclude that the new q-dice is more uncertain than
the initial p-dice.
The relation above between the p and q can be expressed as
q = Dp , (4.10)
where D := 21 In + 21 P . More generally, if instead of an unbiased coin the player uses a random
device that produces the outcome j ∈ [m] with probability tj and a relabeling corresponding
to a permutation matrix Pj , then the matrix D can be expressed as
X
D= t j Pj . (4.11)
j∈[m]

We therefore conclude that for any such matrix and any probability vector p, the vector
q := Dp corresponds to more uncertainty than p. The matrix D above has the property
that all its components are non-negative and each row and column sums to one. Such
matrices are called doubly stochastic (see Appendix A.5).
Exercise 4.1.9. Show that the matrix D in (4.11) is doubly stochastic.
The converse of the statement of the exercise above is equally valid. Owing to Birkhoff’s
theorem (see Theorem A.5.1), we know that any n × n doubly stochastic matrix can be
expressed as a convex combination of no more than m ⩽ (n − 1)2 + 1 permutation matrices
(see Exercise A.5.3). By integrating this theorem with our prior analysis, we deduce that
q is more uncertain than p if and only if q = Dp. Shortly, we will demonstrate that the
relationship q = Dp corresponds to majorization. However, before we present this, it’s
essential to introduce the concept of a T -transform.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.1. MAJORIZATION BETWEEN PROBABILITY VECTORS 157

T -Transform
A T -transform is a special kind of a linear transformation from Rn to itself. The matrix
representation of a T -transform is an n × n matrix

T = tIn + (1 − t)P , (4.12)

where t ∈ [0, 1] and P is a permutation matrix that just exchanges two components of the
vector it acts upon. Therefore, for T as above there exist x, y ∈ [n] with x < y such that for
every p = (p1 , . . . , pn )T ∈ Prob(n) and every z ∈ [n] the z-component of the vector r := T p
is given by 
p z
 for z ̸∈ {x, y}
rz = tpx + (1 − t)py for z = x . (4.13)

tpy + (1 − t)px for z = y


Exercise 4.1.10. Let p and r := T p be as above. Show that p ≻ r.

Lemma 4.1.1. Let p, q ∈ Rn be such that p ≻ q. Then, there exists a finite m ∈ N

and n × n T -transforms T1 , . . . , Tm such that

q = T1 · · · Tm p . (4.14)

Proof. If p and q are related by a permutation matrix then the lemma follows from the
fact that any permutation matrix on n elements is a product of transposition matrices (i.e.
matrices that only exchange two elements and keep the rest unchanged). We therefore
assume now that p is not a permutation of q, and without loss of generality assume that
p = p↓ and q = q↓ .
The main idea of the proof is to construct a T -transform of the form given in (4.12) such
that the vector r := T p as given in (4.13) has the following three properties:
1. p ≻ r ≻ q.

2. rx ̸= px and ry ̸= py .

3. rx = qx or ry = qy .
From the third property at least one of the components of r is equal to one of the components
of q. Therefore, if such a T -transform exists, by a repetition of the above process, q can be
obtained from p by a finite number of such T -transforms. It is therefore left to show that a
T -transform with the above three properties exists.
Given that both p and q are probability vectors satisfying p ̸= q, it is impossible for
their components to satisfy px ⩾ qx for all x ∈ [n]. If this were the case, it would lead to
a contradiction, as the sum of the components of both p and q must equal one, implying
px = qx for all x ∈ [n]. Consequently, we define x ∈ [n] as the largest integer for which
px > qx .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

158 CHAPTER 4. MAJORIZATION

Similar arguments to those discussed above imply that the reverse scenario, where px ⩽ qx
for all x ∈ [n], is also not feasible. Moreover, since p ̸= q and p ≻ q, it follows from
Exercise 4.1.7 that the largest integer z ∈ [n] for which pz ̸= qz must satisfy qz > pz .
Therefore, there exists an integer y ∈ [n] with the property that y > x and qy > py . We take
y to be the smallest integer that satisfies these two criteria.
Since x ∈ [n] is the largest integer for which px > qx we get that pw ⩽ qw for all w > x.
Similarly, since y is the smallest integer that satisfy y > x and qy > py we get that for all
x < w < y we have pw ⩽ qw . Combining these two observations we conclude that
pw = q w ∀ w ∈ [n] such that x < w < y . (4.15)
Moreover, from the definitions of x and y, along with the fact that x < y and p = p↓ and
q = q↓ , we deduce the following inequality:
px > qx ⩾ qy > py . (4.16)
Utilizing Equation (4.13), we find that rx = tpy + (1 − t)px and ry = tpx + (1 − t)py .
By choosing t ∈ (0, 1), which means t is strictly between zero and one, we ensure that the
second condition, rx ̸= px and ry ̸= py , is satisfied, given that px ̸= py . It is important to
note that p ≻ r for any T -transform (as per Exercise 4.1.10). Consequently, our remaining
task is to demonstrate the existence of a t ∈ (0, 1) such that r ≻ q, and either rx = qx or
ry = qy is true.
ε
Set ε := min{px − qx , qy − py } > 0, and define t := 1 − px −p y
. By definition, 0 < t < 1
qx −py
(due to (4.16)), and if ε = px − qx then t = px −py
so that

rx = tpx + (1 − t)py = py + t(px − py )

(4.17)
= p y + q x − p y = qx .
px −qy
Similarly, if ε = qy − py then t = px −py
so that

ry = tpy + (1 − t)px = px − t(px − py )

(4.18)
= px − px + qy = q y .
We therefore conclude that for this choice of t ∈ (0, 1) we get that rx = qx or ry = qy .
Before showing that r ≻ q, we argue that rx can never be strictly smaller that qx . Indeed,
we saw above that if ε = px −qx then rx = qx . Moreover, for the second option that ε = qy −py
−qy
we have t = ppxx−p y
so that

rx = tpx + (1 − t)py = py + t(px − py )

= py + px − qy
(4.19)
ε = qy − py −−−−→ = px − ε

ε ⩽ px − qx −−−−→ ⩾ px − (px − qx ) = qx .
We therefore conclude that for both options rx ⩾ qx .
Finally, to show that r ≻ q we show that ∥r∥(k) ⩾ ∥q∥(k) for all k ∈ [n]. We show it in
three cases:

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.1. MAJORIZATION BETWEEN PROBABILITY VECTORS 159

1. For 1 ⩽ k < x we have ∥r∥(k) = ∥p∥(k) ⩾ ∥q∥(k) since p ≻ q and rw = pw for w ∈ [k].

2. For x ⩽ k < y, we have the following relation:

X k
X
∥r∥(k) ⩾ rw = ∥p∥(x−1) + rx + pw . (4.20)
w∈[k] w=x+1

The first term on the right-hand side, ∥p∥(x−1) , satisfies ∥p∥(x−1) ⩾ ∥q∥(x−1) since
p ≻ q. For the second term, we P have already established that rx ⩾ qx , and for the
k Pk
third term we get from (4.15) that w=x+1 pw = w=x+1 qw . Incorporating these three
relations into (4.20) yields ∥r∥(k) ⩾ ∥q∥(k) .

3. For y ⩽ k ⩽ n we use the fact that rx + ry = px + py so that

X X
∥r∥(k) ⩾ rw = pw = ∥p∥(k) ⩾ ∥q∥(k) , (4.21)
w∈[k] w∈[k]

where the last inequality follows from the fact that p ≻ q.

Hence, r ≻ q. This completes the proof.

Exercise 4.1.11. Prove the converse of the statement presented in Lemma 4.1.1. Specif-
ically, demonstrate that for any vector p ∈ Rn and a sequence of m n × n T -transforms,
denoted as T1 , . . . , Tm , the resulting vector q := T1 · · · Tm p fulfills the condition p ≻ q. Hint:
Refer to Exercise 4.1.10 for guidance.

The Fundamental Theorem of Majorization

We are now prepared to introduce the fundamental theorem of majorization, which delineates
the relationship between doubly-stochastic matrices and majorization. For clarity, we adopt
the notation 1n := (1, . . . , 1)T ∈ Rn . This allows for a succinct expression of the sum of the
components of any vector p ∈ Rn as the dot product 1n · p.

Characterization
n
Theorem 4.1.1. Let p, q ∈ R . The following are equivalent:

1. p ≻ q.

2. There exists an n × n doubly stochastic matrix D such that q = Dp.

3. 1n · p = 1n · q and for every t ∈ R

p − tu(n) 1
⩾ q − tu(n) 1
. (4.22)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

160 CHAPTER 4. MAJORIZATION

Remark. In some books the last condition is expressed as

X X
(px − t)+ ⩾ (qx − t)+ , (4.23)
x∈[n] x∈[n]

where for all r ∈ R the notation (r)+ = r if r ⩾ 0 and otherwise (r)+ = 0. To see the
equivalence note that (r)+ = 21 (|r| + r) and “absorb” the factor 1/n into t.

Proof. We divide the proof into three parts:

The implication 1 ⇒ 2: From Lemma 4.1.1 we get that if p ≻ q then q = T1 · · · Tm p
for some T -transforms T1 , . . . , Tm . Since the product of the T -transforms D := T1 · · · Tm is
doubly stochastic (see Exercise 4.1.12) it follows that q = Dp and D is doubly stochastic.
The implication 2 ⇒ 3: Suppose q = Dp for some doubly stochastic matrix D. Then,
since Du(n) = u(n) we get for all t ∈ R

q − tu(n) 1
= Dp − tu(n) 1
= Dp − tDu(n) 1
(n)
(4.24)
(2.8)→ ⩽ p − tu 1
,

where in the last inequality we used the property (2.8) of the norm ∥ · ∥1 , in conjunction
with the fact that the doubly-stochastic matrix D is particularly column stochastic.
The implication 3 ⇒ 1: Suppose (4.23) holds for all t ∈ R. Without loss of generality
suppose that p = p↓ and q = q↓ . Fix k ∈ [n − 1] and observe that for t = pk+1 the left-hand
side of (4.23) can be expressed as:
X X
(px − t)+ = (px − pk+1 )+
x∈[n] x∈[n] (4.25)
p=p ↓
−−−−→ = ∥p∥(k) − kpk+1 .

Furthermore, the right-hand side of (4.23) satisfies

X X
(qx − pk+1 )+ ⩾ (qx − pk+1 )+
x∈[n] x∈[k]
X
(qx − pk+1 )+ ⩾ qx − pk+1 −−−−→ ⩾ (qx − pk+1 ) (4.26)
x∈[k]

q = q↓ −−−−→ = ∥q∥(k) − kpk+1 .

Hence, the combination of the two equations above with our assumption that (4.23) holds
for all t ∈ R, and in particular for t = pk+1 , gives ∥p∥(k) ⩾ ∥q∥(k) . Since k ∈ [n − 1] was
arbitrary we conclude that p ≻ q. This completes the proof.

Exercise 4.1.12. Show that the product of two n × n doubly stochastic matrices is itself a
doubly stochastic matrix.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.1. MAJORIZATION BETWEEN PROBABILITY VECTORS 161

4.1.2 Three Equivalent Approaches

We have seen that majorization is a pre-order relationship between vectors in Prob(m).
Given that a probability vector can represent a classical system X, we can understand
majorization in terms of mixing operations applied to system X. To approach this, let’s set
aside our initial definition of majorization and attempt to redefine it as follows: We state
that a vector p majorizes another vector q (denoted as p ≻ q) if there exists a mixing
operation M ∈ STOCH(m, m) that fulfills the condition:

q = Mp . (4.27)

The pivotal question then becomes how to define these mixing operations. Conceptually,
mixing operations are processes that increase the uncertainty of system X. In this context,
we propose that the mixing operation M can be conceptualized in three distinct manners:

1. The Axiomatic Approach: Since mixing operations can only increase the uncertainty
of system X, the uniform distribution remains invariant under mixing operations, as
its uncertainty cannot be increased further. Hence, M ∈ STOCH(n, n) can be defined
as a mixing operation if
M uX = uX . (4.28)
That is, M = D is doubly stochastic.

2. The Constructive Approach: In this approach the mixing operations are defined in-
tuitively as a convex combination of permutation matrices. Indeed, mixing a pack of
cards literally corresponds to the action of a random permutation. Therefore, in this
approach M ∈ STOCH(n, n) corresponds to a mixing operation if there exists k ∈ N
m × m permutation matrices {Pj }j∈[k] such that
X
M= s j Pj , (4.29)
j∈[k]

for some s ∈ Prob(k).

3. Operational Approach: In this approach, a mixing operation is defined as a stochastic

process M ∈ STOCH(m, m) that cannot increase the chances to win a game of chance.
Specifically, M ∈ STOCH(m, m) is a mixing operation if and only if for all p ∈ Prob(m)
and for all k ∈ [m] the probability, Prk , to win a k-gambling game satisfies

Prk (p) ⩾ Prk (M p) . (4.30)

As we proved in the preceding subsections, all the three approaches above are equivalent,
leading to the same pre-order given in (4.1). Furthermore, the established equivalence of
these approaches solidifies the conceptual foundation of uncertainty. This, in turn, vali-
dates functions that exhibit monotonic behavior under majorization as reliable quantifiers
of uncertainty. Such measures of uncertainty are known as Schur concave functions.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

162 CHAPTER 4. MAJORIZATION

4.1.3 Schur Convexity

A function f : Rn → R that preserves the majorization order is said to be Schur convex.
That is, f is Schur convex if and only if for any two vector p, q ∈ Rn such that p ≻ q we
have
f (p) ⩾ f (q) . (4.31)
We also say that f is Schur concave if −f is Schur convex. Note that a Schur convex function
f is symmetric with respect to permutations, since if q = P p for some permutation matrix
P then we have p ≻ q and q ≻ p so that f (p) = f (q). Moreover, from the fundamental
theorem of majorization if p ≻ q then there exists a set of permutation matrices {Pj }j∈[m]
and a probability distribution {tj }j∈[m] such that
X
q= t j Pj p . (4.32)
j∈[m]

Therefore, if a function f : Rn → R is symmetric (under permutations) and convex then it

is necessarily Schur convex. To see this, observe that
X
f (q) = f tj Pj p
j∈[m]
X
f is convex→ ⩽ tj f (Pj p) (4.33)
j∈[m]
X
f is symmetric→ = tj f (p) = f (p) .
j∈[m]

As an example, consider the Shannon entropy, defined for any probability vector p ∈ Prob(n)
as X
H(p) := − px log2 (px ) . (4.34)
x∈[n]

This function is clearly symmetric under any permutation of the components of p, and it is
also concave. Therefore, the Shannon entropy is an example of a Schur concave function.

Exercise 4.1.13. Show that the geometric mean function

n1
G(p) := p1 p2 · · · pn ∀ p ∈ Prob(n) , (4.35)

is Schur concave. Hint: Show first that log G(p) is Schur concave by showing that it is both
symmetric and concave (what is its Hessian matrix?).

From the following exercise it follows that not all Schur convex functions are symmetric
and convex. Therefore, in this sense, the notion of Schur convexity is weaker than (standard)
convexity.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.1. MAJORIZATION BETWEEN PROBABILITY VECTORS 163

Exercise 4.1.14. Consider the function

f (p) = log max{p1 , . . . , pn } ∀ p ∈ Prob(n) . (4.36)

1. Show that f is symmetric.

2. Show that f is not convex.

3. Show that f is Schur convex.

One way to test if a given multivariable function is convex is to check if its Hessian matrix
is positive semidefinite. Since Schur convex functions are not necessarily convex, this test
cannot always be used to determine if a function is Schur convex. Instead, we can use the
theorem below to test if a given symmetric function is Schur convex.

Schur’s Test
Theorem 4.1.2. Let f : Prob(n) → R be a continuous function that is also
continuously differentiable on the interior of Prob(n). Then, f is Schur convex if and
only if the following two conditions hold:

1. f is symmetric in Prob(n); i.e. for every n × n permutation matrix P

f (p) = f (P p) ∀ p ∈ Prob(n) . (4.37)

2. For all 0 < p ∈ Prob(n)

∂f (p) ∂f (p)
(p1 − p2 ) − ⩾0. (4.38)
∂p1 ∂p2

Remark. Since the function f is symmetric, the condition in (4.38) is equivalent the following
condition. For all 0 < p ∈ Prob(n) and all x ̸= y ∈ [n]

∂f (p) ∂f (p)
(px − py ) − ⩾0. (4.39)
∂px ∂py

Proof. Suppose f is Schur convex. We need to show that (4.38) holds. Let p > 0 and observe
that if p1 = p2 the condition clearly holds. Therefore, without loss of generality we assume
that p1 > p2 (and recall that p2 > 0 since p > 0). Let 0 < ε < p1 − p2 and define

p̃ε := (p1 − ε, p2 + ε, p3 , . . . , pn )T . (4.40)

ε
Let a := p1 −p2
< 1 and observe that p̃ε = Dp, where D = D2 ⊕ In−2 with
 
1−a a
D2 :=   . (4.41)
a 1−a

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

164 CHAPTER 4. MAJORIZATION

Since D is doubly stochastic, we get from Theorem 4.1.1 that p ≻ p̃ε . Therefore, from the
assumption that f is Schur convex we conclude that for all 0 < ε < p1 − p2
f (p) − f (p̃ε )
0⩽
ε
f (p) − f (p1 − ε, p2 , . . . , pn ) f (p1 − ε, p2 , . . . , pn ) − f (p̃ε )
= + (4.42)
ε ε
ε→0+ ∂f (p) ∂f (p)
−−−→ − .
∂p1 ∂p2

Hence, we get that ∂f∂p(p)

1
⩾ ∂f∂p(p)
2
which is equivalent to (4.38) (since p1 > p2 ).
For the converse, let 0 < p, q ∈ Prob(n) and suppose p ≻ q. We need to show that
f (p) ⩾ f (q). From Lemma 4.1.1 we know that q can be obtained from p by a sequence of T -
transforms. Therefore, it is sufficient to show that f is non-increasing under a single action of
T -transform. Explicitly, it is sufficient to show that f (T p) ⩽ f (p) for any T = tIn + (1 − t)P
where t ∈ [0, 1] and P is a permutation matrix exchanging only two components. Moreover,
we can assume that t ∈ [ 21 , 1] since f is symmetric under permutations and therefore f (T p) =
f (P T p), where P T is the same T transform but with 1 − t replacing t (since P 2 = In ). For
simplicity of the exposition, we also assume that P is the matrix exchanging the first and
second components, so that

T p = (tp1 + (1 − t)p2 , tp2 + (1 − t)p1 , p3 , . . . , pn )T . (4.43)

Note that if p1 = p2 then the transformation does not effect p. Therefore, without loss of
generality suppose that p1 > p2 (recall that f is symmetric, so we can exchange between p1
and p2 if necessary). Now, from (4.38) we get that
d
f (p1 − ε, p2 + ε, p3 , . . . , pn ) ⩽ 0 , (4.44)
dε
for any 0 ⩽ ε ⩽ 12 (p1 − p2 ) (note that in this domain p1 − ε ⩾ p2 + ε). We therefore conclude
that the function
g(ε) := f (p1 − ε, p2 + ε, p3 , . . . , pn ) (4.45)
is non-increasing in the domain 0 ⩽ ε ⩽ 21 (p1 − p2 ). Taking ε := (1 − t)(p1 − p2 ) we conclude
that
f (p) = g(0) ⩾ g(ε) = f (p1 − ε, p2 + ε, p3 , . . . , pn )
By definition of ε→ = f (tp1 + (1 − t)p2 , tp2 + (1 − t)p1 , p3 , . . . , pn ) (4.46)
= f (T p)
This completes the proof.
As an example, consider the family of Rényi entropies defined for any α ∈ [0, ∞] and all
p ∈ Prob(n) as
1 X
Hα (p) := log pαx , (4.47)
1−α
x∈[n]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.2. APPROXIMATE MAJORIZATION 165

where the cases α = 0, 1, ∞ are defined in terms of their limits. In the next chapter we
will study these functions in more details. Here we show that for all α ∈ (0, ∞) the Rényi
entropies are SchurP concave. Due to the monotonicity of the log function, it is enough to
show that fα (p) := x∈[n] pαx is Schur convex for α ∈ (0, 1) and Schur concave for α ∈ (1, ∞).
Indeed,
∂f (p) ∂f (p)
= α(p1 − p2 ) pα−1 − pα−1

(p1 − p2 ) − 1 2 (4.48)
∂p1 ∂p2
which is always non-negative for α > 1 and non-positive for α ∈ (0, 1). Hence, from the
Schur’s test it follows that Hα (p) is Schur concave for all α ∈ [0, ∞] (the cases α = 0, 1, ∞
follow from the continuity of Hα in α).
As another example, consider the elementary symmetric functions defined for each k ∈ [n]
by X
fk (p) := px1 · · · pxk ∀ p ∈ Prob(n) . (4.49)
x1 <···<xk
x1 ,...,xk ∈[n]

For example, for k = 2 they take the form

X
f2 (p) = px py , (4.50)
x<y
x,y∈[n]

and for k = n we have fn (p) = p1 · · · pn . From Schur’s test it follows that the elementary
symmetric functions are Schur concave.

Exercise 4.1.15. Use Schur’s test to verify that the elementary symmetric functions are
Schur concave.

4.2 Approximate Majorization

We saw in the previous section that given two probability vectors p, q ∈ Prob(n) it is possible
to have both p ̸≻ q and q ̸≻ p (i.e. p and q are incomparable). For some applications in
resource theories, it is often useful to know how much one has to perturb p so that p ≻ q. To
make this idea rigour, we first introduce several notions of “greatest” and “least” elements
in a subset of probability vectors.

Maximal and Minimal Elements

Definition 4.2.1. Let C ⊆ Prob(n) be a subset of n-dimensional probability vectors.

• A vector p ∈ C is said to be a maximal element of C if for all q ∈ C we have

p ≻ q.

• A vector p ∈ C is said to be a minimal element of C if for all q ∈ C we have

q ≻ p.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

166 CHAPTER 4. MAJORIZATION

In general, partial orders don’t always have maximal and minimal elements. For example,
the set    


 1/2 2/5 
    
C := 1/4 , 2/5 (4.51)
   

     

 1/4 
1/5 
has no maximal nor minimal elements since none of the two vectors majorize the other.
On the other hand, one can define upper and lower bounds on a set of probability vectors.
Specifically, given a subset C ⊆ Prob(n)
• A vector p ∈ Prob(n) is said to be an upper bound of C if for all q ∈ C we have p ≻ q.
• A vector p ∈ Prob(n) is said to be a lower bound of C if for all q ∈ C we have q ≻ p.
Note that lower and upper bounds always exist since the vector e1 = (1, 0, . . . , 0)T is always
an upper bound and the vector u(n) is always a lower bound. Less trivial bounds are those
that are optimal: Given a subset C ⊆ Prob(n),
• An upper bound p ∈ Prob(n) of C is said to be optimal if for any other upper bound
p′ ∈ Prob(n) of C we have p′ ≻ p.
• A lower bound p ∈ Prob(n) of C is said to be optimal if for any other lower bound
p′ ∈ Prob(n) of C we have p ≻ p′ .
Exercise 4.2.1. Let
C := p1 , . . . , pm ⊂ Prob(n) . (4.52)
be a set consisting of m probability vectors, and for each z ∈ [n] denote by
sz := max ∥py ∥(z) (4.53)
y∈[m]

where ∥ · ∥(z) is the Ky Fan norm (see (2.78)). Finally, denote by

qz := sz − sz−1 ∀ z ∈ [n] , (4.54)
with s0 := 0. Show that the vector q = (q1 , . . . , qn )T is a probability vector in Prob(n) that
is also an upper bound of C. Is it optimal?
The study of the above notions of extrema under majorization of a set C goes beyond
the scope of this book. Here, we are only interested in a particular set of vectors, namely,
a ball of a small radius around some probability vector. Let p ∈ Prob(n) be a probability
vector and for any ε ∈ [0, 1] define a “ball” of radius ε around it as

′ 1 ′
Bε (p) := p ∈ Prob(n) : ∥p − p ∥1 ⩽ ε . (4.55)
2
Remarkably, the above subset of Prob(n) has both minimal and maximal elements, known
as the flattest and steepest ε-approximations of p.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.2. APPROXIMATE MAJORIZATION 167

4.2.1 The Steepest ε-Approximation

In this subsection we find the maximal element (under majorization) of Bε (p), where p ∈
Prob(n). Recall that the vector e1 := (1, 0, . . . , 0)T ∈ Prob(n) satifies e1 ≻ q for all q ∈
Prob(n). Therefore, if 21 ∥p − e1 ∥1 ⩽ ε, i.e., e1 ∈ Bε (p), then the maximal element of Bε (p)
is unique up to permutation of the components, and is given by e1 . We will therefore assume
now that 21 ∥p − e1 ∥1 > ε and that the components of p are arranged in non-increasing order;
i.e. p = p↓ .
Exercise 4.2.2. Let {ez }nz=1 be the elementary basis of Rn . Show that if p = p↓ and
e1 ̸∈ Bε (p) then for all z ∈ [n] we have ez ̸∈ Bε (p). Hint: Show first that 12 ∥p−ez ∥1 = 1−pz .
Fix ε ∈ (0, 1) and let k ∈ [n] be the integer satisfying

∥p∥(k) ⩽ 1 − ε < ∥p∥(k+1) . (4.56)

With this index k we define the steepest ε-approximation of p, denoted by p(ε) , whose
components are 

 p1 + ε if x = 1

p
x if x ∈ {2, . . . , k}
p(ε)
x := . (4.57)


 1 − ε − ∥p∥ (k) if x = k + 1

0 otherwise
Note that from its definition above, p(ε) is indeed a probability vector whose components
are arranged in non-increasing order.
Exercise 4.2.3. Utilize the definition of k as provided in (4.56) and the definition of p(ε)
as outlined in (4.57) to demonstrate the following two properties:
(ε) (ε)
1. The components of p(ε) are arranged in non-increasing order. In particular, pk > pk+1 .

2. The components of p(ε) satisfy

p(ε)
x ⩽ px ∀ x ∈ {2, . . . , n} . (4.58)
(ε)
In particular, pk+1 < pk+1 .
The intuition behind the definition above is that we want to alter p in a way that it
becomes more similar to e1 . However, since p(ε) must be close to p we cannot increase p1
by too much. Indeed, the vector p(ε) as defined above is ε-close to p. To see this, observe
that from its definition
1 (ε) X
p(ε)

p −p 1 = x − px +
2 (4.59)
x∈[n]

(4.58)→ = p1 + ε − p1 = ε .
Therefore, the vector p(ε) is indeed in Bε (p).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

168 CHAPTER 4. MAJORIZATION

Theorem 4.2.1. Let p ∈ Prob↓ (n) be such that 12 ∥p − e1 ∥1 > ε. Then, the vector
p(ε) as defined in (4.57) is the maximal element (under majorization) of Bε (p).

Proof. Since we already showed that p(ε) ∈ Bε (p) it is left to show that for any q ∈ Bε (p)
we have p(ε) ≻ q. Indeed, since q ∈ Bε (p) it follows from (2.83) that for every ℓ ∈ [n]
1
∥q∥(ℓ) − ∥p∥(ℓ) ⩽ ∥q − p∥ ⩽ ε . (4.60)
2
Therefore, for ℓ ∈ [k] we get
∥q∥(ℓ) ⩽ ∥p∥(ℓ) + ε
(4.61)
(4.57)→ = ∥p(ε) ∥(ℓ) .
Combining this with the fact that for k + 1 ⩽ ℓ ⩽ n, ∥p(ε) ∥(ℓ) = 1, we conclude that p(ε) ≻ q.
This completes the proof.

Exercise 4.2.4. Let m, n ∈ N be such that m < n, and let p ∈ Prob↓ (n) be such that
u(m) ̸≻ p. Show that a minimal element of the set

Cp,m := q′ ∈ Prob(m) : q′ ≻ p

(4.62)

is given by a probability vector q ∈ Prob↓ (m) of the form

T
q = p1 , . . . , pℓ , pℓ , . . . pℓ , 1 − (m − ℓ − 1)pℓ , (4.63)

where ℓ ∈ [m] is the largest integer satisfying

(m − ℓ)pℓ ⩾ 1 − ∥p∥(ℓ) . (4.64)

One can use the steepest ε-approximation to compute the distance of a vector p ∈
Prob(n) to the set of all vectors r ∈ Prob(n) that majorizes q. Specifically, let

Majo(q) := {r ∈ Prob(n) : r ≻ q} , (4.65)

denotes the set of all vectors in Prob(n) that majorize q, and define the distance between
p ∈ Prob(n) and the set Majo(q) as:
1
T p, Majo(q) := min ∥p − r∥1 . (4.66)
r∈Majo(q) 2

Theorem 4.2.2. Using the same notations as above, for all p, q ∈ Prob(n)

T p, Majo(q) = max ∥q∥(ℓ) − ∥p∥(ℓ) . (4.67)
ℓ∈[n]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.2. APPROXIMATE MAJORIZATION 169

Proof. Without loss of generality we will assume that p, q ∈ Prob↓ (n). For any ε ∈ (0, 1),
let p(ε) be the steepest ε-approximation of p; see (4.57). Observe that by definition, for any
m ∈ [d] we have p(ε) (m) ⩽ ∥p∥(m) +ε with equality if m ∈ [k]. In Theorem 4.2.1 we showed
that p(ε) is the maximal element of Bε (p) as long as ε < 12 ∥p − e1 ∥1 (otherwise, e1 is the
maximal element). Hence,
n1 o
T p, Majo(q) := min ∥p − r∥1 : r ≻ q , r ∈ Prob(n)
n2 o
= min ε ∈ [0, 1] : r ≻ q , r ∈ Bε (p) (4.68)
n o
(ε)
p(ε) ≻ r ∀ r ∈ Bε (p) −−−−→ = min ε ∈ [0, 1] : p ≻q .

That is, it is left to compute the smallest ε that satisfy p(ε) ≻ q. We will show that this
smallest ε equals
δ := max ∥q∥(ℓ) − ∥p∥(ℓ) . (4.69)
ℓ∈[n]

We first show that p(δ) ≻ q. Let k be the integer satisfying (4.56) but with δ replacing ε.
Then, from the definition of p(δ) it follows that for m > k we have

p(δ) (m)
= 1 ⩾ ∥q∥(m) . (4.70)

Moreover, for m ∈ [k] the definition in (4.100) gives δ ⩾ ∥q∥(m) − ∥p∥(m) so that

p(δ) (m)
= ∥p∥(m) + δ ⩾ ∥q∥(m) . (4.71)

Hence, p(δ) ≻ q. To prove the optimality of δ, we show that for any 0 < δ ′ < δ we must
′
have p(δ ) ̸≻ q. Indeed, since δ ′ < δ there exists m ∈ [d] such that

δ ′ < ∥q∥(m) − ∥p∥(m) . (4.72)

′
Combining this with the observation that p(δ ) (ℓ) ⩽ ∥p∥(ℓ) + δ ′ for all ℓ ∈ [n], we get for
this m ′
p(δ ) (m) ⩽ ∥p∥(m) + δ ′
(4.73)
(4.72)→ < ∥q∥(m) .
′
Hence, p(δ ) ̸≻ q. This concludes the proof.

4.2.2 The Flattest ε-Approximation

In this subsection, we aim to identify the minimal element within the set Bε (p), given that
p ∈ Prob(n). Specifically, our objective is to locate the vector in Bε (p) that exhibits the
most uniform (or “flattest”) distribution. It’s evident that if the uniform distribution vector
u(n) is a member of Bε (p), then it is the minimal element of Bε (p). Therefore, our analysis

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

170 CHAPTER 4. MAJORIZATION

will focus on the scenario where u(n) ̸∈ Bε (p). In this context, the parameter ε satisfies the
following condition:
1
0<ε< p − u(n) 1 . (4.74)
2
Additionally, we will assume that the components of the vector p are sorted in a non-
increasing order; i.e., p = p↓ .

Exercise 4.2.5. Let ε ∈ (0, 1), p ∈ Prob↓ (n), and ℓ ∈ [n] be the integer satisfying pℓ ⩾ 1
n
>
pℓ+1 . Show that the inequality in (4.74) holds if and only if

ℓ
ε < ∥p∥(ℓ) − . (4.75)
n
1

p − u(n)
P
Hint: Start by expressing 2 1
as x∈[n] px − 1/n + .

The minimal element of Bε (p) can be found by “flattening” the tip of p (i.e. first few
components of p) and tail of p (i.e. the last few components of p). The intuition behind
this idea is to alter the vector p so that it becomes more similar to the uniform distribution
u(n) . This process involves replacing the first k components of p with a constant a, and
substituting the last n−m components with another constant b. We denote by p(ε) ∈ Prob(n)
the resulting vector. Its components are given by

a if x ∈ [k]

(ε)
px := px if k < x ⩽ m . (4.76)

b if x ∈ {m + 1, . . . , n}


The objective is to select suitable values for a, b, k, and m, ensuring that p(ε) forms the
flattest ε-approximation of p.
To find the coefficients a, b, k, m we outline the properties that p(ε) has to satisfy:

1. The vector p(ε) is a probability vector in Prob(n). Since all of its components
Pm are non-
negative, we just need to require that they sum to one. Using the relation x=k+1 px =
∥p∥(m) − ∥p∥(k) we get that the coefficients a, b, k, m must satisfy
X
1= p(ε)
x
= ka + ∥p∥(m) − ∥p∥(k) + (n − m)b . (4.77)
x∈[n]

2. The vector p(ε) ∈ Prob↓ (n); i.e. its components are arranged in non-decreasing order.
Since p = p↓ it is sufficient to require that a > pk+1 and b < pm (these inequalities are
strict since we want k and m to mark the indices in (4.76) in which that the “flattening”
process ends and begins, respectively). We therefore conclude that

a ∈ (pk+1 , pk ] and b ∈ [pm+1 , pm ) . (4.78)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.2. APPROXIMATE MAJORIZATION 171

3. The vector p(ε) ∈ B(ε) (p). Moreover, since p(ε) is an optimal vector, we would expect
it to be ε-close (and not δ-close with δ < ε) to p. Therefore, we require that
1 X
ε= p − p(ε) 1
= px − p(ε)
x
2 +
x∈[n]
(4.79)
X
a∈(pk+1 ,pk ]
b∈[pm+1 ,pm )
−−−−→ = (px − a)
x∈[k]

= ∥p∥(k) − ka .

In addition to the three conditions above, we need to require that p(ε) is the minimal element
of Bε (p). However, we first show that the three conditions above already determine uniquely
the coefficients a, b, k, m. Indeed, from (4.77) it follows that

∥p∥(k) − ka = (n − m)b + ∥p∥(m) − 1 . (4.80)

Comparing this equality with (4.79) implies that ε = ∥p∥(k) −ka and ε = (n−m)b+∥p∥(m) −1.
We therefore conclude that
∥p∥(k) − ε 1 + ε − ∥p∥(m)
a= and b = . (4.81)
k n−m
That is, the equation above can be viewed as the definitions of a and b, and it is left to
determine k and m.
Substituting the above definitions of a and b into (4.78) and isolating ε gives that

ε ∈ [rk , rk+1 ) and ε ∈ [sm+1 , sm ) , (4.82)

where for all z ∈ [n]

rz := ∥p∥(z) − zpz and sz := (n − z)pz + ∥p∥(z) − 1 . (4.83)

Moreover, in the exercise below you show that the components of the vectors r := (r1 , . . . , rn )T
and s ∈ (s1 , . . . , sn )T are non-negative and satisfy r = r↑ and s = s↓ . Thus, the relations
in (4.82) uniquely specify k and m. However, it is left to show that k ⩽ m since otherwise
p(ε) would not be well defined.
For this purpose, let ℓ ∈ [n] be the integer defined in Exercise 4.2.5. We will show that
k ⩽ ℓ ⩽ m. To prove k ⩽ ℓ, suppose by contradiction that k ⩾ ℓ + 1. Since ε ∈ [rk , rk+1 ) we
have ε ⩾ rk ⩾ rℓ+1 , where the second inequality follows from the fact that r = r↑ and our
assumption that k ⩾ ℓ + 1. Combining this with the definition of rℓ+1 in (4.83), we get

ε ⩾ ∥p∥(ℓ+1) − (ℓ + 1)pℓ+1
= ∥p∥(ℓ) − ℓpℓ+1
(4.84)
1 ℓ
pℓ+1 < −−−−→ > ∥p∥(ℓ) − ,
n n

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

172 CHAPTER 4. MAJORIZATION

which is in contradiction with (4.75). Therefore, the assumption that k ⩾ ℓ + 1 cannot hold
and we conclude that k ⩽ ℓ.
Similarly, to prove that m ⩾ ℓ, suppose by contradiction that m ⩽ ℓ − 1. Since ε ∈
[sm+1 , sm ) we have ε ⩾ sm+1 ⩾ sℓ , where the second inequality follows from the fact that
s = s↓ and our assumption that m + 1 ⩾ ℓ. Combining this with the definition of sℓ in (4.83),
we get
ε ⩾ (n − ℓ)pℓ + ∥p∥(ℓ) − 1
1 ℓ (4.85)
pℓ ⩾ −−−−→ ⩾ ∥p∥ (ℓ) − ,
n n
which is again in contradiction with (4.75). Therefore, the assumption that m ⩽ ℓ−1 cannot
hold and we conclude that m ⩾ ℓ. Combining this with our earlier result that k ⩽ ℓ we
conclude that k ⩽ m.
Exercise 4.2.6. Show that the vectors r and s, whose components are given in (4.83) satisfy:
0 = r1 ⩽ r2 ⩽ · · · ⩽ rn = 1 − npn and np1 − 1 = s1 ⩾ s2 ⩾ · · · ⩾ sn = 0 . (4.86)
It’s important to note that the index k is characterized by its role as the maximizer of
∥p∥(ℓ) −ε
the function ℓ 7→ tℓ := ℓ
. To put it another way, tk = maxℓ∈[n] {tℓ }. This implies that
the coefficient a can be straightforwardly defined as:
∥p∥(ℓ) − ε

a := max . (4.87)
ℓ∈[n] ℓ
To understand this, let ℓ be the largest integer that satisfies tℓ = maxℓ′ ∈[n] {tℓ′ }. The inequal-
ity tℓ > tℓ+1 leads to (see Exercise 4.2.7):
rℓ+1 − ε
0 < tℓ − tℓ+1 = , (4.88)
ℓ(ℓ + 1)
where rℓ := ∥p∥(ℓ) − ℓpℓ as previously defined. This implies that rℓ+1 > ε. Conversely, by
following a similar reasoning, the condition tℓ ⩾ tℓ−1 yields rℓ ⩽ ε. Therefore, we conclude
that ℓ is the integer for which ε falls in the interval [rℓ , rℓ+1 ), which leads us to deduce that
ℓ = k.
rℓ+1 −ε
Exercise 4.2.7. Verify the equality tℓ − tℓ+1 = ℓ(ℓ+1)
.
Exercise 4.2.8. Using the same notations as above, show that for every ε ∈ (0, 1) and
p ∈ Prob(n) the coefficient b can be expressed as:
1 + ε − ∥p∥(ℓ)
b = min . (4.89)
ℓ∈[n−1] n−ℓ

Theorem 4.2.3. Let ε ∈ (0, 1) and p ∈ Prob↓ (n) be a probability vector such
that (4.74) holds. Let k, m ∈ [n − 1] be the integers satisfying (4.82), and a and b be
the numbers defined in (4.81). Then, for these choices of k, m, a, and b, the vector
p(ε) as defined in (4.76) is the minimal element (under majorization) of Bε (p).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.2. APPROXIMATE MAJORIZATION 173

Proof. We already showed that p(ε) ∈ Bε (p). It is therefore left show that if q ∈ Bε (p)
then q ≻ p(ε) . To establish that ∥q∥(ℓ) ⩾ p(ε) (ℓ) for every ℓ ∈ [n], we partition the proof
into three distinct cases:

1. The case ℓ ∈ [k]. In this scenario, since p(ε) (ℓ)

= ℓa, the condition ∥q∥(ℓ) ⩾ p(ε) (ℓ)
simplifies to 1ℓ ∥q∥(ℓ) ⩾ a. Since the components of q↓ are arranged in non-increasing
order, for every integer ℓ ⩽ k the average 1ℓ ∥q∥(ℓ) is no smaller than the average k1 ∥q∥(k) .
Therefore, it is sufficient to show that k1 ∥q∥(k) ⩾ a. Indeed, since q ∈ Bε (p) we get
from (2.83) that
1
∥p∥(k) − ∥q∥(k) ⩽ ∥p − q∥ ⩽ ε . (4.90)
2
Isolating ∥q∥(k) gives
∥q∥(k) ⩾ ∥p∥(k) − ε
(4.91)
(4.81)→ = ka ,
so that k1 ∥q∥(k) ⩾ a.

2. The case k < ℓ ⩽ m. We use again (2.83) (with ℓ replacing k) to get

∥q∥(ℓ) ⩾ ∥p∥(ℓ) − ε
ℓ
X
= ∥p∥(k) − ε + px
x=k+1
ℓ (4.92)
X
(4.81)→ = ka + px
x=k+1
(ε)
(4.76)→ = p (ℓ)
.

3. The case m < ℓ ⩽ n. We use once more (2.83) (with m replacing k) to get ∥q∥(m) ⩾
∥p∥(m) − ε. Moreover, observe that in this case we have for all m < ℓ ⩽ n
n
X
∥q∥(ℓ) = 1 − qx↓ and p(ε) (ℓ)
= 1 − (n − ℓ)b . (4.93)
x=ℓ+1

Therefore, in order to prove that ∥q∥(ℓ) ⩾ p(ε) (ℓ)

it is sufficient to show that

n
1 X ↓
q ⩽ b. (4.94)
n − ℓ x=ℓ+1 x

The inequality in (4.94) is equivalent to the statement that the average of the last
n − ℓ components q↓ is no greater than b. Since the components of q↓ are arranged

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

174 CHAPTER 4. MAJORIZATION

in non-increasing order, this average is no greater than the average of the last n − m
components of q↓ (recall that n − m > n − ℓ). Hence,
n n
1 X ↓ 1 X
qx ⩽ q↓
n − ℓ x=ℓ+1 n − m x=m+1 x
1 − ∥q∥(m)
= (4.95)
n−m
1 + ε − ∥p∥(m)
∥q∥(m) ⩾ ∥p∥(m) − ε −−−−→ =
n−m
(4.81)→ = b .

We therefore concludes that q ≻ p(ε) .

One can use the flatest ε-approximation to compute the distance of a vector p ∈ Prob(n)
to the set of all vectors r ∈ Prob(n) that are majorized by q. Specifically, let

majo(q) := {r ∈ Prob(n) : q ≻ r} , (4.96)

denotes the set of all vectors in Prob(n) that are majorized by q, and define the distance
between p ∈ Prob(n) and the set majo(q) as:
1
T p, majo(q) := min ∥p − r∥1 . (4.97)
r∈majo(q) 2

Theorem 4.2.4. Using the same notations as above, for all p, q ∈ Prob(n)

T p, majo(q) = max ∥p∥(ℓ) − ∥q∥(ℓ) . (4.98)
ℓ∈[n]

Proof. Without loss of generality we will assume that p, q ∈ Prob↓ (n) and q ̸≻ p. For any
ε ∈ (0, 1), let p(ε) be the flattest ε-approximation of p; see (4.76). By definition,

n1 o
T p, majo(q) := min ∥p − r∥1 : q ≻ r , r ∈ Prob(n)
n2 o
= min ε ∈ [0, 1] : q ≻ r , r ∈ Bε (p) (4.99)
n o
(ε)
r ≻ p(ε) ∀ r ∈ Bε (p) −−−−→ = min ε ∈ [0, 1] : q ≻ p .

That is, it is left to compute the smallest ε that satisfy q ≻ p(ε) . We will show that this
smallest ε equals

δ := max ∥p∥(ℓ) − ∥q∥(ℓ) . (4.100)
ℓ∈[n]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.2. APPROXIMATE MAJORIZATION 175

We first show that q ≻ p(δ) . Let k, m ∈ [n − 1] be the integers satisfying (4.82), and a and
b be the numbers defined in (4.81), but with δ replacing ε. From Exercise 4.1.8 we have
q ≻ p(δ) if and only if

∥q∥(ℓ) ⩾ p(δ) (ℓ)

∀ ℓ ∈ {k, k + 1, . . . , m} . (4.101)

Now, for k ⩽ ℓ ⩽ m
ℓ
X
(δ)
∥q∥(ℓ) − p (ℓ)
= ∥q∥(k) − ka + (qx − px )
x=k+1
ℓ
X (4.102)
ka = ∥p∥(k) − δ −−−−→ = δ + ∥q∥(k) − ∥p∥(k) + (qx − px )
x=k+1
= δ + ∥q∥(ℓ) − ∥p∥(ℓ) .

Hence, q ≻ p(δ) if and only if for all ℓ ∈ {k, . . . , m} we have δ ⩾ ∥p∥(ℓ) − ∥q∥(ℓ) . From its
definition, δ ⩾ ∥p∥(ℓ) − ∥q∥(ℓ) for all ℓ ∈ [n]. Hence, q ≻ p(δ) .
To prove the optimality of δ, we use the fact that p(δ) is δ-close to p so that from (2.83)
we get for any ℓ ∈ [n]
δ ⩾ ∥p∥(ℓ) − p(δ) (ℓ)
(4.103)
q ≻ p(δ) → ⩾ ∥p∥(ℓ) − ∥q∥(ℓ) .
Since the above inequality holds for all ℓ ∈ [n] we conclude that δ is optimal. This concludes
the proof.

Two Key Functions

We end the section by introducing two functions frequently used in majorization theory,
enabling the study of approximate majorization more effectively. For a given p ∈ Prob(n)
we define fp : [0, 1] → [0, 1] and gp : [0, 1] → [0, n − 1] via
X X
fp (t) := (px − t)+ and gp (t) := (t − px )+ . (4.104)
x∈[n] x∈[n]

Recall that (s − t)+ = 21 (|s − t| + s − t), and since the absolute value is a continuous function,
these functions are continuous (although not differentiable). Observe that fp (t) = 0 for
t ⩾ p1 , whereas gp (t) = 0 for t ∈ [0, pn ] and gp (t) = nt − 1 for t ⩾ p1 . The function fp (t) is
non-increasing in t while gp (t) is non-decreasing in t. See Fig. 4.1 for examples of fp (t) and
gp (t).

Exercise 4.2.9. In this exercise we use the same notations used in this subsection with a
fix p ∈ Prob↓ (n).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

176 CHAPTER 4. MAJORIZATION

Figure 4.1: The functions fp (t) and gp (t). The red dots indicate the points at which the slop of
the functions changes.

1. Show that for any z ∈ [n]

fp (pz ) = rz and gp (pz ) = sz . (4.105)

2. Use Part 1 to provide an alternative proof that r = r↑ and s = s↓ .

3. Show that fp (a) = gp (b) = ε.
4. Show that fp (1/n) = gp (1/n) = ∥p∥(ℓ) − nℓ , where ℓ ∈ [n] is the largest integer satisfying
pℓ ⩾ n1 .
5. Show that
ℓ ℓ
∥p∥(ℓ) − ⩽ 1 − npn and ∥p∥(ℓ) − ⩽ np1 − 1 . (4.106)
n n
The functions fp (t) and gp (t) are one-to-one when restricted to the domains [0, p1 ] and
[pn , 1], respectively (and therefore in these domains they are monotonically decreasing and
monotonically increasing, respectively). Therefore, the functions fp : [0, p1 ] → [0, 1] and
gp : [pn , 1] → [0, n − 1] have inverse functions. Using the same notations as above, the
inverse function fp−1 : [0, 1] → [0, p1 ] : r 7→ fp−1 (r) is given by (see Exercise 4.2.10)
( ∥p∥ −r
(k)
−1 k
if r > 0
fp (r) = , (4.107)
1 if r = 0

where k ∈ [n] is the integer satisfying r ∈ (rk , rk+1 ]. The inverse function gp−1 : [0, n − 1] →
[pn , 1] : s 7→ gp−1 (s) is given by (see Exercise 4.2.10)
( 1+s−∥p∥
(m)
−1 n−m
if s > 0
gp (s) = , (4.108)
0 if s = 0

where m ∈ {0, 1, . . . , n − 1} is the integer satisfying s ∈ (sm+1 , sm ].

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.3. RELATIVE MAJORIZATION 177

Exercise 4.2.10. Consider the functions fp : [0, p1 ] → [0, 1] and gp : [pn , 1] → [0, n − 1] as
defined above and let fp−1 and gp−1 be as defined in (4.107) and (4.108).
1. Show that for any t ∈ [0, p1 ] and r ∈ [0, 1]
fp−1 (fp (t)) = t and fp fp−1 (r) = r .

(4.109)

2. Show that for any t ∈ [pn , 1] and s ∈ [0, n − 1]

gp−1 (gp (t)) = t and gp gp−1 (s) = s .

(4.110)

4.3 Relative Majorization

In the first section of this chapter we compared between a p-dice and a q-dice via the degree
of uncertainty they each posses. In this section we study the degree of distinguishability
between the two dice. Unlike arbitrary vectors, objects like probability vectors (as well as
quantum states, quantum channels, etc) contain information about physical systems and
therefore their distinguishability is typically quantified with functions that are sensitive to
this information. For example, suppose a player receives a biased dice whose probability
distribution is either p := {px }x∈[n] or q := {qx }x∈[n] . The player can estimate which of the
two distributions corresponds to the dice by rolling the dice many times. The intuition is
that if p and q are very distinguishable it would be easier (i.e. quicker) to determine which
one of them corresponds to the dice.
One of the key observations in any distinguishability task as above, is that by sending
the information source (i.e. the outcomes of the dice) through a communication channel,
the player cannot increase his or her ability to distinguish between the two distributions p
and q. This means that if E is the column stochastic matrix that corresponds to a classical
communication channel, then the resulting distributions Ep and Eq are less distinguishable
than p and q.

Relative Majorization
Definition 4.3.1. Let p, q ∈ Prob(n) and p′ , q′ ∈ Prob(m) be two pairs of
probability distributions. We say that (p, q) relatively majorize (p′ , q′ ) and write

(p, q) ≻ (p′ , q′ ) ⇐⇒ p′ = Ep and q′ = Eq , (4.111)

where E is an m × n column stochastic matrix. Further, if (p, q) ≻ (p′ , q′ ) ≻ (p, q)

we then write
(p, q) ∼ (p′ , q′ ) . (4.112)

Relative majorization is a pre-order. The property (p, q) ≻ (p, q) (i.e. reflexivity) follows
by taking E in the definition above to be the identity matrix. The transitivity of relative
majorization follows from the fact that the product of two column stochastic matrices is also
a column stochastic matrix.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

178 CHAPTER 4. MAJORIZATION

Exercise 4.3.1. Consider the equivalence relation ∼ as defined in (4.112).

1. Let p, q ∈ Prob(n) be two n-dimensional probability vectors and let P be an n × n

permutation matrix. Show that

(P p, P q) ∼ (p, q) and (p ⊕ 0, q ⊕ 0) ∼ (p, q). (4.113)

2. Show that for any m, n ∈ N and any probability vectors, p ∈ Prob(n) and q ∈ Prob(m),

(p, p) ∼ (q, q) . (4.114)

3. Let p1 , p2 ∈ Prob(n) and q1 , q2 ∈ Prob(m). Show that if p1 · p2 = q1 · q2 = 0 then

(p1 , p2 ) ∼ (q1 , q2 ) . (4.115)

From the second relation in (4.113) it follows that without loss of generality we can
always assume that there is no x ∈ [n] such that px = qx = 0 (since any x-component with
px = qx = 0 can be removed from the vectors p and q without changing the equivalency).

Standard Form
Definition 4.3.2. A pair (p, q) of probability vectors in Prob(n) is said to be given
in a standard form if there is no x ∈ [n] such that px = qx = 0, and the components
of the vectors p and q are arranged such that
p1 p2 pn
⩾ ⩾ ··· ⩾ , (4.116)
q1 q2 qn

where we used the convention px /qx = ∞ for x ∈ [n] with px > 0 and qx = 0.

Observe that since (P p, P q) ∼ (p, q) for every permutation matrix P , any pair of vectors
is equivalent (under relative majorization) to its standard form. The choice of the order given
in (4.116) will be clear later on when we characterize relative majorization with testing
regions.

Exercise 4.3.2. Let {e1 , e2 } be the standard basis of R2 . Express the pair (e1 , e2 ) in the
standard form.

Exercise 4.3.3. Show that if p, q ∈ Prob(n) and q has the form q = (q1 , . . . , qr , 0, . . . , 0)
for some r < n then
(p, q) ∼ (p′ , q) (4.117)

for any p′ ∈ Prob(n) whose first r components equal the first r components of p.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.3. RELATIVE MAJORIZATION 179

4.3.1 Lower and Upper Bounds

We saw earlier that the maximal and minimal elements of Prob(n) under majorization are e1
and u(n) , respectively. In the following exercise you find the maximal and minimal elements
of Prob(n) × Prob(n) under relative majorization.
Exercise 4.3.4. Let p, q ∈ Prob(n) be two n-dimensional probability vectors, and let e1 , e2 ∈
Prob(m) be two m-dimensional probability vectors with orthogonal support; i.e. e1 · e2 = 0.
Show that for any k ∈ N and any r ∈ Prob(k)

(e1 , e2 ) ≻ (p, q) ≻ (r, r) . (4.118)

In the following theorem we bound any pair of probability vectors by pairs of two dimen-
sional vectors. For any n ∈ N and p, q ∈ Prob(n), we denote by
X qx
λmin := qx and λmax := min , (4.119)
x∈supp(p) px
x∈supp(p)

where supp(p) := {x ∈ [n] : px ̸= 0}. Later in the book we will see that λmax and λmin are
related to the min and max relative entropies. In the following theorem we use the notations
e1 := (1, 0)T and e2 := (0, 1)T , and in addition denote by

vmax := λmax e1 + (1 − λmax )e2 and vmin := λmin e1 + (1 − λmin )e2 . (4.120)

Theorem 4.3.1. Using the same notations as above we have for any p, q ∈ Prob(n)

(e1 , vmax ) ≻ (p, q) ≻ (e1 , vmin ) (4.121)

Remark. Note that the bounds are not symmetric, meaning that if we swap p with q, the
vector e1 := (1, 0)T will appear second in the bounding pairs, and λmin and λmax will also
change. Moreover, if p > 0 (i.e. all the components of p are strictly positive) than λmin = 1
and consequently the lower bound becomes trivial.
Proof. We first prove the upper bound. For this purpose, we need to find an n × 2 column
stochastic channel E ∈ STOCH(n, 2) with the property that

Ee1 = p and Evmax = q , (4.122)

where e1 := (1, 0)T and e2 := (0, 1)T . Observe that the first condition above implies that the
first column of E must be equal to p. Combining this with the definition of vmax and with
the second condition above we get that
q − λmax p
Evmax = λmax p + (1 − λmax )Ee2 = q ⇒ Ee2 = . (4.123)
1 − λmax
By definition, λmax ∈ [0, 1] and it has the property that q ⩾ λmax p (i.e. qx ⩾ λmax px for each
x ∈ [n]). Therefore, the right-hand side of the equation above is a probability vector. To

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

180 CHAPTER 4. MAJORIZATION

summarize, the n × 2 column stochastic matrix E, whose first column is p, and its second
column is q−λ max p
1−λmax
satisfies (4.122) so that by definition the upper bound in (4.121) holds.
We now prove the lower bound. By definition, it is sufficient to show that there exists a
channel E ∈ STOCH(2, n) such that
Ep = e1 and Eq = vmin = λmin e1 + (1 − λmin )e2 . (4.124)
Since E must be a column stochastic matrix with two rows, it follows that if its first row
is tT then its second row is (1n − t)T , where 1Tn = (1, . . . , 1). Hence, E satisfies the above
conditions if and only if
t · p = 1 and t · q = λmin . (4.125)
Note also that we must have 0 ⩽ t ⩽ 1n (element-wise) since E is column stochastic. We
therefore choose t = (t1 , . . . , tn )T with
(
1 if px > 0
tx = ∀ x ∈ [n] . (4.126)
0 if px = 0
It is simple to check that this t satisfies (4.125). This completes the proof.
Note that when supp(p) = supp(q), the value of λmax , as defined in (4.119), is constrained
to the range 0 < λmax < 1, ensuring that vmax > 0. In this case we can improve the upper
bound (e1 , vmax ). Indeed, let 0 < s < t ⩽ 1 be such that
1−t t
p⩽q⩽ p. (4.127)
1−s s
Note that such s and t exists since we assume that p and q have the same support, and we
can take s close enough to zero and t close enough to one. Now, define a stochastic evolution
matrix E = [v1 v2 ] ∈ STOCH(m, 2), with the two columns v1 , v2 ∈ Prob(m) given by
(1 − s)q − (1 − t)p tp − sq
v1 := and v2 := . (4.128)
t−s t−s
Note that the conditions in (4.127) implies that E is indeed a column stochastic matrix since
tp − sq ⩾ 0 (entrywise) and (1 − s)q − (1 − t)p ⩾ 0. Moreover, denoting by s := (s, 1 − s)T
and t := (t, 1 − t)T we have by direct calculation (Exercise 4.3.5)
Es = p and Et = q . (4.129)
We therefore conclude that for every p, q ∈ Prob(n) with equal support, i.e., supp(p) =
supp(q), there exist s, t ∈ Prob>0 (2) satisfying the relation:
(s, t) ≻ (p, q) . (4.130)
Observe that the relation above hold as long as s, t ∈ [0, 1] satisfies s < t and
(1 − s)q ⩾ (1 − t)p and tp ⩾ sq . (4.131)
Exercise 4.3.5. Verify by direct calculation the relations in (4.129).
Exercise 4.3.6. Show that by taking t = 1 − λmax and s = 0 the relation (s, t) ≻ (p, q) is
equivalent to the upper bound in (4.121).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.3. RELATIVE MAJORIZATION 181

4.3.2 Majorization vs Relative Majorization

Majorization and relative majorization are interrelated concepts, rather than independent
ones. Specifically, they become equivalent when one of the probability vectors in each pair
has a uniform distribution. In this section, we will explore the deep connection between these
two concepts. This exploration will allow us, in the following section, to understand provide
a geometrical characterization of relative majorization. Moreover, the insights gained will
be used later in the book to prove and explore other related findings.
Relative majorization generalizes majorization between vectors. Specifically, for any two
probability vectors p, q ∈ Prob(n)

(p, u) ≻ (q, u) ⇐⇒ p≻q, (4.132)

where u := ( n1 , . . . , n1 )T is the uniform distribution. In the exercise below you will prove this
assertion using Theorem 4.1.1.

Exercise 4.3.7. Use the equivalence between the first two conditions in Theorem 4.1.1 to
prove (4.132).

The Special Case of Vectors with Rational Components

If one of the vectors has positive rational components, the relationship between majorization
and relative majorization becomes even closer than what we have seen so far. Consider a
pair of vectors (p, q), where p ∈ Prob(n) and
T
k1 kn
q := ,..., k1 , . . . , kn ∈ N , (4.133)
k k

and k := k1 + · · · + kn . Define the vector r ∈ Prob(k) via

M p p1 p2 p2 pn pn T
1
r := px u(kx ) = ,..., , ,..., ,..., ,..., (4.134)
k k k k k k
x∈[n] | 1 {z 1} | 2 {z 2} | n {z n}
k1 -times k2 -times kn -times

where u(kx ) is the uniform probability vector in Prob(kx ). We then have the following
theorem.

Theorem 4.3.2. Let p, q ∈ Prob(n) and r ∈ Prob(k) be as above with q having

positive rational components. Then,

(p, q) ∼ (r, u(k) ) . (4.135)

Remark. Observe that any vector 0 < q ∈ Prob(n) ∩ Qn can be expressed as in (4.133) for
sufficiently large k. This k is a common denominator for all the components of q.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

182 CHAPTER 4. MAJORIZATION

Proof. We first show that (p, q) ≻ (r, u(k) ). For any x ∈ [n], let E (x) be the kx × n matrix
whose x-column is u(kx ) and all the remaining n − 1 columns are zero. Moreover, let E be
the k × n matrix given by  
E (1)
 
 (2) 
E 
E := 
 ..  .
 (4.136)
 . 
 
(n)
E
By definition, E (x) p = px u(kx ) so that Ep = r. Similarly, E (x) q = kkx u(kx ) = k1 1kx so that
Eq = u(k) . Therefore, since E is column stochastic we get that (p, q) ≻ (r, u(k) ).
For the converse, let F (x) be the n × kx matrix whose x-row is [1, . . . , 1] and all the
remaining n − 1 rows are zero. Moreover, denote by F the n × k column stochastic matrix
given by h i
F := F (1) F (2) · · · F (n) . (4.137)

Observe that F E = In . Therefore, F r = F Ep = p and similarly F u(k) = F Eq = q.

In other words, (r, u(k) ) ≻ (p, q). But since we already proved that (p, q) ≻ (r, u(k) ) we
conclude that (p, q) ∼ (r, u(k) ).
Exercise 4.3.8. Verify all steps in the proof above; i.e. show that E and F are indeed
column stochastic and F E = In .
We have seen that relative majorization reduces to majorization when one of the vectors
is the uniform vector (see (4.132)). In the following excerise, you show that the remark-
able equivalence between (p, q) and (r, u(k) ) implies that relative majorization reduces to
majorization if one of the vectors has positive rational components.
Exercise 4.3.9. Let p ∈ Prob(n), 0 < q ∈ Prob(n) ∩ Qn , p′ ∈ Prob(m) and 0 < q′ ∈
Prob(m) ∩ Qm . Show that there exists k ∈ N and vectors r, r′ ∈ Prob(k) such that
(p, q) ≻ (p′ , q′ ) ⇐⇒ r ≻ r′ . (4.138)
Exercise 4.3.10. Let p ∈ Prob(n) and 0 < q ∈ Prob(n) ∩ Qn . Show that the pair (p, q)
is given in the standard form (i.e., satisfies (4.116)) if and only if the vector r as defined
in (4.134) satisfies r = r↓ .

4.3.3 Testing Regions

Relative majorization can be characterized geometrically in terms of testing regions (also
known as zonotopes). Testing regions are regions in R2 that have several applications in
statistics, particularly the area of hypothesis testing. The testing region associated with a
pair of probability vectors p, q ∈ Prob(n) is a region in R2 defined by
n o
n
T(p, q) := (p · t, q · t) : t ∈ [0, 1] , (4.139)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.3. RELATIVE MAJORIZATION 183

where t (also known as a probabilistic hypothesis test) is an n-dimensional vector with entries
between 0 and 1. Note that for any pair of probability vectors the points (0, 0) and (1, 1)
belong to its testing region. Explicitly, (0, 0) is obtained by taking t to be the zero vector,
and (1, 1) is obtained by taking t = (1, . . . , 1)T . An example of a testing region is plotted in
Fig. 4.2.

Figure 4.2: Testing Region.

Exercise 4.3.11. Show that the testing region is convex, and it has the symmetry that if
(x, y) ∈ T(p, q) then also (1 − x, 1 − y) ∈ T(p, q). Hint: For the latter property, consider
the vector t′ = (1, . . . ., 1) − t.
The testing region is bounded by two curves known as lower and upper Lorenz curves.
Due to the symmetry that (1 − x, 1 − y) ∈ T(p, q) for any (x, y) ∈ T(p, q), the upper Lorenz
curve can be obtained from the lower Lorenz curve through a 180-degree rotation centered
at the midpoint (1/2, 1/2). Consequently, either the lower or the upper Lorenz curve is
sufficient to uniquely define the entire testing region.
Since the testing region is convex it can be characterized by its extreme points. It is
tempting to draw a parallel with the convex set [0, 1]n , which possesses 2n extreme points
encapsulated within the set 0, 1n . However, this analogy can be misleading in the context of
our testing region. In reality, only 2n points are necessary to fully characterize the testing
region. We will focus on the extreme points characterizing the lower Lorenz curve. Note
that the lower Lorenz curve is a convex curve (while the upper Lorenz curve is concave).
Exercise 4.3.12. Let p, q ∈ Prob(n) and let P be an n × n permutation matrix.
1. Show that
T(P p, P q) = T(p, q) . (4.140)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

184 CHAPTER 4. MAJORIZATION

2. Show that
T(p ⊕ 0, q ⊕ 0) = T(p, q) . (4.141)
Since for any permutation matrix P , (p, q) and (P p, P q) have the same testing region,
we can assume without loss of generality that (p, q) are always given in the standard form.
This is also justified by the fact that under relative majorization we have (p, q) ∼ (P p, P q)
for any permutation matrix P .

Theorem 4.3.3. Given p, q ∈ Prob(n) in standard form, the extreme points on the
lower boundary of the testing region T(p, q) (specifically, on its lower Lorenz curve)
are the n + 1 vertices:
X X
(ak , bk ) := px , qx k = 0, 1, . . . , n , (4.142)
x∈[k] x∈[k]

where a0 := 0 and b0 := 0.
P
Remark. In general, the sum x∈[k] px does not equal to ∥p∥(k) since the components of p
are not necessarily arranged in a non-increasing order. Instead, the components of p and q
are arranged such that the order in (4.116) holds.
Proof. Let f : [0, 1] → [0, 1] be the function whose graph is the lower Lorenz curve of
(p, q). Then, by definition, for every a ∈ [0, 1], (a, f (a)) is the lowest point in T(p, q) whose
x-coordinate is a. Therefore, for any r ∈ [0, 1] we can express f (r) as
f (r) = min q · t : t ∈ [0, 1]n , p · t = r .

(4.143)
Our objective is to demonstrate that the function f (r) defines the segment connecting two
adjacent vertices. Specifically, consider a fixed k ∈ 0, 1, . . . , n. Our aim is to establish that
for any r in the interval [ak , ak+1 ), the function f (r) corresponds to the line segment joining
the points (ak , bk ) and (ak+1 , bk+1 ). Mathematically, this means that for all r ∈ [ak , ak+1 ),
we have:
f (r) = sk+1 (r − ak ) + bk , (4.144)
where sx := px /qx for all x ∈ [n], adhering to the convention that sx := ∞ if px = 0 and
qx > 0 (recall that we assume that there is no x ∈ [n] such that both px and qx are equal to
zero). Successfully proving this relationship implies that the set of points {(ak , bk )}nk=0 are
indeed the extreme points on the lower Lorenz curve of the testing region T(p, q).
To prove (4.144), observe that the optimization problem in (4.143) is a linear program.
In Exercise 4.3.13 you will apply methods discussed in Sec. A.9, specifically the dual problem
framework, to express f (r) as:
n o
n
f (r) = max rs − v · 1n : v ∈ R+ , sp − q ⩽ v , s ∈ R+ (4.145)

where 1n := (1, . . . , 1)T and the inequality is entry-wise. The maximization in (4.145) can
be simplified since the vector v with the smallest non-negative components that satisfies

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.3. RELATIVE MAJORIZATION 185

the constraint v ⩾ sp − q is given by v = (sp − q)+ (the components of (sp − q)+ are
{(spx − qx )+ }x∈[n] ). Hence,

f (r) = max sr − (sp − q)+ · 1n . (4.146)
s⩾0

To simplify further, note that

X X
(sp − q)+ · 1n = (spx − qx )+ = px (s − sx )+ . (4.147)
x∈[n] x∈[n]

Now, from (4.116) it follows that s1 ⩽ s2 ⩽ · · · ⩽ sn . Therefore, for any s ⩾ 0 there exists
ℓ ∈ {0, 1, . . . , n} with the property that sℓ ⩽ s < sℓ+1 , where we added the definitions s0 := 0
and sn+1 := ∞. With this definition of ℓ we get
X
(sp − q)+ · 1n = px (s − sx ) = saℓ − bℓ . (4.148)
x∈[ℓ]

Therefore, by splitting the maximization in (4.146) into maximization over all ℓ ∈ {0, 1, . . . , n}
and all s ∈ [sℓ , sℓ+1 ) we get

f (r) = max sup s(r − aℓ ) + bℓ . (4.149)
ℓ∈{0,1...,n} s∈[sℓ ,sℓ+1 )

If the optimal ℓ above satisfies ℓ ⩽ k then r − aℓ ⩾ 0 so that

sup s(r − aℓ ) + bℓ = sℓ+1 (r − aℓ ) + bℓ . (4.150)
s∈[sℓ ,sℓ+1 )

Moreover, among all ℓ ∈ {0, 1, . . . , k} the choice ℓ = k yields the greatest value since the
right-hand side above is increasing in ℓ as long as ℓ ⩽ k (see Exercise 4.3.14); hence,

max sup s(r − aℓ ) + bℓ = sk+1 (r − ak ) + bk . (4.151)
ℓ∈{0,1,...,k} s∈[sℓ ,sℓ+1 )

On the other hand, if ℓ > k then r − aℓ < 0 so that

sup s(r − aℓ ) + bℓ = sℓ (r − aℓ ) + bℓ (4.152)
s∈[sℓ ,sℓ+1 )

Furthermore, among all ℓ ∈ {k + 1, . . . , n} the choice ℓ = k + 1 yields the greatest value since
the right-hand side above is decreasing in ℓ as long as ℓ ⩾ k + 1 (see Exercise 4.3.14); that
is,
max sup s(r − aℓ ) + bℓ = sk+1 (r − ak+1 ) + bk+1 . (4.153)
ℓ∈{k+1,...,n} s∈[sℓ ,sℓ+1 )

The right-hand side of (4.151) is in fact equal to the right-hand side of (4.153) (see Exer-
cise 4.3.14). We therefore conclude that for any k ∈ {0, 1, . . . , n} and any r ∈ [ak , ak+1 ) we
have f (r) = sk+1 (r − ak ) + bk . This completes the proof.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

186 CHAPTER 4. MAJORIZATION

Exercise 4.3.13. Consider the function f (r) as defined in (4.143).

1. Show that the condition p·t = r in (4.143) can be replaced with p·t ⩾ r. Hint: Observe
that any t satisfying p · t > r can be rescaled to give p · t = r (and this rescaling can
only decrease q · t).

2. Prove the equality in (4.145). Hint: First express the minimization in (4.143) (after
replacing p · t = r with p · t ⩾ r) as a conic linear programming of the form (A.52)
(with vectors in Rn replacing Hermitian matrices, and the dot product replacing the
Hilbert-Schmidt inner product). Then use (A.57) and the strong duality to get (4.145).

Exercise 4.3.14. Show that the right-hand side of (4.150) is increasing in ℓ ∈ [k], and
the right-hand side of (4.152) is decreasing in ℓ ∈ {k + 1, . . . , n}. Moreover, show that the
two expressions are the same for ℓ = k and ℓ = k + 1 (i.e., show that the right-hand side
of (4.151) is equal to the right-hand side of (4.153)).

Exercise 4.3.15. Let p, q ∈ Prob(n) and t ∈ R. Prove the following equalities:

1
1 . (p − tq)+ · 1n = ∥p − tq∥1 + 1 − t .
2 (4.154)
1
2 . (tp − q)+ · 1n = ∥tp − q∥1 + t − 1 .
2
Hint: Use the relation (a − b)+ = 12 |a − b| + 12 (a − b).

Exercise 4.3.16. Compute the vertices of the lower Lorenz curve of the example given in
Fig. 4.2. If necessary, rearrange the components of p and q so that (4.116) holds.

Exercise 4.3.17. For a given p, q, ∈ Prob(n), find the vertices of T(p, q) that are located
on the upper Lorenz curve of (p, q).

Exercise 4.3.18. Let p ∈ Prob(n). Show that the vertices of the lower Lorenz curve of the
pair (p, u(n) ) are given by

k
∥p∥(k) , k = 0, 1, . . . , n , (4.155)
n

with the convention that for k = 0, ∥p∥(0) := 0.

Theorem 4.3.3 has the following interesting corollary.

Corollary 4.3.1. Let p, q ∈ Prob(n) and p′ , q′ ∈ Prob(n′ ). If for all t ⩾ 1 we have

∥p − tq∥1 ⩾ ∥p′ − tq′ ∥1 then T(p, q) ⊇ T(p′ , q′ ).

Remark. We will see shortly that the converse to the statement in the corollary above is also
true.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.3. RELATIVE MAJORIZATION 187

Proof. The proof follows immediately from the expression for the lower Lorenz curve in (4.146).
Explicitly, by using the variable t := 1s in (4.146), and using the notation fp,q (r) for f (r) we
get that for all r ∈ [0, 1]

r − (p − tq)+ · 1n
fp,q (r) = max
t⩾1 t
(4.156)
2r − ∥p − tq∥1 + t − 1
(4.154)→ = max .
t⩾1 2t
Therefore, if ∥p − tq∥1 ⩾ ∥p′ − tq′ ∥1 for all t ⩾ 1 then fp,q (r) ⩽ fp′ ,q′ (r) for all r ∈ [0, 1];
i.e., the lower Lorenz curve of the pair (p, q) is nowhere above the lower Lorenz curve of
(p′ , q′ ) so that T(p, q) ⊇ T(p′ , q′ ).
Exercise 4.3.19. Let p, q ∈ Prob(n) and p′ , q′ ∈ Prob(n′ ). Show that if (p, q) ≻ (p′ , q′ )
then for all t ∈ R we have ∥p − tq∥1 ⩾ ∥p′ − tq′ ∥1 . Hint: Use the property 2.8 of the 1-norm
∥ · ∥1 .
Exercise 4.3.20. Let p, q ∈ Prob(n) and p′ , q′ ∈ Prob(n′ ).
1. Show that if (p, q) ≻ (p′ , q′ ) then T(p′ , q′ ) ⊆ T(p, q).
2. Show that if (p, q) ∼ (p′ , q′ ) then T(p′ , q′ ) = T(p, q).
Hint: For the first part, let E ∈ STOCH(n′ , n) be such that p′ = Ep and q′ = Eq, and show
first that for any t′ ∈ [0, 1]n the vector t := E T t′ belongs to [0, 1]n and satisfies (t′ ·p′ , t′ ·q′ ) =
(t · p, t · q).

4.3.4 Characterization of Relative Majorization

Relative majorization possesses several valuable characterizations, all of which are succinctly
summarized in the theorem that follows. In Appendix D.9, we present a more extensive and
traditional proof of this theorem, employing concepts from convex analysis such as support
functions and sublinear functionals. In contrast to this comprehensive approach, we also
provide a considerably shorter proof here. This brief proof avoids reliance on the aforemen-
tioned concepts from convex analysis, and instead leverages the intricate relationship that
we previously examined between majorization and relative majorization.

Characterization
Theorem 4.3.4. Let n, n′ ∈ N, p, q ∈ Prob(n), and p′ , q′ ∈ Prob(n′ ). Then, the
following are equivalent:

1. (p, q) ≻ (p′ , q′ ).

2. For all t ∈ R we have ∥p − tq∥1 ⩾ ∥p′ − tq′ ∥1 .

3. T(p, q) ⊇ T(p′ , q′ ).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

188 CHAPTER 4. MAJORIZATION

Remark. The equivalence between 1 and 3 in Theorem 4.3.4 provides a very simple geomet-
rical characterization of relative majorization. Denoting by LC(p, q) and LC(p′ , q′ ) the two
lower Lorenz curves associated with the two testing regions, we have that (p, q) ≻ (p′ , q′ ) if
and only if LC(p, q) is nowhere above LC(p′ , q′ ). An example illustrating this property is
depicted in Fig. 4.3.

Figure 4.3: Lower Lorenz Curves. The red lower Lorenz curve LC(p, q) is nowhere above the blue
lower Lorenz curve LC(p′ , q′ ). This means that the pair (p, q) relatively majorizes the pair (p′ , q′ ).
5 1 7 1
Note that aside from the vertices (0, 0) and (1, 1), the vertices of LC(p, q) are ( 12 , 12 ), ( 12 , 6 ),
5 1 ′ ′ 1 1 7 1 5 7
( 6 , 2 ), and the vertices of LC(p , q ) are ( 3 , 12 ), ( 12 , 4 ), ( 6 , 12 ).

The implication 1 ⇒ 2 can be easily deduced from the monotonicity property of the norm
∥ · ∥1 , as discussed in (2.8). This part of the proof is straightforward and hence, is suggested
as an exercise for the reader (refer to Exercise 4.3.19). Having previously established the
implication 2 ⇒ 3 in Corollary 4.3.1, our remaining task is to demonstrate that 3 ⇒ 1. We
begin this proof by focusing on the case where both q and q′ consist of positive rational
components.

Lemma 4.3.1. Let q, p ∈ Prob(n) and p′ , q′ ∈ Prob(n′ ), and suppose that q and q′
have positive rational components. If T(p, q) ⊇ T(p′ , q′ ) then (p, q) ≻ (p′ , q′ ).

Proof. From Theorem 4.3.2 we get that there exist vectors r, r′ ∈ Prob(k) such that (p, q) ∼
(r, u(k) ) and (p′ , q′ ) ∼ (r′ , u(k) ), where k is a common denominator of all the components of
q and q′ . Therefore, from the second part of Exercise 4.3.20 we get that T(p, q) = T(r, u(k) )
and T(p′ , q′ ) = T(r′ , u(k) ). Moreover, since we assume that T(p, q) ⊇ T(p′ , q′ ) we get that
T(r, u(k) ) ⊇ T(r′ , u(k) ). Hence, the Lorenz curve LC(r, u(k) ) is nowhere above the Lorenz
curve LC(r′ , u(k) ). In addition, the non-zero vertices of LC(r, u(k) ) and LC(r′ , u(k) ) are given

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.3. RELATIVE MAJORIZATION 189

respectively by (cf. (4.155)):

ℓ ′ ℓ
∥r∥(ℓ) , and ∥r ∥(ℓ) , . (4.157)
k ℓ∈[k] k ℓ∈[k]

Therefore, since the vertex (∥r∥(ℓ) , ℓ/k) has the same y-coordinate as the vertex (∥r∥(ℓ) , ℓ/k),
and since the convex curve LC(r, u(k) ) is nowhere above the convex curve LC(r′ , u(k) ), we
get that ∥r∥(ℓ) ⩾ ∥r′ ∥(ℓ) for all ℓ ∈ [k]. That is, r ≻ r′ and from (4.138) this is equivalent to
(p, q) ≻ (p′ , q′ ). This completes the proof.

In order to completes the proof of Theorem 4.3.4 we will need a continuity argument that
extends the lemma above to the general case of arbitrary q and q′ .

Lemma 4.3.2. Let q, p ∈ Prob(n) and p′ , q′ ∈ Prob(n′ ) and suppose

T(p, q) ⊇ T(p′ , q′ ). Then, for every ε ∈ (0, 1) there exist two vectors q(ε) ∈ Prob(n)
and q′(ε) ∈ Prob(n′ ) with positive rational components such that q(ε) ≈ε q,
q′(ε) ≈ε q′ , and
T p, q(ε) ⊇ T p′ , q′(ε) .

(4.158)

Proof. We provide a geometrical proof using Fig. 4.4. By keeping p′ unchanged, we can raise

Figure 4.4: Geometrical Proof of Lemma 4.3.2.

slightly and vertically the vertices of LC(p′ , q′ ) to get the lower Lorenz curve of LC(p′ , q′(ε) ),
where q′(ε) has positive rational components and is ε-close to q′ ; see Fig. 4.4(a) for an
illustration. Explicitly, let ε1 , . . . , εn−1 be small enough positive numbers such that for all
x ∈ [n − 1], q ′ (ε) x := qx′ + εx is a rational number. Furthermore,
P we can always choose
ε1 , . . . , εn−1 to be small enough such that their sum δ := x∈[n−1] εx < ε and satisfies
′ (ε) := ′
qn qn − δ > 0 (due to the standard form of (p , q ) we have qn′ > 0). For these choices,
′ ′

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

190 CHAPTER 4. MAJORIZATION

′(ε) ′(ε)
the q′(ε) := (q1 , . . . , qn )T has positive rational numbers, and q′(ε) is also ε-close to q.
Furthermore, by construction, LC(p′ , q′(ε) ) is everywhere above LC(p′ , q′ ).
Once we established LC(p′ , q′(ε) ) we construct q(ε) in a similar way; see Fig. 4.4(b).
Explicitly, let ν1 , . . . , νn−1 be small enough positive numbers such that for all x ∈ [n − 1],
(ε)
qx := qx + νx is a rational number. Furthermore, we can always choose ν1 , . . . , νn−1 to be
P (ε)
small enough such that their sum ν := x∈[n−1] νx < ε and satisfies qn := qn − ν > 0. For
(ε) (ε)
these choices, the vector q(ε) := (q1 , . . . , qn )T has positive rational numbers, is ε-close to q,
and as long as ν1 , . . . , νn−1 are sufficiently small, LC(p, q(ε) ) is everywhere below LC(p′ , q′(ε) ).
This completes the proof.
With these lemmas at hand, we are now ready to prove the theorem.

Proof of Theorem 4.3.4

Proof. Recall that it is left to prove that 3 ⇒ 1. Since we assume that T(p, q) ⊇ T(p′ , q′ ),
Lemma 4.3.2 implies that for any ε ∈ (0, 1) there exists two vectors q(ε) ∈ Prob(n) and
q′(ε) ∈ Prob(n′ ) with positive rational components such that T(p, q(ε) ) ⊇ T(p′ , q′(ε) ). Thus,
applying Lemma 4.3.1 with the two pairs (p, q(ε) ) and (p′ , q′(ε) ) replacing (p, q) and (p′ , q′ )
gives
(p, q(ε) ) ≻ (p′ , q′(ε) ) . (4.159)
Now, let {εℓ }ℓ∈N be a sequence of numbers in (0, 1) with zero limit, and denote by qℓ := q(εℓ )
and q′ℓ := q′ (εℓ ) . With these notations for each ℓ ∈ N we have (p, qℓ ) ≻ (p′ , q′ℓ ) and
limℓ→∞ qℓ = q and limℓ→∞ q′ℓ = q′ . The condition (p, qℓ ) ≻ (p′ , q′ℓ ) implies that for each
ℓ ∈ N there exists a matrix E (ℓ) ∈ STOCH(n′ , n) such that
p′ = E (ℓ) p and q′ℓ = E (ℓ) qℓ . (4.160)
Since STOCH(n′ , n) is a compact set, the sequence {E (ℓ) }ℓ∈N has a converging subsequence.
For simplicity of the exposition we assume that {E (ℓ) }ℓ∈N is itself a convergent sequence
(otherwise, just replace everywhere {ℓ}ℓ∈N with a subsequence of elements {ℓj }j∈N ). There-
fore, there exists E ∈ STOCH(n′ , n) such that E = limℓ→∞ E (ℓ) . By taking the limit ℓ → ∞
on both of the equations in (4.160) we get p′ = Ep and q′ = Eq. That is, (p, q) ≻ (p′ , q′ ).
This completes the proof.
Exercise 4.3.21. Consider the second statement of Theorem 4.3.4.
1. Show that t ∈ R can be restricted to t ⩾ 1. Hint: Recall Corollary 4.3.1.
2. Show that t ∈ R can be restricted to t ∈ [0, 1]. Hint: Recall that (p, q) ≻ (p′ , q′ ) if and
only if (q, p) ≻ (q′ , p′ ).
Exercise 4.3.22. Let p = (p, 1 − p)T and q = (q, 1 − q)T be two probability vectors satisfying
p > q. Let p′ = (p′ , 1 − p′ )T and q′ = (q ′ , 1 − q ′ )T be another pair of probability vectors with
p′ ⩾ q and , q ′ ⩽ p. Show that
(p′ , q) ≺ (p, q) ⇐⇒ p′ ⩽ p
(4.161)
(p, q′ ) ≺ (p, q) ⇐⇒ q ⩽ q′

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.3. RELATIVE MAJORIZATION 191

Exercise 4.3.23. Let 0 < q ∈ Prob(n) and suppose q = q↓ . Show that

(en , q) ≻ (p, q) ∀ p ∈ Prob(n) , (4.162)

where en = (0, . . . , 0, 1)T .

4.3.5 Continuity of Relative Majorization

In this subsection we investigate several continuity properties of relative majorization. We
start by noting that in the case that q doesn’t have positive rational components, the equiva-
lence of the form (4.135) does not hold in general. However, it is still possible to approximate
this relation as the following exercise demonstrates.

Exercise 4.3.24. Let ε > 0, and p, q ∈ Prob(n). Show that:

1. If p ̸= q then there exists 0 < q′ ∈ Prob(n) ∩ Qn such that q′ is ε-close to q and

(p, q) ≻ (p, q′ ).

2. There exists a vector q′ ∈ Prob(n)∩Qn such that q′ is ε-close to q, supp(q′ ) = supp(q),

and (p, q′ ) ≻ (p, q).

Hint: Take q′ to be q(ε) or q′(ε) as defined in Lemma 4.3.2.

In the exercise above we approximated (p, q) with a pair of vectors (p, q′ ), where q′ has
some desired properties (particularly, rational components), and p is fixed. In the next two
lemmas we remove some of the assumptions on q by allowing p to vary. We will only assume
that q′ is close to q.
Specifically, consider two probability vectors p, q ∈ Prob(n) and let ε ∈ (0, 1) be a
sufficiently small number to be determined later. Our objective is to show that for every
p′ ∈ Bε (p) there exists δ ∈ (0, 1) and q′ ∈ Bδ (q) such that (p′ , q′ ) ≻ (p, q). We will be
able to show that such a q′ exists if we assume that q > 0 and that supp(p′ ) ⊆ supp(p). In
the following lemma we will use the notation ε0 := 12 pmin qmin , where pmin and qmin are the
smallest non-zero components of p and q, respectively.

Lemma 4.3.3. Let p, q ∈ Prob(n), q > 0, ε ∈ (0, ε0 ), δ := ε/qmin , and p′ ∈ Bε (p)

with supp(p′ ) ⊆ supp(p). Then, there exists q′ ∈ Bδ (q) such that (p′ , q′ ) ≻ (p, q).

Proof. We need to define q′ ∈ Bδ (q) and a channel E ∈ STOCH(n, n) such that Ep′ = p
and Eq′ = q. The key idea is to look for a matrix E of the form

Et := p + s (t − p′ ) ∀ t ∈ Prob(n) , (4.163)

where s ∈ R+ is some coefficient. Observe that we define the matrix E by its action on
probability vectors. Clearly, by construction, Ep′ = p. However, E above is not necessarily
a stochastic matrix since for arbitrary s ∈ R the vector p + s (t − p′ ) could have negative

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

192 CHAPTER 4. MAJORIZATION

components. We therefore choose s ⩾ 0 to be such that p ⩾ sp′ (entry-wise) so that E, as

defined above, is indeed a column stochastic matrix. We therefore take s to be the optimal
s that satisfies p ⩾ sp′ ; that is, we define
px
s := min . (4.164)
x∈supp(p′ ) p′x

Observe that s ⩽ 1 (since we cannot have px > p′x for all x ∈ supp(p′ )), and since supp(p′ ) ⊆
supp(p) we have s > 0. In fact, since p′ is ε-close to p we must have p′x ⩽ px + ε for all
x ∈ [n] so that
px pmin
s ⩾ min ′ ⩾ . (4.165)
x∈supp(p ) px + ε pmin + ε
Our next goal is to define q′ such that Eq′ = q. Observe that according to (4.163) we
have Eq′ = p + s (q′ − p′ ) so that Eq′ = q if p + s (q′ − p′ ) = q. Isolating q′ we get that
q′ should have the form
1
q′ := p′ + (q − p) (4.166)
s
(indeed, check that with this q′ we have Eq′ = q). However, it is not obvious that q′ has
non-negative components. Therefore, we next show that ε is small enough so that q′ ⩾ 0
(i.e. q′ is a probability vector).
Observe that q′ ⩾ 0 if and only if for all x ∈ [n] we have qx ⩾ px − sp′x . Now, since p is
ε-close to p′ we get that

px − sp′x ⩽ px − s(px − ε) = (1 − s)px + sε

px , s ⩽ 1 −−−−→ ⩽ 1 − s + ε
(4.167)
ε 2ε
(4.165)→ ⩽ +ε⩽ ⩽ qmin .
pmin + ε pmin

Hence, p′ is a probability vector. By definition,

1 ′ 1
∥q − q∥1 = ∥(1 − s)q − (p − sp′ )∥1
2 2s
1
Triangle inequality→ ⩽ (1 − s + ∥p − sp′ ∥1 )
2s (4.168)
1 −s 1
p ⩾ sp′ −−−−→ = = −1
s s
ε
(4.165)→ ⩽ =δ.
pmin

This completes the proof.

Exercise 4.3.25. Using the same notations as in Lemma 4.3.3, let

pmin qx
ε1 := min . (4.169)
x∈supp(p) pmin + (px − qx )+

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.4. THE TRUMPING RELATION 193

1. Show that ε0 < ε1 .

2. Show that the Lemma 4.3.3 still holds if we replace ε0 with ε1 .

qx′
Lemma 4.3.4. Let p, q, q′ ∈ Prob(n) and denote by δ := 1 − minx∈[n] qx
. Then,
there exists p′ ∈ Bδ (p) such that (p, q) ≻ (p′ , q′ ).

Remark. Note that if 12 ∥q − q′ ∥1 ⩽ ε for some ε > 0 then

qx − qx′ ε
δ = max ⩽ (4.170)
x∈[n] qx qℓ

where ℓ is the integer satisfying δ = 1 − qℓ′ /qℓ . Therefore, if q and q′ are very close to each
other so are p and p′ .
′
Proof. Let s := 1 − δ = minx∈[n] qqxx . From the definition of s we have q′ ⩾ sq (entry-wise),
so that the mapping
Et := q′ + s (t − q) ∀ t ∈ Prob(n) (4.171)
is a channel. By definition, Eq = q′ . Define

p′ := Ep
(4.172)
(4.171)→ = q′ + s (p − q) ,

so that (p, q) ≻ (p′ , q′ ). Then,

1 ′ 1
∥p − p∥1 = ∥q′ − sq − (1 − s)p∥1
2 2
1 ′ 1 (4.173)
Triangle inequality→ = ∥q − sq∥1 + (1 − s)
2 2
q ⩾ sq −−−−→ = 1 − s .
′

This completes the proof.

Exercise 4.3.26. Prove the following theorem: Let p, q, p′ ∈ Prob(n) and denote by δ :=
′
1 − minx∈[n] ppxx . Then, there exists q′ ∈ Bδ (q) such that (p, q) ≻ (p′ , q′ ).

4.4 The Trumping Relation

Another pre-order that plays an important role in several resource theories is the following
variant of majorization.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

194 CHAPTER 4. MAJORIZATION

The Trumping Relation

Definition 4.4.1. For any p, q ∈ Prob(n) we say that p trumps q and write

p ≻∗ q , (4.174)

if there exists an integer m ∈ N and a vector r ∈ Prob(m), known as the catalyst

vector, such that
p⊗r≻q⊗r. (4.175)

Remark. Observe that while we require that the dimension m < ∞, it is still unbounded. We
will see later that this implies that the trumping relation is very sensitive to perturbations,
and also leads to a phenomenon know as ‘embezzlement’ of entanglement.
By definition, if p ≻ q then necessarily p ≻∗ q. The example below demonstrates that
the opposite direction does not hold in general. In this sense, the trumping relation impose
a weaker constraint than majorization.
Example. Consider the probability vectors
   
2/5 1/2
   
   
 2/5  1/4
p := 
 and
 q=
 
 (4.175)
1/10 1/4
   
1/10 0

It is simple to check (see exercise below) that p ̸≻ q and q ̸≻ p. Yet, p ≻∗ q since the vector
r = (3/5, 2/5)T satisfies (4.175).
Exercise 4.4.1. Let p, q, r be as in the example above.
1. Verify that p ̸≺ q and q ̸≺ p.
2. Verify that (4.175) holds.
Exercise 4.4.2. Show that if p, q ∈ Prob(3) and p ≻∗ q then p ≻ q.
Exercise 4.4.3. Show that the uniform probability vector u cannot act as a catalyst verctor;
that is, show that if p ∈ Prob(n) and q ∈ Prob(m) are such that p ̸≻ q then for any k ∈ N,
p ⊗ u(k) ̸≻ q ⊗ u(k) .
Exercise 4.4.4. Let f : Prob(n) → R be a Schur convex function that is additive under
tensor product; i.e. for all p ∈ Prob(n) and q ∈ Prob(m)
f (p ⊗ q) = f (p) + f (q) . (4.176)
Show that
p ≻∗ q ⇒ f (p) ⩾ f (q) . (4.177)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.4. THE TRUMPING RELATION 195

The tramping relation can also be extended to pairs of probability vectors.

Relative Trumping
Definition 4.4.2. Let p, q ∈ Prob(n) and p′ , q′ ∈ Prob(m). We say that the pair
(p, q) relatively trumps the pair (p′ , q′ ), and write

(p, q) ≻∗ (p′ , q′ ) , (4.178)

if there exists k ∈ N and vectors r, s ∈ Prob(k) with s > 0 such that

(p ⊗ r, q ⊗ s) ≻ (p′ ⊗ r, q′ ⊗ s) . (4.179)

Exercise 4.4.5. Show that if we did not impose s > 0 (or alternatively that r > 0) in
the definition above then there would always exists a catalyst. Hint: Take r and s to be
orthogonal.

Remark. In the above definition, an alternative approach could have been to require that
the vectors r and s are not orthogonal instead of enforcing s > 0. Nevertheless, opting for
the stricter criterion of s > 0 brings two distinct benefits. Firstly, within the rational field,
where vectors like q, q′ , r and others comprise rational components, Theorem 4.3.2 implies
that (r, s) ∼ (t, u), for some vector t of higher dimensionality. This equivalence effectively
simplifies relative trumping to standard trumping in this scenario. Secondly, when applying
the concept of relative trumping to thermodynamic contexts, the vector s typically represents
a Gibbs state, which is inherently positive by nature. This correspondence ensures that the
mathematical model is in harmony with the underlying physical principles of Gibbs states
in thermodynamics.

In the next chapter we will study functions that behaves monotonically under relative
trumping. A well known family of such functions are Rényi divergences. Rényi divergences
are defined for any α ∈ [0, ∞] and any p, q ∈ Prob(n) as

(
1
pαx qx1−α
P
α−1
log x∈[m] if supp(p) ⊆ supp(q) .
Dα (p∥q) := (4.180)
∞ otherwise

The cases α = 0, 1, ∞ are defined by taking the appropriate limits (more details will be given
in the next chapter). Both the trumping and relative trumping relations can be characterized
with the above family of Rényi divergences.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

196 CHAPTER 4. MAJORIZATION

Characterization of the Trumping Relation

Theorem 4.4.1. Let p, q ∈ Prob(n) with either p > 0 or q > 0 and p ̸= q. Then,

p ≻∗ q (4.181)
1
if and only if for all α ⩾ 2

Dα (p∥u) > Dα (q∥u) and Dα (u∥p) > Dα (u∥q) . (4.182)

The proof of the theorem above is rather complicated and goes beyond the scope of this
book. In the section ‘Notes and References’ at the end of this chapter, we discuss its history
and provide relevant references for further reading. In the corollary below we show that this
theorem can be extended to relative tramping for the case that one of the vectors in each
pair has positive rational components. We use the notation
(p, q) ⊗ (p′ ⊗ q′ ) := (p ⊗ p′ , q ⊗ q′ ) ∀ p, q ∈ Prob(n) , ∀ p′ , q′ ∈ Prob(m) . (4.183)

Corollary 4.4.1. Let m, n ∈ N, p, q ∈ Prob(n), p′ , q′ ∈ Prob(m), and suppose that

both q and q′ have positive rational components. Then, the following are equivalent:

1. There exists k ∈ N and a vector s ∈ Prob(k) such that

(p, q) ⊗ s, u(k) ≻ (p′ , q′ ) ⊗ s, u(k)

(4.184)

1
2. For all α ⩾ 2

Dα (p∥q) > Dα (p′ ∥q′ ) and Dα (q∥p) > Dα (q′ ∥p′ ) . (4.185)

Note that the condition 1 above implies in particular that (p, q) ≻∗ (p′ , q′ ). We leave
the proof of the corollary as an exercise.
Exercise 4.4.6. Prove Corollary 4.4.1 using the combination of Theorem 4.4.1 and Theo-
rem 4.3.2.

4.5 Catalytic Majorization

In this section we study a variant of the trumping relation that we call catalytic majorization.
We will see that catalytic majorization is robust under small perturbations and therefore is
more useful in some applications, particularly in thermodynamics.

4.5.1 Robustness Under Small Perturbations

Both majorization and relative majorization are robust under small perturbations. To be
precise, let {pk }k∈N and {qk }k∈N be sequences in Prob(n) with limits p and q, respectively,

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.5. CATALYTIC MAJORIZATION 197

and let {p′k }k∈N and {q′k }k∈N be sequences in Prob(m) with limits p′ and q′ . Suppose now
that (pk , qk ) ≻ (p′k , q′k ) for all k ∈ N and recall from Theorem 4.3.4 that this means that
T(pk , qk ) ⊇ T(p′k , q′k ). Since this inclusion of testing regions is robust under taking the limit,
we conclude that T(p, q) ⊇ T(p′ , q′ ) so that necessarily (p, q) ≻ (p′ , q′ ).
The above argument cannot be applied to the trumping and relative trumping relations.
To see why, consider two sequences {pk }k∈N ⊆ Prob(n) and {qk }k∈N ⊆ Prob(m) with limits
p and q, and suppose that pk ≻∗ qk for all k ∈ N. This means that for each k ∈ N there
exists a catalyst vector rk ∈ Prob(ℓk ) such that pk ⊗ rk ≻ qk ⊗ rk , where ℓk ∈ N is the
dimension of rk that can depend on k. Without invoking additional arguments, one cannot
rule out the possibility that the dimension ℓk goes to infinity as k goes to infinity. This
means that we cannot conclude that p ≻∗ q. However, at the time of writing this book, it
is left open to find an example with convergent sequences satisfying pk ≻∗ qk for all k ∈ N,
whereas their limits satisfy p ̸≻∗ q.

Exercise 4.5.1. Show that if there exists an example as above, then there is also a similar
example for relative trumping. That is, there exists sequences {pk }k∈N , {qk }k∈N , {p′k }k∈N ,
and {q′k }k∈N , in Prob(n), with limits p, q, p′ , and q′ , respectively, such that (pk , qk ) ≻∗
(p′k , q′k ) for all k ∈ N and (p, q) ̸≻∗ (p′ , q′ ).

4.5.2 Robust Version of Relative Trumping

In the definition below we define a robust version of relative trumping. Similarly, one can
define a robust version for trumping itself, however, we do not do it here since the robust
version of relative trumping is the concept that we will use later on in applications to resource
theories.

Catalytic Majorization
Definition 4.5.1. Let m, n ∈ N and p, q ∈ Prob(n) and p′ , q′ ∈ Prob(m). We say
that (p, q) catalytically majorizes (p′ , q′ ) and write

(p, q) ≻c (p′ , q′ ) , (4.186)

if for any ε > 0 there exists four vector pε ∈ Bε (p), qε ∈ Bε (q), p′ε ∈ Bε (p′ ), and
q′ε ∈ Bε (q′ ) such that (pε , qε ) ≻∗ (p′ε , q′ε ).

Exercise 4.5.2. Show that ≻c is indeed a pre-order, and if (p, q) ≻∗ (p′ , q′ ) then necessarily
(p, q) ≻c (p′ , q′ ).

The relation ≻c is robust under perturbation essentially by definition.

Exercise 4.5.3 (Robustness of Catalytic Majorization). Show that if for any ε > 0 there
exists four vector pε ∈ Bε (p), qε ∈ Bε (q), p′ε ∈ Bε (p′ ), and q′ε ∈ Bε (q′ ) such that
(pε , qε ) ≻c (p′ε , q′ε ) then (p, q) ≻c (p′ , q′ ).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

198 CHAPTER 4. MAJORIZATION

In the definition of catalytic majorization, we considered four balls of radius ε around p,

q, p′ , and q′ . We now show that under certain support conditions, it is sufficient to consider
only two such balls.

Lemma 4.5.1. Let m, n ∈ N, p, q ∈ Prob(n) and p′ , q′ ∈ Prob(m). Suppose further

that both p′ > 0 and q > 0. Then, the following are equivalent:

1. (p, q) ≻c (p′ , q′ ).

2. For any ε > 0 there exist qε ∈ Bε (q) and q′ε ∈ Bε (q′ ) such that
(p, qε ) ≻∗ (p′ , q′ε ).

Proof. Clearly, if for all ε > 0 the two vectors in the second statement exist, then (p, q) ≻c
(p′ , q′ ) since we can define pε := p and p′ε := p′ , so that the four vectors pε , qε , p′ε , and q′ε
satisfy the conditions in Definition 4.5.1. It thus remains to show the converse implication.
Suppose (p, q) ≻c (p′ , q′ ) and let pε , qε , p′ε , and q′ε be as in Definition 4.5.1. In particular,
(pε , qε ) ≻∗ (p′ε , q′ε ) and we choose ε > 0 to be sufficiently small so that qε > 0 (recall
that q > 0 and qε is ε-close to q) and p′ε > 0. From Lemma 4.3.3 it follows that there
ε
exists r ∈ Bδ (qε ) with δ := qε,min (qε,min being the smallest component of qε ) such that
(p, r) ≻ (pε , qε ). Similarly, from the version of Lemma (4.3.4) that is given in Exercise 4.3.26
′ ′
it follows that there exists a vector r′ ∈ Bδ (q′ε ) with δ ′ := 1 − minx∈[n] (pp′x)x , such that
ε
(p′ε , q′ε ) ≻ (p′ , r′ ). Observe that r and r′ satisfy

(p, r) ≻ (pε , qε ) ≻∗ (p′ε , q′ε ) ≻ (p′ , r′ ) . (4.187)

Hence, in particular (p, r) ≻∗ (p′ , r′ ). Now, recall that r is δ-close to qε and therefore
(δ + ε)-close to q. Similarly, r′ is (δ ′ + ε)-close to q′ . The proof is therefore concluded by the
observation that both δ and δ ′ go to zero in the limit ε → 0 so that r and r′ can be made
arbitrarily close to q and q′ , respectively.
Exercise 4.5.4. Show that under the assumption that q > 0 and p′ > 0 both δ and δ ′ in the
lemma above goes to zero as ε goes to zero.

4.5.3 Characterization of Catalytic Majorization

This subsection concludes with an insightful characterization of catalytic majorization. This
characterization is articulated through Rényi divergences, a topic we will delve into in greater
detail in subsequent chapters. We have previously encountered Rényi divergences in The-
orem 4.4.1 and Corollary 4.4.1. Both Theorem 4.4.1 and Corollary 4.4.1 incorporate strict
inequalities among Rényi divergences. These strict inequalities highlight the fact that rel-
ative trumping may not be resilient to noise, though a counterexample has not yet been
identified. The upcoming theorem expands upon Theorem 4.4.1, replacing the concept of
the trumping relation with the robust relative trumping relation, what we define as catalytic
majorization.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.5. CATALYTIC MAJORIZATION 199

Characterization of Catalytic Majorization

Theorem 4.5.1. Let m, n ∈ N, p, q ∈ Prob(n), and p′ , q′ ∈ Prob(m) such that
either p > 0 or q > 0. Then the following statements are equivalent:

1. (p, q) ≻c (p′ , q′ )

2. For all α ⩾ 1
2
we have Dα (p∥q) ⩾ Dα (p′ ∥q′ ) and Dα (q∥p) ⩾ Dα (q′ ∥p′ ).

Proof. The proof of the theorem for the special case that either p = q or p′ = q′ is very
simple and is left as an exercise. Therefore, we assume now that both p ̸= q and p′ ̸= q′ .
Due to the symmetry in the roles of p and q, and since one of them has full support, we
assume without loss of generality (without loss of generality) that it is q; that is, we assume
q > 0. The proof of the monotonicity property of Dα under catalytic majorization will be
detailed in Chapter 6, where we extensively study the properties of the Rényi divergences.
Consequently, this section will only cover the proof of the implication 2 ⇒ 1.
From Exercise 4.3.24 it follows that for any ε > 0 there exist qε ∈ Bε (q) and q′ε ∈ Bε (q′ )
with 0 < qε ∈ Prob(n) ∩ Qn and 0 < q′ε ∈ Prob(m) ∩ Qm such that

(p, qε ) ≻ (p, q) and (p′ , q′ ) ≻ (p′ , q′ε ) . (4.188)

In Chapter 6 we will see that Dα behaves monotonically under both relative majorization
and catalytic majorization. Therefore, the relations above combined with our assumption
that (p, q) ≻c (p′ , q′ ), lead to the conclusion that

Dα (p∥qε ) ⩾ Dα (p∥q) ⩾ Dα (p′ ∥q′ ) ⩾ Dα (p′ ∥q′ε ) , (4.189)

and similarly

Dα (qε ∥p) ⩾ Dα (q∥p) ⩾ Dα (q′ ∥p′ ) ⩾ Dα (q′ε ∥p′ ) . (4.190)

Now, since both qε and q′ε have positive rational components, there exists two finite dimen-
sional probability vectors rε , r′ε ∈ Prob(mε ), with mε ∈ N, such that (see Theorem 4.3.2)

rε , u(mε ) ∼ (p, qε ) and r′ε , u(mε ) ∼ (p′ , q′ε ) .

(4.191)
1
Hence, for all α ⩾ 2

Dα rε u(mε ) ⩾ Dα r′ε u(mε ) and Dα u(mε ) rε ⩾ Dα u(mε ) r′ε .

(4.192)

Our strategy is to use Theorem 4.4.1 in conjunction with the inequalities above in order to
obtain a majorization relation between r′ε and rε . However, since the inequalities above are
not strict, we cannot use Theorem 4.4.1 and will need to tweak a bit the vector r′ε .
We first rule out the possibility that r′ε = u(mε ) . Consulting the construction in Theo-
rem 4.3.2, we see that r′ε = u(mε ) implies p′ = q′ε . However, this cannot occur for sufficiently

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

200 CHAPTER 4. MAJORIZATION

ε→0+
small ε > 0 since q′ε −−−→ q′ ̸= p′ by our assumption. Hence, we can assume r′ε ̸= u(mε ) for
sufficiently small ε > 0. Moreover, observe that for any ε ∈ (0, 1), we have (see Exercise 4.1.2)
r′ε ≻ sε := (1 − ε) r′ε + εu(mε ) . (4.193)
A combination of the relation r′ε ≻ sε (note that sε > 0) with Theorem 4.4.1 and Eq. (4.192),
gives the following strict inequalities for all α ⩾ 12

Dα rε u(mε ) > Dα sε u(mε ) and Dα u(mε ) rε > Dα u(mε ) sε .

(4.194)
Since the condition above is equivalent to the condition given in Theorem 4.4.1 it follows
that rε ≻∗ sε . Hence,
(p, qε ) ∼ (rε , u(mε ) ) ≻∗ (sε , u(mε ) ) ∼ (p′ε , q′ε ) , (4.195)
where
p′ε := (1 − ε) p′ + εq′ε . (4.196)
The equivalence (sε , u(mε ) ) ∼ (p′ε , q′ε ) can be verified from the construction in Theorem 4.3.2.
Since catalytic majorization is robust to small perturbations (see Exercise 4.5.3), the relation
(p, qε ) ≻c (p′ε , q′ε ) that follows from (4.195) (and holds for any sufficiently small ε > 0) gives
(p, q) ≻c (p′ , q′ ). This concludes the proof.
Exercise 4.5.5. Prove the theorem above for the special case that p = q. Similarly, prove
it also for the case that p′ = q′ .
Exercise 4.5.6. Prove the equivalence (sε , u(mε ) ) ∼ (p′ε , q′ε ). Hint: Use the construction in
Theorem 4.3.2.

4.6 Conditional Majorization

Conditional majorization is a concept that helps us understand the uncertainty inherent in
a physical system when we have access to another correlated system. This idea is formalized
as a pre-order relationship within the set of probability distributions denoted as Prob(mn),
where m represents the dimension of a classical system denoted as X (held by Alice), and n
is the dimension of the “conditioning” system referred to as Y (held by Bob). In essence, this
pre-order quantifies the uncertainty associated with system X when we possess information
about system Y . As we delve further into this topic in the book, we’ll find that conditional
majorization serves as the foundation for defining conditional entropy.
We will use the notation pXY to denote a probability vector in Prob(mn), which is mn-
dimensional. It’s important to note that we view pXY as a probability distribution associated
with the joint system XY . To clarify this perspective, we denote the components of pXY as
{pxy } and express pXY as follows:
X X
pXY = pxy eX Y
x ⊗ ey , (4.197)
x∈[m] y∈[n]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.6. CONDITIONAL MAJORIZATION 201

where {eX m Y n
x }x∈[m] is the standard basis of R , and {ey }y∈[n] is the standard basis of R . While
mathematically, pXY is a vector in Prob(mn), conceptually, we treat it as a joint probability
distribution.
To introduce the concept of conditional majorization, we build upon the foundation of
conditional mixing operations. Our objective is to characterize a set of evolution matrices
that possess a specific property: they increase the conditional uncertainty associated with
system X when provided access to system Y . To embark on this journey, consider three clas-
sical systems: X, Y , and Y ′ , each with dimensions m, n, and n′ , respectively. Additionally,
′
consider two probability vectors pXY ∈ Prob(mn) and qXY ∈ Prob(mn′ ). We say that pXY
′ ′
conditionally majorizes qXY and denote it as pXY ≻X qXY when there exists a conditional
mixing operation (to be define shortly), denoted as M ∈ STOCH(mn′ , mn), such that:
′
qXY = M pXY . (4.198)

The challenge lies in crafting a meaningful definition for M that aligns with the concept
of “conditional mixing.” In this context, we demonstrate that there exist three distinct
approaches to defining M , mirroring the three methodologies introduced in Section 4.1.2.
Remarkably, all of these approaches converge to the same definition of conditional mixing and
conditional majorization, thereby establishing a solid foundation for the notion of conditional
majorization.
This section is structured as follows: Initially, we introduce both the axiomatic and con-
structive approaches, showing that they both lead to the identical definition of a conditional
mixing operation. We then use this definition to establish conditional majorization and to
examine some of its key properties. Following that, we explore a useful characterization of
conditional majorization, paying special attention to cases in smaller dimensions. Lastly, in
the final subsection on this topic, we investigate the operational approach to conditional ma-
jorization and demonstrate its consistency with the axiomatic and constructive approaches.

4.6.1 The Axiomatic Approach

In this approach, we build our foundation upon two minimalistic axioms, which we regard
as fundamental for any reasonable definition of a conditional mixing operation. Much like
in the previous context of mixing operations, our framework operates under the assumption
that the conditional mixing operation can be represented by a stochastic matrix denoted as
M , which belongs to the set STOCH(mn′ , mn). This matrix M serves as a representation
of a classical channel that transforms the joint system XY into XY ′ . In a conceptual sense,
we can envision Alice as the possessor of system X, while Bob holds systems Y and Y ′ .

Axiom 1: No-Signaling from Alice to Bob.

A key condition that the conditional mixing operation M ∈ STOCH(mn, mn′ ) must adhere
to in order to avoid decreasing conditional uncertainty relates to the prevention of infor-
mation leakage from subsystem X to subsystem Y . Such leakage could potentially reduce
uncertainty about system X. Conditional uncertainty specifically pertains to the notion of

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

202 CHAPTER 4. MAJORIZATION

uncertainty about system X when one has access to system Y . To address this, we introduce
a minimalistic causality assumption that accounts for the property that system X has no
causal effect on system Y ′ . In mathematical terms, this assumption implies that the compo-
nents of the stochastic matrix M = (µx′ y′ |xy ) satisfy the following equation for all x ∈ [m],
y ∈ [n], and y ′ ∈ [n′ ]: X
µx′ y′ |xy = ry′ |y , (4.199)
x′ ∈[m]

where {ry′ |y } (with y ∈ [n], and y ′ ∈ [n′ ]) is some conditional probability distribution in-
dependent on x. We refer to this condition as non-signalling from X to Y ′ or in short
X ̸→ Y ′ -signalling (see (2.135) for a similar definition).
Matrices that satisfies the above non-signalling condition has a relatively simple form.
′
Specifically, for every y ∈ [n] and y ′ ∈ [n′ ] let T (y,y ) ∈ Rm×m
+ be the matrix whose components
are µx′ y′ |xy
(y,y ′ )
tx′ |x := ∀x, x′ ∈ [m] . (4.200)
ry′ |y
′
From (4.199) it then follows that T (y,y ) is column stochastic; i.e., for every y ∈ [n] and
′
y ′ ∈ [n′ ] we have T (y,y ) ∈ STOCH(m, m). With these notations we can express M as
(summations runs over all x, x′ ∈ [m] and all y ∈ [n] and y ′ ∈ [n′ ])
X
M= µx′ y′ |xy |x′ ⟩⟨x| ⊗ |y ′ ⟩⟨y|
x,x′ ,y,y ′
X X (y,y ′ )
(4.201)
= ry′ |y tx′ |x |x′ ⟩⟨x| ⊗ |y ′ ⟩⟨y| ,
y,y ′ x,x′

where we employed quantum notations by denoting |x′ ⟩⟨x| (and similarly |y ′ ⟩⟨y|) as the
m × m rank-one matrix eTx′ ex , which is a matrix with a one at the (x′ , x)-position and
zeros elsewhere. We therefore conclude that M ∈ STOCH(mn, mn′ ) is X ̸→ Y ′ -signalling
′
if and only if there exists nn′ stochastic matrices T (y,y ) ∈ STOCH(m, m) (with y ∈ [n] and
y ′ ∈ [n′ ]), and another stochastic matrix R = (ry′ |y ) ∈ STOCH(n′ , n) such that
′
X
M= ry′ |y T (y,y ) ⊗ |y ′ ⟩⟨y| . (4.202)
y,y ′

In the following exercise you show that the above form of M represents a bipartite channel
that can be realized with one-way communication from Bob to Alice.
Exercise 4.6.1. Let M ∈ STOCH(mn′ , mn). Show that the following two statements are
equivalent:
1. M is X ̸→ Y ′ -signalling.
2. M can be realized with one-way communication from Bob to Alice. That is, M can be
expressed as (refer to Fig. 4.5):
X
M= T (j) ⊗ Rj (4.203)
j∈[k]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.6. CONDITIONAL MAJORIZATION 203

′
where k ∈ N, and for each j ∈ [k], T (j) ∈ STOCH(m, m), Rj ∈ Rn+ ×n , and R :=
′
P
j∈[k] Rj ∈ STOCH(n , n).

Figure 4.5: An X ̸→ Y ′ -signalling bipartite channel M ∈ STOCH(mn′ , mn).

The term X ̸→ Y ′ -signalling is sometimes referred to as X ̸→ Y ′ semi-causal. The

following exercise introduces the notion of X ̸→ Y ′ semi-causal.
′ ′
Exercise 4.6.2. PLet M = (µx′ y′ |xy ) ∈ STOCH(mn , mn), and N = (νy′ |xy ) ∈ STOCH(n , mn),
where νy′ |xy := x′ ∈[m] µx′ y′ |xy .

1. Show that if M satisfies (4.199) then for every evolution matrix E ∈ STOCH(m, m)
the marginal channel N satisfies

N (E ⊗ In ) = N . (4.204)

This condition ensure that any operation E that Alice (system X) may chose to apply
to her system cannot be detected by Bob (system Y ). Such a condition is also called
X ̸→ Y ′ semi-causal. See Fig. 4.6 for an illustration of a semi-causal channel.

2. Show that if N satisfies (7.14) for all E ∈ STOCH(m, m) then M satisfies (4.199).

Figure 4.6: An illustration of a semi-causal classical bipartite channel M . The marginal channel
N equals N (E ⊗ In ) for any choice of E ∈ STOCH(m, m).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

204 CHAPTER 4. MAJORIZATION

Axiom 2: Preservation of Maximal Uncertainty

Consider a joint probability vector in the form of uX ⊗ pY , where uX ∈ Prob(m) represents
the uniform probability vector, and pY is some probability vector in Prob(n). Since this
joint probability distribution is uncorrelated, having access to Y cannot aid in reducing the
uncertainty associated with X. Therefore, we can assume that such a joint probability vector
exhibits the highest degree of conditional uncertainty concerning X|Y (i.e., X given access
to Y ). Therefore, any stochastic (evolution) matrix M ∈ STOCH(mn, mn′ ) that does not
decrease conditional uncertainty must map states with maximal conditional uncertainty to
states that also possess maximal conditional uncertainty. In explicit terms, such a stochastic
map M ∈ STOCH(mn, mn′ ) must satisfy, for all pY ∈ Prob(n):
′
M uX ⊗ pY = uX ⊗ qY

(4.205)
′
where qY is a probability vector in Prob(n′ ). It is important to note that this assumption is
exceedingly minimalistic, as it merely posits that if X is initially maximally uncertain, then
applying a conditional mixing operation should not diminish this maximal uncertainty.
′
The vector qY in the equation above is not independent of M and pY . In fact, by
′ ′
left-multiplying both sides of the equation with 1Tm ⊗ I Y , we can express the vector qY as
follows:
′ ′
qY = 1Tm ⊗ I Y M uX ⊗ pY .

(4.206)

Note that the multiplication by row vector 1Tm in the expression above effectively functions
as the “tracing out” of system X.

Definition of Conditionally Mixing Operations (CMO)

By combining the two axioms presented above, we arrive at the following definition for
conditional mixing operations. In this definition, we consider three classical systems: X, Y ,
and Y ′ , with dimensions m, n, and n′ , respectively.

Definition 4.6.1. A stochastic matrix M ∈ STOCH(mn′ , mn) is called a

conditionally mixing operation if it satisfies the two conditions given in (4.205)
and (4.199). The set of all conditional mixing operations in STOCH(mn′ , mn) is
denoted by CMO(mn′ , mn).

Before we delve into characterizing the maps within CMO(mn′ , mn), let’s explore an
alternative approach, which we refer to as the constructive approach. We will demonstrate
that this approach ultimately yields the same set of conditional mixing operations.

4.6.2 The Constructive Approach

In the constructive approach we propose to construct conditionally mixing operation as it is
intuitively suggests. More precisely, a conditional mixing operation M is a stochastic map

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.6. CONDITIONAL MAJORIZATION 205

obtained by Alice applying a mixing operation to her system (i.e., doubly stochastic map)
conditioned on information received from Bob (see Fig. 4.7). Mathematically,
X
M= D(j) ⊗ Rj (4.207)
j∈[k]

where j ∈ [k] is the information Bob sends to Alice after he processes his input y via
Rj = (ry′ j|y ). Upon receiving j Alice applies a mixing operation to her input x described by
the m × m doubly-stochastic matrix D(j) .

Conditionally Doubly Stochastic (CDS)

Definition 4.6.2. A stochastic matrix M ∈ STOCH(mn′ , mn) is said to be
conditionally doubly stochastic (CDS) if it has the form given in (4.207) with k ∈ N,
′
and for each j ∈ [k], D(j) is an mP× m doubly-stochastic matrix, and Rj ∈ Rn+ ×n is a
sub-stochastic matrix such that j∈[k] Rj ∈ STOCH(n′ , n). The set of all
conditionally doubly stochastic matrices in STOCH(mn′ , mn) is denoted by
CDS(mn′ , mn).

Figure 4.7: A conditionally doubly stochastic (CDS) channel M ∈ STOCH(mn′ , mn).

It is important to note that expression in (4.207) is very similar to the one given in (4.203)
except that for any fixed j ∈ [k] the stochastic matrix T (j) is replaced with the doubly
stochastic matrix D(j) . Therefore, CDS channels are necessarily X ̸→ Y ′ semi-causal. More-
over, since each D(j) is doubly stochastic we get that for any pY ∈ Prob(n)
X (j) X
M uX ⊗ pY = D u ⊗ Rj pY
j∈[k] (4.208)
X Y
D(j) uX = uX −−−−→ = u ⊗ Rp ,
P
where R = j∈[k] Rj . That is, M satisfies the condition given in (4.205). We therefore
conclude that every CDS channel M is necessarily a CMO. In the next theorem we prove
that the converse also holds. Therefore, we get that both the axiomatic and the constructive
approaches leading to the same set of conditionally mixing operations.
Exercise 4.6.3. Let M be as in (4.207). Show that without loss of generality we can assume
that the matrices D(j) are permutation matrices. Hint: Recall that every doubly-stochastic
matrix is a convex combination of permutation matrices.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

206 CHAPTER 4. MAJORIZATION

Exercise 4.6.4. Show that for any doubly stochastic matrix D ∈ STOCH(m, m) and any
stochastic matrix R ∈ STOCH(n′ , n) we have that R ⊗ D is CDS.

Theorem 4.6.1. Let m, n, n′ ∈ N. Then,

CDS(mn′ , mn) = CMO(mn′ , mn) . (4.209)

Proof. We already proved the inclusion CDS(mn′ , mn) ⊆ CMO(mn′ , mn) (see the discussion
below Definition 4.6.2). To prove the opposite inclusion, suppose M ∈ CMO(mn′ , mn). We
want to show that M ∈ CDS(mn′ , mn). For this purpose we will use the form given in (4.202)
′
for X ̸→ Y ′ semi-causal, and show that each T (y,y ) is an m × m doubly stochastic matrix.
′
Observe that if ry′ |y = 0 for some y ′ ∈ [n′ ] and y ∈ [n], then replacing T (y,y ) with the identity
′
matrix will not affect M , since ry′ |y = 0. Consequently, it suffices to demonstrate that T (y,y )
is doubly stochastic for those indices y and y ′ where ry′ |y ̸= 0.
Let {eYy }y∈[n] be the standard basis of Rn and consider the condition given in (4.205).
Fix y ∈ [n] and observe that from (4.202) we get
′ ′
X
M uX ⊗ eYy = ry′ |y T (y,y ) uX ⊗ eYy′ .

(4.210)
y ′ ∈[n′ ]

On the other hand, by taking pY = eYy in (4.205) we get that

′
M uX ⊗ eYy = uX ⊗ qYy

′
(4.211)
X
= qy′ |y uX ⊗ eYy′ ,
y ′ ∈[n′ ]

′ ′ ′
for some vector qYy := y′ ∈[n′ ] qy′ |y eYy′ in Prob(n′ ). Since {eYy }y′ ∈[n′ ] is an orthonarmal basis
P
′
of Rn , a comparison of (4.210) and (4.211) reveals that for all y ∈ [n] and y ′ ∈ [n′ ] that
′
ry′ |y T (y,y ) uX = qy′ |y uX . (4.212)
′ ′
Observe that since T (y,y ) is column stochastic, T (y,y ) uX is a probability vector so its dot
product with the vector 1m equals one. Thus, by taking the dot product on both sides of
the equation above with the vector 1m we get ry′ |y = qy′ |y for all y ∈ [n] and y ′ ∈ [n′ ]. We
′ ′
therefore conclude that T (y,y ) uX = uX (recall we assume ry′ |y ̸= 0). Hence, T (y,y ) is doubly
stochastic. This completes the proof.
′
In the proof above we showed that the matrices T (y,y ) as appear in (4.202) are doubly
stochastic. We therefore conclude that M ∈ STOCH(mn′ , mn) is a conditionally mixing
operation if and only if there exists a column stochastic matrix R ∈ STOCH(n′ , n) and nn′
′
doubly stochastic matrices D(y,y ) ∈ STOCH(m, m), with y ∈ [n] and y ′ ∈ [n′ ], such that
′
X X
M= ry′ |y D(y,y ) ⊗ |y ′ ⟩⟨y| . (4.213)
y∈[n] y ′ ∈[n′ ]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.6. CONDITIONAL MAJORIZATION 207

With the above conclusion we arrive at the definition of conditional majorization.

Conditional Majorization
Definition 4.6.3. Let X, Y , and Y ′ , be three classical systems of dimensions m, n,
′
and n′ , respectively. Further, let pXY ∈ Prob(mn) and qXY ∈ Prob(mn′ ). We say
′
that pXY conditionally majorizes qXY with respect to X, and write
′
pXY ≻X qXY , (4.214)
′
if there exists M ∈ CMO(mn′ , mn) such that qXY = M pXY . We further write
′ ′ ′
qXY ∼X pXY if both pXY ≻X qXY and qXY ≻X pXY .

4.6.3 Basic Properties

In this subsection, we study a few basic properties of conditional majorization. For this
purpose, we will express pXY that appear in (4.197) in a more concise form as:
X X
pXY = pX Y
y ⊗ ey , where pX
y := pxy eX
x . (4.215)
y∈[n] x∈[m]

′ ′
Likewise, if we denote the components of qXY as {qxy′ }, we can represent qXY as:
′ ′
X X
qXY = qX Y
y ′ ⊗ ey ′ , where qX
y ′ := qxy′ eX
x . (4.216)
y ′ ∈[n′ ] x∈[m]

′
Using these notations, the pre-order pXY ≻X qXY can be expressed as a relationship between
the two sets of vectors {pX X
y }y∈[n] and {qy ′ }y ′ ∈[n′ ] . Observe further that all these vectors has
non-negative components and their sums are given by the marginal probability vectors:
X X
pX := pX X
y ∈ Prob(m) and q := qX
y ′ ∈ Prob(m) . (4.217)
y∈[n] y ′ ∈[n′ ]

′
By definition, if pXY ≻X qXY then there exists a matrix M of the form (4.213) such
′
that qXY = M pXY . Using the notations above this relation can be expressed as (see
Exercise 4.6.5)
′
X
qy ′ = ry′ |y D(y,y ) py ∀ y ′ ∈ [n′ ] , (4.218)
y∈[n]

′
where R = (ry′ |y ) ∈ STOCH(n′ , n) and each D(y,y ) is an m × m doubly stochastic matrix.
′
Exercise 4.6.5. Prove the relation (4.218) using the above form of pXY and qXY , and the
form (4.213) of M .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

208 CHAPTER 4. MAJORIZATION

The relation (4.218) implies that if p̃XY = X Y

P
y∈[n] p̃y ⊗ ey is a probability vector in
Prob(mn) obtained from pXY by permutation of the components of the vectors {pX y }y∈[n] ,
X (y) X (y)
i.e., for each y ∈ [n], p̃y = Π py for some m × m permutation matrix Π , then

p̃XY ∼X pXY . (4.219)

Exercise 4.6.6. Prove the relation (4.219). Hint: take in (4.218), Y ′ = Y , and for each
′ ′ ′
y ′ , y ∈ [n] take ry′ y = δy′ y and D(y ,y ) = Π(y ) .

The relation (4.219) implies that without loss of generality we can assume that the
components of the vectors {pX X
y }y∈[n] and {qy ′ }y ′ ∈[n′ ] are arranged in non-increasing order.
We will therefore assume this order in the rest of this section.

′
Theorem 4.6.2. Let pXY ∈ Prob(mn) and qXY ∈ Prob(mn′ ) with m := |X|,
′
n := |Y |, and n := |Y ′ |. Then, pXY ≻X qXY if and only if there exists
R = (ry′ |y ) ∈ STOCH(n′ , n) such that
X
ry′ |y pX X
y ≻ qy ′ ∀ y ′ ∈ [n′ ] , (4.220)
y∈[n]

where {pX X
y }y∈[n] and {qy ′ }y ′ ∈[n′ ] are defined in (4.215) and (4.216), respectively.

Remarks:
↓
1. We assume in the lemma above that for each y ∈ [n] we have pX
y = pX
y .

2. The vectors pX X
y and qy ′ are not necessarily probability vector since the sum of their
components py := 1m ·pX X
y and qy ′ := 1m ·qy ′ are in general smaller than one. Therefore,
the majorization relation in (4.220) implies in particular that
X
ry′ |y py = qy′ ∀ y ′ ∈ [n′ ] . (4.221)
y∈[n]

We obtained this equality by summing the components on both sides of (4.220).

3. Observe that the relation (4.220) can be expressed also as

X
ry′ |y LpX X
y ⩾ Lqy ′ ∀ y ′ ∈ [n′ ] , (4.222)
y∈[n]

where the inequality is entry-wise, and L is the m × m matrix defined in Exercise 4.1.5.
We will later demonstrate that the inequality (4.222) is instrumental in characterizing
conditional majorization as a semidefinite program.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.6. CONDITIONAL MAJORIZATION 209

′ ′
Proof. Suppose pXY ≻X qXY so that the relation (4.218) holds. Since each D(y,y ) is doubly
(y,y ′ ) X
stochastic we have pX
y ≻ D py . Multiplying both sides of this relation by ry′ |y and
summing over y ∈ [n] gives (see Exercise 4.1.6)
′
X X
ry′ |y py ≻ ry′ |y D(y,y ) py
y∈[n] y∈[n] (4.223)
X
(4.218)→ = qy′ .

Conversely, suppose (4.220) holds. Then, from Theorem 4.1.1 for every y ′ ∈ [n′ ] there
′
exists a doubly stochastic matrix D(y ) ∈ STOCH(m, m) such that
(y ′ ) ′
X X
qX
y′ = D r ′ p
y |y y
X
= ry′ |y D(y ) pX
y . (4.224)
y∈[n] y∈[n]

′ ′
Defining D(y,y ) := D(y ) we conclude that the above relation is a special case of the rela-
′
tion (4.218). Hence, pXY ≻ qXY . This completes the proof.
To get a better intuition about conditional majorization, we first consider the cases in
which one of the systems X, Y , and Y ′ , is trivial:
1. The Case |X| = 1. This is a trivial case in which there is no uncertainty about
system X. We therefore expect the pre-order to be trivial as well. Indeed, in this case
′ ′
pXY = pY ∈ Prob(n) and qXY = qY ∈ Prob(n′ ), so the relation (4.220) becomes
′ ′
qY = RpY . Since for any pY ∈ Prob(n) and any qY ∈ Prob(n′ ) there exists a row
′ ′
stochastic matrix R that satisfies qY = RpY , we conclude that pXY ∼X qXY for any
′
probability vectors pXY and qXY with |X| = 1.

2. The Case |Y | = 1. In this case pXY = pX ∈ Prob(m), and the stochastic matrix R that
appear in Theorem 4.6.2 is a vector R = r := (r1 , . . . , rn′ )T ∈ Prob(n′ ). Therefore,
′ ′
the relation (4.220) becomes ry′ pX ≻ qX y ′ for all y ∈ [n ]. Moreover, denoting by
qy′ := 1n′ · qy′ the sum of the components of qy′ , we get from (4.221) that ry′ = qy′ for
all y ′ ∈ [n′ ]. We therefore conclude that for |Y | = 1,
′
pX ≻X qXY ⇐⇒ pX ≻ qX
|y ′ ∀ y ′ ∈ [n′ ] , (4.225)
1 X
where qX
|y ′ := q
qy′ y ′
is the vector whose components are {qx|y′ }x∈[m] .
′
3. The Case |Y ′ | = 1. In this case, qXY = qX ∈ Prob(m), and since R in Theorem 4.6.2
has to be an 1×n column stochastic
P matrix it must be equal to the row vector (1, . . . , 1).
X := X
Therefore, denoting by p y∈[n] py we get from Theorem 4.6.2 that

pXY ≻X qX ⇐⇒ pX ≻ qX . (4.226)
′ ′
Exercise 4.6.7. Let pXY ∈ Prob(mn) and qXY ∈ Prob(mn′ ). Show that if pXY ≻X qXY
then pX ≻ qX .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

210 CHAPTER 4. MAJORIZATION

4.6.4 Standard Form

Consider the majorization relation between probability vectors in Prob(n). This relation is
not a partial order since if two vectors p, q ∈ Prob(n) satisfy both p ≻ q and q ≻ p it is
still possible that p ̸= q. However, the majorization relation becomes a partial order if we
assume that the components of the vectors are given in some standard form. For example,
we can define the standard form of a vector p ∈ Prob(n) to be p↓ . Then, the majorization
is a partial order when restricted to this standard form; i.e. to vectors in Prob↓ (n).
Similarly, we would like to define a standard form for conditional majorization. For this
′
purpose, we will consider two probability vectors pXY ∈ Prob(mn) and qXY ∈ Prob(mn′ )
′ ′ ′
and characterize the relation pXY ∼X qXY , meaning pXY ≻X qXY and qXY ≻X pXY .
We start by discussing three examples of vectors in Prob(mn) that are equivalent under
conditional majorization to the vector pXY as defined in (4.215) :
1. In (4.219) we saw that if p̃XY is obtained from pXY by applying an arbitrary permu-
tation matrix to each of the vectors {pXy }y∈[n] , we get that p̃
XY
∼X pXY .

2. Similarly, for any permutation/bijection π : [n] → [n] we get that (see Exercise 4.6.8)
X
pXY ∼X pX Y
π(y) ⊗ ey . (4.227)
y∈[n]

3. Suppose that there exists λ ∈ R+ such that pX X

1 = λp2 . Then, (see Exercise 4.6.9)

n
′ ′
X
XY
p ∼X (pX
1 + pX
2 ) ⊗ eY1 + pX Y
y ⊗ ey−1 (4.228)
y=3

where Y ′ is a system of dimension |Y ′ | = n − 1.

′
Exercise 4.6.8. Prove (4.227). Hint: Take ry′ |y = δy′ π(y) and D(y,y ) = Im .
Exercise 4.6.9. Suppose that there exists λ ∈ R+ such that pX X
1 = λp2 . Prove the equiva-
lence relation given in (4.228). Hint: To show LHS ≻X RHS, find R ∈ STOCH(n − 1, n)
′ ′
that satisfies for j ∈ {1, 2}, ReYj = eY1 , and for j ∈ {3, . . . , n}, ReYj = eYj−1 . To show
′ 1 λ
RHS ≻X LHS, find R ∈ STOCH(n, n − 1) that satisfies ReY1 = 1+λ eY1 + 1+λ eY2 and for
′
j ∈ {2, . . . , n − 1}, ReYj = eYj+1 .
Recall that the vectors {pXy }y∈[n] associated with p
XY
, are not probability vectors. Hence,
it will be convenient to denote for every y ∈ [n] py := 1m · py and px|y := pxy /py . Since the re-
ordering of the vectors {pX y }y∈[n] is a reversible CMO operation, we can choose a particular
order to be the order of the “standard form”. The choice of this order is somewhat arbitrary,
but it will be useful (in particular for the spacial case in which |X| = 2 as we will see later)
to order the vectors {pX y }y∈[n] such that

p1|1 ⩾ p1|2 ⩾ · · · ⩾ p1|n . (4.229)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.6. CONDITIONAL MAJORIZATION 211

Moreover, if there exists y ∈ [n−1] such that p1|y = p1|y+1 then, if necessary, we will exchange
the vectors pX X
y and py+1 so that p2|y ⩾ p2|y+1 . If the latter inequality is also an equality
we continue by induction until we get a k ∈ [m − 1] such that px|y = px|y+1 for all x ∈ [k]
but pk+1|y > pk+1|y+1 . Combining these observations with the exercise above we are ready to
define the standard form.

Standard Form
Let pXY ∈ Prob(mn) be as defined in (4.215). We say that pXY is given in the
standard form if the vectors {pX
y }y∈[n] satisfy the following three conditions:

X ↓

1. For all y ∈ [n], pX
y = py .

2. The vectors {pX

y }y∈[n] are arranged as discussed around Eq. (4.229).

3. There is no λ ∈ R such that pX X

y = λpw for some y, w ∈ [n] with y ̸= w.

′
Exercise 4.6.10. Let pXY ∈ Prob(mn), qXY ∈ Prob(mn′ ) be two probability vectors given
in their standard form, and L be the m × m matrix defined in Exercise 4.1.5. Use Theo-
′
rem 4.6.2 to show that pXY ≻X qXY if and only if there exists R ∈ STOCH(n′ , n) such
that
′
(L ⊗ R)pXY ⩾ (L ⊗ In′ ) qXY , (4.230)
where the inequality is entrywise.

Based on the preceding discussion, particularly the three examples provided, we can
deduce that any pXY is, under conditional majorization, equivalent to its standard form.
Consequently, without loss of generality, we may always assume that pXY ∈ Prob(mn) is
presented in its standard form. We will now demonstrate that conditional majorization
between vectors in standard form indeed constitutes a partial order.

′
Theorem 4.6.3. Let pXY ∈ Prob(mn) and qXY ∈ Prob(mn′ ) be two probability
′
vectors given in their standard form. Suppose further that pXY ∼X qXY . Then,
′
pXY = qXY (in particular, Y = Y ′ and n = n′ ).

′
Proof. From Exercise 4.6.10, the relation pXY ∼X qXY implies that there exists R ∈
STOCH(n′ , n) and R′ ∈ STOCH(n, n′ ), such that
′ ′
(L ⊗ R)pXY ⩾ (L ⊗ In′ ) qXY and (L ⊗ R′ )qXY ⩾ (L ⊗ In ) pXY . (4.231)

Denote by S = R′ R and S ′ := RR′ , and observe that the two equations above implies that
(see Exercise 4.6.11)
′ ′
(L ⊗ S)pXY ⩾ (L ⊗ In ) pXY and (L ⊗ S ′ )qXY ⩾ (L ⊗ In ) qXY . (4.232)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

212 CHAPTER 4. MAJORIZATION

Denoting by sy|w the (w, y)-component of the matrix S we get that the equation above is
equivalent to X
sy|w Lpw ⩾ Lpy ∀ y ∈ [n] . (4.233)
w∈[n]

On the other hand, observe that by taking the sum over y ∈ [n] on both sides of the equation
above we get an equality between the two sides. Therefore, all the n inequalities above must
be equalities! Multiplying both sides by the inverse of L gives
X
py = sy|w pw ∀ y ∈ [n] . (4.234)
w∈[n]

Observe that these equalities can be expressed as

X
(1 − sy|y )py = sy|w pw ∀ y ∈ [n] . (4.235)
w∈[n]
w̸=y

We now argue that sy|y = 1 for all y ∈ [n] (which is equivalent to sy|w = δyw and S = In ).
Otherwise, suppose by contradiction that there exists y ∈ [n] such that sy|y < 1. Without
loss of generality suppose that this y is n. We then get that
X sn|w
pn = pw . (4.236)
1 − sn|n
w∈[n−1]

Substituting this into (4.234) gives for all y ∈ [n − 1]

in contradiction with the third property of the standard form of the vector pXY . We therefore
conclude that there must exists y ∈ [n − 1] such that ty|y < 1. Observe that we started with

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.6. CONDITIONAL MAJORIZATION 213

the relation (4.234) with the condition that there exists sy|y < 1 for some y ∈ [n], and we
reduced it to the relation (4.239) with the condition that there exists ty|y < 1 for some
y ∈ [n − 1]. Continuing by induction until we have only one term in the sum on the right-
hand side of (4.234) (or of (4.239)) we conclude that one of the vectors of {pX y }y∈[n] is
proportional to another vector in the same set, in contradiction with the standard form of
pXY . Therefore, the assumption that there exists y ∈ [n] such that sy|y < 1 is in correct,
and we conclude that S = In or equivalently R′ R = In .
Moreover, following the same arguments as above we conclude that also S ′ := RR′ = In′ .
Combining this with R′ R = In we must have n′ = n and R′ = R−1 . However, the only
stochastic matrix whose inverse is also stochastic is a permutation matrix (that is, doubly
stochastic and orthogonal). We therefore conclude that the sets {pX X
y }y∈[n] and {qy }y∈[n]
can only differ up to a permutation; i.e., for all y ∈ [n], pX Y
y = qπ(y) for some permutation
π : [n] → [n]. However, since the {pX X
y }y∈[n] and {qy }y∈[n] are ordered in a specific way given
in the second property of the standard form, we conclude that π(y) = y for all y ∈ [n]. This
completes the proof.

Exercise 4.6.11. Prove the relations in (4.232). Hint: Multiply both sides of the first
inequality in (4.231) by R′ and the second inequality by R.

4.6.5 Conditional Schur Convex Functions

Conditionally Schur convex functions are functions from Prob(mn) to the real line that be-
have monotonically under conditional majorization. Such functions generalize Schur convex
functions, and can be used to quantify the amount of conditional uncertainty contained in
a correlated source; that is, they are measures of conditional uncertainty. In Chapter 7 we
will study a class of conditionally Schur concave functions known as conditional entropies.

S
Definition 4.6.4. A function f : n,m∈N Prob(mn) → R is said to be conditionally
′
Schur-convex if for every pXY ∈ Prob(mn) and qXY ∈ Prob(mn′ )
′ ′
pXY ≻X qXY ⇒ f pXY ⩾ f qXY .

(4.241)

Remark. In addition to the definition above, f is said to be conditionally Schur-concave if

−f is conditionally Schur-convex.

Observe that the conditionally Schur convex functions reduce to Schur convex functions
when restricted to Prob(m) (i.e. n = 1). Conversely, in the theorem below we show that
every convex symmetric function on the set of probability vectors can be extended to a con-
ditionally Schur convex function (see theorem below). Remember that in Subsection 4.1.3,
we established that such symmetric convex functions are in particular Schur convex.
In the theorem below, for every convex symmetric function f : Prob(m) → R we define

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

214 CHAPTER 4. MAJORIZATION

its extension, Hf , to any vector pXY ∈ Prob(mn) via

X
Hf (pXY ) := py f pX

|y , (4.242)
y∈[n]

1
where p|y :== p
py y
is the probability vector whose components are {px|y }x∈[m] .
S
Theorem 4.6.4. Let f : m∈N Prob(m) → R be a symmetric convex function.
Then, the function Hf , as defined in (4.242), is conditionally Schur concave.

Proof. Recall that pXY ≻X qXY if and only if there exists a stochastic matrix R ∈ STOCH(n′ , n)
such that (4.220) holds. By rewriting (4.220) with pX X X X
y = py p|y and qy ′ = qy ′ q|y ′ we get that

X ry′ |y py
pX X
|y ≻ q|y ′ ∀ y ′ ∈ [n′ ] . (4.243)
qy ′
y∈[n]

Therefore, if pXY ≻X qXY then

XY ′
X
qy ′ f qX

Hf q = |y ′
y ′ ∈[n′ ]
n ′
X r′ p (4.244)
y |y y X
X
f is Schur convex −−−−→ ⩽ qy ′ f p|y ,
y ′ =1 ′
qy ′
y∈[n ]

where we used (4.243). Moreover, from (4.221) we have

X ry′ |y py
=1. (4.245)
qy ′
y∈[n]

Thus, since f is convex we get from (4.244)

′
X X
Hf qXY ⩽ ry′ |y py f pX

|y
y ′ ∈[n′ ] y∈[n]
X (4.246)
py f pX XY

R is stochastic −−−−→ = |y = Hf (p ).
y∈[n]

This completes the proof.

4.6.6 Two Dimensional Cases

We study here the relatively simpler cases in which the dimension of the systems X or Y is
two. In these cases, it is possible to get the exact analytical expressions that determine if
one given state is conditionally majorized by another. We will see in the next section that in
the more general case, conditional majorization can be characterized with a linear program.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.6. CONDITIONAL MAJORIZATION 215

The Case |X| = 2

In this case, the vectors {pX X
y }y∈[n] and {qw }w∈[n′ ] are all two dimensional. It will be conve-
nient to denote their components as follows:
   
a y b y ′
pXy :=
  and qX y ′ :=
  , (4.247)
p y − ay q y ′ − by ′

where y ∈ [n], y ′ ∈ [n′ ], ay := p1y , by′ := q1y′ , and py := p1y + p2y and qy′ := q1y′ + q2y′ are
the sums of the components of pX X
y and qy ′ , respectively. With these notations we get for all
′ ′
y ∈ [n] and y ∈ [n ]    
a by ′
LpX
y =
 y and LqX
y′ =
  , (4.248)
py q y′
 
1 0
where L :=  . Moreover, since we assume that pXY and qXY ′ are given in their
1 1
standard form, we have (see (4.229))

a1 an b1 bn′
⩾ ··· ⩾ and ⩾ ··· ⩾ . (4.249)
p1 pn q1 qn ′
′
From Theorem 4.6.2 we have that pXY ≻X qXY if and only if there exists a row stochastic
matrix R ∈ STOCH(n′ , n) such that for all y ′ ∈ [n′ ] we have
X X
by ′ ⩽ ry′ |y ay and qy′ ⩽ ry′ |y py . (4.250)
y∈[n] y∈[n]

P P
Observe that since y′ ∈[n′ ] qy′ = y∈[n] py = 1, the second inequality above must hold with
equality (in fact, we know it already from (4.221)) .
Let a, b, p, q be the vectors whose components are respectively {ay }y∈[n] , {by′ }y′ ∈[n′ ] ,
{py }y∈[n] , and {qy′ }y′ ∈[n′ ] . To streamline our analysis, we omitted the superscripts Y and
Y ′ when referring to the vectors a, b, p, q. It is important for the reader to bear in mind
that the vectors a := aY and p := pY correspond to a system of dimension n (referred to
′ ′
as system Y ), while b := bY and q := qY pertain to a system of dimension n′ (referred to
′
as system Y ′ ). With these notations we get that pXY ≻X qXY if and only if there exists
R ∈ STOCH(n′ , n) such that

Ra ⩾ b and Rp = q . (4.251)

This relation is closely related to relative majorization, however, note that the vectors a
and b are not probability vectors since their components in general don’t sum to one. We
therefore say in this case that the pair (a, p) relatively submajorize the pair (b, q). Observe

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

216 CHAPTER 4. MAJORIZATION

also that if a := ∥a∥1 equals b := ∥b∥1 then the inequality Ra ⩾ b can be replaced with
Ra = b so that (4.251) becomes equivalent to relative majorization; i.e.

1 1
a, p ≻ b, q . (4.252)
a b

We therefore assume now that a > b (the case a < b is not possible since Ra ⩾ b, and R is
column stochastic).
Even though the components of a do not sum to one (in general), we can still define its
testing region as (see (4.139))
n o
n
T(a, p) := (a · t, p · t) : t ∈ [0, 1] . (4.253)

By taking t = 1n we get the point (a, 1) ∈ T(a, p) as oppose to the point (1, 1) that one
would get if a was a probability vector. In fact, the testing region of the pair of probability
vectors ( a1 a, p) is almost identical to that of (a, p) except for a rescaling of the x-axis by a
factor of a; that is, (r, s) ∈ T( a1 a, p) if and only if (ar, s) ∈ T(a, p). Therefore, if (r, s) is
an extreme point of T( a1 a, p) then (ar, s) is an extreme point of T(a, p). That is, there are
n + 1 extreme points on the lower Lorenz curve of T(a, p) given by (0, 0) and the n points
{(µℓ , νℓ )}ℓ∈[n] , where
X X
µℓ := ax and νℓ := px . (4.254)
x∈[ℓ] x∈[ℓ]

Recall that since we assume that pXY is given in its standard form, the components of a
and p satisfy (4.249). See the red line in Fig. 4.8 for an example of the lower Lorenz curve
of the pair (a, p).
In Fig. 4.8 we also depicted another (purple) Lorenz curve, by taking it to be identical
to the Lorenz curve of (a, p) if the x-coordinate is no greater than b, and a vertical line
if the x-coordinate equals b. This purple curve is a Lorenz curve of some pair of vectors
(ã, p̃) for which ∥ã∥1 = b and p̃ is a probability vector. Moreover, the Lorenz curve of
(ã, p̃) has the property that any other Lorenz curve LC(b, q) (see the blue curve in Fig. 4.8)
that is no where below the Lorenz curve of (a, p) is also nowhere below the Lorenz curve of
′
(ã, p̃). We will see shortly that this implies that the relation pXY ≻X qXY is equivalent to
(ã, p̃) ≻ (b, p).
The vectors ã and p̃ that corresponds to the purple Lorenz curve of Fig. 4.8 can be
expressed as follows. Let k ∈ [n − 1] be the integer satisfying µk ⩽ b < µk+1 , or equivalently

b − ak+1 < µk ⩽ b . (4.255)

Such an index k exists since we assume that a > b. The line connecting the vertices vk :=
(µk , νk ) with vk+1 := (µk+1 , νk+1 ) contains the point (b, λ) (see Fig. 4.8), where
pk+1
λ := (b − µk ) + νk . (4.256)
ak+1

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.6. CONDITIONAL MAJORIZATION 217

Figure 4.8: Submajorization. Given the Lorenz curve LC(a, p) (red), we can always construct
another Lorenz curve, LC(ã, p̃) (purple), with ∥ã∥1 = b such that any Lorenz curve LC(b, q)
(blue) that is no where below LC(a, p) (red) is also nowhere below LC(ã, p̃) (purple). In this
example n = 5 and k = 3.

The point (1, λ) is therefore a vertex of the purple line in Fig. 4.8. With this notations, ã
and p̃ are given by

ã := (a1 , . . . , ak , b − µk , 0)T ∈ Rk+2

+
(4.257)
p̃ := (p1 , . . . , pk , λ − νk , 1 − λ)T ∈ Prob(k + 2) .

With these notations we have the following characterization of conditional majorization.

Theorem 4.6.5. Using the same notations as above, for |X| = 2 the following
statements are equivalent:
′
1. pXY ≻X qXY

2. The curve LC(a, p) is nowhere above the curve LC(b, p).

3. (ã, p̃) ≻ (b, q) .

Proof. The case a = b is relatively simple and is left as an exercise. We therefore prove the
′
theorem for the case a > b. Suppose pXY ≻X qXY . Then, there exists R ∈ STOCH(n′ , n)
such that (4.251) holds. Let (b · t′ , q · t′ ) ∈ LC(b, q) be a point on the lower Lorenz
′
curve of the testing region of (b, q), where t′ is some vector in [0, 1]n . Then, the vector
t := RT t′ ∈ [0, 1]n has the property that

a · t = aT RT t′ ⩾ b · t′ and p · t = pT RT t′ = q · t′ , (4.258)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

218 CHAPTER 4. MAJORIZATION

where we used the relations in (4.251). The above relation implies that for any point (b·t′ , q·
t′ ) in LC(b, q), there exists a point (a · t, p · t) in the testing region of (a, p) that is located
to its right (i.e. a point with the same y-coordinate and no smaller x-coordinate). Since the
lower Lorenz curve is convex, this means that LC(b, q) is a nowhere below LC(a, p). That
is, the second statement of the theorem holds.
To prove that the second statement implies the third statement of the theorem, observe
that by construction of LC(ã, p̃) (see Fig. 4.8), the curve LC(b, q) is nowhere below the
curve LC(ã, p̃). Since ∥ã∥1 = b we get from Theorem 4.3.4 that (ã, p̃) ≻ (b, q).
It is therefore left to prove that the third statement in the theorem implies the first one.
Since we assume that (ã, p̃) ≻ (b, q) there exists a matrix S ∈ STOCH(n′ , k + 2) such that
Sã = b and S p̃ = q. On the other hand, the matrix
   
λ−νk
Ik 0k,n−k 0 ··· 0
T :=   ∈ STOCH(k + 2, n) where M :=  pk+1  (4.259)
νk+1 −λ
02,k M pk+1
1 ··· 1

satisfies T a ⩾ ã and T p = p̃ (see Exercise 4.6.12). Therefore,

ST a ⩾ Sã = b and ST p = S p̃ = q , (4.260)

so that the matrix R := ST ∈ STOCH(n′ , n) satisfies Ra ⩾ b and Rp = q. That is, the

′
condition in (4.251) holds and therefore pXY ≻X qXY . This completes the proof.
Exercise 4.6.12. Using the same notations as the proof above:
1. Prove the theorem above for the case a = b.
2. Show that (4.255) implies that λ < νk+1 .
3. Verify that the matrix T in (4.259) is column stochastic and satisfies T a ⩾ ã and
T p = p̃.

The Case |Y | = 2
In this case, pXY has the form

pXY = p1 ⊗ e1 + p2 ⊗ e2 , (4.261)

where p1 , p2 ∈ Prob(m) and e1 , e2 ∈ R2 form the standard basis of R2 (for simplicity, we

omitted the superscript X from p1 and p2 ). The matrix R that appears in Theorem 4.6.2
has the form R = [a b], where a := (a1 , . . . , an′ )T and b := (b1 , . . . , bn′ )T are probability
vectors. Therefore, the relation in (4.220) can be expressed as

aw p1 + bw p2 ≻ qw ∀ w ∈ [n′ ] . (4.262)

Moreover, from (4.221) we get that

q w = aw p 1 + b w p 2 ∀ w ∈ [n′ ] , (4.263)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.6. CONDITIONAL MAJORIZATION 219

where py := 1m · py for all y ∈ [n], and qw := 1m · qw for all w ∈ [n′ ].

Let p|y := p1y py ∈ Prob(m) for y = 1, 2, and q|w := q1w qw ∈ Prob(m) for w ∈ [n′ ]. With
these notations, for all w ∈ [n′ ], the two equations above gives
aw p 1 aw p 1
tw p|1 + (1 − tw )p|2 ≻ q|w , where tw := = . (4.264)
aw p 1 + b w p 2 qw
′
Observe that if we find a vector t ∈ [0, 1]n , whose components satisfy tw p|1 +(1−tw )p|2 ≻ q|w
for all w ∈ [n′ ], it could still be the case that there are no a, b ∈ Prob(n′ ) that relates to t
as above. To find the condition on t that ensures the existence of such a, b ∈ Prob(n′ ), we
rearrange the expression above for tw to get for all w ∈ [n′ ]

qw tw = aw p1 . (4.265)

Using the fact that a ∈ Prob(n′ ), by summing over w ∈ [n′ ] both sides of the equation above
we get that t must satisfy X
qw tw = p1 . (4.266)
w∈[n′ ]
′ ′
We therefore conclude that pXY ≻X qXY if and only if there exists t ∈ [0, 1]n that satis-
fies (4.266) and for all w ∈ [w′ ], tw p|1 + (1 − tw )p|2 ≻ q|w .
′
In order to determine when such t ∈ [0, 1]n exists, we assume that pXY is given in its
standard form so that both p|1 = p↓|1 and p|2 = p↓|2 . With this property, the majorization
relation given in (4.264) is equivalent to

tw ∥p|1 ∥(k) + (1 − tw )∥p|2 ∥(k) ⩾ ∥q|w ∥(k) ∀ w ∈ [n′ ] , ∀ k ∈ [m] . (4.267)

Our next goal is to characterize the constraints that the equation above impose on tw . For this
purpose, we denote by I+ , I0 , and I− the set of all indices k ∈ [m] for which ∥p|1 ∥(k) −∥p|2 ∥(k)
is positive, zero, and negative, respectively. With these notations, if k ∈ I0 then (4.267)
takes the form ∥p|2 ∥(k) ⩾ ∥q|w ∥(k) . On the other hand, if k ∈ I+ we can isolate tw to get

∥q|w ∥(k) − ∥p|2 ∥(k)

tw ⩾ . (4.268)
∥p|1 ∥(k) − ∥p|2 ∥(k)

Therefore, since this inequality holds for all k ∈ I+ and since tw ⩾ 0 we conclude that
tw ⩾ µw for all w ∈ [n′ ], where

∥q|w ∥(k) − ∥p|2 ∥(k)

µw := max 0, max . (4.269)
k∈I+ ∥p|1 ∥(k) − ∥p|2 ∥(k)

Simililarly, by isolating tw in (4.267) for the cases that k ∈ I− we get tw ⩽ νw for all w ∈ [n′ ],
where
∥p|2 ∥(k) − ∥q|w ∥(k)

νw := min 1, min . (4.270)
k∈I− ∥p|2 ∥(k) − ∥p|1 ∥(k)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

220 CHAPTER 4. MAJORIZATION

We therefore arrive at the following theorem.

Theorem 4.6.6. Using the same notations as above, for the case |Y | = 2 we have
′
pXY ≻X qXY if and only if the following conditions hold:

1. For all w ∈ [n′ ] we have νw ⩾ µw .

2. For all k ∈ I0 we have ∥p|2 ∥(k) ⩾ maxw∈[n′ ] ∥q|w ∥(k) .

P P
3. ′
w∈[n ] q w µ w ⩽ p 1 ⩽ w∈[n′ ] qw νw .

Exercise 4.6.13. Use the arguments above to prove Theorem 4.6.6.

Exercise 4.6.14. Simplify the conditions in Theorem 4.6.6 for the case that I+ = [m].

Exercise 4.6.15. Consider the case |X| = |Y | = |Y ′ | = 2 and let pXY , qXY ∈ Prob(4)
be such that pY = qY = u(2) . Simplify the necessary and sufficient conditions given in the
theorem above for this case.

4.6.7 Conditional Majorization with Linear Programming

In Theorem 4.6.2 we presented a useful characterization of conditional majorization. This
′
characterization posits that pXY ≻ qXY if and only if a stochastic matrix R ∈ STOCH(n′ , n)
exists, satisfying the condition specified in equation (4.220). Furthermore, Theorem 4.6.2
suggests that determining whether pXY ≻X qXY can be accomplished through linear pro-
gramming.
′
To see it explicitly, suppose that pXY and qXY are given in their standard form, and
′
recall that pXY ≻X qXY if and only if there exists R ∈ STOCH(n′ , n) such that
X
ry′ |y LpX X
y ⩾ Lqy ′ ∀ y ′ ∈ [n′ ] , (4.271)
y∈[n]

where L is the m × m matrix defined in Exercise 4.1.5. Denote the rows of R by r1 , . . . , rn′ ∈
Rn+ ; i.e., ry′ := (ry′ |1 , . . . , ry′ |n )T , and denote by r ∈ Prob(nn′ ) the probability vector
 
r1
 . 
 
r :=  ..  . (4.272)
 
rn′

Note that in this vector form of R, the condition that R is stochastic is equivalent to r ⩾ 0

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.6. CONDITIONAL MAJORIZATION 221
h i
∈ Rm×n , and denote by
P
and y ′ ∈[n′ ] ry′ = 1n . Let P := pX
1 ··· pX
n

 
−LP 0 ··· 0  
  −LqX
1 
 0 −LP · · · 0 
 
 .. 

 ..

.. . .. 

(mn′ +n)×nn′
 . 
M :=  . . . . . ∈R and b :=   ∈ Rmn′ +n . (4.273)
−LqX
   
n′
· · · −LP 
 
 0 0  
  1n
In In ··· In

It is then straight forward to check that the inequalities given in (4.271) can be expressed
′
compactly as M r ⩽ b. The only other constraint is that r ∈ Rnn + . The problem of deter-
mining if such a vector r exists is known as a linear programming feasibility problem, and
there are several algorithms that can be used to solve it.
′
Exercise 4.6.16 (Farkas Lemma). Show that there exists r ∈ Rnn + satisfying M r ⩽ b if and
mn′ +n T
only if for every v ∈ R+ that satisfies v M ⩾ 0 (entrywise) we have v · b ⩾ 0. Hint:
For the harder direction, use the hyperplane separation theorem (Theorem A.2).

Dual Characterization
′
In the discussion above we saw that the condition pXY ≻X qXY is equivalent to the existence
′
of a vector r ∈ Rnn+ such that M r ⩽ b. Moreover, from the exercise above it follows that
′ +n
such an r exists if and only if for every v ∈ Rmn
+ that satisfies vT M ⩾ 0 we have v · b ⩾ 0.
We now express this later condition in terms of sub-linear functionals.
For this purpose, we express v as
 
v
 1
 .. 
 . 
v := 
  ,
 (4.274)
vn′ 
 
t

where v1 , . . . , vn′ ∈ Rm n
+ and t ∈ R+ . From the definition of M in (4.273) we get that the
condition vT M ⩾ 0 can be expressed as

t − vwT LP ⩾ 0 ∀ w ∈ [n′ ] , (4.275)

which in terms of the components {ty }y∈[n] of t can be expressed as

ty ⩾ max′ vwT LpX

y ∀ y ∈ [n] . (4.276)
w∈[n ]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

222 CHAPTER 4. MAJORIZATION

Similarly, from the definition of b in (4.273) we get that the condition v · b ⩾ 0 is equivalent
to X X
ty ⩾ vyT′ LqX
y′ . (4.277)
y∈[n] y ′ ∈[n′ ]

Hence, the condition that vT M ⩾ 0 implies v · b ⩾ 0 is equivalent to

X X
max′ vwT LpX
y ⩾ vwT LqX
w , (4.278)
w∈[n ]
y∈[n] w∈[n′ ]

where we took ty in (4.277) to be equal to its smallest possible value as given in (4.276).
Finally, for each w ∈ [n′ ] let sw := LT vw and observe that the inequality above can be
written as X X
max′ sw · pX
y ⩾ sw · qX
y′ . (4.279)
w∈[n ]
y∈[n] w∈[n′ ]

↓
Note that since each vw ∈ Rm m
+ we get that sw ∈ R+ and sw = sw (see Exercise 4.1.5). Finally,
by dividing both sides of the inequality above by a sufficiently large number and absorbing
it into each sw we can assume without loss of generality that the matrix S := [s1 · · · sn′ ] ∈
STOCH⩽ (m, n′ ) is sub-stochastic (i.e., the components of each column sums to a number
smaller or equal to one). We therefore arrived at the following theorem.

′
Theorem 4.6.7. Let pXY ∈ Prob(mn) and qXY ∈ Prob(mn′ ) be given in their
′
standard form. Then, pXY ≻X qXY if and only if for every sub-stochastic matrix
S := [s1 · · · sn′ ] ∈ STOCH⩽ (m, n′ ), whose columns satisfy sw = s↓w for all w ∈ [n′ ], we
have X X
max′ sw · pX
y ⩾ sw · qXy′ . (4.280)
w∈[n ]
y∈[n] w∈[n′ ]

Exercise 4.6.17. Consider the theorem above without the assumption that sw = s↓w for all
′
w ∈ [n′ ] and without the assumption that pXY and qXY are given in their standard form.
′
Show that pXY ≻X qXY if and only if for every sub-stochastic matrix S := [s1 · · · sn′ ] ∈
STOCH⩽ (m, n′ ) X X
max′ s↓w · p↓y ⩾ s↓w · q↓y′ , (4.281)
w∈[n ]
y∈[n] w∈[n′ ]

where for simplicity, we removed the superscript X from pX X

y and qw .

4.6.8 The Operational Approach: Games of Chance with a Cor-

related Source
In this subsection we delve into an alternative approach that leads to an equivalent definition
of conditional majorization. This particular approach is grounded in operational principles,
drawing parallels with games of chance. As a result, it presents a particularly persuasive

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.6. CONDITIONAL MAJORIZATION 223

rationale for the definition of conditional majorization as introduced in the previous subsec-
tions.
In the beginning of this chapter we introduced the concept of majorization using games
of chance. We saw that two probability vectors, p, q ∈ Prob(n) satisfy p ≻ q if and only
if in all games of chance, a player has better odds to win the game with the p-dice rather
than with the q-dice. Similarly, we will see that our definition of conditional majorization as
given in Definition 4.6.3 can be characterized with games of chance that involve a correlated
source.
We can think about a correlated source XY as two dice that are connected with a gum.
Rolling the two dice results in an outcome x for system X and a correlated outcome y
for system Y . As before, we denote by pXY ∈ Prob(mn) the probability matrix whose
(x, y)-entry, pxy , represents the probability that X = x and Y = y. It will be convenient
to denote by px|y = ppxyy , the conditional probability that X = x given that Y = y, where
P
py := x∈[m] pxy for any y ∈ [n].
Consider now a gambling game with such two correlated dice, in which a player, say
Alice, has to provide k ⩽ m numbers as her guesses for the value of X. If Alice has access to
the value y of Y , then she will choose the k numbers that has the largest probability to occur
relative to the conditional probability {px|y }x∈[m] . Therefore, the maximum probability to
win such a k-gambling game is given by
X X ↓
py px|y . (4.282)
y∈[n] x∈[k]

That is, Alice chooses the k numbers that has the largest probability to occur after she
learns the value of Y = y, which occur with probability py .
The example provided earlier is not the only kind of gambling game that Alice can engage
in with a correlated source, like the two-dice system. More expansively, we can envisage a
game where the host randomly determines the value of k according to a certain distribution.
In line with our aim to explore the widest range of scenarios in a gambling game with a
correlated source, we allow the player a degree of control in choosing which k-gambling
game will be played. This control is exercised through the player selecting a number w ∈ [ℓ]
and communicating it to the game host. Subsequently, the host decides the value of k
based on a distribution T := (tk|w ) ∈ Rm×ℓ
+ , aP detail known to the player. This distribution
adheres to the conditions that tk|w ⩾ 0 and k∈[m] tk|w ⩽ 1 for all w ∈ [ℓ]. Notably, we
also accommodate the scenario where the set {tk|w }k∈[m] does not sum to one, reflecting the
possibility of no k value occurring, resulting in the player losing the game from the onset.
The procedural steps of such a T -gambling game are illustrated in Fig. 4.9.
Note that the set encompassing all T -gambling games includes all k-gambling games as
well. This is evident when we consider T = (tk|w ) with tk|w = δkk0 , where k0 is a specific
integer within [m]. In this scenario, the game essentially becomes a k0 -gambling game,
meaning the host selects k = k0 regardless of w. In another example, where tk|w = m1 , the
host picks k from a uniform distribution, also independent of w.
Generally, for a given Y = y and a chosen w ∈ [ℓ], the optimal chance Alice has to win

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

224 CHAPTER 4. MAJORIZATION

Figure 4.9: A T -gambling game with correlated source pXY . Upon learning the value of Y , the
player provides the host a number w. Then, the host chooses k (at random) according to the
distribution {tk|w }k∈[m] . After that, the player provides her k guesses with the highest probability
to occur.

a T -gambling game can be calculated by the probability:

X X ↓
tk|w px|y . (4.283)
k∈[m] x∈[k]

Consequently, for each Y = y, Alice will select the number w that maximizes this probability.
Therefore, the maximum likelihood of winning a T -gambling game, as outlined above, is given
by:
X X X ↓
PrT pXY = py max tk|w px|y . (4.284)
w∈[ℓ]
y∈[n] k∈[m] x∈[k]

The expression above for the winning probability, sometimes referred to as the reward
function in game theory, can be simplified. Consider the following alteration in the sequence
of summations:
X X XX m
↓
tk|w px|y = tk|w p↓x|y . (4.285)
k∈[m] x∈[k] x∈[m] k=x

Let’s introduce the matrix S = (sxw ) ∈ Rm×ℓ , whose coefficients are defined by
m
X
sxw := tk|w (4.286)
k=x

The text can be improved for better readability and coherence as follows:
”The formula for calculating the winning probability, sometimes referred to as the reward
function in game theory, can be simplified. Consider the following transformation:
X X m
XX
tk|w p↓x|y = tk|w p↓x|y ; . (4.287)
k∈[m] x∈[k] x∈[m] k=x

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.6. CONDITIONAL MAJORIZATION 225

Let’s introduce the matrix S = (sxw ) ∈ Rm×ℓ , whose coefficients are defined by
m
X
sxw := tk|w ; . (4.288)
k=x

It’s important to note that the columns of S are in non-decreasing order; that is, for every
w ∈ [ℓ],
1 ⩾ s1w ⩾ s2w ⩾ · · · ⩾ smw . (4.289)
With this notation, the probability of winning can be expressed as
X X
PrT pXY = py max sxw p↓x|y . (4.290)
w∈[ℓ]
y∈[n] x∈[m]

Finally, denoting the columns of S by s1 , . . . , sℓ we conclude that

X
PrT pXY = max sw · p↓y , (4.291)
w∈[ℓ]
y∈[n]

where py ∈ Prob(m) is the probability vector with components {pxy }x∈[m] . Observe that
the formula above for calculating the winning probability coinside with the left-hand side
of (4.280). This observation provides our initial clue about the connection between games
of chance and conditional majorization. Another key insight is that conditional mixing
operations cannot increase the maximum probability of winning the game, as the following
lemma demonstrates.

′
Lemma 4.6.1. Let T ∈ STOCH⩽ (m, ℓ), pXY ∈ Prob(mn), and qXY ∈ Prob(mn′ ).
′
If pXY ≻ qXY then the maximal probability to win a T -gambling game satisfies
′

PrT pXY ⩾ PrT qXY .

(4.292)

Proof. Let S be the m×ℓ matrix whose components as defined in (4.288). Due to (4.289) the
columns of S = [s1 · · · sℓ ] satisfy s↓w = sw for all w ∈ [ℓ]. According to (4.291), the function
PrT (pXY ) has the form (4.242) with f : Prob(m) → R being the sublinear functional
f (p) := max′ sw · p↓ ∀ p ∈ Prob(m) . (4.293)
w∈[n ]

Since the function above is convex and symmetric (under permutations) we get from The-
orem 4.6.4 that PrT (pXY ) is conditionally Schur concave and in particular PrT (pXY ) ⩾
′
PrT (qXY ).
The subsequent exercise establishes a one-to-one correspondence (bijection) between the
set of all m × ℓ T -matrices and all m × ℓ S-matrices.
Exercise 4.6.18. Use (4.288) to find an m × m matrix U such that S = U T . Show that U
is invertible by computing its inverse, and use that to show that for any matrix S ∈ Rm×ℓ
+
whose components satisfy (4.289), the matrix U −1 S has non-negative entries.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

226 CHAPTER 4. MAJORIZATION

Operational Characterization of Conditional Majorization

The reward function above can serve as the basis for defining conditional majorization.
Recall that majorization can be formally established through the concept of k-gambling
games: p ≻ q if, for every k ∈ [m], a player has a higher probability of winning a k-gambling
game using the p-dice rather than the q-dice. In a similar vein, we can operationally define
conditional majorization as follows: pXY ≻X qXY if, for all sub-stochastic matrices T (over
all dimensions ℓ ∈ N), a player enjoys superior odds of winning a T -gambling game when
′
using the pXY dice-pair as opposed to the qXY dice-pair. We show now that this operational
definition of conditional majorization coincides with Definition 4.6.3 that follows from the
axiomatic and constructive approaches.

Conditional Majorization and Games of Chance

′
Theorem 4.6.8. Let p ∈ Prob(mn) and q ∈ Prob(mn′ ). We have pXY ≻X qXY if
and only if
XY ′
XY
∀ T ∈ STOCH⩽ (m, n′ ) ,

PrT p ⩾ PrT q (4.294)
where PrT is the maximal probability to win a T -gambling game.

′
Remark. We emphasize that the theorem above states that pXY ≻X qXY if and only if
with the pXY -dice pair, Alice has better odds to win all T -gambling games than with the
′
qXY -dice pair. Moreover, observe that instead of considering T -gambling games with T ∈
STOCH⩽ (m, ℓ) over all ℓ ∈ N, it is sufficient to consider ℓ = n′ . That is, the dimensions of
T are completely determined by X and Y ′ .

′
Proof. Due to Lemma 4.6.1, it is sufficient to prove the (4.294) implies pXY ≻X qXY . Let
S := [s1 · · · sn′ ] ∈ STOCH⩽ (m, n′ ), whose columns satisfy sw = s↓w for all w ∈ [n′ ]. From
Exercise 4.6.18 it follows that there exists a sub-stochastic matrix T ∈ STOCH⩽ (m, n′ ) that
satisfies the relation (4.288). Therefore,
X
max′ sw · py = PrT pXY

w∈[n ]
y∈[n]

XY ′
(4.294)→ ⩾ PrT q
X (4.295)
= max′ sw · qy′
w∈[n ]
y ′ ∈[n′ ]
X
⩾ sy′ · qy′ .
y ′ ∈[n′ ]

Since the above inequality holds for all S := [s1 · · · sn′ ] ∈ STOCH⩽ (m, n′ ), whose columns
′
satisfy sw = s↓w for all w ∈ [n′ ], we conclude from Theorem 4.6.7 that pXY ≻X qXY . This
completes the proof.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

4.7. NOTES AND REFERENCES 227

4.7 Notes and References

The book by [152] is dedicated solely to the theory of majorization. Also the book by [25] is
a good source, particularly, the second chapter covers majorization. Lemma 4.1.1 goes back
to [166] and [110], and the Schur’s test (Theorem 4.1.2) is due to [199] and [175].
Approximate majorization was introduced in [132], and we will see later on that the
concept of the flattest ε-approximation plays a useful role in several resource theories.
Relative majorization is the backbone of the resource theoretic approach to quantum
thermodynamics. It was studied under different names such as d-majorization in [6], matrix
majorization in [57], and thermo-majorization in [131]. The ideas of the proof of main
characterization theorem of relative majorization (Theorem 4.3.4) goes back to [28]. More
details were given by [193] and [140]. Independent proof was also given more recently by [57]
by employing techniques from convex analysis. To the author knowledge, the proof we
presented here did not appear elsewhere.
In the proof of Theorem 4.3.3 we followed [189]. Theorems 4.3.1 and 4.3.2 as well as Ex-
ercise 4.3.24 are due to [99]. The vector r that appears in Theorem 4.3.2 was first introduced
in [29] in the context of thermodynamics.
The characterization of the trumping relation (Theorem 4.4.1) was proved by [212] and
independently by [143]. Both proofs are very complicated and it is an open problem to find
a simpler/shorter proof of the theorem. The symmetric rewrite of this theorem in terms of
Rényi divergences, and the characterization of catalytic majorization (Theorem 4.5.1), are
due to [99].
Conditional majorization was first introduced by [90] in the context of the quantum un-
certainty principle. More recently, its relation to games of chance was introduced by [36],
and its quantum version was given by [103]. In Sec. 4.6.6 we saw that for the case |X| = 2
conditional majorization becomes equivalent to relative submajorization. Relative subma-
jorization has some applications in thermodynamics, and it was first introduced and studied
by [189].

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

228 CHAPTER 4. MAJORIZATION

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

CHAPTER 5

Divergences and Distance Measures

This chapter explores methods to quantify the distinguishability between entities such as
probability distributions and quantum states. Unlike generic vectors, mathematical objects
like probability vectors and quantum states embody information about physical systems.
Consequently, their distinguishability is typically measured using functions attuned to this
inherent information. Consider this example: Alice possesses a system in her laboratory
that is either in state ρ (for instance, an electron with its spin oriented in the z-direction)
or in state σ (such as the same electron with spin in the x-direction). Alice can attempt to
discern the state of her system (whether it is ρ or σ) by performing a quantum measurement
on it. The underlying principle is that the greater the distinguishability between ρ and σ,
the easier (or more likely) it is for Alice to accurately identify which of the two states her
system is in.
In any task involving distinguishability, such as the one mentioned earlier, a key observa-
tion is that sending a system (like the electron in Alice’s lab) through a quantum communi-
cation channel does not enhance Alice’s ability to differentiate between two states, ρ and σ.
This implies that if E ∈ CPTP(A → B) represents a quantum channel, the states E(ρ) and
E(σ) that result from this channel are less distinguishable than the original states ρ and σ
(this concept is visually illustrated in Fig. 5.1). In essence, any measure that quantifies the
distinguishability between two quantum states ρ and σ must decrease (or at most stay the
same) under any quantum process that transforms the pair (ρ, σ) into (E(ρ), E(σ)). Func-
tions that adhere to this principle are known as quantum divergences. Their characteristic
of reducing in value under such transformations is often referred to as the data processing
inequality (DPI).
Quantum divergence extends the concept of divergences from classical to quantum realms.
In a classical context, divergences are functions that behave monotonically under transfor-
mations that map a pair of probability vectors (p, q) to (Ep, Eq), with E being a column
stochastic matrix. As a result, many metrics in Rn , like the Euclidean distance, do not serve
well for quantifying distinguishability between two probability vectors. It’s also notewor-
thy that divergences are functions that behave monotonically under relative majorization.
Therefore, the tools developed in Chapter 4 will be very useful in this context as well.

229
230 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

Figure 5.1: A noisy process makes objects look more similar.

5.1 Classical Divergences

We begin by presenting the formal definition of a classical divergence. Let D represent a
function defined as [n o
D: Prob(n) × Prob(n) → R ∪ {∞} , (5.1)
n∈N

which operates on pairs of probability vectors across all finite dimensions.

Classical Divergence
Definition 5.1.1. The function D, as defined in (5.1), is termed a divergence
provided it fulfills these two conditions:

1. Data Processing Inequality (DPI)

D Ep Eq ⩽ D(p∥q) (5.2)

for all n, m ∈ N, all p, q ∈ Prob(n), and all E ∈ STOCH(m, n).

2. Normalization, D(1∥1) = 0.

Note that for the trivial dimension n = 1, Prob(n) contains only the number one. In
this dimension, we require the divergence to be zero. Functions as above that satisfy the
DPI but with D(1∥1) ̸= 0 will be called unnormalized divergences. Moreover, observe that
the DPI property of a divergence D can also be viewed as monotonicity under relative
majorization. That is, we can state the first property above as follows: For all p, q ∈ Prob(n)
and p′ , q′ ∈ Prob(m) such that (p, q) ≻ (p′ , q′ ) we have

D(p∥q) ⩾ D(p′ ∥q′ ) . (5.3)

We now discuss a few basic properties of classical divergences.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.1. CLASSICAL DIVERGENCES 231

5.1.1 Basic Properties

Divergences are non-negative since the one-row matrix [1, . . . , 1] is column stochastic matrix
in STOCH(1, n), so that for any pair of probability vectors p, q ∈ Prob(n)

D(p∥q) ⩾ D [1, . . . , 1]p [1, . . . , 1]q = D(1∥1) = 0 , (5.4)
where the inequality follows from the DPI. Moreover, any probability vector p ∈ Prob(n)
with dimension n > 1 can be viewed as a preparation channel (i.e. one-column stochastic
matrix) p ∈ STOCH(n, 1), so that

D(p∥p) = D p · 1 p · 1 ⩽ D(1∥1) = 0 , (5.5)
where again we used the DPI for divergences. Combining this with the non-negativity
property of divergences, we conclude that for any state p ∈ Prob(n) in any dimension n ∈ N
D(p∥p) = 0 . (5.6)
This is consistent with the intuition that divergences quantify the distinguishability between
two states. An interesting question remaining is whether the converse of the above property
also holds. That is, the question is whether D(p∥q) = 0 necessarily implies that p = q. This
property is called faithfulness and we will see later on that there exists divergences that are
not faithful.
Another interesting property that divergences satisfy is the following lower and upper
bounds for any p, q ∈ Prob(n):

D e1 λmin e1 + (1 − λmin )e2 ⩽ D(p∥q) ⩽ D e1 λmax e1 + (1 − λmax )e2 (5.7)
where λmax and λmin are defined in (4.119), and {e1 , e2 } is the computational basis of R2 .
This property follows from a combination of the DPI with Theorem 4.3.1.
We now move to discuss some examples of divergences. In the previous section, particu-
larly Theorem 4.3.4, we found several characterization for relative majorization. Some of the
characterizations are given in terms of divergences (although not explicitly). For example,
fix t ⩾ 1 and define for all n ∈ N and all p, q ∈ Prob(n) the function
X
Dt (p∥q) := (px − tqx )+ . (5.8)
x∈[n]

Since we assume t ⩾ 1 we have Dt (p∥p) = 0 for all p ∈ Prob(n). To show that Dt satisfies
the DPI we can use the relation (r)+ = (|r| + r)/2 for all r ∈ R to express Dt as
1
(∥p − tq∥1 + 1 − t) .
Dt (p∥q) = (5.9)
2
Consequently, the property outlined in (2.8) implies that {Dt }t⩾1 is a family of divergences.
Moreover, from Theorem 4.3.4 we learn that this family of classical divergences can be used
to characterize relative majorization; i.e., for all p, q ∈ Prob(n) and p′ , q′ ∈ Prob(n′ ) we
have (see Exercise 5.1.1)
(p, q) ≻ (p′ , q′ ) ⇐⇒ Dt (p∥q) ⩾ Dt (p′ ∥q′ ) ∀t⩾1. (5.10)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

232 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

Exercise 5.1.1. Prove (5.10). Hint: Use Theorem 4.3.4 in conjunction with Corollary 4.3.1.

Exercise 5.1.2. Let D be a divergence, and P be an n × n permutation matrix. Show that

D(p∥q) = D(P p∥P q) ∀ p, q ∈ Prob(n) . (5.11)

There is another family of divergences, consisting of numerous divergences (that appear

in applications), and also includes the above example as a special case. This family is called
the f -divergence.

5.1.2 The f -Divergence

Definition 5.1.2. Let f : (0, ∞) → R be a convex function that satisfy f (1) = 0.

Then, the f -divergence is defined for any p, q ∈ Prob(n) by

X px
Df (p∥q) := qx f (5.12)
qx
x∈[n]

with the following conventions:

0 a a 1
f (0) := lim+ f (r) , 0f := 0 , 0f := lim rf = a lim sf .
r→0 0 0 +
r→0 r s→0+ s

Remark. We do not assume that f is positive nor that

1
f˜(0) := lim sf (5.13)
s→0+ s

is finite. Therefore, for some convex functions as above, we can have Df (p∥q) = ∞ for some
choices of p, q ∈ Prob(n). From the theorem below it will follow that Df is a divergence and
therefore is always non-negative (even if f (x) is negative for some x ∈ (0, ∞)). Furthermore,
observe that the f -divergence can be expressed for any p, q ∈ Prob(n) as

X X px
Df (p∥q) = f˜(0) px + qx f , (5.14)
qx
x̸∈supp(q) x∈supp(q)

where we split the sum in (5.12) into a sum over all x ∈ [n] with qx = 0 and over all x ∈ [n]
with qx ̸= 0.

Exercise 5.1.3. Show that for every t ⩾ 1 the function Dt as defined in (5.8) is an f -
divergence.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.1. CLASSICAL DIVERGENCES 233

Exercise 5.1.4. Show that the definition above for the f -divergence is equivalent to the fol-
lowing definition. Let f : (0, ∞) → R be a convex function and define f (0) := limε→0+ f (ε).
Then, the f -Divergence is defined as in (5.12) for p, q ∈ Prob(n) with q > 0, and for q ̸> 0,

Df (p∥q) := lim+ Df p (1 − ε)q + εu , (5.15)
ε→0

where u is the uniform distribution in Prob(n).

Exercise 5.1.5. Let f : (0, ∞) → R be a convex function that satisfy f (1) = 0, and let
f˜(r) := rf 1r . Show that f˜ is also convex with f˜(1) = 0 and prove that

Df (p∥q) = Df˜(q∥p) . (5.16)

Theorem 5.1.1. Let f : (0, ∞) → R be a convex function that satisfy f (1) = 0.

Then, Df as defined in Definition 5.1.2 is a classical divergence.

Proof. The normalization condition Df (1|1) = 0 is directly derived from the requirement
that f (1) = 0. To illustrate the data processing inequality, consider m, n ∈ N, a stochastic
matrix E ∈ STOCH(m, n), and probability vectors p, q ∈ Prob(n). Define r := Ep and
s := Eq. For each x ∈ [m] and y ∈ [n], let ex|y represent the (x, y)-component of E.
P
With these definitions, the x-components of r and s are respectively rx = y∈[n] ex|y py
P
and sx = y∈[n] ex|y qy . Assuming initially that q > 0, we find that if sx = 0, then ex|y = 0
for all y ∈ [n]. Consequently, if sx = 0, it follows that rx must also be 0. This leads to the
conclusion that
X rx
Df (Ep∥Eq) = Df (r∥s) = sx f , (5.17)
sx
x∈supp(s)

where the summation is limited to all x ∈ [m] for which sx ̸= 0. If sx = 0, then rx is also 0,
which contributes 0f ( 00 ) = 0 to the sum in (5.12), with (r, s) replacing (p, q).
The strategy of the proof involves representing rx /sx , as seen on the right-hand side
of (8.171), as a convex combination of the ratios py /qy y ∈ [n]. This is achieved by defining,
for each x ∈ supp(s),
ex|y qy ex|y qy
ty|x := P = . (5.18)
y ′ ex|y ′ qy ′ sx

It is important to note that for every x ∈ supp(s), the set {ty|x }y∈[n] forms a probability
vector, and for all x ∈ supp(s), it holds that

rx X py
= ty|x . (5.19)
sx qy
y∈[n]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

234 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

Inserting this expression into (8.171), we obtain

X X py
Df (Ep∥Eq) = sx f ty|x
qy
x∈supp(r) y∈[n]

X X py
by convexity→ ⩽ ty|x sx f
qy
x∈supp(s) y∈[n]
(5.20)
X X py
(5.18)→ = ex|y qy f
qy
y∈[n] x∈supp(s)

X py
ex|y = 1 −−−−→ = qy f = Df (p∥q) ,
X

x∈supp(s) qy
y∈[n]

P
where the equality x∈supp(s) ex|y = 1 is valid because ex|y = 0 if sx = 0.
For the case q ⩾ 0 define qε := (1 − ε)q + εu. We then get that qε > 0 so that for any
ε>0
Df (Ep∥Eqε ) ⩽ Df (p∥qε ) . (5.21)
Taking the limit ε → 0+ on both sides of the equation above and using continuity of Df (p∥q)
in q (see Exercise 5.1.6) completes the proof.

Exercise 5.1.6 (Continuity of Df (p∥q) in q). In the last part of the proof above we used the
fact that the f -Divergence is continuous in q. Show that in general, if {qk }k∈N is a sequence
of probability vectors in Prob(n) that satisfies qk → q as k → ∞ then

lim Df (p∥qk ) = Df (p∥q) . (5.22)

k→∞

Hint: Observe that f is continuous in (0, ∞) since it is convex.

Corollary 5.1.1. Let n, m ∈ N, p, q ∈ Prob(n), and p′ , q′ ∈ Prob(m). The

condition (p, q) ≻ (p′ , q′ ) holds if and only if for every convex function f (0, ∞) → R
with f (1) = 0
Df (p∥q) ⩾ Df (p′ ∥q′ ) . (5.23)

Exercise 5.1.7. Prove the corollary above. Hint: Use (5.10), Exercise 5.1.3, and Theo-
rem 5.1.1 to prove the above corollary.

5.1.3 Examples
In this subsection we give several examples of f -divergences that play important role in
applications.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.1. CLASSICAL DIVERGENCES 235

Kullback–Leibler divergence

The Kullback–Leibler divergence (also known as the KL-divergence or the relative entropy) is
perhaps the most well known divergence which appears in numerous applications in statistics,
information theory, and as we will see in resource theories. For this reason, it is the only
divergence that we will denote simply by D without any subscript. It is the f -divergence
that corresponds to the function f (r) = r log r. For this choice, we get,
(P
x∈[n] px log px − log qx if p ≪ q
D(p∥q) = (5.24)
∞ otherwise

where p ≪ q denotes supp(p) ⊆ supp(q), and we use the convention 0 log 0 = 0. In the next
chapters we will study the many properties of this divergence.

The Trace Distance

The trace distance, also known as the total variation distance (sometimes also called sta-
tistical distance), is an f -divergence with f (r) = 21 |r − 1|. For this convex function we
get
X 1 px 1X 1
Df (p∥q) = qx −1 = |px − qx | = ∥p − q∥1 . (5.25)
2 qx 2 2
x∈[n] x∈[n]

This f -divergence, which also functions as a metric, will be examined in detail in the subse-
quent sections.

The Hellinger Distance

√
This is another distance measure that is closely related an f -divergence with f (r) = 21 ( r −
1)2 . For this f we get
r 2
X 1 px 1X √ √
Df (p∥q) = qx −1 = ( px − qx )2 (5.26)
2 qx 2
x∈[n] x∈[n]

The Hellinger distance is defined as the square root of the above expression
s
1X √ √
H(p, q) := ( p x − qx ) 2 (5.27)
2
x∈[n]

We will see later on that the above divergence is also a metric that is closely related to a
quantity known as the fidelity.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

236 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

The α-Divergence
r −r α
The α-Divergence is an f -Divergence corresponding to fα (r) = α(α−1) , where α ∈ [0, ∞),
α
r −r
where the case α = 1 is defined by the limit limα→1 α(α−1) = r ln(r) (which yields the
KL-divergence), and similarly the case α = 0 is given by − ln(r). For this choice of f we get
 
α
X 1 px px 1 X
Dfα (p∥q) = qx − =  pαx qx1−α − 1 (5.28)
α(α − 1) qx qx α(α − 1)
x∈[n] x∈[n]

The α-divergence can be expressed as a function of the Rényi divergences that we will study
in the Chapter 6.
Exercise 5.1.8. Show that all the functions f above are convex and satisfy f (1) = 0.
Exercise 5.1.9 (The Jensen–Shannon Divergence). Let f : (0, ∞) → R given by

2
f (r) = (r + 1) log + r log r ∀r∈R. (5.29)
r+1
Show that f is convex with f (1) = 0 and compute its f -Divergence.

5.1.4 Continuity of Divergences

Divergences are generally not continuous over Prob(n) × Prob(n), even when they exhibit
continuity in their first and/or second arguments. For example, consider the two sequences
of probability vectors
T T
1 1 1 1
pk := ,1 − and qk := ,1 − 2 ∀k∈N. (5.30)
k k k2 k
Clearly, we have    
0 0
p := lim pk =   and q := lim qk =   . (5.31)
k→∞ k→∞
1 1
Consider now the α-divergence with α = 2 as defined in (5.28). Denote this divergence by
 
1 X
D′ (p∥q) =  p2x qx−1 − 1 (5.32)
2
x∈[n]

It is simple to check (see the exercise below) that

1
lim D′ (pk ∥qk ) = ̸= 0 = D′ (p∥q) . (5.33)
k→∞ 2
For the α divergences, this discontinuity only happens when the limits p, q are on the
boundary. In fact, we will see in Chapter 6 that many divergences are continuous in the
interior of Prob(n) × Prob(n).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.1. CLASSICAL DIVERGENCES 237

Exercise 5.1.10. Let D′ be the α-divergence for α = 2, and let pk and qk as in (5.31).
Prove Eq. (5.33).

We show now that by utilizing the data processing inequality, if a divergence is continuous
in one of its arguments it is necessarily continuous in the second argument as well.

Theorem 5.1.2. Let D be a divergence. The following statements are equivalent:

1. For every fixed q ∈ Prob(n) the function p 7→ D(p∥q) is continuous in Prob(n).

2. For every fixed p ∈ Prob(n) the function q 7→ D(p∥q) is continuous in Prob(n).

Proof. We will demonstrate the implication of 1 ⇒ 2. The converse, 2 ⇒ 1, will be estab-

lished using a similar approach.
Let q, q′ ∈ Prob(n), and define the channel E ∈ STOCH(n, n) by its action on every
v ∈ Prob(n) as

qx′
Ev := (1 − ε)(v − q) + q′ where ε := 1 − min . (5.34)
x∈[n] qx

By definition, Eq = q′ , and ε ∈ (0, 1) can be reduced to an arbitrarily small value by

taking q and q′ to have sufficiently small trace distance 12 ∥q − q′ ∥1 . Note also that from the
definition of ε it follows that q′ − (1 − ε)q ⩾ 0. Therefore, for every v ∈ Prob(n)

Ev = (1 − ε)v + q′ − (1 − ε)q ⩾ 0 .

(5.35)

The inequality above implies that E is indeed column stochastic. Moreover, by definition

∥p − Ep∥1 = ε(p − q) + q′ − q 1
⩽ ε + ∥ q − q′ ∥1 . (5.36)

This inequality demonstrates that as q converges towards q′ , the vector Ep approaches p

(recall that ε goes to zero as q approaches q′ ). Applying the DPI with E we can thus bound

D(p∥q) − D(p∥q′ ) ⩾ D(Ep∥Eq) − D(p∥q′ )

(5.37)
= D(Ep∥q′ ) − D(p∥q′ ) .

Since D(p∥q) is continuous in p, the expression on the right-hand side of the equation above
vanish when q → q′ .
Subsequently, we establish a comparable upper bound for D(p|q)−D(p|q′ ) by introducing
Ẽ ∈ STOCH(n, n) and ε̃ ∈ (0, 1). These are defined identically to how E and ε were defined,
but with the roles of q and q′ reversed. Specifically, Ẽ is defined by its action on every
v ∈ Prob(n) as
qx
Ẽv := (1 − ε̃)(v − q′ ) + q where ε̃ := 1 − min ′
. (5.38)
x∈[n] qx

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

238 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

By definition, Ẽq′ = q and ε̃ ∈ (0, 1). Further, following similar steps as above, it can
be verified that Ẽ is indeed column stochastic, and Ẽp approaches p as q approaches q′ .
Utilizing the DPI we also have
D(p∥q) − D(p∥q′ ) ⩽ D(p∥q) − D(Ẽp∥Ẽq′ )
(5.39)
= D(p∥q) − D(Ẽp∥q) .
Therefore, as before, due to the continuity of D(p∥q) in p, the expression on the right-hand
side of the equation above vanish when q′ → q. Combining this with the lower bound
in (5.37), we conclude that D(p∥q) is continuous in q.

5.1.5 Divergences from Measures of Nonuniformity

In this subsection, we demonstrate a one-to-one correspondence between divergences and
Schur convex functions that fulfill specific criteria. This relationship allows for the construc-
tion of divergences from Schur convex functions, thereby expanding the class of f -divergences.
This correspondence proves particularly valuable in Chapter 6, where we utilize it to establish
a bijection between entropies and relative entropies.

A Measure of Nonuniformity
Definition 5.1.3. A function
[
g: Prob(n) → R ∪ {∞} (5.40)
n∈N

is called a measure of nonuniformity if it satisfies the following three properties:

1. For every n ∈ N, the restriction of g to Prob(n) is Schur convex and continuous

on Prob(n).

2. For n = 1 it is normalized to g(1) = 0.

3. For all k, n ∈ N and r ∈ Prob(n), g r ⊗ u(k) = g(r) .

In Chapter 16 we will study the resource theory of nonuniformity in which the functions
above quantify the resource of this theory. Specifically, these functions quantify how different
a probability vector p is from the uniform distribution u. As an indication of this, note that
if D is a classical divergence that is continuous in its first argument, then the function
gD (p) := D p u(n)

∀ p ∈ Prob(n) , (5.41)
is a measure of non-uniformity.
Exercise 5.1.11. Verify that gD indeed satisfies all the three properties above. Hint: For
the third property show that
D(p∥u(n) ) = D p ⊗ u(k) u(nk)

(5.42)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.1. CLASSICAL DIVERGENCES 239

as a consequence of the DPI applied twice for channels introducing and removing an inde-
pendent distribution u(k) .

Exercise 5.1.12. Let f : (0, ∞) → R be convex with f (1) = 0. Show that for the f -
divergence, Df , we have

1 X
gDf (p) = f (npx ) ∀ p ∈ Prob(n) . (5.43)
n
x∈[n]

Verify by direct calculation that this expression satisfy the three properties of g.

In Theorem 4.3.2, we established that for every n ∈ N, p ∈ Prob(n), and q ∈ Prob>0 (n)∩
Qn , there is a vector r ∈ Prob(k) with the property that (p, q) ∼ (r, u(k) ). To elaborate, let
q = ( kk1 , . . . , kkn )T , where each kx ∈ N and k := k1 + · · · + kn . The vector r is then expressed
as:
M
r := px u(kx ) . (5.44)
x∈[n]

Building upon this equivalency, the following theorem demonstrates a bijective relationship
between divergences and measures of non-uniformity. However, prior to this, it’s essential
to explore the uniqueness of the vector r above.
The vector r is not unique. To see why, let {mx }x∈[n] be a set of n integers satisfying

kx mx X
qx = = where m := mx . (5.45)
k m
x∈[n]

Given any such set, we can define the probability vector s := x∈[n] px u(mx ) so that (p, q) ∼
L

(s, u(m) ). This demonstrates that r is not unique. However, since kmx = mkx , we observe
that:
M M
u(m) ⊗ r = px u(mkx ) = px u(kmx ) = u(k) ⊗ s . (5.46)
x∈[n] x∈[n]

The above relation highlights that for any measure of non-uniformity g, following the third
property of Definition 5.1.3, we obtain:

g(r) = g u(m) ⊗ r

(5.46)→ = g u(k) ⊗ s

(5.47)
= g(s) ,

where the last equality again utilizes the third property of Definition 5.1.3. Thus, in this
context, r and s have the same non-uniformity.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

240 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

Bijection between Divergences and Measures of Nonuniformity

Theorem 5.1.3. Let g be a measure of non-uniformity. For any n ∈ N, p ∈ Prob(n)
and q ∈ Prob>0 (n) ∩ Qn , define

Dg (p∥q) := g(r), (5.48)

where r ∈ Prob(k) is the vector defined in (5.44). For general q ∈ Prob(n), Dg is

defined via a continuous extension. Then, Dg is a divergence and continuous in q for
any fixed p ∈ Prob(n).

Remark. Observe that Dg in (5.48) is well defined since g(r) = g(s) for any other vector s
as defined above. We will also show in the proof below that the continuous extension for
general q ∈ Prob(n) is well defined.

Proof. We first show that Dg is a divergence on the restricted space in which q ∈ Prob>0 (n)∩
Qn . The normalization of Dg holds since Dg (1∥1) = g(1) = 0. To show the DPI, let
p ∈ Prob(n), q ∈ Prob>0 (n) ∩ Qn , and E ∈ STOCH(m, n) ∩ Qm×n be a stochastic matrix
(channel) with rational components. Let further k ∈ N be large enough such that we can
express
T T
k1′ k′

k1 kn ′
q= ,..., , q := Eq = ,..., m , (5.49)
k k k k

where kx , kx′ ∈ N with ′

P P
x∈[n] kx = x∈[m] kx = k. Due to Theorem 4.3.2 there exists
r, s ∈ Prob(k) such that (p, q) ∼ (r, u(k) ) and (Ep, Eq) ∼ (s, u(k) ). By definition, (p, q) ≻
(Ep, Eq) so that (r, u(k) ) ≻ (s, u(k) ), or equivalently, r ≻ s. Hence,

Dg (Ep∥Eq) = g(s) ⩽ g(r) = Dg (p∥q) , (5.50)

where the inequality follows from the Schur convexity of g.

Next, we must demonstrate continuity in the second argument within Prob>0 (n) ∩ Qn
to ensure the continuous extension is well defined. As inferred from (5.48), for any fixed
q ∈ Prob>0 (n)∩Qn , the function Dg (p∥q) is continuous in p ∈ Prob(n) due to the continuity
of g. Combining this with Theorem 5.1.2 we conclude that indeed the function Dg (p∥q) is
also continuous in the second argument q ∈ Prob>0 (n) ∩ Qn . Finally, observe that the DPI
remains valid when we define the quantity for irrational q via continuous extension. This
completes the proof.

Exercise 5.1.13. Describe explicitly the bijection between divergences that are continuous
in their second argument and measures of non-uniformity. That is, for any g express the
corresponding D and vice versa.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.2. QUANTUM DIVERGENCES 241

5.2 Quantum Divergences

In this section we study the quantum version of a classical divergence. We start with their
formal definition and some of their basics properties, and then move to study a systematic
approach to extend classical divergences to the quantum domain. We then give examples
focusing on the quantum extension of the f -divergence. Similarly to the definition of a
classical divergence, a quantum divergence is also defined in terms of the DPI.
In the following definition we consider a function that is acting on pairs of quantum states
in all finite dimensions:
[n o
D: D(A) × D(A) → R ∪ {∞} (5.51)
A

Definition 5.2.1. The function D, as presented in (5.51), is termed a quantum

divergence if it fulfills the following two criteria:

1. Data Processing Inequality (DPI): D E(ρ) E(σ) ⩽ D(ρ∥σ) for all
E ∈ CPTP(A → B) and all ρ, σ ∈ D(A).

2. Normalization: D(1∥1) = 0.

Remark. Note that a classical divergence can be viewed as a quantum divergence whose
domain is restricted to classical systems. The union in (5.51) is over all systems A and
particularly over all finite dimensions |A|. Therefore, the domain of D consists of pairs of
density matrices (ρ, σ) in any dimension |A| ∈ N. For the case of a trivial system A with
|A| = 1 the only density matrix in D(A) is the number one. In this case, divergences satisfies
D(1∥1) = 0.
Like classical divergences, quantum divergences are non-negative since for any pair of
states ρ, σ ∈ D(A) we have the trace Tr ∈ CPTP(A → 1), so that

D(ρ∥σ) ⩾ D Tr[ρ] Tr[σ] = D(1∥1) = 0 , (5.52)

where the inequality follows from the DPI. Moreover, recall that any state ρ ∈ D(A) with
dimension |A| > 1 can be viewed as a preparation channel ρ1→A ∈ CPTP(1 → A). Hence,

D(ρA ∥ρA ) = D ρ1→A (1) ρ1→A (1) ⩽ D(1∥1) = 0 ,

(5.53)

where again we used the DPI for divergences. Combining this with the non-negativity of
divergences we conclude that for any state ρ ∈ D(A) in any dimension |A| ∈ N

D(ρ∥ρ) = 0 . (5.54)

This is consistent with the intuition that divergences quantify the distinguishability between
two states.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

242 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

An interesting question remaining is whether the converse of the above property also
holds. A quantum divergence, D, is said to be faithful if for any ρ, σ ∈ D(A), the condition
D(ρ∥σ) = 0 implies ρ = σ. We will see later on that not all quantum divergences are faithful.
However, in the following lemma we show that a quantum divergence is faithful if and only
if its reduction to classical systems is faithful.

Lemma 5.2.1. Let D be a quantum divergence. Then, D is faithful if and only if its
reduction to classical (diagonal) states is faithful.

Proof. Clearly, if D is faithful on quantum states it is also faithful on classical states as

the latter is a subset of the former. Suppose now that D is faithful on classical states, and
suppose by contradiction that there exists ρ ̸= σ ∈ D(A) such that D(ρ∥σ) = 0. Then, there
exists a basis of A such that the diagonal of ρ in this basis does not equal to the diagonal
of σ (see exercise below). Let ∆ ∈ CPTP(A → A) be the completely dephasing channel in
this basis. Then, ∆(ρ) ̸= ∆(σ) and we get

D ∆(ρ) ∆(σ) ⩽ D(ρ∥σ) = 0 . (5.55)

But since D is faithful on diagonal states we get the contradiction that ∆(ρ) = ∆(σ). Hence,
D is faithful also on quantum states.

Exercise 5.2.1. Let ρ, σ ∈ D(A). Show that if ρ ̸= σ then there exists a basis of A such
that the diagonal of ρ in this basis does not equal to the diagonal of σ in the same basis.

The data processing inequality implies that quantum divergences are invariant under
isometries. That is, for any isometry channel V ∈ CPTP(A → B) we have

D V(ρ) V(σ) = D(ρ∥σ) ∀ ρ, σ ∈ D(A) . (5.56)

To see why, recall that every isometry channel has a left inverse channel R ∈ CPTP(B → A)
that satisfies RB→A ◦ V A→B = idA (see Section 3.5.8). Hence, by definition of R

D(ρ∥σ) = D R ◦ V(ρ) R ◦ V(σ)

DPI→ ⩽ D V(ρ) V(σ) (5.57)
DPI→ ⩽ D(ρ∥σ) .

That is, all the inequalities above must be equalities so that (5.56) holds.

Exercise 5.2.2. Use the invariance property of quantum divergences under isometries, to
show that any classical divergence, D, satisfies

D(p ⊕ 0∥q ⊕ 0) = D(p∥q) . (5.58)

Exercise 5.2.3. Let D be a quantum divergence, ρ, σ ∈ D(A), and E ∈ CPTP(A → B).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.2. QUANTUM DIVERGENCES 243

1. Show that if there exists a channel F ∈ CPTP(B → A) such that F ◦ E(ρ) = ρ and
F ◦ E(σ) = σ then
D(ρ∥σ) = D E(ρ) E(σ) . (5.59)

2. Show that for any ω ∈ D(B)

D(ρ ⊗ ω∥σ ⊗ ω) = D(ρ∥σ) . (5.60)

Exercise 5.2.4. Let ρ, σ ∈ D(A) and denote by

Tr[ρΛ]
sup(ρ/σ) := sup . (5.61)
0⩽Λ⩽I A Tr[σΛ]

1. Show that the function above satisfies the DPI; i.e.

sup E(ρ)/E(σ) ⩽ sup(ρ/σ) ∀ E ∈ CPTP(A → B) . (5.62)

2. Show that
sup(ρ/σ) = inf λ ∈ R : λσ − ρ ⩾ 0 . (5.63)

3. Show that the function f (ρ∥σ) := sup(ρ/σ) − 1 is a quantum divergence.

Joint Convexity
We say that a quantum divergence D is jointly convex if for any quantum system A, m ∈ N,
p ∈ Prob(m), and two sets, {ρx }x∈[m] and {σx }x∈[m] of m density matrices in D(A) we have
X X X
D px ρ x px σ x ⩽ px D(ρx ∥σx ) . (5.64)
x∈[n] x∈[n] x∈[n]

Although not every quantum divergence exhibits joint convexity, the combination of joint
convexity with both the property described in (5.60), and the invariance under isometries,
results in a condition that is more stringent than DPI.

Lemma 5.2.2. Let D be a function with the same domain and range as a quantum
divergence that is invariant under isometries. Suppose further that D is jointly
convex and satisfies (5.60) for any quantum systems A and B, and quantum states
ρ, σ ∈ D(A) and ω ∈ D(B). Then D satisfies the DPI.

Proof. Due to Stinespring dilation theorem, the invariance under isometries implies that it
is sufficient to prove that for any two bipartite states ρ, σ ∈ D(AB)

D ρAB σ AB ⩾ D ρA σ A .

(5.65)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

244 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

Let k := |B|2 and R ∈ CPTP(B → B) be the completely randomizing (or equivalently

depolarizing) channel. Such a channel takes all states to the maximally mixed state, and
in Chapter 8 (particularly, Exercise 3.5.32) we will see
P that it can be expressed as the
following uniform mixture of unitary channels, R = n1 x∈[n] Ux , where n := |B|2 and each
Ux ∈ CPTP(B → B) is a unitary channel. We then get from (5.60) that

D ρA σ A = D ρA ⊗ uB σ A ⊗ uB

B→B AB B→B AB

=D R ρ R σ
1 X (5.66)
D UxB→B ρAB UxB→B σ AB

Joint Convexity → ⩽
n
x∈[n]

Invariance under unitaries → = D ρAB σ AB

This completes the proof.

5.2.1 The Quantum f -Divergence

In previous subsections, we observed that a wide range of classical divergences can be repre-
sented as an f -Divergence. This subsection delves into their extension within the quantum
realm. We will explore how this quantum extension gives rise to a variety of divergences
that are significant in practical applications. However, it is important to note that extend-
ing classical divergences to the quantum domain generally does not yield unique results. In
particular, the quantum f -divergence is not the only possible quantum extension. We will
explore other quantum extensions of this concept in the following subsection.
To motivate the formal definition of the quantum f -divergence, we first discuss a useful
correspondence between pairs of density matrices and pairs of probability distributions.
Consider two quantum states ρ, σ ∈ D(A) with spectral decomposition (here m := |A|)
X X
ρ= px |ax ⟩⟨ax | and σ = qy |by ⟩⟨by | (5.67)
x∈[m] y∈[m]

where {|ax ⟩}x∈[m] and {|by ⟩}y∈[m] are orthonormal bases consisting of the eigenvectors of ρ
and σ, respectively. Define the probability vectors p̃, q̃ ∈ Prob(m2 ) whose components are
given by
p̃xy := px |⟨ax |by ⟩|2 and q̃xy = qy |⟨ax |by ⟩|2 ∀ x, y ∈ [m] . (5.68)
Now, if D is a classical divergence then we can extend it to ρ, σ ∈ D(A) by

Dq (ρ∥σ) := D p̃ q̃ .

(5.69)

Clearly, Dq (ρ∥σ) is zero if ρ = σ, and in the exercise below you show that if ρ and σ are
diagonal then Dq (ρ∥σ) := D p q , where p and q are the diagonals of ρ and σ. In the
following lemma we show that Dq is invariant under isometries.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.2. QUANTUM DIVERGENCES 245

Lemma 5.2.3. Let D be a classical divergence and define Dq as in (5.69). Then, for
any isometry channel V ∈ CPTP(A → B) and any ρ, σ ∈ D(A)

Dq V(ρ) V(σ) = Dq (ρ∥σ) .

(5.70)

Proof. The non-zero components of p̃ and q̃ remain unchanged if ρ and σ are replaced with
V(ρ) and V(σ), for any isometry V ∈ CPTP(A → B). Moreover, note that with n := |A|

K := span{V |ax ⟩ : x ∈ [m]} = span{V |by ⟩ : y ∈ [m]} , (5.71)

since both {|ax ⟩}x∈[m] and {|by ⟩}y∈[m] are bases of A. Hence, denoting by n := |B|, the
additional n−m zero eigenvalues of V(ρ) (and similarly of V(σ)), corresponds to eigenvectors
that are in the orthogonal complement of K. Hence, if p̃, q̃ corresponds to ρ and σ as in (5.68)
then p̃ ⊕ 0k and q̃ ⊕ 0k corresponds to V(ρ) and V(σ), respectively, where 0k is the zero
vector in dimension k := n2 − m2 . Hence,

Dq V(ρ) V(σ) = D p̃ ⊕ 0k q̃ ⊕ 0k = D p̃ q̃ = Dq (ρ∥σ) ,

(5.72)

where the second equality follows from the fact that classical divergences are invariant under
embedding (see Exercise 5.2.2).

Exercise 5.2.5. Show that if ρ and σ are diagonal then Dq (ρ∥σ) := D p q , where p and
q are the diagonals of ρ and σ.
Due to Lemma 5.2.3, the Stinespring delation implies that Dq (ρ∥σ) as defined above is
a quantum divergence if and only if it is non-increasing under the partial trace. This later
property does not hold in general, however, it does hold when D belongs to a large class of
f -divergences.

Quantum f -Divergence
Definition 5.2.2. Let f : (0, ∞) → R be an operator convex function satisfying
f (1) = 0. Let Df be its corresponding classical f -divergence as defined in
Definition 5.1.2. The quantum f -divergence, Dfq , is defined on any ρ, σ ∈ D(A) as

Dfq (ρ∥σ) := Df (p̃∥q̃) , (5.73)

where p̃ and q̃ are the probability vectors in Prob(m2 ) as defined in (5.68).

Remark. We will see below that the requirement that f is operator convex (vs just convex)
ensures that Dfq is indeed a quantum divergence. Moreover, from (5.15), for any ρ, σ ∈ D(A)
with spectral decomposition as in (5.67) we have (see exercise below)

X px
Df (ρ∥σ) = lim+ (qy + ε)f |⟨ax |by ⟩|2 . (5.74)
ε→0
x,y
q y + ε

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

246 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

Exercise 5.2.6. Prove (5.74) and use it to show that for any ρ, σ ∈ D(A)

X px
|⟨ax |by ⟩|2 + f (0)Tr (I − ρ0 )σ + f˜(0)Tr (I − σ 0 )ρ , (5.75)

Df (ρ∥σ) = qy f
qy
x∈supp(p)
y∈supp(q)

where f˜(0) := limr→∞ f (r) r

, and ρ0 and σ 0 are the projections to the supports of ρ and σ.
Hint: For the first part, note that the components of (1 − ε)q + εu (see (5.15)) can be written
as qy + ε( n1 − qy ). Now ε( n1 − qy ) is positive if qy = 0, and if qy > n1 then since f is continuous
we can still replace ε( n1 − qy ) with ε in the limit ε → 0+ .

The quantum f -divergence has the following quantum formula.

Quantum Formula
Theorem 5.2.1. Let f : (0, ∞) → R be an operator convex function satisfying
f (1) = 0. For any ρ, σ ∈ D(A) with σ > 0 the quantum f -Divergence can be
expressed as h
−1
i
Df (ρ∥σ) = Tr ϕA
σ
Ã
f σ ⊗ ρ T
, (5.76)

where |ϕA Ã
σ ⟩ := σ
1/2
⊗ I Ã |ΩAÃ ⟩ is a purification of σ. For σ ⩾ 0 the f -divergence
satisfies (with u ∈ D(A) is the maximally mixed state)

Df (ρ∥σ) = lim+ Df ρ (1 − ε)σ + εu . (5.77)
ε→0

Proof. Suppose first that σ > 0. Then, by definition

X p̃xy
Df (ρ∥σ) := Df p̃ q̃ = q̃xy f
q̃xy
x,y∈[m]
(5.78)
X
2 px
= qy |⟨ax |by ⟩| f .
qy
x,y∈[m]

Now, for every x, y ∈ [m] we can express qy |⟨ax |by ⟩|2 as follows:

E
qy |⟨ax |by ⟩|2 = ΩAÃ qy |by ⟩⟨by | ⊗ |ax ⟩⟨ax |T ΩσAÃ
D E (5.79)
1 1 AÃ T AÃ
qy |by ⟩⟨by | = σ |by ⟩⟨by |σ −−−−→ = ϕσ
2 2 |by ⟩⟨by | ⊗ |ax ⟩⟨ax | ϕσ .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.2. QUANTUM DIVERGENCES 247

Substituting this expression into (5.78) we obtain

D X px E
Df (ρ∥σ) = ϕσAÃ f T
|by ⟩⟨by | ⊗ |ax ⟩⟨ax | ϕσAÃ
qy
x,y∈[m]
 
D X px E
(5.80)
{|by ⟩⟨by |⊗|ax ⟩⟨ax |T }x,y∈[m] −−−−→ = ϕσAÃ f |by ⟩⟨by | ⊗ |ax ⟩⟨ax |T  ϕA
σ
Ã
is orthonormal qy
x,y∈[m]
D E
= ϕσAÃ f σ −1 ⊗ ρT ϕA Ã

σ ,

The case σ ⩾ 0 follows directly from Exercise 5.1.4 and is left as an exercise.

We now demonstrate that the expression for the f -divergence, as outlined in the preceding
theorem, satisfies the data processing inequality when f is operator convex.

Theorem 5.2.2. Let f : (0, ∞) → R be an operator convex function satisfying

f (1) = 0. Then, Dfq as defined in Definition 5.2.2 is a quantum divergence.

Proof. Due to Stinespring dilation theorem, any E ∈ CPTP(A → B) can be expressed as

an isometry followed by a partial trace. From its definition and Lemma 5.2.3 it follows that
the quantum f -divergence is invariant under isometries. It is therefore left to show that it
is monotonic under partial trace. For this purpose, let ρ, σ ∈ D(AB) and without loss of
generality we assume that σ AB > 0 since the case σ AB ⩾ 0 follows from the continuity of Df
in the limit ε → 0+ .
We need to show that
Df (ρA ∥σ A ) ⩽ Df (ρAB ∥σ AB ) . (5.81)

From the quantum formula in (5.76) we see that the left hand side of (5.81) depends on
(σ A )−1 ⊗ (ρÃ )T whereas the right hand side depends on (σ AB )−1 ⊗ (ρÃB̃ )T . In the exercise
below you will show that there exists an isometry that relates between these two expressions.
Explicitly, you will show that there exists an isometry V : AÃ → AÃB B̃ (with V ∗ V = I AÃ )
such that T −1 Ã T
∗ AB −1 ÃB̃
V = σA

V σ ⊗ ρ ⊗ ρ . (5.82)

Combining this with the operator Jensen’s inequality (B.30) for operator convex functions
we get
T T
A −1 Ã ∗ AB −1 ÃB̃

f σ ⊗ ρ =f V σ ⊗ ρ V
T (5.83)
∗ AB −1 ÃB̃

(B.30)→ ⩽ V f σ ⊗ ρ V .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

248 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

Finally, multiplying both sides by ϕσAÃ and taking the trace gives
T
A A Ã ∗ AB −1
ϕA ÃB̃

Df ρ ⩽ Tr V
σ σ ⊗ ρ σ V f
T
(5.84)

AB ÃB̃ AB −1 ÃB̃

(5.86)→ = Tr ϕσ f σ ⊗ ρ

= Df ρAB σ AB .

This completes the proof.

Exercise 5.2.7. Let V : AÃ → AB ÃB̃ be the matrix

X 1 A − 12
AB 2

V = σ σ ⊗ |y⟩ ⊗ I Ã ⊗ |y⟩B̃ .
B
(5.85)
y∈[m]

1. Show that V is an isometry, i.e. V ∗ V = I AÃ .

2. Show that V satisfies (5.82).

3. Show that
A
21 Ã

AÃ

AB
21 ÃB̃

V σ ⊗I Ω = σ ⊗I ΩAB ÃB̃ , (5.86)

Examples:
1. The Umegaki Divergence. For the function f (r) = r log r

X px
Df (ρ∥σ) = lim+ (qy + ε)f |⟨ax |by ⟩|2
ε→0
x,y
q y + ε

X px
= lim+ px log |⟨ax |by ⟩|2
ε→0
x,y
qy + ε (5.87)
X X
= px log px − px |⟨ax |by ⟩|2 log qy
x x,y

= Tr[ρ log ρ] − Tr[ρ log σ] ,

where in the last line we used the relation ⟨by |ρ|by ⟩ = x px |⟨ax |by ⟩|2 . Since f (r) =
P
r log r is operator convex, the above expression is a quantum divergence. It is known
as the Umegaki divergence or sometimes referred to as the relative entropy. We will
discuss many of its properties in the following chapters.

2. The Trace Distance? The function f (r) = 21 |r − 1| is convex but it is not operator
convex (on any domain that includes 1). Therefore, we cannot conclude that for this

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.3. OPTIMAL QUANTUM EXTENSIONS OF DIVERGENCES 249

choice Dfq is a quantum divergence. Moreover, note that for this case
X 1 px
Df (ρ∥σ) = lim+ (qy + ε) − 1 |⟨ax |by ⟩|2
ε→0
x,y
2 qy + ε
(5.88)
1X
= px |⟨ax |by ⟩|2 − qx |⟨ax |by ⟩|2 ,
2 x,y

which cannot be expressed as a simple function of ρ and σ.

r −r α
3. The quantum α-divergence. The function f (r) = α(α−1) is known to be operator convex
for α ∈ [0, 2] (cf. Table B.1). For any ρ, σ ∈ D(A) let p̃, q̃ ∈ Prob(m2 ) (with m := |A|)
be the probability vectors defined in (5.68). Then, for this f we have
!
1 X
Dfq (ρ∥σ) = Df (p̃∥q̃) = p̃α q̃ 1−α − 1
α(α − 1) x,y xy xy
!
1 X α 1−α
px |⟨ax |by ⟩|2 qy |⟨ax |by ⟩|2

= −1
α(α − 1) x,y
! (5.89)
1 X
= |⟨ax |by ⟩|2 pαx qy1−α − 1
α(α − 1) x,y
1
Tr ρα σ 1−α − 1 .

=
α(α − 1)

Exercise 5.2.8. Use the quantum formula given in Theorem 5.2.1 to compute the Umegaki
divergence and the quantum α-divergence.

5.3 Optimal Quantum Extensions of Divergences

In addition to the method for extension introduced above, sometimes it is possible to guess a
quantum extension of a classical divergence simply by replacing p with ρ and q with σ. For
example, the trace distance divergence 21 ∥p − q∥1 can be replaced with its quantum version
1
2
∥ρ − σ∥1 . However, this method is not always simple and, in general, it is not even unique.
Moreover, one still needs to verify that the proposed quantum divergence indeed satisfies the
DPI. We therefore investigate in this subsection a more systematic approach for quantum
extensions.
Given a classical divergence D we denote by D its minimal quantum extension, and by D
its maximal quantum extension. That is, D and D are quantum divergences that reduce to
D on classical systems, and any other quantum divergence, D′ , that reduces to D on classical
states, necessarily satisfies

D(ρ∥σ) ⩽ D′ (ρ∥σ) ⩽ D(ρ∥σ) ∀ ρ, σ ∈ D(A) . (5.90)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

250 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

One may wonder whether such optimal quantum extensions exists. In the next theorem we
prove that they do using the following construction.
Let D be a classical divergence, and for any ρ, σ ∈ D(A) define
D(ρA ∥σ A ) := sup D E A→X ρA E A→X σ A ,

(5.91)
X X A X→A X A X→A X

D(ρ∥σ) := inf D p q : ρ =F (p ), σ = F (q ) , (5.92)
where the optimizations are over the classical system X, the channels E ∈ CPTP(A → X)
and F ∈ CPTP(X → A), as well as the probability distributions (diagonal density
matrices)
p, q ∈ D(X). Note that E is a POVM channel and therefore D E(ρ) E(σ) is well defined
since E(ρ) and E(σ) are classical states; i.e. they can be viewed as probability vectors or
diagonal density matrices. Similarly, pX and qX can be viewed either as diagonal density
matrices or as probability vectors. Moreover, the supremum and infimum are taken over all
dimensions |X| ∈ N.

Optimal Extensions
Theorem 5.3.1. Let D be a classical divergence, and let D and D be as in (5.91)
and (5.92), respectively. Then, both D and D are quantum divergences that reduces
to D on classical states. In addition, any other quantum divergence D′ that reduces
to D on classical states satisfies (5.90).

Proof. We first prove the reduction property. Let ρ, σ ∈ D(A) be classical states. Then, for
D we can take X in (5.91) to be a classical system with |X| = |A| and E to be the identity
channel. Since this identity channel is not necessarily the optimal channel, we get that
D(ρ∥σ) ⩾ D(ρ∥σ) . (5.93)
Conversely, since ρ and σ are classical, any E in (5.91) can be assumed to be classical since
E(ρ) = E ◦ ∆(ρ) and E(σ) = E ◦ ∆(σ) (5.94)
where ∆ is the completely dephasing channel. Therefore, if E is not classical we can replace
it with E ◦ ∆ which is classical (recall that the output of E is classical). Now, by the DPI

property of the classical divergence D we have for all such classical E, D E(ρ) E(σ) ⩽
D(ρ∥σ). Hence, we must have
D(ρ∥σ) ⩽ D(ρ∥σ) . (5.95)
Combining (5.93) with (5.95) we conclude that D(ρ∥σ) = D(ρ∥σ). Similarly, for D we can
assume that F in (5.92) is a classical channel since ρ and σ are classical. Hence, by the DPI
of D we get the lower bound D(ρ∥σ) ⩾ D(ρ∥σ), and this bound can be saturated since we
can take F in (5.92) to be the identity channel.
We next prove that D and D both satisfy the DPI. Let N ∈ CPTP(A → B). Then,

D N (ρ) N (σ) = sup D E ◦ N (ρ) E ◦ N (σ) : E ∈ CPTP(B → X)
X
′ ′
: E ′ ∈ CPTP(A → X)

E replaces E ◦ N → −−−−→ ⩽ sup D E (ρ) E (σ)
′ (5.96)
X
= D(ρ∥σ) .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.3. OPTIMAL QUANTUM EXTENSIONS OF DIVERGENCES 251

For D we have

D(ρ∥σ) := inf D(p∥q) : ρ = F(p), σ = F(q), F ∈ CPTP(X → A)
X

⩾ inf D(p∥q) : N (ρ) = N ◦ F(p), N (σ) = N ◦ F(q), F ∈ CPTP(X → A)
X
⩾ inf D(p∥q) : N (ρ) = F ′ (p), N (σ) = F ′ (q), F ′ ∈ CPTP(X → B)

X

= D N (ρ) N (σ) ,
(5.97)
where the first inequality follows from the fact that if ρ = F(p) then necessarily N (ρ) =
N ◦ F(p) (but the converse is not necessarily true), and in the second inequality we replaced
N ◦ F with F ′ .
Finally, we prove the optimality of D and D. First observe that from the DPI of D′ we
have for any ρ, σ ∈ D(A) and any E ∈ CPTP(A → X)

D′ (ρ∥σ) ⩾ D′ E(ρ) E(σ) = D E(ρ) E(σ) ,

(5.98)

where the last equality follows from the fact that D′ reduces to D on classical states. Since the
above inequality holds for all E ∈ CPTP(A → X) it also holds for the supremum over such
E. We therefore conclude that D′ (ρ∥σ) ⩾ D(ρ∥σ). For the second inequality, let ρ, σ ∈ D(A)
and p, q ∈ D(X), and suppose there exists F ∈ CPTP(X → A) such that ρ = F(p) and
σ = F(q). Then, from the DPI of D′ we get

D′ (ρ∥σ) = D′ F(p) F(q) ⩽ D′ (p∥q) = D(p∥q) ,

(5.99)

where the last equality follows from the fact that D′ reduces to D on classical states. Since
the above inequality holds for all such p, q for which there exists an F that takes them to ρ
and σ, it must also hold for the infimum over all such p, q. Hence, D′ (ρ∥σ) ⩽ D(ρ∥σ).
Since the maximal and minimal extension provides upper and lower bounds on all exten-
sions, it can be useful to have a closed formula for them. Remarkably, a closed formula for
the maximal extension exists if one of the input states is pure, or for the f -Divergences if f
is operator convex. On the other hand, at the time of writing this book, a closed formula
for the minimal extension of the f -divergence is not known. However, for specific examples
such as the trace distance and fidelity, the minimal extension can be computed (see the
next section), and as we will see in Chapter 6 the regularized minimal extension can also be
computed for all known relative entropies.

Exercise 5.3.1. Let f : [0, ∞) → [0, ∞) be a operator convex function, and for all ρ, σ ∈
D(A) with σ > 0 define
h 1 i
′ −2 − 12
Df (ρ∥σ) := Tr [ρ#f σ] = Tr σf σ ρσ , (5.100)

where #f is the Kobu-Ando operator mean (see Definition B.5.1). Finally, let Df be the
maximal f -divergence.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

252 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

1. Show that Df′ reduces to the classical f -divergence when ρ and σ are classical (i.e.
diagonal in the same basis).

2. Show that Df′ satisfies the DPI in the domain D(A) × D>0 (A). Hint: Show that Df′
satisfies all the conditions of Lemma 5.2.2.

3. Show that for any ρ, σ ∈ D(A) with σ > 0

Df (ρ∥σ) ⩾ Df′ (ρ∥σ) . (5.101)

Hint: Use Theorem 6.4

5.3.1 The Maximal Quantum Extension

The maximal extension can be expressed as

D(ρ∥σ) = inf D(p∥q) (5.102)

subject to the conditions

X X
ρ= px ωx and σ = qx ω x , (5.103)
x∈[n] x∈[n]

where n ∈ N, p, q ∈ Prob(n), and for each x ∈ [n], ωx ∈ D(A). Note that we replaced
F(|x⟩⟨x|) with ωx . The infimum above can include vectors p and q with zero components.
We now show that the number of zeros in each of these vectors can be restricted to be at
most one.

Lemma 5.3.1. The infimum in (5.102) can be restricted to vectors p, q ∈ Prob(n)

that has at most one zero component.

Proof. We first show that q can have this property. Since divergences are invariant under
(joint) permutation of the components of p and q, without loss of generality we can assume
that
q1 ⩾ · · · ⩾ qr > qr+1 = · · · = qn = 0 . (5.104)
where r is the number of non-zero components of q. With this order of q we have (see
Exercise 4.3.3) (p, q) ∼ (p′ , q), where
n
′
T X
p = p1 , . . . , pr , p′r+1 , 0, . . . , 0 where p′r+1 := px . (5.105)
x=r+1

Note that the relations in (5.103) can be expressed as

X X
ρ= px ωx + p′r+1 τ and σ = qx ωx , (5.106)
x∈[r] x∈[r]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.3. OPTIMAL QUANTUM EXTENSIONS OF DIVERGENCES 253

1
Pn
where τ := p′r+1 x=r+1 px ωx . Therefore, the vectors

T
p̃ = p1 , . . . , pr , p′r+1 and q̃ = (q1 , . . . , qr , 0) (5.107)

satisfy both (5.103) with n replaced by r + 1, and D(p∥q) = D(p̃∥q̃). Repeating the same
arguments for p̃ completes the proof.

It’s important to note that the lemma mentioned above aids in simplifying the opti-
mization problem described in (5.102). This simplification is achieved by assuming, without
any loss of generality, that p and q have forms similar to p̃ and q̃ as specified in (5.107).
Consequently, we can redefine the infimum in (5.102) as an infimum over all 1 < n ∈ N,
p ∈ Prob(n), and 0 < q ∈ Prob(n − 1), provided there are n − 1 density matrices
{ωx }x∈[n−1] ⊂ D(A) meeting the following criteria:
X X
ρ⩾ px ωx and σ = qx ωx , (5.108)
x∈[n−1] x∈[n−1]

where it is understood that the inequality in the first relation is satisfied if and only if there
exists a density matrix ωn ∈ D(A) such that
X
ρ= px ωx + pn ωn . (5.109)
x∈[n−1]

As we will see, in certain cases, working with the expression in (5.108) becomes more man-
ageable because q > 0. In other situations, it might be preferable to work with p > 0. It
is worth noting that by applying the same reasoning as above but substituting p for q and
vice versa, we can also express the infimum in (5.102) as an infimum over all 1 < n ∈ N,
0 < p ∈ Prob(n − 1) and q ∈ Prob(n), with the requirement of having n − 1 density matrices
{ωx }x∈[n−1] ⊂ D(A) that satisfy:
X X
ρ= px ωx and σ ⩾ qx ωx . (5.110)
x∈[n−1] x∈[n−1]

In the next theorem we employ this property to calculate the maximal divergence when one
of the input states is pure.

Theorem 5.3.2. Let D be a classical divergence, ψ ∈ Pure(A), and σ ∈ D(A). Then,

D(ψ∥σ) = D (1, 0)T (λmax , 1 − λmax )T

(5.111)

where n o
λmax := max λ ∈ R : λψ ⩽ σ . (5.112)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

254 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

Proof. Consider the relation (5.110) with the pure state ψ replacing ρ. Since ρ := ψ is a
pure state, the first relation in (5.110) can hold if and only if for any x ∈ [n − 1] we have
ωx = ψ. Substituting this into the second relation in (5.110) we obtain
X
σ⩾ qx ψ = (1 − qn )ψ . (5.113)
x∈[n−1]

Observe that this condition is equivalent to (1 − qn ) ⩽ λmax . Consequently, we have reached

the following expression:
n o
D(ψ∥σ) = inf D(p⊕0∥q) : 0 < p ∈ Prob(n−1), q ∈ Prob(n), qn ⩾ 1−λmax . (5.114)
1<n∈N

Finally, we simplify the expression above by demonstrating that we can confine the value of
n in the optimization above to be equal to two. To achieve this, let E be the 2 × n column
stochastic matrix  
1 ··· 1 0
E :=   . (5.115)
0 ··· 0 1
Observe that Ep = (1, 0)T and Eq = (1 − qn , qn )T so that

(p ⊕ 0, q) ≻ (E(p ⊕ 0), Eq) = (1, 0)T , (1 − qn , qn )T

(5.116)
−−−−→ ≻ (1, 0)T , (λmax , 1 − λmax )T

1 − qn ⩽ λmax .

Therefore, the minimum is obtained with n = 2 and with the pair (p, q) being equal to the
pair on the right hand side of the equation above.

5.3.2 The Maximal f -Divergence

In this section, we calculate the maximal f -divergence Df (ρ|σ) for ρ, σ ∈ D(A) with the
condition that σ > 0. The special case where σ possesses zero eigenvalues will be deferred
to the appendix.
When D equals the f -divergence, the expression given in Eq. (5.14), along with (5.108),
indicates that the infimum in (5.102) can be represented as follows:

n X px o
Df (ρ∥σ) := inf qx f + f˜(0)pn , (5.117)
1<n∈N qx
x∈[n−1]

where f˜(0) is defined in (5.13), and the infimum above is over all 1 < n ∈ N, p ∈ Prob(n)
and 0 < q ∈ Prob(n − 1), such that there exists n − 1 density matrices {ωx }x∈[n−1] ⊂ D(A)
1 1 1 1
satisfying (5.108). Denoting by Λx := qx σ − 2 ωx σ − 2 , and applying the conjugation σ − 2 (·)σ − 2
to both sides of (5.108) gives the relations:
1 1
X px X
σ − 2 ρσ − 2 ⩾ Λx and Λx = I A . (5.118)
qx
x∈[n−1] x∈[n−1]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.3. OPTIMAL QUANTUM EXTENSIONS OF DIVERGENCES 255

With these new notations, the infimum in (5.117) is taken over all 1 < n ∈ Prob(n), all
p ∈ Prob(n), and all POVMs {Λx }x∈[n−1] for which the inequality (5.118) holds with qx :=
Tr [Λx σ] > 0.
One natural choice/guess for the optimal n, p and {Λx }x∈[n−1] , is to choose them such
that the inequality in (5.118) becomes an equality. This is possible for example by taking
n = |A| + 1, and for any x ∈ [n − 1] to take Λx = ψx ∈ Pure(A) with |ψx ⟩ being the
1 1
x-eigenvector of σ − 2 ρσ − 2 corresponding to the eigenvalue px /qx (i.e. p is chosen such that
1 1
px /Tr[σΛx ] is the x-eigenvalue of σ − 2 ρσ − 2 ). For this choice we have
1 1
X px
σ − 2 ρσ − 2 = |ψx ⟩⟨ψx | (5.119)
qx
x∈[n−1]

which forces pn to be
X X px
pn = 1 − px = 1 − ⟨ψx |σ|ψx ⟩ = 1 − Tr[ρ] = 0 , (5.120)
qx
x∈[n−1] x∈[n−1]

where the last equality follows by multiplying both sides of (5.119) by σ and taking the
trace. Moreover, for these choices of n, p and {Λx }, we have

X px X px
qx f = Tr[σ|ψx ⟩⟨ψx |]f
qx qx
x∈[n−1] x∈[n−1]

X px
∀t ⩾ 0 f (t|ψx ⟩⟨ψx |) = f (t)|ψx ⟩⟨ψx | −−−−→ = Tr σf |ψx ⟩⟨ψx |
qx (5.121)
x∈[n−1]
h X p i
x
{|ψx ⟩}x∈[n−1] is orthonormal −−−−→ = Tr σf |ψx ⟩⟨ψx |
qx
x∈[n−1]
h 1 1
i
(5.119)→ = Tr σf σ − 2 ρσ − 2 .

Note that we obtained the formula above for a particular choice of n, p and {Λx }x∈[n−1] .
Therefore, since this is not necessarily the optimal choice (recall Df is defined in terms of
an infimum), we must have
h 1 i
−2 − 12
Df (ρ∥σ) ⩽ Tr σf σ ρσ = Tr [ρ#f σ] . (5.122)

where #f is the Kobu-Ando operator mean (see Definition B.5.1). Interestingly, to get this
upper bound we did not even assume that f is convex, but if f is operator convex we get an
equality above.

Closed Formula of The Maximal f -Divergence

Theorem 5.3.3. Let ρ, σ ∈ D(A) with σ > 0, and let f := (0, ∞) → R be operator
convex, with f (0) := limε→0+ f (ε) and f (1) = 0. Then,
h 1 i
−2 − 12
Df (ρ∥σ) = Tr [ρ#f σ] = Tr σf σ ρσ . (5.123)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

256 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

Remark.

1. In the theorem above we restricted the domain of Df to D(A)×D>0 (A). To incorporate

the singular case it is necessary to employ additional more delicate arguments (see
e.g. D.41
 inthe appendix). Explicitly, for the case that σ is singular, we can write
σ̃ 0
σ=  in a block matrix form with σ̃ > 0, and the formula for Df becomes (see
0 0
Appendix D.2)
Df (ρ∥σ) = Tr [ρ̃#f σ̃] + (1 − Tr[ρ̃])f˜(0) (5.124)
where f˜(0) := limε→0+ εf ( 1ε ), and ρ̃ 
:= ρ/ρ22 := ρ11 − ζρ−1

∗
22 ζ is the Schur complement

ρ11 ζ
(see (B.75)) of the block ρ22 of ρ =  .
∗
ζ ρ22

2. In Theorem B.5.1 we proved that for a continuous function f : [0, ∞) → [0, ∞) the
Kobu-Ando operator mean #f is operator convex if and only if it is jointly convex.
Therefore, at least in the domain D(A) × D>0 (A) the maximal f divergence is jointly
convex for any operator convex f .

Proof. The proof of the theorem follows immediately from the inequality (5.122) combined
with the opposite inequality (5.101).

Examples:
1. The Belavkin–Staszewski divergence. Consider the function f (r) = r log r. In this
case we have f˜(0) = limε→0+ εf (1/ε) = limε→0+ log(1/ε) = ∞. According to the closed
form in (D.42), this means that unless supp(ρ) ⊆ supp(σ) we have Df (ρ∥σ) = ∞. For
the case supp(ρ) ⊆ supp(σ) we have
h 1 1 i
−2 − 12 −2 − 12
Df (ρ∥σ) = Tr σ σ ρσ log σ ρσ (5.125)

The expression above can be simplified by using the relation M f (M ∗ M ) = f (M M ∗ )M

1 1
from Exercise B.0.1. Denoting M := ρ 2 σ − 2 we get

Df (ρ∥σ) = Tr [σM ∗ M log (M ∗ M )]

(B.1)→ = Tr [σM ∗ log (M M ∗ ) M ] (5.126)
h 1 i
1 −1 −1 21
M := ρ σ
2 2 −−−−→ = Tr ρ log ρ σ ρ2 .

This divergence is known as the Belavkin–Staszewski divergence.

r −r α
2. The maximal α-divergences. Consider the function fα (r) = α(α−1) which is known to
be operator convex for α ∈ (0, 2]. Therefore, for α ∈ (0, 2] we have (recall we assume

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.4. DIVERGENCES THAT ARE METRICS 257

σ > 0)
1 h 1 1 α
1 1
i
Dfα (ρ∥σ) = Tr σ σ − 2 ρσ − 2 − σ − 2 ρσ − 2
α(α − 1)
(5.127)
1 h 1 1 α
i
= Tr σ σ − 2 ρσ − 2 −1 .
α(α − 1)

5.4 Divergences that are Metrics

We saw that every norm induces a corresponding metric. Metrics are used to measure how
close two vectors are, but since there are many norms (e.g. the family of p-norms) there
are also numerous metrics that can be used to measure distance. Mathematically, in finite
dimensions, all metrics are topologically equivalent. Yet, physically, only very few have a
known operational meaning. In this subsection we study metrics that are also divergences.
We first consider norms that induce metrics that are also divergences. We saw in Chapter
1 that the 1-norm satisfies the monotonicity condition
∥Ev∥1 ⩽ ∥v∥1 ∀ v ∈ Cm ∀ E ∈ STOCH(n, m) . (5.128)
This monotonicity property ensures that the metric D(p, q) := 21 ∥p − q∥1 is a divergence.
Remarkably, for vectors in Rn+ , up to a multiplication by a constant, the 1-norm is the only
norm with this monotonicity property!
To see this, let ∥ · ∥ be a norm satisfying the same monotonicity property 5.128, and
suppose (1, 0)T = 1. From (5.128) with ∥ · ∥ replacing ∥ · ∥1 , it follows that ∥v∥ is invariant
under permutation of its components, and further more it also implies that ∥v∥ = ∥v ⊕ 0∥.
Hence, if {e1 , . . . , em } is the standard basis of Cm then ∥ej ∥ = 1 for all j = 1, . . . , m. Now,
let E be the column stochastic matrix whose first row is [1, 1, . . . , 1] and all other rows are
zero. Then,
X X
|v1 + · · · + vm | = ∥Ev∥ ⩽ ∥v∥ = vj ej ⩽ |vj | = ∥v∥1 (5.129)
j∈[m] j∈[m]

Note that if all the components of v are non-negative real numbers then all the inequalities
above must be equalities and we get in particular that ∥v∥ = ∥v∥1 .

5.4.1 The Trace Norm

The trace norm is the Schatten 1-norm as introduced in Definition 2.3.1. Specifically, for
any operator M : A → B it is defined by:
√
∥M ∥1 := Tr M ∗ M . (5.130)
Therefore, the trace norm is the sum of the singular values of M . Recall that all Schatten
norms satisfy the invariance property under isometries (see (2.70)). Particularly, that for
any two isometries U : B → B ′ and V : A → A′ we have
UMV ∗ 1
= ∥M ∥1 . (5.131)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

258 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

Exercise 5.4.1.
1. Show that the trace norm is indeed a norm.
2. Show that the trace norm is always bigger than the norm induced by the inner prod-
uct (2.16).
Exercise 5.4.2. Show that for any 3 Hermitian operators M, N, σ ∈ Herm(A), with σ > 0,
the following holds:
1. Tr(M N ) ⩽ ∥M ∥2 ∥N ∥2
√ √
2. ∥ σM σ∥1 ⩽ ∥M ∥2 ∥σ∥2
p
3. ∥M ∥1 ⩽ Tr[σ] σ −1/4 M σ −1/4 2 .
√
Hint: Use part (b) with M replaced by σ −1/4 M σ −1/4 and σ replaced by σ.
where ∥ · ∥2 is the norm induced by the Hilbert-Schmidt inner product.
The subsequent two lemmas establish that the trace norm can be formulated as opti-
mization problems. These formulations are instrumental in proving various properties of the
trace norm. We begin with an expression that is particularly useful for Hermitian matrices.

Definition 5.4.1. Let M : A → A be an Hermitian operator. The trace norm of M

can be expressed as:

∥M ∥1 = max Tr [M Π] : −I A ⩽ Π ⩽ I A , Π ∈ Herm(A) .

(5.132)

Proof. Let M+ and M− be the positive and negative parts of M (see (2.54)), and let Π−
and Π+ = I − Π− the projections to the negative, and non-negative eigenspaces of M . With
these notations we have |M | = M+ + M− , so that the trace norm of M can be expressed as
∥M ∥1 = Tr[M+ ] + Tr[M− ]
= Tr [M (Π+ − Π− )] (5.133)
Exercise 5.4.3→ = max Tr [M Π] ,
−I⩽Π⩽I

where the maximum is over all matrices Π ∈ Herm(A) with eigenvalues between −1 and
1.
Exercise 5.4.3. Prove the last equality in Eq. (5.133).
Exercise 5.4.4. Show that for any two (normalized) pure states |ψ⟩, |ϕ⟩ ∈ A we have
1 p
T (ψ, ϕ) := |ψ⟩⟨ψ| − |ϕ⟩⟨ϕ| 1
= 1 − |⟨ψ|ϕ⟩|2 . (5.134)
2
Hint: Denote |0⟩ := |ψ⟩ and express |ϕ⟩ := a|0⟩ + b|1⟩ where |1⟩ is some (normalized)
orthogonal vector to |0⟩.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.4. DIVERGENCES THAT ARE METRICS 259

Exercise 5.4.5. Let A be a Hilbert space and let |ψ⟩, |ϕ⟩ ∈ A be two (normalized) states in
A. Denote ψ := |ψ⟩⟨ψ| and ϕ := |ϕ⟩⟨ϕ|. Show that

1
∥ψ − ϕ∥1 ⩽ |ψ⟩ − |ϕ⟩ (5.135)
2

where the norm on the right-hand side is the induced inner-product norm ∥|χ⟩∥ := ⟨χ|χ⟩1/2 .
Hint: Use the previous exercise.

The trace norm can also be expressed as an optimization over partial isometries.

Lemma 5.4.1. Let A and B be two finite dimensional Hilbert spaces, and let
M : A → B be a linear operator. Then, the trace norm of M can be expressed as

∥M ∥1 = max |Tr [V M ]| , (5.136)

V :B→A

where the maximum is over all partial isometries V : B → A.

Proof. Express M = x∈[n] λx |ϕB A

P
x ⟩⟨ψx |, where {λx }x∈[n] are the singular values of M , and
{|ψxA ⟩}x∈[n] and {|ϕB
x ⟩}x∈[n] are orthonormal
P sets Aof vectors in A and B, respectively. Let
B
U : B → A be the partial isometry U = x∈[n] |ψx ⟩⟨ϕx |. We then have

∥M ∥1 = Tr [U M ] ⩽ max Tr [V M ] (5.137)
V :B→A

P is overB all Apartial isometries V : B → A. Substituting into the right

where the maximum
hand side M = x∈[n] λx |ϕx ⟩⟨ψx | gives
X
λx Tr V |ϕB A

∥M ∥1 ⩽ max x ⟩⟨ψx |
V :B→A
x∈[n]
X
⩽ max λx ⟨ψxA |V |ϕB
x⟩ (5.138)
V :B→A
x∈[n]
X
see (5.139) below→ ⩽ λx = ∥M ∥1 ,
x∈[n]

where we used the Cauchy-Schwarz inequality to get

p
⟨ψxA |V |ϕB
x⟩ ⩽ ⟨ψx |ψx ⟩⟨ϕx |V ∗ V |ϕx ⟩
p
= ⟨ϕx |V ∗ V |ϕx ⟩ (5.139)
V ∗ V is a projection −−−−→ ⩽ 1

Hence, all the inequalities in (5.138) must be equalities. This completes the proof.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

260 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

Exercise 5.4.6. Show that if |A| ⩾ |B| in Lemma (5.4.1) then the maximization over
partial isometries in (5.136) can be replaced with maximization over isometries V : B → A.
Similarly, show that if |A| ⩽ |B| then

∥M ∥1 = max Tr [U ∗ M ] , (5.140)
U :A→B

where the maximum is over all isometries U : A → B

Strong Monotonicity Property

We show here that the trace norm behaves monotonically under a positive linear map (not
necessarily CPTP). This monotonicity property is stronger than what we discussed so far.

Monotonicity of the Trace Norm

Theorem 5.4.1. Let E ∈ L(A → B) be a trace non-increasing positive linear map,
and let M ∈ L(A). Then,
∥E(M )∥1 ⩽ ∥M ∥1 . (5.141)

Proof. Express M = x∈[n] λx |ϕA A

P
x ⟩⟨ψx |, where {λx }x∈[n] are the singular values of M , and
{|ψxA ⟩}x∈[n] and {|ϕA
x ⟩}x∈[n] are orthonormal sets of vectors in A. Then, from the triangle
inequality of the trace norm we have
X X
λx E |ϕA A
λx E |ϕA A

∥E(M )∥1 = x ⟩⟨ψx | ⩽ x ⟩⟨ψx | 1
. (5.142)
1
x∈[n] x∈[n]

A A

Hence, it will be sufficient
P to prove that E |ϕx ⟩⟨ψ x | 1
⩽ 1 for all x since this would
imply that ∥E(M )∥1 ⩽ x∈[n] λx = ∥M ∥1 . For simplicity of the exposition we remove the
sub-index x from the rest of the proof, since nothing will depend on it.
Now, the square matrix E(|ϕ⟩⟨ψ|) has a polar decomposition

E(|ϕ⟩⟨ψ|) = U |E(|ϕ⟩⟨ψ|)| . (5.143)

Since U is a unitary matrix in L(B) it is diagonalizable and can be expressed as

X
U= eiθy |φy ⟩⟨φy | , (5.144)
y∈[n]

where {|φy ⟩} is an orthonormal basis of B, and {θy } are some phases. Hence,

∥E(|ϕ⟩⟨ψ|)∥1 = Tr |E(M )| = Tr [U ∗ E(|ϕ⟩⟨ψ|)]

X
(5.144)→ = e−iθy Tr [|φy ⟩⟨φy |E (|ϕ⟩⟨ψ|)]
y∈[n] (5.145)
X
= e−iθy Tr [E ∗ (|φy ⟩⟨φy |) |ϕ⟩⟨ψ|] .
y∈[n]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.4. DIVERGENCES THAT ARE METRICS 261

From the Exercise 3.4.4, it follows that E ∗ is positive and sub-unital (i.e. E ∗ (I) ⩽ I).
Therefore, the matrices Λy := E ∗ (|φy ⟩⟨φy |) ⩾ 0 form an incomplete POVM since
X
Λy = E ∗ (I B ) ⩽ I A . (5.146)
y∈[n]

With this notation,

X X
∥E(|ϕ⟩⟨ψ|)∥1 = e−iθy Tr [Λy |ϕ⟩⟨ψ|] = e−iθy ⟨ψ|Λy |ϕ⟩
y∈[n] y∈[n]
X X p p
⩽ |⟨ψ|Λy |ϕ⟩| = ⟨ψ| Λy Λy |ϕ⟩
y∈[n] y∈[n]
Xq (5.147)
Cauchy-Schwarz inequality→ ⩽ ⟨ψ|Λy |ψ⟩⟨ϕ|Λy |ϕ⟩
y∈[n]
1X
geometric-arithmetic inequality→ ⩽ (⟨ψ|Λy |ψ⟩ + ⟨ϕ|Λy |ϕ⟩) ⩽ 1 .
2
y∈[n]

This completes the proof.

Exercise 5.4.7. Provide an alternative (simpler) proof of the theorem above for the case
that M is Hermitian. Hint: Use the previous lemma and prove first that if −I B ⩽ Π ⩽ I B
then −I A ⩽ E ∗ (Π) ⩽ I A .

5.4.2 The Trace Distance

The trace distance between two states ρ, σ ∈ D(A) is defined by

1
T (ρ, σ) := ∥ρ − σ∥1 (5.148)
2

The inclusion of the one-half factor is for normalization purposes, specifically to ensure
that the distance reaches its maximum value of 1 when the two states, ρ and σ, are orthogonal
(refer to Exercise 5.4.8 for more details).
Consider the case in which ρ and σ are classical, or
Pequivalently commute,Pand therefore
diagonal in the same basis. In this case, denoting ρ = x∈[n] px |x⟩⟨x| and σ = x∈[n] qx |x⟩⟨x|
we get
1 X 1X
T (ρ, σ) := (px − qx )|x⟩⟨x| = |px − qx | =: T (p, q) , (5.149)
2 1 2
x∈[n] x∈[n]

where p := (p1 , . . . , pn )T , q = (q1 , . . . , qn )T , and T (p, q) as defined above denotes the trace
distance between the classical probability vectors p and q.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

262 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

In the general case, since both ρ and σ have the same trace

0 = Tr[ρ − σ] = Tr[(ρ − σ)+ ] − Tr[(ρ − σ)− ] , (5.150)

where (ρ − σ)± are the positive and negative parts of ρ − σ. Therefore, denoting by Π+ the
projection to the positive eigenspace of ρ − σ, we conclude that
1
T (ρ, σ) := (Tr[(ρ − σ)+ ] + Tr[(ρ − σ)− ]) = Tr[(ρ − σ)+ ] = Tr[(ρ − σ)Π+ ] (5.151)
2
That is, the trace distance can be written as

T (ρ, σ) = max Tr[(ρ − σ)Π] (5.152)

0⩽Π⩽I

where the maximization is over any matrix Π ∈ Pos(A) (not necessarily a projection) with
eigenvalues between 0 and 1. The expression above for the trace distance will be useful in
some of applications we discuss later on.
The monotonicity of the trace norm under quantum channels (in fact positive maps)
implies the monotonicity of the trace distance as well. We summarize it in the following
theorem.

Monotonicity of the Trace Distance

Theorem 5.4.2. The trace distance is a quantum divergence. In particular, for any
ρ, σ ∈ D(A) and E ∈ CPTP(A → B)

T (E(ρ), E(σ)) ⩽ T (ρ, σ) . (5.153)

Exercise 5.4.8. Let ρ, σ ∈ D(A). Show that

T (ρ, σ) = 1 ⇐⇒ ρσ = σρ = 0. (5.154)

Exercise 5.4.9. Let u ∈ D(A) be the maximally mixed state and ψ ∈ Pure(A) be a pure
state. Show that the trace distance between these two states is given by
m−1
T uA , ψ A = , (5.155)
m
where m := |A|.

A Relationship Between the Trace Distance and the Ky Fan Norm

There is an interesting relationship between the trace distance and the Ky Fan norm (see
Definition 2.3.2) that we will use in our study of entanglement theory. Consider a quantum
state ρ ∈ D(A) and let m < |A| be an integer. Further, we denote by ρ(m) the m-pruned
version of ρ:
Πm ρΠm
ρ(m) := (5.156)
Tr [ρΠm ]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.4. DIVERGENCES THAT ARE METRICS 263

where Πm ∈ Pos(A) is the projection to the subspace spanned by the m eigenvectors of the
m largest eigenvalues of ρ. By definition, Tr [ρΠm ] = ∥ρ∥(m) and ρ commutes with ρ(m) . In
the following exercise you use these properties to show that the trace distance between ρ
and ρ(m) is related to the Ky Fan norm.

Exercise 5.4.10. Using the same notations as above, show that

T ρ, ρ(m) = 1 − ∥ρ∥(m) ,

(5.157)

where ∥ · ∥(m) is the Ky Fan norm. Hint: Use the relations T ρ, ρ(m) = Tr ρ(m) − ρ + and
Tr [ρΠm ] = ∥ρ∥(m) , and the fact that ρ commutes with ρ(m) .

In the following theorem we use the notation Dm (A) to denote the set of all density
matrices in D(A), whose rank is not greater than m.

Theorem 5.4.3. Using the same notations as above, the trace-distance of ρ to the
set Dm (A) is given by

T ρ, Dm (A) := min T (ρ, σ) = T ρ, ρ(m) = 1 − ∥ρ∥(m) .

(5.158)
σ∈Dm (A)

Proof. Let σ ∈ Dm (A) and observe that since Rank(σ) ⩽ m we have

∥ρ∥(m) ⩾ Tr [Πσ ρ]
= Tr [Πσ σ] + Tr [Πσ (ρ − σ)]
h i
= 1 + Tr Πσ (ρ − σ)+ − (ρ − σ)−
(5.159)
Tr [Πσ (ρ − σ)+ ] ⩾ 0 −−−−→ ⩾ 1 − Tr [Πσ (ρ − σ)− ]
Πσ ⩽ I A −−−−→ ⩾ 1 − Tr(ρ − σ)−
= 1 − T (ρ, σ) .

Therefore, for every σ ∈ Dm (A) we get that

T (ρ, σ) ⩾ 1 − ∥ρ∥(m)
(5.160)
Exercise 5.4.10→ = T ρ, ρ(m) .

On the other hand, since ρ(m) ∈ Dm (A) by taking above σ = ρ(m) we can achieve an equality.
Hence,
T ρ, Dm (A) = T ρ, ρ(m) = 1 − ∥ρ∥(m) .

(5.161)

This completes the proof.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

264 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

Optimality of the Trace Distance

In (5.25) we saw that for f (r) = 12 |r − 1| we get that the classical f -divergence is equal
to the classical trace distance. We show now that the minimal quantum extension of the
classical trace distance equals to the quantum trace distance. Explicitly, let Tc be the classical
trace distance, and let T c be its minimal quantum extension. That is, for any ρ, σ ∈ D(A)
(cf. (5.91))
T c (ρ, σ) := sup Tc E(ρ), E(σ) , (5.162)
E∈CPTP(A→X)

where the supremum is over all classical systems X, POVM channels E ∈ CPTP(A → X),
and the diagonal matrices E(ρ) and E(σ) are viewed as probability vectors.

Theorem 5.4.4. Using the same notations as above, for all ρ, σ ∈ D(A)
1
T c (ρ, σ) = T (ρ, σ) := ∥ρ − σ∥1 (5.163)
2

Remark. The theorem above demonstrates that the quantum trace distance is the smallest
divergence that reduces to the classical trace distance on classical states.
Proof. From Sec. (5.3), particularly Theorem 6.4, it follows that for any ρ, σ ∈ D(A) we have
T (ρ, σ) ⩾ T c (ρ, σ) since T c is the minimal quantum divergence that reduces to the classical
trace distance when the input restricted to classical states. To prove the converse inequality
T (ρ, σ) ⩽ T c (ρ, σ) , let Π± be the two projections to the positive and negative eigenspaces
of ρ − σ, and let E ∈ CPTP(A → X) with |X| = 2 be its corresponding POVM channel; i.e.
E(ω) = Tr[ωΠ+ ]|0⟩⟨0| + Tr[ωΠ− ]|1⟩⟨1| for all ω ∈ D(A). Then, by definition,
1
T c ρ, σ ⩾ Tc E(ρ), E(σ) = (Tr[(ρ − σ)+ ] + Tr[(ρ − σ)− ]) = T (ρ, σ) . (5.164)
2
This completes the proof.

Joint Convexity of the Trace Distance

The monotonicity of the trace distance under quantum channels implies that the trace dis-
tance is jointly convex.

Theorem 5.4.5. Let p = (p1 , . . . , pm )T and q = (q1 , . . . , qm )T be two probability

vectors, and let {ρx }x∈[m] and {σx }x∈[m] be two sets of m density matrices in D(A).
Then, X X X
T px ρx , qx σx ⩽ T (p, q) + px T (ρx , σx ) , (5.165)
x∈[m] x∈[m] x∈[m]
1
P
where T (p, q) := 2 x∈[m] |px − qx | is the classical trace distance between two
probability vectors.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.4. DIVERGENCES THAT ARE METRICS 265

We provide two proofs for this theorem to illustrate different techniques.

Proof. Let X be a classical system (register) and define the classical-quantum states
X X
ρXA := px |x⟩⟨x|X ⊗ ρA
x and σ XA
:= qx |x⟩⟨x|X ⊗ σxA . (5.166)
x∈[m] x∈[m]

Then,
1 X
T ρXA , σ XA = |x⟩⟨x| ⊗ (px ρx − qx σx )
2 1
x∈[m]
1 X
= ∥px ρx − qx σx ∥1
2
x∈[m]
1 X
= ∥px ρx − px σx + px σx − qx σx ∥1 (5.167)
2
x∈[m]
1 X
Triangle inequality→ ⩽ ∥px ρx − px σx ∥1 + ∥px σx − qx σx ∥1
2
x∈[m]
X 1 X
= px T (ρx , σx ) + |px − qx | .
2
x∈[m] x∈[m]

Hence, X X
px ρA qx σxA = T TrX [ρXA ], TrX [σ XA ]

T x,
x∈[m] x∈[m]

monotonicity (Theorem 5.4.2)→ ⩽ T ρXA , σ XA

X
⩽ px T (ρA A
x , σx ) + T (p, q) .
x∈[m]

This completes the proof.

Alternative Proof. Let Π be the optimal projection such that
X X h X i
T px ρ x , qx σx = Tr Π (px ρx − qx σx ) (5.168)
x∈[m] x∈[m] x∈[m]

Therefore,
X X X
T px ρx , qx σx = px Tr [Π(ρx − σx )] + (px − qx )Tr[Πσx ]
x∈[m] x∈[m] x∈[m]
X X
⩽ px T (ρx , σx ) + (px − qx )+ (5.169)
x∈[m] x∈[m]
X
= px T (ρx , σx ) + T (p, q) ,
x∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

266 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

where we used the fact that Tr[Πσx ] ⩽ 1 and the fact that px − qx ⩽ (px − qx )+ . This
completes the proof.
We conclude this subsection by discussing a nuanced yet crucial property of the trace
distance. This property is highly relevant to certain applications in quantum information,
though it is often overlooked. Consider ρ, σ ∈ D(A) and let us define ε := T (ρ, σ). If
ε is very small, it implies that ρ and σ are nearly identical states. This concept can be
articulated as follows: Decompose ρ − σ into positive and negative parts, written as ρ − σ =
(ρ − σ)+ − (ρ − σ)− . Then, define two states ω± := 1ε (ρ − σ)± . Given that ε = T (ρ, σ) =
Tr(ρ − σ)+ = Tr(ρ − σ)− , it follows that ω± are valid density matrices in D(A). Furthermore,
we can express:
ρ − σ = ε(ω+ − ω− ); . (5.170)
The importance of this equation lies in the fact that the matrix H := ω+ − ω− is bounded,
satisfying −I ⩽ H ⩽ I. Additionally, the equation ρ = σ + εH does not depend explicitly
on the underlying dimension |A|.
To further elucidate this point, consider the following straightforward example involving
the Schatten 2-norm (the norm induced by the Hilbert-Schmidt inner product). Let ρn = n1 In
denote the n × n maximally mixed state. Observe that its 2-norm is calculated as follows:
p 1
∥ρn ∥2 := Tr[ρ2n ] = √ . (5.171)
n
Consequently, as n approaches infinity, ∥ρn ∥2 tends towards zero, while the trace norm
∥ρn ∥1 = 1 for all n ∈ N.
Exercise 5.4.11. Using the same notations as above, show that if a set of Hermitian matrices
{Hn }, with each Hn ∈ Herm(Cn ), satisfies limn→∞ ∥Hn ∥1 = 0, then there exists a sequence
of positive numbers {εn } with a limit limn→∞ εn = 0 and a set of bounded matrices {Mn }
with −In ⩽ Mn ⩽ In such that Hn = εn Mn .

5.4.3 The Fidelity

The fidelity is another distance like measure between two quantum states, however, unlike
the trace distance or any other divergence, it reaches its maximal value when the two states
are the same, and its minimal value when they are orthogonal. For any ρ, σ ∈ D(A) it is
defined by

√ √ √ √ √ √
q
F (ρ, σ) := ∥ ρ σ∥1 = Tr | ρ σ| = Tr σρ σ . (5.172)

Consider first the simpler P

case that ρ and σ commutes.
P In this case, there exists a basis
{|x⟩}x∈[n] of A such that ρ = x∈[n] px |x⟩⟨x| and σ = x∈[n] qx |x⟩⟨x|. We then have
√ √ X√ X√
F (ρ, σ) = ∥ ρ σ∥1 = px qx |x⟩⟨x| = px qx := F (p, q) (5.173)
1
x∈[n] x∈[n]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.4. DIVERGENCES THAT ARE METRICS 267

where p := (p1 , . . . , pn )T , q = (q1 , . . . , qn )T , and F (p, q) as defined above denotes the fidelity
√ √
between the classical probability vectors p and q. If ρ = σ we get that F (ρ, ρ) = ∥ ρ ρ∥1 =
∥ρ∥1 = Tr[ρ] = 1. Moreover, for any ρ, σ ∈ D(A), the fidelity F (ρ, σ), cannot be greater
than one. This will follow trivially from Uhlmann’s theorem below, and can also be seen
from the following argument:
√ √ √ √
| ρ σ|2 = σρ σ
√ √
ρ = I − (I − ρ) −−−−→ = σ − σ(I − ρ) σ (5.174)
√ √
σ(I − ρ) σ ⩾ 0 −−−−→ ⩽ σ ⩽ I A .
√ √
Therefore, | ρ σ| ⩽ I A .
Exercise 5.4.12. Let ρ, σ ∈ D(A).
1. Show that the fidelity is symmetric: F (ρ, σ) = F (σ, ρ). Hint: Use the fact that for any
complex matrix M , the matrix M ∗ M has the same non-zero eigenvalues as M M ∗ .
p
2. Show that if σ = |ψ⟩⟨ψ| is pure then F (ρ, σ) = ⟨ψ|ρ|ψ⟩
√ √
Exercise 5.4.13. Let ρ, σ ∈ D(A). Show that if λ is an eigenvalue of the matrix √ | ρ σ|
√
then λ2 is an eigenvalue of the non-Hermitian matrix ρσ. Hint: Let M = σρ σ and
N = ρσ and find a matrix η ⩾ 0 such that M = η −1 N η, where η −1 is the generalized inverse
of η.

Uhlmann’s Theorem
The last part in the exercise above also implies that if both ρ = |ψ⟩⟨ψ| and σ = |ϕ⟩⟨ϕ|
are pure, then the fidelity becomes the absolute value of the inner product between the two
states; i.e. F (ρ, σ) = |⟨ψ|ϕ⟩|. The following theorem by Uhlmann’s shows that this can be
extended to mixed states by considering all the possible purifications of ρ and σ.

Uhlmann’s Theorem
Theorem 5.4.6. Let ρ, σ ∈ D(A) be two density matrices, and let |ψ AB ⟩ and |ϕAC ⟩
be two purifications of ρA and σ A , respectively. Then,

F (ρA , σ A ) = max |⟨ψ AB |V ∗ |ϕAC ⟩| (5.175)

V B→C

where the maximum is over all partial isometries V : B → C.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

268 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

where U, W are two isometries. Thus,

√ √
max ⟨ψ AB |V ∗ |ϕAC ⟩ = max ⟨ΩAÃ | ρ σ ⊗ U ∗ V ∗ W |ΩAÃ ⟩
V B→C V B→C
h√ √ i
part 2 of Exercise 2.3.26→ = max Tr ρ σ (U ∗ V ∗ W )T
V B→C
√ √
= max Tr Ū ρ σW T V̄

V B→C (5.177)
√ √
Lemma 5.4.1→ = Ū ρ σW T 1
√ √
(5.131)→ = ρ σ 1
= F (ρA , σ A ) .

This completes the proof.

Uhlmann’s theorem has numerous applications in quantum information, and we will use
it quite often later on in the book. The following corollary is an immediate consequence of
Uhlmann’s theorem. We leave its proof as an exercise.

Corollary 5.4.1. Let ρ, σ ∈ D(A). Then, F (ρ, σ) ⩽ 1 with equality if and only if ρ = σ.

The next consequence of Uhlmann’s theorem is the monotonicity of the fidelity under
quantum channels.

Monotonicity of the Fidelity

Corollary 5.4.2. Let ρ, σ ∈ D(A), and let E ∈ CPTP(A → B) be a quantum
channel. Then,
F (ρ, σ) ⩽ F (E(ρ), E(σ)) . (5.178)

Proof. Let |ψ AC ⟩ and |ϕAC ⟩ be optimal purifications of ρ and σ such that the fidelity
F (ρ, σ) = |⟨ψ AC |ϕAC ⟩| (i.e. we are using Uhlmann’s Theorem). Now, from Stinespring
dilation theorem there exists an isometry V : A → BE such that

E(ρ) = TrE [V ρV ∗ ] = TrEC V ⊗ I C |ψ AC ⟩⟨ψ AC | V ∗ ⊗ I C

(5.179)
E(σ) = TrE [V σV ∗ ] = TrEC V ⊗ I C |ϕAC ⟩⟨ϕAC | V ∗ ⊗ I C .

F (E(ρ), E(σ)) ⩾ ⟨ψ̃ BEC |ϕ̃BEC ⟩ = ⟨ψ AC |ϕAC ⟩ = F (ρ, σ) . (5.180)

This completes the proof.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.4. DIVERGENCES THAT ARE METRICS 269

Note that since the partial trace is a quantum channel it follows that for any two bipartite
states ρ, σ ∈ D(AB) we have

F (ρA , σ A ) ⩾ F (ρAB , σ AB ) . (5.181)

Another corollary of Uhlmann’s theorem is the joint concavity of the fidelity.

Joint Concavity of the Fidelity

Corollary 5.4.3. Let {px }x∈[m] and {qx }x∈[m] be two probability distributions, and
let {ρx }x∈[m] and {σx }x∈[m] be two sets of m density matrices in D(A). Then,
X X X√
F px ρx , qx σx ⩾ px qx F (ρx , σx ) . (5.182)
x∈[m] x∈[m] x∈[m]

Remark. Note that from the corollary above it follows in particular that
X X X
F px ρ x , px σ x ⩾ px F (ρx , σx ) . (5.183)
x∈[m] x∈[m] x∈[m]

Hence, the fidelity is jointly concave.

Proof. Let {|ψxAB ⟩}x∈[m] and {|ϕAB A
x ⟩}x∈[m] be optimal purifications, respectively, of {ρx }x∈[m]
A
and {σx }x∈[m] such that

F (ρx , σx ) = ⟨ψxAB |ϕAB

x ⟩ ∀ x ∈ [m] . (5.184)

Define also the pure states

where C is some mP dimensional system. Note that the two states above are purifications of
A A
P
p ρ
x∈[m] x x and q σ
x∈[m] x x , respectively. Therefore, we must have
X X
F px ρ A
x , q x σx
A
⩾ ⟨ψ̃ ABC |ϕ̃ABC ⟩
x∈[m] x∈[m]
X√
(5.185)→ = px qx ⟨ψxAB |ϕAB
x ⟩ (5.186)
x∈[m]
X√
= px qx F (ρx , σx ) .
x∈[m]

This completes the proof.

Exercise 5.4.14. Prove the joint concavity of the fidelity by defining classical quantum states
as in (5.166) and using the monotonicity of the Fidelity under the partial trace.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

270 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

Exercise 5.4.15. The square fidelity on Prob(n) × Prob(n) is defined for all p, q ∈ Prob(n)
as X√ X √
2 √
F (p, q)2 = px qx = p · q + px qx py qy . (5.187)
x∈[n] x,y∈[n]
x̸=y

Show that the square of the fidelity is concave on each of its arguments; that is, show that
for any k ∈ N, {qz }z∈[k] ⊂ Prob(n), and t ∈ Prob(k) we have
X X 2
tz F (p, qz )2 ⩽ F p, tz qz . (5.188)
z∈[k] z∈[k]

Similarly, show that the square fidelity is concave with respect to the first argument.
Exercise 5.4.16. Let ρ, σ ∈ D(A) and let τ, ω ∈ D(B). Show that

F ρA ⊗ τ B , σ A ⊗ ω B = F ρA , σ A F τ B , ω B .

(5.189)

Optimal Extensions of the Classical Fidelity

The corollaries above also implies that the function 1 − F (ρ, σ) is a faithful quantum diver-
gence. We can therefore study the optimality of the fidelity by exploring the minimal and
maximal quantum extensions of the classical divergence 1 − F (p, q). In particular, the max-
imal quantum extension of the classical fidelity F (p, q), which corresponds to the minimal
quantum extension of 1 − F (p, q), is given by (cf. (5.91))

F (ρA , σ A ) := F E A→X (ρA ), E A→X (σ A )

inf ∀ ρ, σ ∈ D(A) , (5.190)
E∈CPTP(A→X)

where the infimum is over all classical systems X and all POVM channels E ∈ CPTP(A →
X). By applying Theorem 6.4 to the classical divergence 1−F (p, q) we get that any function
f that satisfies the same monotonicity property (5.178) as the fidelity and that reduces to the
fidelity on classical states must satisfy f (ρ, σ) ⩽ F (ρ, σ) for all ρ, σ ∈ D(A). Remarkably,
the Uhlmann’s theorem implies that the fidelity in fact equals to this maximal quantum
extension.

Optimality
Corollary 5.4.4. For any ρ, σ ∈ D(A)

F (ρ, σ) = F (ρ, σ) (5.191)

Proof. As discussed above, the inequality F (ρ, σ) ⩾ F (ρ, σ) follows by applying Theorem 6.4
to the classical divergence 1 − F (p, q). To prove the converse, let {|x⟩⟨x|}x∈[m] be the
orthonormal eigenbasis of the Hermitian matrix
1/2
Λ = σ −1/2 σ 1/2 ρσ 1/2 σ −1/2 (5.192)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.4. DIVERGENCES THAT ARE METRICS 271

such that Λ|x⟩ = λx |x⟩, with {λx }x∈[m] being the eigenvalues of Λ. The key reason for this
choice of basis is that the matrix Λ satisfies

ΛσΛ = ρ . (5.193)

Taking E ∈ CPTP(A → X) to be the completely dephasing channel in the basis {|x⟩}x∈[m] ,

we obtain
Xp Xp
F E(ρ), E(σ) = ⟨x|ρ|x⟩⟨x|σ|x⟩ = ⟨x|ΛσΛ|x⟩⟨x|σ|x⟩
x x
Xp X
= λ2x ⟨x|σ|x⟩⟨x|σ|x⟩ = λx ⟨x|σ|x⟩
x x (5.194)
X
= λx Tr [σ|x⟩⟨x|] = Tr[Λσ]
x
h i
1/2 1/2 1/2

= Tr σ ρσ = F (ρ, σ) .

This completes the proof.

Exercise 5.4.17. Prove (5.193).

Since the function 1 − F (p, q) is a classical divergence, we can also define its maximal
quantum extension. This maximal extension corresponds to the minimal quantum extension
of the fidelity. The minimal quantum extension of the classical fidelity is given by

F (ρ, σ) = sup F (p, q) (5.195)

where the supremum is over all classical systems X, and over all p, q ∈ D(X) for which
there exists a channel E ∈ CPTP(X → A) such that ρ = E(p) and σ = E(q) (depending on
the context, we are using the notation p, q to indicate either diagonal density matrices in
D(A) or probability vectors in Prob(n) ).

The Minimal Fidelity

Theorem 5.4.7. The minimal quantum extension of the classical fidelity is given by
1
− 12 − 12 2
F (ρ, σ) = FM (ρ, σ) := Tr σ̃ σ̃ ρ̃σ̃ . (5.196)

where σ̃ = σ11 and ρ̃ := ρ11 − ζρ−1 ∗

22 ζ are as given in (5.124).

Proof. Define
r
X √ X px
D⋆ (p∥q) := 1 − F (p, q) = q x − px q x = qx 1 − (5.197)
qx
x∈[n] x∈supp(q)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

272 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES
√
Therefore, D⋆ equals the f -Divergence, Df , with f (r) := 1 − r. Note that the function
f : [0, ∞) → R is continuous and operator convex, so we can apply the formula (5.124) to
get a closed form for the maximal quantum extension √ of Df . Explicitly, from (5.124) and
the fact that f˜(0) := limε→0+ εf (1/ε) = limε→0+ (ε + ε) = 0 we get
h 1 1
i
Df (ρ∥σ) = Tr σ̃f σ̃ − 2 ρ̃σ̃ − 2
1 1
−2 − 21 2
= Tr σ̃ 1 − σ̃ ρ̃σ̃ (5.198)
1
− 12 − 21 2
= 1 − Tr σ̃ σ̃ ρ̃σ̃ .

Hence, 1
− 12 − 12 2
F (ρ, σ) = 1 − Df (ρ∥σ) = Tr σ̃ σ̃ ρ̃σ̃ . (5.199)

This completes the proof.

Note that if σ > 0 then σ̃ = σ and ρ̃ = ρ, and if σ ̸> 0 then one can express the minimal
fidelity in terms of the limit (cf. Exercise D.2.2)
FM (ρ, σ) = lim+ FM (ρ, σ + εI) (5.200)
ε→0

Exercise 5.4.18. Prove the following properties of the minimal fidelity:

1. Ranges from zero to one; i.e. 0 ⩽ FM (ρ, σ) ⩽ 1 for all ρ, σ ∈ D(A).
2. Symmetry; FM (ρ, σ) = FM (σ, ρ) for all ρ, σ ∈ D(A) (Hint, use the symmetry of the
classical divergence).
3. Attains zero for orthogonal states.
4. Joint Concavity; satisfies (5.183) with F replaced by FM .
5. Multiplicativity over tensor products;
FM (ρ1 ⊗ ρ2 , σ1 ⊗ σ2 ) = FM (ρ1 , σ1 )FM (ρ2 , σ2 ) (5.201)
for all ρ1 , σ1 ∈ D(A) and all ρ2 , σ2 ∈ D(B).

5.4.4 The Relation Between the Trace Distance and the Fidelity
The trace distance and the fidelity satisfy the following inequalities.

Theorem 5.4.8. Let ρ, σ ∈ D(A), F be the fidelity, and T the trace distance. Then,
q
1 − F (ρ, σ) ⩽ T (ρ, σ) ⩽ 1 − F (ρ, σ)2 . (5.202)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.4. DIVERGENCES THAT ARE METRICS 273

This relation reveals that if the fidelity is close to one then the trace distance is close to
zero, and if the fidelity is close to zero then the trace distance is close to one.
Proof. We first prove the upper bound. Let ψ AB and ϕAB be purifications or ρA and σ A ,
such that F (ρA , σ A ) = |⟨ψ AB |ϕAB ⟩|. Such purifications exists due to Uhlmann’s theorem.
We then have from monotonicity of the trace distance under partial trace

T ρA , σ A ⩽ T ψ AB , ϕAB

p
Exercise 5.4.4→ = 1 − |⟨ψ AB |ϕAB ⟩|2 (5.203)
q
= 1 − F (ρA , σ A )2 .

where the last equality follows from the definition of ψ AB and ϕAB .
To get the lower bound of (5.202), we start by observing that from Corollary 5.4.4 that
there exists a POVM {Λx }x∈[n] such that
X√
F (ρ, σ) = px qx where px := Tr[Λx ρ] , qx := Tr[Λx σ] . (5.204)
x∈[n]

√ √ √
Hence, from the equality 2 px qx = px + qx − ( px − qx )2 it follows
1 X √ √ 1X √ √
F (ρ, σ) = px + qx − ( px − qx )2 = 1 − ( px − qx )2 . (5.205)
2 2 x
x∈[n]

To bound the last term, observe that

1X √ √ 1X √ √ √ √
( px − qx )2 = | px − qx || px − qx |
2 2 x
x∈[n]

Theorem 5.4.4→ ⩽ T (ρ, σ)

Combining Eqs. (5.205,5.206) gives

F (ρ, σ) ⩾ 1 − T (ρ, σ) , (5.207)

which is equivalent to the lower bound of (5.202).

We say that two states ρ, σ ∈ D(A) are ε-close in trace distance if T (ρ, σ) ⩽ ε for some
ε > 0. Similarly, ρ and σ are ε-close in fidelity if F (ρ, σ) ⩾ 1 − ε. In the exercise below you
will show that the relation between the trace distance and the fidelity can be used to show
that the two notions of “ε-close” are essentially the same.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

274 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

Exercise 5.4.19.
1. Show that if two states ρ, σ ∈ D(A) are ε-close in trace distance then they are ε-close
in fidelity.
√
2. Show that if two states ρ, σ ∈ D(A) are ε-close in fidelity, then they are 2ε-close in
fidelity.
The relation between the trace distance and the fidelity can also be used to derive some
additional bounds on the trace distance. For example, consider a pure state ρAB whose
marginal (mixed) state ρA is ε-close to a pure state ψ A . Since ψ A is pure, this means that
the marginal state ρA is itself close to being pure, and this in turn means that the pure state
ρAB should be close to a produce state ψ A ⊗ ρB . We make this intuition rigorous in the
following lemma.

Lemma 5.4.2. Let ρ ∈ Pure(AB) be a pure state and let ψ ∈ Pure(A) be another
AB A A
pure state. If the marginal of ρ satisfies T ρ , ψ ⩽ ε then
√
T ρAB , ψ A ⊗ ρB ⩽ 2 2ε .

(5.208)

Proof. From the relation (5.202)

F (ρA , ψ A ) ⩾ 1 − T ρA , ψ A ⩾ 1 − ε .

(5.209)
From Uhlmann’s theorem there exists a pure state |ϕ⟩ ∈ B such that
F (ρA , ψ A ) = F (ρAB , ψ A ⊗ ϕB ) (5.210)
Hence, applying again (5.202) gives
q
AB
, ψ ⊗ ϕ ⩽ 1 − F (ρAB , ψ A ⊗ ϕB )2
A B

T ρ
p
= 1 − F (ρA , ψ A )2 (5.211)
q √
(5.209)→ ⩽ 1 − (1 − ε)2 ⩽ 2ε .
From the monotonicity of the trace distance we also have
√
T ρB , ϕB ⩽ T ρAB , ψ A ⊗ ϕB ⩽ 2ε .

(5.212)
Hence, from the triangle inequality of the trace distance we get
T ρAB , ψ A ⊗ ρB ⩽ T ρAB , ψ A ⊗ ϕB + T ψ A ⊗ ϕB , ψ A ⊗ ρB

= T ρAB , ψ A ⊗ ϕB + T ϕB , ρB

(5.213)
√ √ √
⩽ 2ε + 2ε = 2 2ε .
This completes the proof.
Exercise 5.4.20. Using the same notations as in the theorem above, suppose F (ρA , ψ A ) ⩾
1 − ε. What is the best lower bound that you can find for F (ρAB , ψ A ⊗ ρB )?

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.5. DISTANCE BETWEEN SUB-NORMALIZED STATES 275

The Gentle Measurement Lemma

We now use the relationship between the trace distance and fidelity to prove an intuitive
phenomenon related to the disturbance of states under quantum measurements. Specifically,
let ρ ∈ D(A), and let Λ ∈ Eff(A) be one element of a POVM (i.e. Λ is an effect). From
Born’s rule, the quantity Tr[ρΛ] can be interpreted as the probability that the outcome
associated with Λ will occur. The gentle measurement lemma asserts that if this probability
is high then the post measurement state will not change much and remain very close to ρ.
This fundamental property lies at the heart of several applications of quantum information
and quantum resource theories.

Lemma 5.4.3. Let ε ∈ (0, 1), ρ ∈ D(A),

√ and Λ ∈ Eff(A). Suppose Tr[ρΛ] ⩾ 1 − ε.
Then, the post measurement state is ε-close to ρ; i.e.
√ √
1 √ Λρ Λ
∥ρ − ρ̃∥1 ⩽ ε where ρ̃ := . (5.214)
2 Tr[Λρ]

Proof. Let |ψ AR ⟩ be a purification of ρA , and observe that (see Exercise 5.4.21)

√
AR Λ ⊗ I R |ψ AR ⟩
|ψ̃ ⟩ := p (5.215)
Tr[ρΛ]
√
is a purification of ρ̃A . From Uhlmann’s theorem, and the fact that Λ ⩽ Λ for any effect
0 ⩽ Λ ⩽ I A we have
AR
√
⟨ψ | Λ ⊗ I R |ψ AR ⟩ ⟨ψ AR |Λ ⊗ I R |ψ AR ⟩
F (ρ, ρ̃) ⩾ ⟨ψ AR |ψ̃ AR ⟩ = p ⩾ p
Tr[ρΛ] Tr[ρΛ]
(5.216)
Tr[ρΛ] p √
=p = Tr[ρΛ] ⩾ 1 − ε .
Tr[ρΛ]
Therefore, from the relation (5.202) between the trace distance and the fidelity we conclude
that
1 p q √ √
∥ρ − ρ̃∥1 ⩽ 1 − F (ρ, ρ̃) ⩽ 1 − ( 1 − ε)2 = ε .
2 (5.217)
2

Exercise 5.4.21. Prove that the state defined in (5.215) is indeed a purification of ρ̃A .

5.5 Distance Between Sub-normalized States

Sub-normalized states are positive semidefinite matrices with a trace less than or equal to
one. These states, acting on a Hilbert space A, are denoted as:
n o
D⩽ (A) := ρ ∈ Pos(A) : Tr[ρ] ⩽ 1 . (5.218)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

276 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

Note that D(A) ⊂ D⩽ (A).

Although subnormalized states do not represent physical systems, they arise in quantum
measurements. For instance, consider a quantum state ρ ∈ D(A) and a quantum instrument
A→B
⊗ |x⟩⟨x|X ∈ CPTP(A → BX), as discussed in Sec. 3.5.10. The quantum
P
E = x∈[m] Ex
state E(ρ) is given by:
X
E A→BX (ρA ) = ExA→B (ρA ) ⊗ |x⟩⟨x|X , (5.219)
x∈[m]

where {Ex }x∈[m] are trace non-increasing CP maps, and {Ex (ρ)}x∈[m] are sub-normalized
states. These states provide both the information about the probability px := Tr[Ex (ρ)] that
an outcome x ∈ [m] occurs during the quantum measurement, and the post-measurement
state p1x Ex (ρ).
We previously saw that distance measures for normalized states are monotonic under
quantum channels and satisfy the DPI, a crucial aspect in applications. Quantum channels
map normalized states to normalized states, while trace non-increasing (TNI) CP maps,
including CPTP maps, take sub-normalized states to subnormalized states. Therefore, it’s
beneficial to define a distance measure for subnormalized states that is monotonic under
TNI-CP maps. We denote by CP⩽ (A → B) the set of all TNI maps in CP(A → B).
In section 5.3, we explored extending divergences from the classical to the quantum
domain. We now apply a similar approach to extend divergences from normalized to sub-
normalized states. However, unlike classical-to-quantum extensions, we will see that there
is no analogous ‘minimal’ extension from normalized to sub-normalized states. Thus, we
begin by introducing the maximal extension of a quantum divergence to the sub-normalized
domain.

The Maximal Extension

Definition 5.5.1. Let D be a quantum divergence. The maximal extension of D to
subnormalized states, D, is defined for any ρ, σ ∈ D⩽ (A) by

D(ρ∥σ) := inf D(ρ̃∥σ̃) (5.220)

where the infimum is over all systems R, and all density matrices ρ̃, σ̃ ∈ D(R) for
which there exists E ∈ CP⩽ (R → A) such that

ρ = E(ρ̃) and σ = E(σ̃) . (5.221)

Remark. Note that earlier we used the same notation D to denote the maximal extension of
a classical divergence to a quantum one. The bar symbol over D in our notations will always
indicate maximal extensions from one domain to a larger one, whereas the domain of a given
extension should be clear from the context.
The maximal extension D have the following three properties:

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.5. DISTANCE BETWEEN SUB-NORMALIZED STATES 277

1. Reduction. For any ρ, σ ∈ D(A) we have

D(ρ∥σ) = D(ρ∥σ) . (5.222)

2. Monotonicity. For any E ∈ CP⩽ (A → B) and any subnormalized states ρ, σ ∈ D⩽ (A)

D E(ρ)∥E(σ) ⩽ D(ρ∥σ) . (5.223)

3. Optimality. If f : D⩽ (A)×D⩽ (A) → R+ is a function that reduces to D when restricted

to normalized states, and behaves monotonically under TNI-CP maps (as in (5.223))
then
f (ρ∥σ) ⩽ D(ρ∥σ) ∀ ρ, σ ∈ D⩽ (A) . (5.224)

The last property justify the name for D as the maximal extension of D to subnormalized
states.
Exercise 5.5.1. Prove the three properties above using the same techniques that were used
to prove Theorem 6.4. Hint: As you follow the same lines used in the proof of Theorem 6.4,
replace ‘classical states’ with ‘normalized quantum states’ and ‘quantum states’ with ‘sub-
normalized quantum states’.
Remarkably, the maximal extension has the following closed formula.

Closed Formula
Theorem 5.5.1. Let D be a quantum divergence and D be its maximal extension to
sub-normalized states as defined in (5.220). For any pair of sub-normalized states
ρ, σ ∈ D⩽ (A)
D(ρ∥σ) = D ρ ⊕ (1 − Tr[ρ]) σ ⊕ (1 − Tr[σ]) . (5.225)

Proof. Let ρ̃, σ̃ ∈ D(R) and E ∈ CP⩽ (R → A) be a TNI-CP map such that ρ = E(ρ̃) and
σ = E(σ̃). Moreover, define N ∈ CPTP(R → A ⊕ C) as

N (ω) := E(ω) ⊕ Tr[ω] − Tr[E(ω)] ∀ ω ∈ L(A) . (5.226)
Then, since N is a CPTP map,

D(ρ̃∥σ̃) ⩾ D N (ρ̃)∥N (σ̃)

= D E(ρ̃) ⊕ (1 − Tr[E(ρ̃)]) E(σ̃) ⊕ (1 − Tr[E(σ̃)]) (5.227)

= D ρ ⊕ (1 − Tr[ρ]) σ ⊕ (1 − Tr[σ]) .

Since the above inequality holds for all such ρ̃, σ̃, E we must have that D(ρ̃∥σ̃) is no smaller
than the right-hand side on (5.225). To prove the converse inequality, take R = A ⊕ C,
ρ̃ = ρ ⊕ (1 − Tr[ρ]), σ̃ = σ ⊕ (1 − Tr[σ]), and E(·) := P (·)P † , where P is the projection to the
subspace A in R. Then, ρ = E(ρ̃) and σ = E(σ̃) so that by definition (see (5.220)) we must
have D(ρ∥σ) ⩽ D(ρ̃∥σ̃). Together with the previous inequality, this completes the proof of
the equality in (5.225).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

278 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

If D is a quantum divergence, its minimal extension D can be defined in analogy with (5.91)
as
D(ρ∥σ) := sup D(E(ρ)∥E(σ)) ∀ ρ, σ ∈ D⩽ (A) , (5.228)
where the supremum is over all systems R and all E ∈ CP⩽ (A → R) such that and E(ρ)
and E(σ) are normalized states. However, such E does not exists if either ρ or σ has trace
strictly smaller than one. Hence, the minimal extension of D must satisfy
D(ρ∥σ) = 0 (5.229)
for all subnormalized states ρ, σ ∈ D⩽ (A) with either Tr[ρ] < 1 or Tr[σ] < 1. Therefore,
this extension is rather pathological and not useful in applications. The following corollary
applies specifically to the case where D functions as both a divergence and a metric.

Corollary 5.5.1. Let D be a quantum divergence that is also a metric. Then, its
maximal extension to sub-normalized states, D, is also a metric.

Proof. We need to show that D is symmetric and satisfies the triangle inequality. To see it,
let ρ, σ, ω ∈ D⩽ (A). The symmetry of D follows from the symmetry of D:

D(ρ∥σ) = D ρ ⊕ (1 − Tr[ρ]) σ ⊕ 1 − Tr[σ]

D is symmetric→ = D σ ⊕ (1 − Tr[σ]) ρ ⊕ (1 − Tr[ρ]) (5.230)
= D(σ∥ρ) .

Similarly, the triangle inequality of D follows from the triangle inequality of D:

D(ρ∥σ) = D ρ ⊕ (1 − Tr[ρ]) σ ⊕ (1 − Tr[σ])

⩽ D ρ ⊕ 1 − Tr[ρ]) ω ⊕ (1 − Tr[ω]) + D ω ⊕ (1 − Tr[ω]) σ ⊕ (1 − Tr[σ])
= D(ρ∥ω) + D(ω∥σ) .
(5.231)

5.5.1 Examples
The Generalized Trace Distance
The maximal extension of the trace distance to subnormalized states is known as the genar-
alized trace distance. From Theorem 5.5.1 we get that the generalized trace distance has the
following simple form.

Corollary 5.5.2. The generalized trace distance can be expressed for any
ρ, σ ∈ D⩽ (A) as
1 1
T (ρ, σ) = ∥ρ − σ∥1 + Tr[ρ − σ] . (5.232)
2 2

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.5. DISTANCE BETWEEN SUB-NORMALIZED STATES 279

Exercise 5.5.2. Prove the corollary above using the formula given in Theorem 5.5.1 when
D is replaced by the trace distance.

Remark. The generalized trace distance can also be expressed as

n o
T (ρ, σ) = max Tr(ρ − σ)+ , Tr(ρ − σ)− . (5.233)

To see why, set a := Tr(ρ − σ)+ and b := Tr(ρ − σ)− , and use the relation max{a, b} =
1
2
a + b + |a − b| . The formula above is consistent with the fact that the trace distance is the
largest extension of the trace distance to sub-normalized states that satisfies the monotonicity
property under TNI-CP maps.

Exercise 5.5.3. Show that for any ρ, σ ∈ D⩽ (A) the function f (ρ, σ) = 12 ∥ρ − σ∥1 is also
an extension of D to sub-normalized states that satisfies the exact same properties satisfied
by T except for the optimality. Give an example showing that f (ρ, σ) can be strictly smaller
than T (ρ, σ).

Exercise 5.5.4. Show that for two sub-normalized pure states ψ, ϕ ∈ D⩽ (A) the generalized
trace distance can be expressed as
r
1 1
T (ψ, ϕ) = (Tr[ψ + ϕ])2 − |⟨ψ|ϕ⟩|2 + Tr[ψ − ϕ] . (5.234)
4 2
Hint: Use similar techniques as in Exercise 5.4.4.

The Gentle Operator Lemma

The gentle operator lemma is a variant of the gentle measurement lemma (Lemma 5.4.3) in
which the post-measurement state is taken to be unnormalized. Specifically, in the theorem
below√ we√let ε ∈ (0, 1), ρ ∈ D(A), Λ ∈ Eff(A), and consider the subnormalized state
ρ̃ := Λρ Λ.

1 √
Lemma 5.5.1. Using the notations above, if Tr[ρΛ] ⩾ 1 − ε then 2
∥ρ − ρ̃∥1 ⩽ ε.

Remark. The gentle operator lemma’s extension to cases where ρ̃ = GρG∗ , with G ∈ L(A, B)
being an arbitrary element of a generalized measurement and Λ ≡ G∗ G ∈ Eff(A), may seem
promising. However, without imposing further constraints on G, such an extension could
result in non-informative bounds. Consider, for instance, the scenario where G is a unitary
matrix, making Λ = G∗ G = I A . Here, Tr[Λρ] = 1 ⩾ 1 − ε for any ε ⩾ 0. But, if we
choose ρ = |0⟩⟨0| and a unitary G such that G|0⟩ = |1⟩, it follows that 12 ∥ρ − ρ̃∥1 = 1.
This example illustrates that extending the gentle operator lemma to encompass arbitrary
elements of generalized measurements is impractical without specific additional constraints
on G.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

280 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

√ √
Proof. Let |ψ AÃ ⟩ = ρ ⊗ I Ã |ΩAÃ ⟩ and |ψ̃ AÃ ⟩ := Λ ⊗ I Ã |ψ AÃ ⟩ be purifications ρA and ρ̃A ,
respectively. Denote by t := Tr[ρΛ] ⩾ 1 − ε and observe that
√
⟨ψ AÃ |ψ̃ AÃ ⟩ = ⟨ψ AÃ | Λ ⊗ I Ã |ψ AÃ ⟩
√
Λ⩾Λ −−−−→ ⩾ ⟨ψ AÃ |Λ ⊗ I Ã |ψ AÃ ⟩ (5.235)
√
|ψ AÃ ⟩ = ρ ⊗ I Ã |ΩAÃ ⟩ −−−−→ = Tr[ρΛ] = t .

Due to the DPI of the generalized trace distance it follows that

T (ρA , ρ̃A ) ⩽ T (ψ AÃ , ψ̃ AÃ ) . (5.236)

From (5.232) we get that both sides on the equation above contain the same term 12 Tr[ρ −
ρ̃] = 21 Tr[ψ − ψ̃] , so we can cancel it. Combining this with Exercise 5.5.4 we conclude that
r
1 1 2
∥ρ − ρ̃∥1 ⩽ Tr[ψ + ψ̃] − |⟨ψ|ψ̃⟩|2
2 4
r
1
(5.235)→ ⩽ (1 + t)2 − t2 (5.237)
4
√
Exercise 5.5.5→ ⩽ 1 − t
√
t ⩾ 1 − ε −−−−→ ⩽ ε.

This completes the proof.

q √
Exercise 5.5.5. Show that the function f (t) := 1
4
(1 + t)2 − t2 is smaller than 1 − t for
all t ∈ [0, 1].

Exercise 5.5.6. Show that one can use the gentle measurement lemma (Lemma 5.4.3) to
prove a slightly weaker version of the gentle operator lemma (Lemma 5.5.1). Use only
Lemma 5.4.3 and the triangle inequality of the trace norm to show that

1 √ 1
∥ρ − ρ̃∥1 ⩽ ε + ε . (5.238)
2 2
√ √
Λρ Λ
Hint: Set ρ′ := Tr[Λρ]
and write ρ − ρ̃ = ρ − ρ′ + ρ′ − ρ̃.

The Generalized Fidelity

We can use the techniques developed above to extend the fidelity to sub-normalized states.
However, since the fidelity achieves its maximum for identical states, the infimum of (5.220)
will be replaced with a supremum.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.5. DISTANCE BETWEEN SUB-NORMALIZED STATES 281

Definition 5.5.2. Let ρ, σ ∈ D⩽ (A) be two sub-normalized states. We define the

generalized fidelity F : D⩽ (A) × D⩽ (A) → R+ to be

F (ρ, σ) := sup F (ρ̃, σ̃) , (5.239)

where the supremum is over all systems R, and all density matrices ρ̃, σ̃ ∈ D(R) for
which there exists E ∈ CP⩽ (R → A) with the property that ρ = E(ρ̃) and σ = E(σ̃).

Since 1 − F (ρ, σ) is a quantum divergence we can use Theorem 5.5.1 to get a closed
formula for the generalized fidelity.

Corollary 5.5.3. The generalized fidelity can be expressed as

√ √ p
F (ρ, σ) = ∥ ρ σ∥1 + (1 − Tr[ρ])(1 − Tr[σ]) ∀ ρ, σ ∈ D⩽ (A) . (5.240)

Remark. The formula (5.240) for the generalized fidelity reveals that we have F (ρ, σ) =
√ √
∥ ρ σ∥1 even if only one of the states is normalized. Note that for the trace distance
T (ρ, σ) = T (ρ, σ) only if both states are normalized.
The generalized fidelity has the following properties:

1. Reduction. For any ρ, σ ∈ D(A) we have F (ρ, σ) = F (ρ, σ) .

2. Monotonicity. For every E ∈ CP⩽ (A → B) and every sub-normalized states ρ, σ ∈

D⩽ (A) we have F E(ρ), E(σ) ⩽ F (ρ, σ) .

3. Faithfulness. For every ρ, σ ∈ D⩽ (A), F (ρ, σ) = 1 ⇐⇒ ρ = σ.

4. Symmetry. For any ρ, σ ∈ D⩽ (A), F (ρ, σ) = F (σ, ρ).

5. Optimality. If f : D⩽ (A)×D⩽ (A) → R+ is a function that reduces to F when restricted

to normalized states, and behaves monotonically under TNI-CP maps (as above) then
f (ρ, σ) ⩾ F (ρ, σ) for all ρ, σ ∈ D⩽ (A).

The last property above indicates that the generalized fidelity is the minimal extension of
the fidelity to sub-normalized states.

Exercise 5.5.7. Prove the above 5 properties of the generalized fidelity.

Exercise 5.5.8. Show that for two sub-normalized pure states ψ ∈ D⩽ (A) and ϕ ∈ D⩽ (A)
the generalized fidelity is give by
p
F (ψ, ϕ) = |⟨ψ|ϕ⟩| + (1 − ⟨ψ|ψ⟩)(1 − ⟨ϕ|ϕ⟩) . (5.241)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

282 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

Generalization of Uhlmann’s theorem

Theorem 5.5.2. Let ρ, σ ∈ D⩽ (A) be two subnormalized states, and let |ψ AB ⟩ and
|ϕAC ⟩ be two purifications of ρA and σ A , respectively. Suppose also that |B| ⩽ |C|.
Then,
F ρA , σ A = max F V B→C ψ AB , ϕAC

(5.242)
V B→C

where the maximum is over all CPTP maps V(·) := V (·)V ∗ , where V : B → C is an
isometry.

Proof. Note first that

Tr V B→C ψ AB = Tr ψ AB = Tr[ρA ] ,

(5.243)

and similarly Tr V B→C ϕAB = Tr[σ A ]. Therefore, it is sufficient to show that
√ √ p p
ρ σ 1
= max V B→C (ψ AB ) ϕAC . (5.244)
V B→C 1

Since V B→C ψ AB and ϕAC are rank one sub-normalized states we have (Exercise (5.5.9))
p p
V B→C (ψ AB ) ϕAC = ⟨ψ AB |I A ⊗ V ∗ |ϕAC ⟩ . (5.245)
1

Hence, the rest of the proof follows the exact same lines as in the proof of Uhlmann’s theorem
(Theorem 5.4.6). In particular, note that all the steps in (5.177) holds even if ρ and σ are
sub-normalized.
Exercise 5.5.9. Prove the equality in Eq. (5.245).

5.6 The Purified Distance

The trace distance and fidelity are both highly valuable in numerous applications. The trace
distance benefits from being a metric, especially due to its compliance with the triangle
inequality. Conversely, the fidelity’s advantage lies in its compatibility with Uhlmann’s the-
orem, allowing the simplification of expressions involving fidelity through the use of quantum
state purifications. Given the desirability of both these properties in quantum information
applications (which will be further explored in this book), it naturally leads to the question:
is there a distance measure that embodies both qualities? As we will discuss now, such a
measure does indeed exist.
For any two pure state ψ, ϕ ∈ Pure(A) the trace distance is given by (see Exercise 5.4.4)
p p
T (ψ, ϕ) = 1 − |⟨ψ|ϕ⟩|2 = 1 − F (ψ, ϕ)2 . (5.246)

Considering this close relationship between trace distance and fidelity when applied to pure
states, we will explore all possible extensions of the trace distance from pure states to mixed

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.6. THE PURIFIED DISTANCE 283

states. In this context, we define the purified distance as the maximal extension among all
such extensions.

Definition 5.6.1. Let T be the trace distance. The purified distance is defined for
all ρ, σ ∈ D(A) as
n o
P (ρ, σ) := inf T (ψ, ϕ) : ρ = E(ψ) , σ = E(ϕ) , E ∈ CPTP(R → A)
ψ,ϕ∈Pure(R)
(5.247)
where the infimum is also over all systems R.

Remarks:
1. Observe that the extension of the trace distance in the definition above is reminiscent to
the maximal quantum extension of the trace distance that was discussed in the previous
sections. Later on we will develop a framework to extend certain functions (specifically
resource monotones) from one domain to a larger one. This framework is very general
and all extensions discussed in this chapter (including the above extension of the trace
distance, i.e. the purified distance) are just specific applications of the framework.
2. We will see below that the purified distance has a closed formula. Historically, this
closed formula has been used as its definition. However, the definition above emphasize
its operational meaning as the largest mixed-state extension of the trace distance (see
Theorem 5.6.1 below).
3. The justification for the name “purified distance” will become clear from the properties
discussed below.
We start by showing that the purified distance is an optimal divergence.

Theorem 5.6.1. The purified distance is a quantum divergence that reduces to the
trace distance on pure states. Moreover, if D is another quantum divergence that
reduces to the trace distance on pure states then for any ρ, σ ∈ D(A) we have

D(ρ∥σ) ⩽ P (ρ, σ) . (5.248)

The proof follows very similar lines as in the proof of Theorem 6.4 and is left as an
exercise.
Exercise 5.6.1. Prove Theorem (5.6.1). Hint: Adopt the methodology used in Theorem 6.4
related to D. In this process, substitute each occurrence of a classical state on system X with
a pure state on system R.
The upcoming lemma demonstrates that the purified distance is derived from a purifica-
tion process, which justifies its name. We will utilize this lemma to derive a closed formula
for the purified distance.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

284 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

Lemma 5.6.1. Let P be the purified distance and T the trace distance. Then, for
all ρ, σ ∈ D(A)
P (ρA , σ A ) = inf T (ψ AB , ϕAB ) , (5.249)
ψ,ϕ

where the infimum is over all purifications of ρA and σ A .

Proof. Let ψ AB and ϕAB be purifications of ρA and σ A , respectively, and denote by E AB→A :=
TrB . By definition, E AB→A (ψ AB ) = ρA and E AB→A (ϕAB ) = σ A so that ψ AB and ϕAB satisfies
the conditions in (5.247) with R := AB. Therefore, P (ρA , σ A ) cannot be greater than the
right-hand side of (5.249). To get the other direction, recall the definition (5.247) and let
ρA = E(ψ R ) and σ A = E(ϕR ) for some ψ, ϕ ∈ Pure(R) and E ∈ CPTP(R → A). Let
R→AB
R→A A R
V ∈ CPTP(R → AB) be the
isometry purifying E . Therefore, ρ = Tr B V ψ
and σ A = TrB V R→AB ϕR . Finally, since the trace distance
is invariant under isometries,
AB := R→AB R AB := R→AB R
denoting by χ V ψ and φ V ϕ we get

T (ψ R , ϕR ) = T (χAB , φAB ) ⩾ inf

′ ′
T (ψ ′AB , ϕ′AB ) (5.250)
ψ ,ϕ

where the infimum is over all purifications ψ ′AB and ϕ′AB of ρA and σ A . Hence, since ψ R
and ϕR were arbitrary pure states that satisfy the conditions in (5.247), we conclude that
P (ρA , σ A ) is no smaller than the right-hand side of (5.249). This completes the proof.

Closed Formula
Theorem 5.6.2. Let P be the purified distance. Then, for all ρ, σ ∈ D(A)
p
P (ρ, σ) = 1 − F (ρ, σ)2 . (5.251)

Proof. From Lemma 5.6.1 we get by direct computation

P (ρ, σ) = min T (ψ, ϕ)

ψ,ϕ
p
Exercise 5.4.4→ = min 1 − |⟨ψ|ϕ⟩|2
ψ,ϕ
q (5.252)
= 1 − max |⟨ψ|ϕ⟩|2
ψ,ϕ
p
Uhlamnn’s Theorem→ = 1 − F (ρ, σ)2 .

This completes the proof.

The maximal extension of purified distance from density matrices to sub normalized
states follows trivially from Theorem 5.5.1. We therefore extend the definition of the purified
distance to subnormalized states in the following way.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

5.6. THE PURIFIED DISTANCE 285

The Purified Distance

Definition 5.6.2. Let ρ, σ ∈ D⩽ (A) be two sub-normalized states. The purified
distance is defined as p
P (ρ, σ) := 1 − F (ρ, σ)2 (5.253)
where F is the generalized fidelity as given in (5.240).

Remark. The purified distance on normalized states has been defined earlier as the maximal
extension of the trace distance from pure states to mixed states. Therefore, the purified
distance on subnormalized states can be viewed as the maximal extension of the trace dis-
tance from pure states to mixed subnormalized states. Moreover, observe that the purified
distance can also be expressed as:
p
P (ρ, σ) := 1 − F (ρ̃, σ̃)2 (5.254)
where ρ̃ := ρ ⊕ (1 − Tr[ρ]) and σ̃ := σ ⊕ (1 − Tr[σ]).
Finally, we show that the purified distance is a metric.

Theorem 5.6.3. The purified distance is a metric on the set of subnormalized states.

Proof. Since F (ρ, σ) ⩽ 1 the purified distance is non-negative. Since F (ρ, σ) = 1 if and only
if ρ = σ the purified distance P (ρ, σ) = 0 if and only if ρ = σ. Since F is symmetric also
P is symmetric. It is therefore left to show that the purified distance satisfies the triangle
inequality.
Let ρ, σ, ω ∈ D⩽ (A) and set ρ̃ := ρ ⊕ (1 − Tr[ρ]), σ̃ := σ ⊕ (1 − Tr[σ]), and ω̃ :=
ω ⊕ (1 − Tr[ω]). Moreover, let ψ, ϕ, φ ∈ D(B B̃) be the purifications of ρ̃, σ̃, and ω̃ such
that F (ρ̃, ω̃) = F (ψ, φ) and F (ω̃, σ̃) = F (φ, ϕ). Such purifications exist due to Uhlmann’s
theorem. Moreover, note that from the Uhlmann’s theorem we also have F (ρ̃, σ̃) ⩾ F (ψ, ϕ).
Hence, p p
P (ρ, ω) + P (ω, σ) = 1 − F (ρ̃, ω̃)2 + 1 − F (ω̃, σ̃)2
p p
= 1 − F (ψ, φ)2 + 1 − F (φ, ϕ)2
(5.134)→ = T (ψ, φ) + T (φ, ϕ)
Triangle inequality of T→ ⩾ T (ψ, ϕ) (5.255)
p
(5.134)→ = 1 − F (ψ, ϕ)2
p
F (ρ̃, σ̃) ⩾ F (ψ, ϕ) −−−−→ ⩾ 1 − F (ρ̃, σ̃)2
= P (ρ, σ)
This completes the proof.
Note that the purified distance is monotonic under TNI-CP maps. That is, for every
map E ∈ CP⩽ (A → B), and any two subnormalized states ρ, σ ∈ D⩽ (A) we have

P E(ρ), E(σ) ⩽ P (ρ, σ) . (5.256)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

286 CHAPTER 5. DIVERGENCES AND DISTANCE MEASURES

This follows trivially from Theorem 5.5.1 and the monotonicity property in (5.223) (or equiv-
alently from the monotonicity of the generalized fidelity). Moreover, note that from Theo-
rem 5.5.2 it follows that for any ρ, σ ∈ D⩽ (A) and any purification ψ AB of ρA , there exists
a purification ϕAB of σ A such that
P (ρA , σ A ) = P (ψ AB , ϕAB ) . (5.257)
We end this subsection by showing that the purified distance is bounded by the generalized
trace distance, T .

Theorem 5.6.4. Let ρ, σ ∈ D⩽ (A) be two subnormalized states. The generalized

trace distance and the purified distance satisfy the following inequalities:
q
T (ρ, σ) ⩽ P (ρ, σ) ⩽ 2T (ρ, σ) . (5.258)

Proof. Set ρ̃ = ρ ⊕ (1 − Trρ) and σ̃ = σ ⊕ (1 − Trσ) and observe

p
P (ρ, σ) = 1 − F (ρ̃, σ̃)2
(5.259)
(5.202)→ ⩾ T (ρ̃, σ̃) = T (ρ, σ) .
To get the other inequality observe that
P (ρ, σ)2 = 1 − F (ρ̃, σ̃)2
2
(5.202)→ ⩽ 1 − 1 − T (ρ̃, σ̃)
(5.260)
⩽ 2T (ρ̃, σ̃)
= 2T (ρ, σ) .
This completes the proof.
Exercise 5.6.2. Let ρ, σ ∈ D⩽ (A) be two subnormalized states. Let τ ∈ D⩽ (AB) be an
extension of ρA . That is, TrB [τ AB ] = ρA . Show that there exists an extension ω ∈ D⩽ (AB)
of σ A such that
P ρA , σ A = P τ AB , ω AB .

(5.261)

5.7 Notes and References

We followed the definition and basic properties of classical and quantum divergences as given
by [99] and [98]. The classical f -divergences goes back to the work of [190] followed by the
independent works of [55], [164], and [5]. The quantum version of the f -divergences are a
special case of Petz’ quasi-entropies defined by [178]. Extensive details on their properties,
applications in quantum information, and additional references can be found in the papers
by [120] and [119] (see also the appendix in [210] for a similar derivation of the closed formula
in Theorem 5.2.1). The maximal extension of the classical f -divergence (Theorem D.2.1) is
due to [160]. The trace distance and the fidelity of subnormalized states were introduced
in [208] and later on developed further in [98].

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

CHAPTER 6

Entropies and Relative Entropies

In Chapter 5, we introduced the concept of a divergence as a measure quantifying the distin-

guishability between probability vectors or quantum states, emphasizing that any measure
of distinguishability should satisfy the data processing inequality. This chapter builds upon
that foundation by integrating the principle of additivity. The additivity of certain functions
under the tensor product of states is a recurring theme in physics. For instance, entropy’s
additivity under tensor products is intimately linked to several characteristics of thermal
systems, including the second law of thermodynamics.
This chapter approaches the definition of entropies and relative entropies axiomatically.
This method is particularly beneficial, as it reveals a multitude of properties common to all
entropies and relative entropies. Furthermore, it distinguishes unique properties of certain
relative entropies from those that are universally applicable. For instance, we will discover
that the KL-divergence, introduced in Chapter 5, is the sole relative entropy characterized
by asymptotic continuity. This distinction underpins the significant role of KL divergence
in information theory.

6.1 Entropy
Entropy is pivotal in numerous fields, including statistical mechanics, thermodynamics, in-
formation theory, black hole physics, cosmology, chemistry, and even economics. This wide
range of applications has led to diverse interpretations of entropy. In thermodynamics, it’s
seen as a measure of energy dispersal at a specific temperature. In contrast, information the-
ory views it as a rate of compression. Other perspectives, explored extensively in literature,
link entropy to disorder, chaos, system randomness, and the concept of time’s arrow. These
varying attributes and contexts give rise to different measures of entropy, such as Gibbs
and Boltzmann entropy, Tsallis entropies, Rényi entropies, and von-Neumann and Shannon
entropies, along with other entropy functions like molar entropy, entropy of mixing, and loop
entropy.
The multifaceted nature of entropy calls for a systematic and unifying approach, where
entropy is defined rigorously and context-independently. This requires identifying common

287
288 CHAPTER 6. ENTROPIES AND RELATIVE ENTROPIES

characteristics across all forms of entropy. One such universal trait is uncertainty, whether
it’s about the state of a physical system or the output of a compression scheme. In various
contexts, this uncertainty also encompasses concepts like disorder and randomness. For
instance, uncertainty about a system’s state correlates with its disorder level.
In Chapter 4, especially in Sec. 4.1, we delved into the role of majorization in defining
uncertainty. We employed three different methodologies – axiomatic, constructive, and op-
erational – to determine that every measure of uncertainty should inherently be a Schur
concave function. Consequently, it is reasonable to anticipate that entropy functions will
exhibit monotonic behavior under majorization.
Besides uncertainty, entropy embodies other attributes. A second key feature, related to
the second law of thermodynamics – especially the Clausius and Kelvin-Planck statements
– involves cyclic processes where a system undergoes a thermodynamic transition while all
other systems, including the environment and heat baths, return to their original state.
Recent developments in quantum information’s approach to small-scale thermodynamics,
as referenced in [29], categorize these as catalytic processes. Consider a thermodynamical
evolution where a physical system A in state ρA transitions into system B in state σ B . The
encompassing thermal machine, including heat baths, environment, etc., can be represented
as an additional system C in state τ C . Thus, for cyclic processes, the thermodynamic
transition can be described as:

ρA ⊗ τ C → σ B ⊗ τ C . (6.1)

In this framework, the second law asserts not just that system A’s entropy is no greater
than that of system B, but also that this holds true only if the entropy of the combined state
ρA ⊗ τ C increases or remains unchanged in such a thermodynamic cyclic process where τ C
is preserved. If entropy is measured with an additive function (under tensor products), then
the entropy of ρA being no greater than that of σ B implies the same relationship between
ρA ⊗ τ C and σ B ⊗ τ C . Thus, we define entropies as additive measures of uncertainty.

6.1.1 Classical Entropy

In this section we study entropy in the classical domain. We first introduce the formal
definition of an entropy in terms of two axioms, and then provide further justification for
these axioms. We will consider a function

[
H: Prob(n) → R (6.2)
n∈N

that maps probability vectors in all finite dimensions to the real numbers.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

6.1. ENTROPY 289

Entropy
Definition 6.1.1. The function H as given in (6.2) is called an entropy if it is not
equal to the constant zero function and it satisfies the following two axioms:

1. Monotonicity under mixing. For every n, m ∈ N, and every p ∈ Prob(n)

and q ∈ Prob(m) that satisfy p ≻ q we have H(p) ⩽ H(q).

2. Additivity. For any n, m ∈ N, and any p ∈ Prob(n) and q ∈ Prob(m),

H(p ⊗ q) = H(p) + H(q) . (6.3)

The first axiom ensures that an entropy quantifies uncertainty. In Sec. 4.1 we arrived
at the definition of majorization from a game of chance, indicating that p ≻ q if q is more
uncertain than p. Note however that we extend here the definition of majorization to vectors
that are not necessarily of the same dimension. This can be done by adding zeros to the
vector with the smaller dimension to make the vectors in the same dimension. This means
in particular that for any p ∈ Prob(n), any entropy H satisfies H(p ⊕ 0) = H(p). Note also
that this axiom also implies that H is Schur concave.
The additivity axiom distinguishes entropy functions from arbitrary measures of uncer-
tainty. For example, in Sec. 4.1.3 we encounter several Schur concave functions, such as
the symmetric elementary functions (see (4.49)) that are in general not additive. Therefore,
such functions cannot be entropies. The additivity property is consistent with the extensiv-
ity property of entropy in thermodynamics, and particularly, the monotonicity of entropy
under cyclic thermodynamical processes. As mentioned above, in such cycles, all degrees of
freedom other than the degrees of freedom of the system remains intact at the end of the
cycle. Therefore, suppose the system at the beginning and end of the cycle is characterized
with some probability vectors p and q, respectively. If the initial state of the system was
described by p ⊗ r, where r corresponds to the remaining degrees of freedom, then at the
end of the cycle the system+environment are described by q ⊗ r (i.e. with the same r).
Since entropy should be monotonic under such cycle in which p ⊗ r ≻ q ⊗ r, it motivates
the additivity property of an entropy function so that it is monotonic under the trumping
relation. That is, the monotonicity under mixing can be strengthened using the additivity
property such that
p ≻∗ q ⇒ H(p) ⩽ H(q) . (6.4)

There are other arguments to motivate the additivity axiom that comes from information
theory and we will discuss them as we go along.
In the definition above we allow for the case that n = 1. In this trivial case, Prob(n) =
Prob(1) contains only the 1-dimensional vector (i.e. number) one. Observe that for any
p ∈ Prob(n) we get from the additivity axiom that H(p) = H(p ⊗ 1) = H(p) + H(1), so
that H(1) = 0. From the fact that for all x ∈ [n] 1 ≻ ex ≻ 1 (i.e. 1 ∼ ex ), where {ex }x∈[n]
is the standard (elementary) basis of Rn , we conclude that also H(ex ) = 0 for all x ∈ [n].
Moreover, since for every n ∈ N and every p ∈ Prob(n) we have ex ≻ p we get from the

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

290 CHAPTER 6. ENTROPIES AND RELATIVE ENTROPIES

monotonicity axiom that H(p) ⩾ H(ex ) = 0. That is, entropy functions cannot be negative.
In the definition of entropy above we assumed that the entropy H is not the zero function.
This means that there exists n ∈ N and p ∈ Prob(n) such that H(p) ̸= 0. Since entropy
cannot be negative, this means that H(p) > 0. On the other hand, for sufficiently large
⊗m
m ∈ N we have p ≻ u(2) (see Exercise 4.1.4) so that
1 ⊗m
H(u(2) ) = H u(2)
m (6.5)
⊗m 1
p≻ u (2)
−−−−→ ⩾ H(p) > 0 .
m
Therefore, all entropy functions take strictly positive values on u(2) := 12 (1, 1)T ∈ Prob(2).
It will be convenient to normalize all entropy functions such that
H u(2) = 1 .

(6.6)
Throughout the remainder of the book, we will focus exclusively on entropy functions that
are normalized as above.

Lemma 6.1.1. Let H be an entropy normalized as in (6.6), n ∈ N, and

u(n) := n1 (1, . . . , 1)T ∈ Prob(n). Then, for all p ∈ Prob(n) we have

H(p) ⩽ H(u(n) ) = log n . (6.7)

Proof. The inequality follow from the Schur concavity of H and the fact that p ≻ u(n) . To
prove the equality, define f : N → R, via f (n) := H(u(n) ). From the normalization (6.6)
we have f (2) = 1 and from the additivity f (2k ) = k for all k ∈ N. More generally, for any
m, n ∈ N the additivity gives
m
⊗m
f (nm ) = H u(n ) = H u(n) = mH u(n) = mf (n) .

(6.8)

Moreover, from the monotonicity property of H and the fact that u(n) ≻ u(n+1) we get that
f is monotonically non-decreasing. Using these properties of f we get for all k, m ∈ N
1 1
f (nm ) = f 2m log(n)

f (n) =
m m
1
f 2⌈m log(n)⌉

f is non-decreasing −−−−→ ⩽ (6.9)
m
1
= ⌈m log(n)⌉ .
m
Similarly, taking the floor instead of the ceiling above gives f (n) ⩾ m1 ⌊m log(n)⌋. In the
limit m → ∞ both of these bounds converge to log n. This concludes the proof.
Exercise 6.1.1. Show that any convex combination of entropies is itself an entropy. That
k
P
is, if {Hx }x=1 is a set of entropies and s ∈ Prob(k) then x∈[k] sx Hx is itself an entropy.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

6.1. ENTROPY 291

The Rényi Entropies

An important class of entropies is the class of Rényi entropies. The Rényi entropies are
defined for any n ∈ N, p ∈ Prob(n), and α ∈ [0, ∞], as
1 X
Hα (p) := log pαx , (6.10)
1−α
x∈[n]

where the cases α = 0, 1, ∞ are defined by the appropriate limits. That is, for α = 0 the
Rényi entropy is also known by the name the max entropy and is given by

Hmax (p) := H0 (p) := lim Hα (p) = log | supp(p)| (6.11)

α→0

where | supp(p)| is the number of non-zero components in p. For α = 1 the Rényi entropy
reduces to the Shannon entropy
X
H(p) = lim Hα (p) = − px log px . (6.12)
α→1
x∈[n]

Finally, for the case α = ∞ the Rényi entropy is also known by the name the min-entropy
and is given by
Hmin (p) := lim Hα (p) = − log max{px } . (6.13)
α→∞ x∈[n]

Exercise 6.1.2. Prove the three limits above.

The Rényi entropies are indeed entropies since they satisfy the three axioms of Defini-
tion 6.1.1. To show the monotonicity under mixing (i.e. Schur concavity) observe that for
x ̸= y ∈ [n] and α ∈ (0, ∞)

∂Hα (p) ∂Hα (p) α 1 α−1 α−1

(px − py ) − = (p x − p y ) p x − p y <0. (6.14)
∂px ∂py 1 − α ∥p∥αα
Hence, from the Schur’s test in (4.39) this implies that Hα are strictly Schur concave for
α ∈ (0, ∞). The Schur concavity of Hα for the cases α = 0 and α = ∞ follows by taking
the limits α → 0+ and α → ∞, respectively.
Exercise 6.1.3. Prove directly, without taking the limits on α, that Hmin and Hmax are
Schur concave.
Note that the Rényi entropy can be expressed as
α
Hα (p) := log ∥p∥α (6.15)
1−α
where ∥p∥α is the p-norm with p = α. Hence, for α > 1 the function ∥p∥α is convex and also
symmetric, so that ∥p∥α is in particular Schur convex. Combining this with the fact that
the log is monotonically increasing function and recalling our assumption that 1 − α < 0 this
provides an alternative proof that Hα is Schur concave for α > 1.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

292 CHAPTER 6. ENTROPIES AND RELATIVE ENTROPIES

Exercise 6.1.4.

1. Show that the Rényi entropy satisfies the additivity axiom of an entropy.

2. Show by direct calculation that Hα (u(n) ) = log n for all n ∈ N.

3. Show that for any fixed p ∈ Prob(n), Hα (p) is monotonically decreasing in α.

The following is a very interesting result proved in [165]. It essentially states that all the
entropy functions are Rényi entropies. We refer the reader to [165] for the proof as it goes
beyond the scope of this book.

Theorem 6.1.1. Let H be an entropy as defined in Definition 6.1.1. Then, H can be

expressed as a convex combination of the Rényi entropies.

6.1.2 Quantum Entropies

In the definition below we make use of the notation ρ ≻ σ to indicate that the eigenvalues of ρ
form a vector that majorizes the vector consisting of the eigenvalues of σ. With this notation
the extension of Definition 6.1.1 to the quantum domain is straightforward. Specifically, we
will consider a function [
H: D(A) → R (6.16)
A

that maps density matrices in all finite dimensions to the real numbers.

Quantum Entropy
Definition 6.1.2. Let H be as in (6.16) and suppose it is not equal to the constant
zero function. Then, H is called an entropy if it satisfies the following two axioms:

1. Monotonicity. For every ρ ∈ D(A) and σ ∈ D(B) that satisfy ρA ≻ σ B we

have H(ρA ) ⩽ H(σ B ).

2. Additivity. For every ρ ∈ D(A) and σ ∈ D(B),

H(ρA ⊗ σ B ) = H(ρA ) + H(σ B ) .

There is a one-to-one correspondence between quantum entropies and classical entropies.

Explicitly, note that the monotonicity property of entropies implies that they are invariant
under unitaries; that it, let H be an entropy function and let U ∈ U(A) be a unitary matrix.
Then,
H(ρ) = H (U ρU ∗ ) ∀ ρ ∈ D(A) . (6.17)
The above invariance property implies that H(ρ) depends only on the eigenvalues of ρ and
therefore the quantum entropy of ρ can be viewed as the classical entropy of the probability

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

6.1. ENTROPY 293

vector consisting of the eigenvalues of ρ. Therefore, any classical entropy Hclassical can be
extended to the quantum domain via (m := |A|)

Hquantum (ρ) := Hclassical (λ1 , . . . , λm ) ∀ρ ∈ D(A) , (6.18)

where {λx }x∈[m] are the eigenvalues of ρ. It is left as a simple exercise to show that Hquantum
is indeed a quantum entropy that satisfies the two axioms of the definition above.
As an example, consider the classical Rényi entropies as defined in (6.10). By replacing
the components {px }x∈[n] with the eigenvalues {λx }x∈[n] of ρ, we get the quantum version of
the Rényi entropies. For any α ∈ [0, ∞] they are given by

1 X 1
Hα (ρ) := log λαx = log Tr[ρα ] . (6.19)
1−α 1−α
x∈[n]

Similarly, from the classical case we get that the limits α = 0, 1, ∞ are given for all ρ ∈ D(A)
by:

1. The max-entropy (α = 0),

Hmax (ρ) = log Tr [Πρ ] (6.20)
where Πρ is the projector to the support of ρ.

2. The von-Neumann entropy (α = 1),

H(ρ) = −Tr[ρ log ρ] . (6.21)

3. The min-entropy (α = ∞),

Hmin (ρ) = − log ∥ρ∥∞ . (6.22)

Exercise 6.1.5. Let H be an entropy function, and let E ∈ CPTP(A → B) be a random

isometry channel; i.e.
X
E A→B = px VxA→B (6.23)
x∈[n]

where {px }x∈[n] is a probability distribution and each Vx ∈ CPTP(A → B) is an isometry.

Show that

H(ρ) ⩽ H E(ρ) ∀ ρ ∈ D(A) . (6.24)

Exercise 6.1.6. Let H be a quantum entropy, and let U ∈ CPTP(A → A) be a unital

channel. Show that

H(ρ) ⩽ H U(ρ) ∀ ρ ∈ D(A) . (6.25)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

294 CHAPTER 6. ENTROPIES AND RELATIVE ENTROPIES

6.2 Classical Relative Entropies

Let [n o
D: Prob(n) × Prob(n) → R ∪ {∞} (6.26)
n∈N

be a function acting on pairs of probability vectors in all finite dimensions.

Relative Entropy
Definition 6.2.1. The function D in (6.26) is called a relative entropy if it satisfies
the following three conditions:

1. Data Processing Inequality. See (5.2).

2. Additivity. For any n, n′ ∈ N, p, q ∈ Prob(n), and p′ , q′ ∈ Prob(n′ ),

D(p ⊗ p′ ∥q ⊗ q′ ) = D(p∥q) + D(p′ ∥q′ ) . (6.27)

3. Normalization. D(e1 ∥u(2) ) = 1, where e1 = (1, 0)T and u(2) := ( 12 , 12 )T .

In the definition above we did not include the normalization condition D(1∥1) = 0 (as
satisfied by all normalized divergences) since it follows from the additivity property. Indeed,
let p, q ∈ Prob(n) and observe that

D(p∥q) = D(p ⊗ 1∥q ⊗ 1) = D(p∥q) + D(1∥1) . (6.28)

Therefore, we must have D(1∥1) = 0. Hence, relative entropies are divergences.

The normalization condition D(e1 |u(2) ) = 1 eliminates the possibility of scaling a relative
entropy by a fixed constant. Notably, this normalization condition is asymmetric with respect
to its two inputs. Indeed, any alternative normalization would disrupt this symmetry, as we
cannot select a symmetric input (bearing in mind that all divergences satisfy D(p|p) = 0 for
every p ∈ Prob(n)).

Exercise 6.2.1. Show that any relative entropy D must satisfy

D(e1 ∥e2 ) = ∞ , (6.29)

where e1 = (1, 0)T and e2 = (0, 1)T . Hint: Show first that D(e1 ∥e2 ) ⩾ 1 and then use the
additivity property together with the DPI to show that for any n ∈ N, D(e1 ∥e2 ) ⩾ n.

6.2.1 The Rényi Relative Entropies

In his seminal paper, Rényi introduced a one parameter family of relative entropies. We
already encountered them in Sec. 4.4, and here we will study some of their properties.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

6.2. CLASSICAL RELATIVE ENTROPIES 295

The Rényi Relative Entropies

Definition 6.2.2. The Rényi relative entropy of order α ∈ [0, ∞] is defined for all
p, q ∈ Prob(n) as
(
1
log x∈[n] pαx qx1−α if supp(p) ⊆ supp(q) or α ∈ [0, 1) and p · q ̸= 0.
P
α−1
Dα (p∥q) :=
∞ otherwise.

The cases α = 0, 1, ∞ are defined in terms of the appropriate limits.

Remark. We use the convention that if qx = px = 0 then pαx qx1−α = 0 even for α > 1. With
this convention, the conditions that supp(p) ⊆ supp(q) or α ∈ [0, 1) and p · q ̸= 0 are
1
precisely the conditions that the expression α−1 log x∈[n] pαx qx1−α is well defined. Otherwise,
P
if it is not well defined the Rényi relative entropy is set to be infinity.
For α = 0 the relative Rényi entropy is called the min-relative entropy. It is given by
X
Dmin (p∥q) := lim+ Dα (p∥q) = − log qx . (6.30)
α→0
x∈supp(p)

Observe that if Dmin (p∥q) ̸= 0 then p must have zero components. For α = ∞ the relative
Rényi entropy is called the max-relative entropy. It is given by

px
Dmax (p∥q) := lim Dα (p∥q) = log max . (6.31)
α→∞ x∈[n] qx
Finally, for α = 1 the Rényi relative entropy is called the Kullback–Leibler divergence, or in
short the KL-divergence. It is given by
X
D(p∥q) := lim Dα (p∥q) = px (log px − log qx ) , (6.32)
α→1
x∈[n]

with the convention 0 log 0 = 0.

Exercise 6.2.2. Prove the three limits above.
Exercise 6.2.3. Show that the Rényi entropies are related to the Rényi relative entropies
via
Hα (p) = log n − Dα (p∥u(n) ) ∀ p ∈ Prob(n) . (6.33)
Exercise 6.2.4. Let p, q ∈ Prob(n) and r ∈ R+ .
1. Show that rq − p ⩾ 0 (entry-wise) if and only if r ⩾ 2Dmax (p∥q) .

2. Show that p − rq ⩾ 0 if and only if r ⩽ 2−Dmax (q∥p) .

3. Why in the first inequality above r must be greater than one, whereas in the second it
must be smaller than one?

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

296 CHAPTER 6. ENTROPIES AND RELATIVE ENTROPIES

Exercise 6.2.5. Show that all the Rényi entropies satisfy the additivity and normalization
properties of a relative entropy as given in Definition 6.2.1.
(2)
Exercise 6.2.6. A relative entropy D is said to be pathological if D(u(2) ∥e1 ) = 0, where
(2)
u(2) := ( 12 , 21 )T is the uniform distribution in Prob(2), and e1 := (1, 0)T . Show that Dmin is
pathological and use it to show that Dpath which is defined for any n ∈ N and p, q ∈ Prob(n)
as
Dpath (p∥q) := Dmin (p∥q) + Dmin (q∥p) , (6.34)
is a relative entropy.
We now show that in addition to the additivity and normalization, the Rényi relative
entropies also satisfies the DPI.

Theorem 6.2.1. The Rényi relative entropy of any order α ∈ [0, ∞] is a relative
entropy; i.e. it satisfies the axioms of DPI, additivity, and normalization, as given in
Definition 6.2.1.

Proof. The additivity and normalization you proved in Exercise 6.2.5. To show DPI recall
the α divergences given in (5.28) by
 
1 X
Dfα (p∥q) =  pαx qx1−α − 1 (6.35)
α(α − 1)
x∈[n]

r −r α
Since the above expression has been derived from the convex function fα (r) = α(α−1) it is
an f -divergence and in particular satisfies the DPI. For α = 1 the above expression coincide
with the Rényi relative entropy of that order (i.e. the KL-divergence), so in this case the
DPI property follows. For α ̸= 1 we denote by
X
Qα (p∥q) := pαx qx1−α . (6.36)
x∈[n]

Observe that from the DPI of Dfα we get that for α > 1 the function Qα (p∥q) is monotoni-
cally non-decreasing under maps (p, q) 7→ (Ep, Eq) with E ∈ STOCH(m, n), and for α < 1
it is monotonically non-increasing under such maps. Since the Rényi relative entropy can be
1
expressed as Dα (p∥q) := α−1 log Qα (p∥q) and the log is monotonically increasing function,
we conclude that Dα (p∥q) satisfies the DPI.
Exercise 6.2.7. Show that for any α ∈ (0, 1) and p, q ∈ Prob(n)
α
Dα (p∥q) = D1−α (q∥p) . (6.37)
1−α
Exercise 6.2.8. Show that if p, q ∈ Prob(n) and ρ, σ ∈ D(X) are two diagonal density
matrices with diagonals p and q, respectively, then
1
Dα (p∥q) = log Tr[ρα σ 1−α ] . (6.38)
α−1

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

6.2. CLASSICAL RELATIVE ENTROPIES 297

6.2.2 Properties of Relative Entropies

The three axioms of a relative entropy provides it with enough structure that yields many
interesting properties. In this subsection we explore some of these key properties that holds
for all relative entropies.

Theorem 6.2.2. Let D be a relative entropy, and let {ex }x∈[n] be the standard
(elementary) basis of Rn . Then, for any p ∈ Prob(n) and x ∈ [n] we have

D(ex ∥p) = − log px , (6.39)

where px is the x-component of p.

The proof of the theorem above is based on the following lemma by Erdös.

Erdös Theorem
Lemma 6.2.1. Let g : N → R be a function from the set of natural numbers to the
real line. Suppose g is non-decreasing and is additive; i.e. g(mn) = g(n) + g(m) for all
n, m ∈ N. Then, there exists a constant c ∈ R such that g(n) = c log(n) for all n ∈ N.

g(n)
Proof. Suppose by contradiction that log n
is not a constant. Therefore, there exists m, n ∈ N
such that
g(m) g(n)
> . (6.40)
log m log n
g(m) g(n)
Denote by a := log m
and b := log n
and observe that a > b or equivalently ab < 1. Multiplying
log n
both sides of the inequality ab < 1 by the positive number log m
k, where k is any integer,
gives
b log n log n
k< k. (6.41)
a log m log m
Therefore, for sufficiently large k ∈ N there must exists an integer between the above two
numbers; i.e. there exists ℓ ∈ N such that
b log n log n
k<ℓ< k. (6.42)
a log m log m
The above two inequalities can be expressed as

k log n > ℓ log m and kb log n < ℓa log m . (6.43)

The first equation above implies that the integers nk and mℓ satisfies nk > mℓ , and the
second equation implies that kg(n) < ℓg(m) . From the additivity of g we therefore conclude
that g(nk ) < g(mℓ ). To summarize, we got that

nk > mℓ and g(nk ) < g(mℓ ) . (6.44)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

298 CHAPTER 6. ENTROPIES AND RELATIVE ENTROPIES

These two inequalities are in contradiction with the assumption that g is non-decreasing.
This completes the proof.
Proof of Theorem 6.2.2. Since divergences (and therefore relative entropies) are invariant
under permutations (see (5.11)), it is sufficient to show that D(e1 ∥p) = − log p1 . We first
show that for any vector r = (r1 , . . . , rn )T ∈ Prob(n) with r1 = 0 we have

(e1 , p) ∼ e1 , p1 e1 + (1 − p1 )r (6.45)

where the symbol ∼ corresponds to the equivalence relation under relative majorization.
Define E := [e1 , r, . . . , r] ∈ STOCH(n, n) to be the column stochastic matrix whose first
column is e1 and the remaining n − 1 columns equal r. We then have

(e1 , p) ≻ (Ee1 , Ep) = e1 , p1 e1 + (1 − p1 )r . (6.46)
1
Conversely, define p̃ := 1−p 1
(0, p2 , . . . , pn )T ∈ Prob(n) and Ẽ := [e1 , p̃, . . . , p̃] ∈ STOCH(n, n).
Then,
e1 , p1 e1 + (1 − p1 )r ≻ Ẽe1 , p1 Ẽe1 + (1 − p1 )Ẽr = (e1 , p) . (6.47)
Combining (6.46) and (6.47) gives (6.45).
The relation in (6.45) implies that

D(e1 ∥p) = D e1 p1 e1 + (1 − p1 )r (6.48)

so that the function f (p1 ) := D(e1 ∥p) is independent on p2 , . . . , pn . Moreover, the function
f : [0, 1] → R+ ∪ {∞} has the following two properties:
1. f is monotonically non-increasing.

2. f is additive; i.e. f (st) = f (s) + f (t) for all s, t ∈ [0, 1].

The first property of f follows from the fact that for any E ∈ STOCH(n, n) with Ee1 = e1
we get Ep = p1 e1 + (1 − p1 )s for some s ∈ Prob(n). This means that the first component of
q := Ep satisfies q1 ⩾ p1 . From the DPI we have

f (p1 ) := D(e1 ∥p) ⩾ D(Ee1 ∥Ep) = D(e1 ∥q) = f (q1 ) , (6.49)

so that f is monotonically non-increasing. The additivity of f follows trivially from the

additivity of D under tensor products (see exercise below).
Define now the function g : N → R+ ∪ {∞} via the relation g(m) := f m1 . This function

is non-decreasing and additive. Therefore, from Erdös theorem there exists a constant c ∈ R
such that g(m) = c log m for all m ∈ N. The condition g(2) = f (1/2) = D(e1 ∥u(2) ) = 1 gives
c = 1. Therefore, for any m ∈ N we have f (1/m) = log m. Furthermore, observe that for
any k ⩽ m the additivity of f gives

k 1 k 1
log k + f =f +f =f = log m . (6.50)
m k m m

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

6.2. CLASSICAL RELATIVE ENTROPIES 299

k

Hence, f m = log m − log k = − log(k/m). Hence, f (r) = − log r for all rationals in [0, 1].
To prove that this relation holds for any r ∈ [0, 1] (possibly irrational), let {sk } and {tk } be
two sequences of rational numbers in [0, 1] both with limit r and with sk ⩽ r ⩽ tk for all
k ∈ N. Then, the monotonicity property of f gives for any k ∈ N
− log sk = f (sk ) ⩾ f (r) ⩾ f (tk ) = − log tk . (6.51)
Taking the limit k → ∞ on both sides and using the continuity of the log function gives
f (r) = − log r. This completes the proof.
The theorem above has the following interesting corollary that justify the terminology of
the max and min relative entropies.

Corollary 6.2.1. Let D be a relative entropy. Then for any n ∈ N and any
p, q ∈ Prob(n),
Dmin (p∥q) ⩽ D(p∥q) ⩽ Dmax (p∥q) . (6.52)

Proof. In Theorem 4.3.1 we proved that for any p, q ∈ Prob(n)

(1, 0)T , (λmax , 1 − λmax )T ≻ (p, q) ≻ (1, 0)T , (λmin , 1 − λmin )T

(6.53)
where λmax := minx∈[n] qx /px = 2−Dmax (p∥q) and λmin := x∈supp(p) qx = 2−Dmin (p∥q) . Hence,
P
the monotonicity of D under relative majorization gives

−Dmax (p∥q) −Dmax (p∥q) T
D(p∥q) ⩽ D e1 2 ,1 − 2
(6.54)
Theorem 6.2.2→ = Dmax (p∥q) ,
and T
D(p∥q) ⩾ D e1 2−Dmin (p∥q) , 1 − 2−Dmin (p∥q)
(6.55)
Theorem 6.2.2→ = Dmin (p∥q) .
This completes the proof.
Exercise 6.2.9. Use the corollary above to show that any relative entropy D satisfies for all
p, q ∈ Prob(n):
• D(p∥q) < ∞ if supp(p) ⊆ supp(q), and
• D(p∥q) = ∞ if p · q = 0.
Relative entropies are not metrics but we show now that they satisfy the following variant
of the triangle inequality.

Triangle Inequality
Theorem 6.2.3. Let D be a relative entropy. Then, for any p, q, r ∈ Prob(n)

D(p∥q) ⩽ D(p∥r) + Dmax (r∥q) . (6.56)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

300 CHAPTER 6. ENTROPIES AND RELATIVE ENTROPIES

Proof. The key idea of the proof is to denote by ε := 2−Dmax (r∥q) and observe that the right
hand side of (6.56) can be expressed as
D(p∥r) + Dmax (r∥q) = D(p∥r) − log ε
Theorem 6.2.2→ = D(p∥r) + D e1 (ε, 1 − ε)T

(6.57)
Additivity→ = D p ⊗ e1 r ⊗ (ε, 1 − ε)T .

Therefore, to prove the inequality (6.56) it is sufficient to show that

p ⊗ e1 r ⊗ (ε, 1 − ε)T ≻ (p∥q) .

(6.58)
To prove the above relation, we define a channel E ∈ STOCH(n, 2n) that acts on Prob(n) ⊗
Prob(2) as an identity upon detecting e1 := (1, 0)T in the second register, and produces a
constant output t ∈ Prob(n) (to be determined shortly) upon detecting e2 := (0, 1)T in the
second register; explicitly, for any s ∈ Prob(n)
E (s ⊗ e1 ) = s , E (s ⊗ e2 ) = t . (6.59)

By definition, E (p ⊗ e1 ) = p. Our objective is to choose t such that E r ⊗ (ε, 1 − ε)T = q.
From the definition of E we have
E r ⊗ (ε, 1 − ε)T = εE (r ⊗ e1 ) + (1 − ε)E (r ⊗ e2 )

(6.60)
= εr + (1 − ε)t .
Therefore, it is left to show that there exists t ∈ Prob(n) such that εr + (1 − ε)t = q. This
means that we need to show that the vector
q − εr
t := , (6.61)
1−ε
has non-negative components. Indeed, from the definition of ε we have q − εr ⩾ 0. This
completes the proof.
Exercise 6.2.10. Use the theorem above to show that the function
n o
DT (p∥q) := max Dmax (p∥q), Dmax (q∥p) (6.62)
is a divergence that is also a metric. Show that it satisfies DT (p∥q) < ∞ if and only if
supp(p) = supp(q). This metric is known as the Thompson’s metric.
The theorem above can be expressed in terms of the Thompson’s metric.

Corollary 6.2.2. Any relative entropy D satisfies for all p, q, q′ ∈ Prob(n)

D(p∥q) − D(p∥q′ ) ⩽ DT (q∥q′ ) . (6.63)

Exercise 6.2.11. Let p, q, r ∈ Prob(n) be 3 probability vectors. Show that

D(r∥p) + D(r∥q) ⩾ D1/2 (p∥q) (6.64)
where D is the KL-divergence and D1/2 is the Rényi relative entropy of order α = 1/2.
Moreover, show that for any choice of p and q there exists r that achieves the equality.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

6.2. CLASSICAL RELATIVE ENTROPIES 301

Continuity of Relative Entropies

The corollary above demonstrates that any relative entropy is continuous in the second
argument when q, q′ > 0. We will now use it to explore the continuity of a relative entropy
in Prob(n)×Prob(n). We say that D is upper semi-continuous at (p, q) ∈ Prob(n)×Prob(n)
if for any sequence {(pk , qk )}k∈N ⊂ Prob(n) × Prob(n) that converges to (p, q) we have
lim sup D(pk ∥qk ) ⩽ D(p∥q) . (6.65)
k→∞

We say that D is lower semi-continuous at (p, q) ∈ Prob(n) × Prob(n) if

lim inf D(pk ∥qk ) ⩾ D(p∥q) . (6.66)
k→∞

Note that D is both lower and upper semi-continuous at (p, q) if and only if it is continuous
at (p, q).
Exercise 6.2.12.
1. Show that the max relative entropy, Dmax (p∥q), is not upper semi-continuous when q
does not have full support. Hint: Consider the sequences {pk }k∈N and {qk }k∈N with
T T
pk := k1 , 1 − k1 and qk := k12 , 1 − k12 .
2. Show that Dpath (p∥q) := Dmin (p∥q) + Dmin (q∥p) is not lower semi-continuous at the
boundary of Prob(n) × Prob(n).
From the exercise above it is clear that we cannot expect relative entropies to be con-
tinuous everywhere in Prob(n) × Prob(n). However, if we remove some of the points in the
boundary, we get the following continuity property.

Continuity of Relative Entropies

Theorem 6.2.4. Let D be a relative entropy. Then, D is upper semi-continuous at
any point in Prob(n) × Prob>0 (n), and is continuous at any point in
Prob>0 (n) × Prob>0 (n).

Proof. Let (pk , qk )k∈N be a sequence in Prob(n) × Prob(n) that converges to (p, q). For
any k ∈ N, define a column stochastic matrix Ek ∈ STOCH(n, n) by its action on every
s ∈ Prob(n) as
Ek s := pk + 2−Dmax (p∥pk ) (s − p) . (6.67)
Since limk→∞ pk = p, for sufficiently large k we have 2−Dmax (p∥pk ) > 0 (see the exercise
below). Moreover, from the definition of Dmax we get that pk − 2−Dmax (p∥pk ) p ⩾ 0 so that Ek
is indeed a column stochastic matrix. Using these notations, we derive the following from
the DPI:
D(p∥q) ⩾ D(Ek p∥Ek q)
(6.67) → = D(pk ∥Ek q) (6.68)
Theorem 6.2.3 → ⩾ D(pk ∥qk ) − Dmax (Ek q∥qk )

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

302 CHAPTER 6. ENTROPIES AND RELATIVE ENTROPIES

Moving the term involving Dmax to the other side and taking the supremum limit on both
sides gives
lim sup D(pk ∥qk ) ⩽ D(p∥q) + lim sup Dmax (Ek q∥qk ) . (6.69)
k→∞ k→∞
The second term on the right-hand side above vanishes since the vector
q̃k := Ek q = 2−Dmax (p∥pk ) q + pk − 2−Dmax (p∥pk ) p

(6.70)
has a limit limk→∞ q̃k = q so that
lim sup Dmax (q̃k ∥qk ) = 0 . (6.71)
k→∞

Note that we used indirectly the fact that q > 0, since for sufficiently large k we must
have qk > 0 so the limit above is indeed zero. This completes the proof that D is upper
semi-continuous on Prob(n) × Prob>0 (n).
We now prove the lower semi-continuity on Prob>0 (n) × Prob>0 (n). Note that since we
already proved upper semi continuity in this domain, this will imply that D is continuous on
Prob>0 (n) × Prob>0 (n). For any k ∈ N, we define Ek as before but with the role of pk and
p interchanged; i.e. Ek ∈ STOCH(n, n) is defined by its action on any s ∈ Prob(n) as
Ek s := p + 2−Dmax (pk ∥p) (s − pk ) . (6.72)
Note that for all k, 2−Dmax (pk ∥p) > 0 since we assume p > 0. Moreover, from the definition
of Dmax we have p − 2−Dmax (pk ∥p) pk ⩾ 0, so that Ek is indeed a column stochastic matrix.
With the above notations we get from the DPI
D(pk ∥qk ) ⩾ D(Ek pk ∥Ek qk )
(6.72) → = D(p∥Ek qk ) (6.73)
Theorem 6.2.3 → ⩾ D(p∥q) − Dmax (Ek qk ∥q)
Taking the infimum limit on both sides gives
lim inf D(pk ∥qk ) ⩾ D(p∥q) , (6.74)
k→∞

where we used the fact that

Ek qk = 2−Dmax (pk ∥p) qk + p − 2−Dmax (pk ∥p) pk

(6.75)
has a limit limk→∞ Ek qk = q so that
lim sup Dmax (Ek qk ∥q) = 0 . (6.76)
k→∞

This completes the proof.

Exercise 6.2.13.
1. Show that if {pk }k∈N is a sequences in Prob(n) that converges to p ∈ Prob(n) then for
sufficiently large k we have Dmax (p∥pk ) < ∞. Hint: Show that for sufficiently large k,
supp(p) ⊆ supp(pk ).
2. Prove the limits in (6.71) and (6.76).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

6.2. CLASSICAL RELATIVE ENTROPIES 303

Faithfulness of Relative Entropies

We have seen before that any divergence D has the property that D(p∥q) = 0 if p = q.
Faithfulness of a divergence refers to the property that this equality holds if and only if
p = q. The minimal relative entropy, Dmin , provides an example of a relative entropy that
is not faithful. Particularly, Dmin (p∥q) = 0 for any p with supp(q) ⊆ supp(p). However,
from the theorem below it follows that Dmin is a very unique relative entropy and almost all
relative entropies are faithful.

Faithfulness
Theorem 6.2.5. Let D be a relative entropy. The following statements are
equivalent:

1. D is not faithful.

2. D(p∥q) = 0 for all m ∈ N and all p, q ∈ Prob(m) with supp(p) = supp(q).

Proof. The direction 2 ⇒ 1 is trivial. We therefore prove that 1 ⇒ 2. Since D is not faithful
there exists p, q ∈ Prob(m) such that p ̸= q and D(p∥q) = 0. For any n ∈ N it follows from
⊗n ⊗n

the additivity property of D that also D p q = 0. Now, in Corollary 8.3.1 of the next
chapter we will see that for any s, t ∈ Prob>0 (2) and large enough n we have

(p⊗n , q⊗n ) ≻ (s, t) . (6.77)

Therefore,
0 = D p⊗n q⊗n ⩾ D(s∥t) .

(6.78)
We therefore conclude that D(s∥t) = 0 for all s, t ∈ Prob>0 (2). It is left to show that this
also holds in dimensions higher than two.
Indeed, let p, q ∈ Prob(m) with supp(p) = supp(q), and recall from (4.130) that there
exists s, t ∈ Prob>0 (2) such that (s, t) ≻ (p, q). We therefore get that

D(p∥q) ⩽ D (s∥t) . (6.79)

Since we already proved that D(s∥t) = 0 for all s, t ∈ Prob>0 (2) we conclude that D(p∥q) = 0
for all p and q with the same support. This completes the proof.

Continuity Implies Faithfulness

Corollary 6.2.3. Let D be a relative entropy, 2 ⩽ m ∈ N, and q ∈ Prob(m) be a
probability vector whose first component satisfies q1 ∈ (0, 1). If D is not faithful then
(m)
the function fq (p) := D(p∥q) is not lower semi-continuous at p = e1 .

Remark. Note that the corollary above in particular implies that relative entropies that are
continuous in the first argument must be faithful.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

304 CHAPTER 6. ENTROPIES AND RELATIVE ENTROPIES

Proof. Let {pk }k∈N be a sequence in Prob(m) such that supp(q) = supp(pk ) and pk → e1
and k → ∞. Such a sequence exists since q1 ∈ (0, 1). From the theorem above it follows
that D(pk ∥q) = 0 so that

lim D(pk ∥q) = 0 < − log(q1 ) = D(e1 ∥q) . (6.80)

k→∞

Therefore, fq (p) cannot be lower-semi-continuous at p = e1 .

6.2.3 Bijection Between Entropies and Relative Entropies

The Rényi entropies are related to the Rényi relative entropies via (see Exercise (6.2.3))

Hα (p) = log n − Dα (p∥u(n) ) ∀ p ∈ Prob(n) . (6.81)

More generally, every relative entropy D can be used to define an entropy H via

H(p) := log n − D(p∥u(n) ) ∀ p ∈ Prob(n) . (6.82)

Exercise 6.2.14. Show that if D is a relative entropy then H as defined in (6.82) satisfies
the normalization and additivity axioms of an entropy.
To show that H as defined in (6.82) is indeed an entropy, we need to prove the monotonic-
ity property (in addition to the properties you proved in the exercise above). Recall that if
p, q ∈ Prob(n) and p ≻ q then there exists a doubly stochastic matrix D ∈ STOCH(n, n)
such that q = Dp. Therefore, in this case we get that

H(q) = H(Dp) = log n − D(Dp∥u(n) )

D is doubly-stochastic → = log n − D(Dp∥Du(n) )
(6.83)
DPI → ⩾ log n − D(p∥u(n) )
= H(p) .

That is, H satisfies the monotonicity property of an entropy if the two vectors have the same
dimension. If p ∈ Prob(n) and q ∈ Prob(m) have different dimensions (i.e. n ̸= m) then
the relation p ≻ q is equivalent to a majorization relation between two vectors with the
same dimension max{n, m} in which one of the vectors is padded with zeros to make the
dimensions equal. Therefore, to show that H above satisfies the monotonicity property of
an entropy, it is left to show that it is invariant under embedding; i.e. H(p ⊕ 0) = H(p) for
all p ∈ Prob(n). For this purpose, note that
(n) (n)
H(e1 ) = log n − D(e1 ∥u(n) ) = log n − log n = 0 , (6.84)

where we used Theorem 6.2.2. Therefore, from the additivity property that you proved in
the exercise above it follows that for any p ∈ Prob(n)

(n+1) (n)
H(p) = H p ⊗ e1 = H (p ⊕ 0) ⊗ e1 = H (p ⊕ 0) . (6.85)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

6.2. CLASSICAL RELATIVE ENTROPIES 305

Hence, H satisfies the monotonicity property of an entropy, and when combined with the
exercise above we conclude that Eq. (6.82) demonstrates that for any relative entropy there
is a corresponding entropy. Remarkably, the next theorem shows that the converse is also
true.

One-To-One Correspondence
Theorem 6.2.6. There exists a bijection f with inverse f−1 mapping between
relative entropies that are continuous in the second argument and entropies.

Proof. For any relative entropy D we define

f(D) := H (6.86)
where H is defined as in (6.82). Since we already established that H is an entropy it follows
that f is indeed a mapping between relative entropies and entropies. We therefore need to
show that when the domain of f is restricted to relative entropies that are continuous in the
second arguments then it has an inverse.
Let H be an entropy and define the function
[
DH : Prob(n) × (Prob>0 (n) ∩ Qn ) → R (6.87)
n∈N

via !
n
M
DH (p∥q) := log n − H px u(kx ) , (6.88)
x=1

for all n ∈ N, p ∈ Prob(n), and q ∈ Prob>0 (n) ∩ Qn with q = ( kk1 , . . . , kkn )T for kx ∈ N and
k = k1 + · · · + kn . Note that this construction is equivalent to the one given in Theorem 5.1.3
with g(p) := log n − H(p), although we do not assume here that H is continuous, and
therefore also g is not assumed to be continuous. Still, the same arguments given in the
proof of Theorem 5.1.3 imply that DH is a divergence in the restricted domain in which the
second argument has positive rational components. Moreover, since H is additive also DH as
defined above is additive under tensor products; i.e., DH is a relative entropy with a restricted
domain (see Exercise 6.2.15 below). This restricted domain will not change the arguments
leading to (6.123) and we therefore conclude that for any fixed n ∈ N and p ∈ Prob(n),
DH (p∥q) is continuous in q ∈ Prob>0 (n) ∩ Qn . Therefore, the continuous extension of DH
to Prob(n) × Prob(n) is well defined. We therefore define f−1 (H) := DH , where DH is the
continuous extension of the expression in (B.3.3) to the full domain Prob(n)×Prob(n). Note
that data-processing inequality and additivity are preserved under continuous extensions and
thus the resulting quantity DH is indeed a relative entropy, concluding the proof.
Exercise 6.2.15. Show that DH as defined in (B.3.3) is a relative entropy on the restricted
domain [
Prob(n) × (Prob>0 (n) ∩ Qn ) . (6.89)
n∈N
Explicitly, show that:

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

306 CHAPTER 6. ENTROPIES AND RELATIVE ENTROPIES

(2)
1. Normalization: DH (e1 ∥u(2) ) = 1.

2. DPI: for all p ∈ Prob(n), q ∈ Prob>0 (n) ∩ Qn , and E ∈ STOCH(m, n) ∩ Qm×n

>0 ,

DH (Ep∥Eq) ⩽ DH (p∥q) . (6.90)

Hint: look at the proof of Theorem 5.1.3.

3. Additivity: for all p1 , p2 ∈ Prob(n) and q1 , q2 ∈ Prob>0 (n) ∩ Qn

DH (p1 ⊗ p2 ∥q1 ⊗ q2 ) = DH (p1 ∥q1 ) + DH (p2 ∥q2 ) . (6.91)

In Exercise 6.2.12 you showed that Dpath (p∥q) := Dmin (p∥q) + Dmin (q∥p), provides a
counterexample to lower semi-continuity. Note that f (Dpath ) = Hmax , and Hmax is in turn
mapped to Dmin (p∥q) by its inverse f−1 ; i.e. f−1 (Hmax ) = Dmin so that the contribution
Dmin (q∥p) that is discontinuous in q is lost in the process. This underscores why the conti-
nuity of relative entropies in the second argument is essential for the existence of the bijection
f. Finally, observe that the correspondence between relative entropies and entropies allows
to import certain results from relative entropies to entropies.

Corollary 6.2.4. Let H be an entropy and n ∈ N. Then, H is continuous on

Prob>0 (n) and lower semi-continuous everywhere. Moreover, for all p ∈ Prob(n), we
have
Hmin (p) ⩽ H(p) ⩽ Hmax (p), (6.92)
where the min and max relative entropies have been defined in (6.13) and (6.11),
respectively.

Exercise 6.2.16. Prove Corollary 6.2.4.

We end this section by recalling Theorem 6.1.1 proved by [165]. This theorem states
that any entropy function can be expressed as a convex combination of Rényi entropies.
Combining this with the one-to-one correspondence between entropies and relative entropies
we get the following uniqueness result.

Uniqueness of Rényi Divergences

Corollary 6.2.5. Let D be a relative entropy as defined in Definition 6.2.1 and that
is continuous in its second argument. Then, D can be expressed as a convex
combination of the Rényi divergences.

Observe the crucial need for continuity in the second argument. This is highlighted by
the fact that Dpath (p|q) := Dmin (p|q) + Dmin (q|p) lacks continuity in its second argument
and is not a convex combination of Rényi divergences.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

6.3. QUANTUM RELATIVE ENTROPIES 307

6.3 Quantum Relative Entropies

Quantum relative entropies are defined analogously to their classical counterpart. Replacing
probability vectors with quantum states we will define a quantum divergence as a function
[n o
D: D(A) × D(A) → R ∪ {∞} . (6.93)
A

that is acting on pairs of quantum states in all finite dimensions |A| < ∞.

Quantum Relative Entropy

Definition 6.3.1. The function D as given in (6.93) is called a quantum relative
entropy if it satisfies the following three conditions:

1. DPI: For all ρ, σ ∈ D(A) and all E ∈ CPTP(A → B) D E(ρ)∥E(σ) ⩽ D(ρ∥σ).

2. Additivity: For any ρ, σ ∈ D(A) and ρ′ , σ ′ ∈ D(B)

D ρ ⊗ ρ′ ∥σ ⊗ σ ′ = D(ρ∥σ) + D(ρ′ ∥σ ′ ) .

(6.94)

3. Normalization: For D |0⟩⟨0| u(2) = 1, where |0⟩⟨0|, u(2) ∈ D(C2 ).

Remark. Quantum relative entropies can be viewed as generalizations of classical relative

entropies. Particularly, a quantum relative entropy reduces to a classical relative entropy
when the domain is restricted to diagonal states in a fixed basis, and the diagonals identified
with probability vectors.
From the remark above it follows that some of the properties of a quantum relative
entropies follow trivially from their classical counterpart.
Exercise 6.3.1. Let D be a quantum relative entropy.
1. Show that D(ρ∥ρ) = 0.
2. Show that if |0⟩ ∈ A is an eigenvector of ρ then

D |0⟩⟨0| ρ = − log⟨0|ρ|0⟩ . (6.95)

Hint; use Theorem 6.2.2.

Theorem 6.2.2 has an interesting consequence in the quantum domain.

Theorem 6.3.1. Let DPbe a quantum relative entropy, n ∈ N, p ∈ Prob(n),

ρ ∈ D(A), and σ AX = x∈[n] px σxA ⊗ |x⟩⟨x|X a cq-state. Then, for any x ∈ [n]

D ρA ⊗ |x⟩⟨x|X σ AX = D ρA σxA − log px .

(6.96)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

308 CHAPTER 6. ENTROPIES AND RELATIVE ENTROPIES

Proof. Fix x ∈ [n] and let E ∈ CPTP(AX → AX) be a quantum channel that acts as the
identity channel if the input on the classical system X is |x⟩⟨x|X , and otherwise acting as a
replacement channel on system A with output σxA . Explicitly, for all τ ∈ D(A) and w ∈ [n]
(
τ A ⊗ |x⟩⟨x|X if w = x
E AX→AX τ A ⊗ |w⟩⟨w|X :=

(6.97)
σxA ⊗ |w⟩⟨w|X otherwise
Then, denoting by pX := w∈[n] pw |w⟩⟨w|X , we get from the DPI of D
P
AX→AX AX
A X AX AX→AX A X

D ρ ⊗ |x⟩⟨x| σ ⩾D E ρ ⊗ |x⟩⟨x| E σ
(6.97) → = D ρA ⊗ |x⟩⟨x|X σxA ⊗ pX

(6.98)
Additivity → = D ρA σxA + D |x⟩⟨x|X pX

Theorem 6.2.2 → = D ρA σxA − log px .

Conversely, for all w ∈ [n], define F ∈ CPTP(AX → AX) as

(
τ A ⊗ |x⟩⟨x|X if w = x
F AX→AX τ A ⊗ |w⟩⟨w|X :=

(6.99)
σwA ⊗ |w⟩⟨w|X otherwise
We then get

D ρA ⊗ |x⟩⟨x|X σxA ⊗ pX ⩾ D F AX→AX ρA ⊗ |x⟩⟨x|X F AX→AX σxA ⊗ pX

(6.100)
(6.99) → = D ρA ⊗ |x⟩⟨x|X σ AX .

The combination of the above equation with (6.98) concludes the proof.
Exercise 6.3.2. Let D be a quantum relative entropy, ρ, σ∈ D(A), ω ∈ D(B), and t ∈ [0, 1].
tσ Z
In addition, let Z be a |A|×|B| complex matrix such that   is a density matrix
∗
Z (1 − t)ω
in D(A ⊕ B). Show that
   
ρ 0 tσ Z
D     ⩾ D(ρ∥σ) − log t , (6.101)
0 0 Z ∗ (1 − t)ω
with equality if Z = 0.
Exercise 6.3.3. Let D be a quantum relative entropy and u ∈ D(A) be the maximally mixed
state.
1. Show that
H(ρA ) := log |A| − D(ρA ∥uA ) ∀ ρ ∈ D(A) , (6.102)
is a quantum entropy.
2. Show that if D is jointly convex then H, as defined above, is concave.
Before we discuss additional properties of quantum relative entropies, we first consider
an example of a family of relative entropies that generalizes the Rényi relative entropies.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

6.3. QUANTUM RELATIVE ENTROPIES 309

6.3.1 The Petz Quantum Rényi Divergence

The first generalization of the Rényi divergences that we P consider here is perhaps the
most straightforward one, in the sense that the expression x pαx qx1−α is simply replaced
by Tr [ρα σ 1−α ] which looks most reminiscent to its classical counterpart.

The Petz Quantum Rényi Divergence

Definition 6.3.2. For any α ∈ [0, 2] and ρ, σ ∈ D(A) the Petz quantum Renyi
divergence is defined as
(
1
α−1
log Tr [ρα σ 1−α ] if supp(ρ) ⊆ supp(σ) or α < 1 and ρσ ̸= 0
Dα (ρ∥σ) :=
∞ otherwise

The cases α = 0, 1 are defined by appropriate limits.

Remark. If supp(ρ) ⊆ supp(σ) the trace in the definition above is strictly positive for all
α ∈ [0, ∞]. Also, if α < 1 and ρσ ̸= 0 (i.e. ρ and σ are not orthogonal) then also ρα σ 1−α ̸= 0
and we have in this case Tr [ρα σ 1−α ] > 0. In all other cases, the trace in the definition above
is either zero or not well defined. One can also extend the definition to α > 2 however we
will see below that the DPI only holds for α ∈ [0, 2].
Exercise 6.3.4. Prove all the statements in the remark above (except for the very last one
about α > 2).
Exercise 6.3.5. Let ρ, σ ∈ D(A) and consider their spectral decomposition as given in (5.67).
Set m := |A|, and let pXY , qXY ∈ Prob(m2 ) be the probability vectors whose components are
{px |⟨ax |by ⟩|2 }x,y∈[m] and {qy |⟨ax |by ⟩|2 }x,y∈[m] , respectively (cf. (5.68)). Show that
Dα (ρ∥σ) = Dα pXY qXY .

(6.103)
where the right-hand side is the classical Rényi divergence between pXY and qXY .
The Petz-Rényi divergence satisfies all the properties of a relative entropy. The normal-
ization and additivity properties you will prove in the exercise below, and we now prove the
data processing inequality.

Theorem 6.3.2. The Petz quantum α-Rényi divergence is a relative entropy for any
α ∈ [0, 2].

Proof. In (5.89) we proved that the quantum α-divergence,

1
Dfqα (ρ∥σ) = Tr ρα σ 1−α − 1

(6.104)
α(α − 1)
is a divergence for α ∈ [0, 2]. Therefore, for α ∈ [0, 1) the expression Tr [ρα σ 1−α ] is monoton-
ically increasing under mappings (ρ, σ) 7→ (E(ρ), E(σ)) with E ∈ CPTP(A → B). Similarly,

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

310 CHAPTER 6. ENTROPIES AND RELATIVE ENTROPIES

for α ∈ (1, 2] the expression Tr [ρα σ 1−α ] is monotonically decreasing under such mappings.
Therefore, for any 1 ̸= α ∈ [0, 2] the Petz quantum Rényi α-divergence satisfies the DPI.
The DPI for the case α = 1 has been proven in (5.87).

Exercise 6.3.6. Show that for any α ⩾ 0, the Petz quantum Rényi entropy Dα satisfies the
normalization and additivity properties of a relative entropy.

The calculation of the limits α → 0, 1 of the Petz quantum Rényi divergence is a bit
more subtle than the classical case. For this purpose, we will use the expression (6.103) in
Exercise 6.3.5. For the limit α → 0 observe that

lim+ Dα (ρ∥σ) = lim+ Dα pXY qXY

(6.105)
α→0 α→0
X
= − log qy |⟨ux |vy ⟩|2 (6.106)
x,y∈[m]
px |⟨ux |vy ⟩|2 >0
X
= − log qy |⟨ux |vy ⟩|2 (6.107)
x,y∈[m]
px >0

= − log Tr[Πρ σ] , (6.108)

where Πρ is the projection to the support of ρ. The quantity above is also known as the min
quantum relative entropy and is denoted by

The Min Quantum Relative Entropy

(
− log Tr[Πρ σ] if ρσ ̸= 0
Dmin (ρ∥σ) := (6.109)
∞ otherwise.

Note that the quantum min relative entropy reduces to the classical min relative entropy
when the states are classical (i.e. diagonal).
For the limit α → 1 we use again (6.103) to get

lim Dα (ρ∥σ) = lim Dα pXY qXY = D pXY qXY

α→1 α→1

X
2 px
= px |⟨ux |vy ⟩| log (6.110)
x,y
qy
= Tr[ρ log ρ] − Tr[ρ log σ] = D(ρ∥σ) ,

where D(ρ∥σ) is the Umegaki relative entropy.

Exercise 6.3.7. Prove the last two lines in the equation above; particularly, show that
P 2 px
x,y px |⟨ux |vy ⟩| log qy = Tr[ρ log ρ] − Tr[ρ log σ].

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

6.3. QUANTUM RELATIVE ENTROPIES 311

Exercise 6.3.8 (Quasi-Convexity). Show that for any α ∈ [0, 2], ρ, ω0 , ω1 ∈ D(A), and
t ∈ [0, 1] we have
n o
Dα ρ tω0 + (1 − t)ω1 ⩽ max Dα (ρ∥ω0 ), Dα (ρ∥ω1 ) . (6.111)

Similar to the definition of the min quantum relative entropy, we can extend the max
relative entropy to the quantum domain.

The Max Quantum Relative Entropy

The max quantum relative entropy is defined for all ρ, σ ∈ D(A) as

Dmax (ρ∥σ) := log min t ∈ R : tσ ⩾ ρ (6.112)

for the case that supp(ρ) ⊆ supp(σ) and otherwise it is set to ∞.

Exercise 6.3.9. Show that:

1. The max quantum relative entropy is indeed a relative entropy.

2. Dmax (ρ∥σ) reduces to the classical max relative entropy when ρ and σ commutes.
1 1
3. For the case that supp(ρ) ⊆ supp(σ) Dmax (ρ∥σ) = log ∥σ − 2 ρσ − 2 ∥∞ . Hint; conjugate
1 1
both sides of tσ ⩾ ρ by σ − 2 (·)σ − 2 .

4. Dmax (ρ∥σ) = limα→∞ Dα (ρ∥σ) if ρ and σ commutes, and give an example for which
Dmax (ρ∥σ) ̸= limα→∞ Dα (ρ∥σ). Here Dα refers to the same formula as the Petz quan-
tum Rényi divergence but with α > 2.

6.3.2 Basic Properties

In this subsection we will see that several of the properties of classical relative entropies carry
over to the quantum domain. However, the proofs of these properties have to be adjusted
to incorporate the larger domain.

Theorem 6.3.3. Let D be a relative entropy. Then for any quantum system A and
any ρ, σ, ω ∈ D(A):

1. Bounds:
Dmin (ρ∥σ) ⩽ D(ρ∥σ) ⩽ Dmax (ρ∥σ) . (6.113)

2. Triangle Inequality:

D(ρ∥σ) ⩽ D(ρ∥ω) + Dmax (ω∥σ) . (6.114)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

312 CHAPTER 6. ENTROPIES AND RELATIVE ENTROPIES

Proof. Let Πρ denotes the projector to the support of ρ. Define the POVM channel E ∈
CPTP(A → X) with |X| = 2 as
E(σ) := Tr σΠρ |0⟩⟨0|X + Tr σ (I − Πρ ) |1⟩⟨1|X .

(6.115)
Then,

D(ρ∥σ) ⩾ D E(ρ)∥E(σ)

(6.115) → = D |0⟩⟨0| Tr σΠρ |0⟩⟨0| + Tr σ (I − Πρ ) |1⟩⟨1|
(6.116)
Theorem 6.2.2 → = − log Tr σΠρ
= Dmin (ρ∥σ) .

For the second inequality, denote by t = 2Dmax (ρ∥σ) , and note that in particular, tσ ⩾ ρ (i.e.
tσ − ρ ⩾ 0). Define a channel E ∈ CPTP(X → A) with |X| = 2 by
tσ − ρ
E(|0⟩⟨0|) := ρ and E(|1⟩⟨1|) := . (6.117)
t−1
Furthermore, denote
1 t−1
qX := |0⟩⟨0|X + |1⟩⟨1|X , (6.118)
t t
and observe that E(qX ) = σ. Hence,

X X
D(ρ∥σ) = D E(|0⟩⟨0| ) E(q )
DPI → ⩽ D |0⟩⟨0|X qX

(6.119)
1
Theorem 6.2.2 → = − log = Dmax (ρ∥σ) .
t
This completes the proof of (6.113).
To prove the triangle inequality (6.114), note first that for |A| = 1 the statement is trivial
so we can assume |A| ⩾ 2. Let ε := 2−Dmax (ω∥σ) ∈ (0, 1), and observe that σ ⩾ εω so that
the matrix τ := (σ − εω)/(1 − ε) is a density matrix satisfying
σ = εω + (1 − ε)τ . (6.120)
From the definition of ε we have
D(ρ∥ω) + Dmax (ω∥σ) = D(ρ∥ω) − log ε

Theorem 6.2.2→ = D(ρ∥ω) + D |0⟩⟨0| ε|0⟩⟨0| + (1 − ε)|1⟩⟨1|
(6.121)
Additivity → = D ρ ⊗ |0⟩⟨0| ω ⊗ ε|0⟩⟨0| + (1 − ε)|1⟩⟨1|
DPI → ⩾ D(ρ∥σ) ,
where in the last inequality we used the DPI property of D with a quantum channel that
acts as an identity upon measuring |0⟩⟨0| in the second register, and produces a constant
output τ upon measuring |1⟩⟨1| in the second register.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

6.3. QUANTUM RELATIVE ENTROPIES 313

Exercise 6.3.10. The quantum Thompson’s metric is defined for any ρ, σ ∈ Prob(n) by
n o
DT (ρ∥σ) := max Dmax (ρ∥σ), Dmax (σ∥ρ) . (6.122)

1. Prove that the quantum Thompson’s metric is both a quantum divergence and a metric
in D(A) × D(A).
2. Prove that any quantum relative entropy D satisfies for all ρ, σ, σ ′ ∈ D(A)

D(ρ∥σ) − D(ρ∥σ ′ ) ⩽ DT (σ∥σ ′ ) . (6.123)

The exercise above demonstrates that quantum relative entropies are continuous in their
second argument. One can also get a continuity property in the first argument.

Lemma 6.3.1. Let ρ, ρ′ , σ ∈ D(A) be quantum states. Then, we have

D(ρ∥σ) − D(ρ′ ∥σ) ⩽ min ′ Dmax ρ + s(σ − ρ′ ) σ

(6.124)
0⩽s⩽2−Dmax (ρ ∥ρ)
∥ρ − ρ′ ∥∞

⩽ log 1 + (6.125)
λmin (ρ′ )λmin (σ)

where the second inequality holds if σ > 0 and λmin (ρ′ ) > ∥ρ − ρ′ ∥∞ .

′
Proof. In somewhat of a variation of the previous theorem, fix 0 ⩽ s ⩽ 2−Dmax (ρ ∥ρ) and
′
denote by ε := 2−Dmax (ρ+s(σ−ρ )∥σ) . Then,
D(ρ′ ∥σ) + Dmax ρ + s(σ − ρ′ ) σ = D(ρ′ ∥σ) − log ε

′
Theorem 6.2.2→ = D(ρ ∥σ) + D |0⟩⟨0| ε|0⟩⟨0| + (1 − ε)|1⟩⟨1|
(6.126)
Additivity→ = D ρ′ ⊗ |0⟩⟨0| σ ⊗ ε|0⟩⟨0| + (1 − ε)|1⟩⟨1|
DPI→ ⩾ D N (ρ′ ) εN (σ) + (1 − ε)ω

where in the last inequality we used the DPI with a channel that acts as some channel
N ∈ CPTP(A → A) when measuring |0⟩⟨0| in the second register and outputs some state
ω ∈ D(A) when measuring |1⟩⟨1|. In other words, the inequality above holds for all N ∈
CPTP(A → A) and all ω ∈ D(A). It is therefore left to show that there exists such N and
ω that satisfy N (ρ′ ) = ρ and εN (σ) + (1 − ε)ω = σ. The latter implies that we can define
ω to be
σ − εN (σ)
ω := . (6.127)
1−ε
Note that we need to choose N such that σ − εN (σ) ⩾ 0 so that ω ∈ D(A). We take
N ∈ CPTP(A → A) to be a measurement-prepare channel of the form

N (η) := sη + (1 − s)τ ∀η ∈ L(A) , (6.128)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

314 CHAPTER 6. ENTROPIES AND RELATIVE ENTROPIES

where we want to choose τ such that both N (ρ′ ) = ρ and σ − εN (σ) ⩾ 0. The condition
N (ρ′ ) = ρ can be expressed as ρ = sρ′ + (1 − s)τ . Isolating τ we get that
ρ − sρ′
τ= . (6.129)
1−s
The above matrix is positive semidefinite if and only if ρ ⩾ sρ′ which hold since s ⩽
′
2−Dmax (ρ ∥ρ) . We therefore choose τ as above so that N (ρ′ ) = ρ. It is left to check that
σ − εN (σ) ⩾ 0. Indeed, since N (σ) := sσ + (1 − s)τ we have
σ − εN (σ) = (1 − εs)σ − ε(1 − s)τ
(6.129)→ = (1 − εs)σ − ε(ρ − sρ′ )
(6.130)
= σ − ε ρ + s(σ − ρ′ )

By definition of ε→ ⩾ 0 .
′
To summarize, we showed that for any 0 ⩽ s ⩽ 2−Dmax (ρ ∥ρ) we have

D(ρ′ ∥σ) + Dmax ρ + s(σ − ρ′ ) σ ⩾ D(ρ∥σ) .

(6.131)
′
Since the above equation holds for all s ⩽ 2−Dmax (ρ ∥ρ) we conclude that the inequality (6.124)
holds.
To prove the second inequality, observe first that the inequality (6.124) can be expressed
as n o
D(ρ∥σ) − D(ρ′ ∥σ) ⩽ log min r ⩾ 0 : (r − s)σ ⩾ ρ − sρ′ ⩾ 0 , s ⩾ 0 (6.132)

Since we assume now that µ := λmin (σ) > 0 we can take r = 1 + 1−s µ
. Note that for this
choice of r we have
σ
(r − s)σ = (1 − s)(1 + µ) ⩾ (1 − s)(1 + µ)I A ⩾ ρ − sρ′ (6.133)
µ
since ρ − sρ′ is a subnormalized state with trace 1 − s. Moreover, if λmin (ρ′ ) ⩾ ∥ρ − ρ′ ∥∞
′∥
then we can take s = 1 − ∥ρ−ρ
′
∞
λmin (ρ′ )
since in this case s ⩽ 2−Dmax (ρ ∥ρ) (or equivalently ρ ⩾ sρ′ ,
see Exercise 6.3.11). We therefore get for these choices of r and s
∥ρ − ρ′ ∥∞

′
D(ρ∥σ) − D(ρ ∥σ) ⩽ log r = log 1 + . (6.134)
λmin (ρ′ )λmin (σ)
This completes the proof.
′
Exercise 6.3.11. Show that if λmin (ρ′ ) ⩾ ∥ρ − ρ′ ∥∞ > 0 then ρ ⩾ sρ′ where s = 1 − ∥ρ−ρ ∥∞
λmin (ρ′ )
.

Exercise 6.3.12. Show that if ρ, σ ∈ D(A) and λmin (ρ) > ∥σ − ρ∥∞ then

∥σ − ρ∥∞
Dmax (ρ∥σ) ⩽ − log 1 − (6.135)
λmin (ρ)
Use this to get a bound on DT (σ∥σ ′ ) in (6.123).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

6.3. QUANTUM RELATIVE ENTROPIES 315

Continuity of Quantum Relative Entropies

Theorem 6.3.4. Let D be a quantum relative entropy. Then, D is upper
semi-continuous at any point in D(A) × D>0 (A), and is continuous at any point in
D>0 (A) × D>0 (A).

Proof. Let (ρk , σk )k∈N be a sequence in D(A)×D(A) that converges to (ρ, σ). For any k ∈ N,
define a quantum channel Ek ∈ CPTP(A → A) by its action on any ω ∈ D(A) as
Ek (ω) := ρk + 2−Dmax (ρ∥ρk ) (ω − ρ) . (6.136)
Note that for sufficiently large k, 2−Dmax (ρ∥ρk ) > 0 (see the exercise below). Moreover, observe
that ρk −2−Dmax (ρ∥ρk ) ρ ⩾ 0 so that Ek is indeed a quantum channel. With the above notations
we get from the DPI

D(ρ∥σ) ⩾ D Ek (ρ) Ek (σ)

(6.136) → = D ρk Ek (σ) (6.137)

(6.114) → ⩾ D(ρk ∥σk ) − Dmax Ek (σ) σk
Moving the term involving Dmax to the other side and taking the supremum limit on both
sides gives
lim sup D(ρk ∥σk ) ⩽ D(ρ∥σ) + lim sup Dmax Ek (σ) σk . (6.138)
k→∞ k→∞
The second term on the right-hand side above vanishes since the density matrix
σ̃k := Ek (σ) = 2−Dmax (ρ∥ρk ) σ + ρk − 2−Dmax (ρ∥ρk ) ρ

(6.139)
has a limit limk→∞ σ̃k = σ so that
lim sup Dmax (σ̃k ∥σk ) = 0 . (6.140)
k→∞

Note that we used indirectly the fact that σ > 0, since for sufficiently large k we must
have σk > 0 so the limit above is indeed zero. This completes the proof that D is upper
semi-continuous on D(A) × D>0 (A).
We now prove the lower semi-continuity on D>0 (A)×D>0 (A). Note that since we already
proved upper semi continuity in this domain, this will imply that D is continuous on D>0 (A)×
D>0 (A). For any k ∈ N, we define Ek as before but with the role of ρk and ρ interchanged;
i.e. Ek ∈ CPTP(A → A) is defined by its action on any ω ∈ D(A) as
Ek (ω) := ρ + 2−Dmax (ρk ∥ρ) (ω − ρk ) . (6.141)
Since we assume that ρ > 0 we get that 2−Dmax (ρk ∥ρ) > 0 for all k. Moreover, observe that
ρ − 2−Dmax (ρk ∥ρ) ρk ⩾ 0 so that Ek is indeed a quantum channel. With the above notations we
get from the DPI
D(ρk ∥σk ) ⩾ D Ek (ρk ) Ek (σk )

(6.141) → = D ρ Ek (σk ) (6.142)

(6.114) → ⩾ D(ρ∥σ) − Dmax Ek (σk ) σ

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

316 CHAPTER 6. ENTROPIES AND RELATIVE ENTROPIES

Taking the infimum limit on both sides gives

lim inf D(ρk ∥σk ) ⩾ D(ρ∥σ) , (6.143)

k→∞

where we used the fact that

Ek (σk ) = 2−Dmax (ρk ∥ρ) σk + ρ − 2−Dmax (ρk ∥ρ) ρk

(6.144)

has a limit limk→∞ Ek (σk ) = σ so that

lim sup Dmax (Ek (σk )∥σ) = 0 . (6.145)

k→∞

This completes the proof.

Exercise 6.3.13.

1. Show that if {ρk }k∈N is a sequences in D(A) that converges to ρ ∈ D(A) then for
sufficiently large k we have Dmax (ρ∥ρk ) < ∞. Hint: Show that for sufficiently large k,
supp(ρ) ⊆ supp(ρk ).

2. Prove the limits in (6.140) and (6.145).

6.4 Optimal Quantum Extensions of Relative Entropies

The minimal and maximal quantum extensions, D and D, of a classical divergence D are
the smallest and largest quantum divergences that reduce to D on classical states. We
encountered them in Sec. 5.3 particularly through Eqs. (5.91,5.92). However, the expressions
given in (5.91,5.92) for D and D are, in general, not additive under tensor products even
if the classical divergence D is additive. Therefore, in order to get the optimal extensions
of relative entropies, we will use regularization to make the quantum extensions at least
partially additive.
Suppose D is a classical relative entropy and define D and D as in (5.91); i.e.

D(ρ∥σ) := sup D E(ρ) E(σ) , (6.146)

D(ρ∥σ) := inf D(p∥q) : ρ = F(p), σ = F(q) , (6.147)

where the optimizations are over the classical system X, the channels E ∈ CPTP(A → X)
and F ∈ CPTP(X → A) as well as the diagonal density matrices p, q ∈ D(X). The
functions D and D are in general not additive even if the D is a classical relative entropy
(and therefore additive). However, in the following lemma we show that in this case D is
super-additive while D is sub-additive.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

6.4. OPTIMAL QUANTUM EXTENSIONS OF RELATIVE ENTROPIES 317

Lemma 6.4.1. Let D be a classical relative entropy, and let D and D be its maximal
and minimal quantum extensions as defined in (5.91). Then, for all ρ1 , σ1 ∈ D(A1 )
and ρ2 , σ2 ∈ D(A2 ) we have:

1. Super-Aditivity: D ρ1 ⊗ ρ2 σ1 ⊗ σ2 ⩾ D(ρ1 ∥σ1 ) + D(ρ2 ∥σ2 ).

2. Sub-Additivity: D ρ1 ⊗ ρ2 σ1 ⊗ σ2 ⩽ D(ρ1 ∥σ1 ) + D(ρ2 ∥σ2 ).

Proof. We will prove the sub-additivity property and leave it as an exercise to prove the
super-additivity using similar lines. By definition we have

D ρ1 ⊗ ρ2 σ1 ⊗ σ2 = sup D E(ρ1 ⊗ ρ2 ) E(σ1 ⊗ σ2 )
E∈CPTP(A1 A2 →X)

Restricting E = E1 ⊗ E2 → −−−−→ ⩾ sup D E1 (ρ1 ) ⊗ E2 (ρ2 ) E1 (σ1 ) ⊗ E2 (σ2 )
E1 ∈CPTP(A1 →X1 )
E2 ∈CPTP(A2 →X2 ) (6.148)

Additivity of D→ = sup D E1 (ρ1 ) E1 (σ1 ) + sup D E2 (ρ2 ) E2 (σ2 )
E1 E2
= D(ρ1 ∥σ1 ) + D(ρ2 ∥σ2 ) .

This completes the proof.

Exercise 6.4.1. Prove the sub-additivity of D.

Since D and D are not necessarily additive, we define their regularization as

1 reg 1
Dreg (ρ∥σ) := lim D ρ⊗n σ ⊗n D ρ⊗n σ ⊗n .

and D (ρ∥σ) := lim (6.149)
n→∞ n n→∞ n

In the Exercise 6.4.2 below you will show that the limits above exist and that in general
reg
Dreg (ρ∥σ) ⩾ D(ρ∥σ) and D (ρ∥σ) ⩽ D(ρ∥σ). Moreover, note that by definition, Dreg and
reg
D are at least partially additive in the sense that for any n ∈ N and any ρ, σ ∈ D(A)
reg reg
Dreg ρ⊗n σ ⊗n = nDreg ρ σ ρ⊗n σ ⊗n = nD

and D ρ σ . (6.150)
reg
It is an open problem to determine if Dreg and D are fully additive. We will see below
reg
that in many examples, Dreg and D turns out to be fully additive so that they are in fact
relative entropies. The following theorem shows that these functions remains optimal.

reg
Theorem 6.4.1. Let D be a classical relative entropy, and let Dreg and D be as
reg
above. Then, both Dreg and D are partially additive quantum divergences that
reduces to D on classical states. In addition, any other quantum relative entropy D′
that reduces to D on classical states, satisfies for all ρ, σ ∈ D(A)
reg
Dreg (ρ∥σ) ⩽ D′ (ρ∥σ) ⩽ D (ρ∥σ) . (6.151)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

318 CHAPTER 6. ENTROPIES AND RELATIVE ENTROPIES

reg
Remark. Observe that since in general Dreg (ρ∥σ) ⩾ D(ρ∥σ) and D (ρ∥σ) ⩽ D(ρ∥σ), the
bounds on D′ above are tighter than the bounds given in (5.90). We are able to get tighter
bounds since D is additive.
reg
Proof. We already saw that Dreg and D are partially additive quantum divergences that
reduces to D on classical states. It is therefore left to prove the inequality (6.151). From (5.90)
we have for all n ∈ N
D ρ⊗n σ ⊗n ⩽ D′ ρ⊗n σ ⊗n ⩽ D ρ⊗n σ ⊗n

(6.152)
Since D′ is additive under tensor product we get after dividing the equation above by n
1 1
D ρ⊗n σ ⊗n ⩽ D′ (ρ∥σ) ⩽ D ρ⊗n σ ⊗n .

(6.153)
n n
The proof is concluded by taking the limit n → ∞ in the equation above.
Exercise 6.4.2. Let ρ, σ ∈ D(A) and let D be a classical relative entropy with maximal and
minimal quantum extensions D and D. Denote by
an := D ρ⊗n σ ⊗n and bn := D ρ⊗n σ ⊗n .

(6.154)
1. Show that the sequences {an } and {bn } satisfies for all n, m ∈ N
an+m ⩽ an + am and bn+m ⩾ bn + bm . (6.155)

2. Use the inequalities above to show that

an na
n
o
lim = α := inf : n∈N
n→∞ n
n (6.156)
bn bn
lim = β := sup : n∈N .
n→∞ n n
Hint: Let ε > 0, choose k such that α + ε > akk , and observe that for any integers
n, m ∈ N that satisfies nk ⩽ m < (n + 1)k we have am ⩽ ank + am−nk ⩽ nak + c, where
c := max{aj }j∈[k] . Use this to bound lim supm→∞ amm .

6.4.1 The Minimal Quantum Extension

This section illuminates the remarkable aspect of Rényi relative entropies, specifically the
existence of a closed formula for the minimal quantum extension of the Rényi divergence. It
is noteworthy that if Dα represents the classical Rényi relative entropy of order α, then the
quantum extension, denoted as Dαreg (ρ∥σ), can be expressed as follows:
1
Dαreg (ρ∥σ) := lim Dα En (ρ⊗n ) En (σ ⊗n ) ,

sup (6.157)
n→∞ n E ∈CPTP(An →X)
n

where the supremum encompasses all dimensions of the classical system X.

To derive a single-letter closed formula for the expression above, a two-step approach is
required:

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

6.4. OPTIMAL QUANTUM EXTENSIONS OF RELATIVE ENTROPIES 319

1. First, identify a function En that approaches optimality as n → ∞.

2. Then, use this selected En to compute the limit as n → ∞, which will lead to the
desired closed formula.

This approach enables the development of a precise and concise formula representing the
minimal quantum extension for the Rényi divergence.
A natural guess for an optimal POVM channels En are the pinching channels discussed in
Sec. 3.5.12. Recall that for any ρ, σ ∈ D(A), and a pinching channel Pσ ∈ CPTP(A → A),
we have that Pσ (ρ) and σ commutes. Therefore, Pσ (ρ) and σ have a common eigenbasis
{|x⟩}x∈[m] (with m := |X| = |A|) that spans A. Let ∆ ∈ CPTP(A → X) be the completely
dephasing channel in this basis. Then, the channel ∆ ∈ CPTP(A → X) is a POVM channel
that we can take to be E1 . From Exercise 3.5.21 it follows that ∆(σ) = σ and ∆(ρ) = Pσ (ρ)
(see (3.231)).
In general, for any n ∈ N, we can choose En = ∆n , where ∆n ∈ CPTP(An → X n ) is the
completely dephasing channel in the common eigenbasis of Pσ⊗n (ρ⊗n ) and σ ⊗n . We will see
shortly that this choice is indeed optimal in the limit n → ∞.
Before we continue with the derivation of the closed formula, we first give a snapshot of
what one can expect the formula to be. With {|x⟩}x∈[m] being the common eigenbasis of
Pσ (ρ) and σ we get (cf. (3.231)) that
1 X
⟨x|ρ|x⟩α ⟨x|σ|x⟩1−α .

Dα Pσ (ρ) σ = Dα ∆ (ρ) σ = log (6.158)
α−1
x∈[m]

where we used the fact that each |x⟩ is a common eigenvector of both σ and Pσ (ρ). In
particular, for any λ ∈ R we have ⟨x|σ λ |x⟩ = ⟨x|σ|x⟩λ . Therefore, the term inside the sum
above can be expressed as
1−α 1−α
α
⟨x|ρ|x⟩α ⟨x|σ|x⟩1−α = ⟨x|σ 2α |x⟩⟨x|ρ|x⟩⟨x|σ 2α |x⟩
1−α 1−α
α (6.159)
1−α 1−α
|x⟩⟨x|σ 2α |x⟩ = σ 2α |x⟩ −−−−→ = ⟨x|σ 2α ρσ 2α |x⟩ .

We therefore conclude that

1 X 1−α 1−α
α
Dα Pσ (ρ) σ = log ⟨x|σ 2α ρσ 2α |x⟩ . (6.160)
α−1
x∈[m]

Since the function x 7→ xα is concave for α ∈ (0, 1) and convex for α ⩾ 1 it follows from the
Jensen’s inequality (B.31) that
1 X 1−α 1−α α
Dα Pσ (ρ) σ ⩽ log ⟨x| σ 2α ρσ 2α |x⟩
α−1
x∈[m] (6.161)
1 1−α 1−α α
= log Tr σ 2α ρσ 2α .
α−1

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

320 CHAPTER 6. ENTROPIES AND RELATIVE ENTROPIES

The expression on the right-hand side is known as the sandwiched Rényi relative entropy.
Remarkably, we will see below that the regularization of the left-hand side equals the right-
hand side in the equation above. For this purpose, it will be convenient to denote the trace
in the equation above as
1−α 1−α α
Q̃α (ρ∥σ) := Tr σ 2α ρσ 2α . (6.162)

Exercise 6.4.3. Show that for any isometry channel V ∈ CPTP(A → B), any ρ, σ ∈ D(A),
and any ω ∈ D(C),

Qα V(ρ) V(σ) = Qα (ρ∥σ) and Qα (ρ ⊗ ω∥σ ⊗ ω) = Qα (ρ∥σ) . (6.163)

The Sandwiched Rényi Relative Entropy

Definition 6.4.1. The sandwiched Rényi relative entropy of order α ∈ [0, ∞], is
defined on any quantum system A and ρ, σ ∈ D(A) as

1
if 12 ⩽ α < 1 and ρ ̸⊥ σ or ρ ≪ σ

 α−1 log Q̃α (ρ∥σ)

1
D̃α (ρ∥σ) = α−1 log Q̃1−α (σ∥ρ) if 0 ⩽ α < 12 and ρ ̸⊥ σ

∞ otherwise


The cases α = 0, 1, ∞ are understood in terms of limits.

We first show that D̃α is indeed a relative entropy. It’s additivity and normalization
properties are relatively easy to show and are left as an exercise.

Exercise 6.4.4. Show that the sandwiched Rényi relative entropy of order α ∈ [0, ∞] satisfies
the additivity and normalization properties of a quantum relative entropy.

Exercise 6.4.5. Show that for any ρ, σ ∈ D(A)

1−α 1−α α 1 1−α 1 α
Tr σ 2α ρσ 2α = Tr ρ 2 σ α ρ 2 (6.164)

Hint: Recall that for any complex matrix M , the matrices M M ∗ and M ∗ M have the same
non-zero eigenvalues.

Theorem 6.4.2. The sandwiched Rényi relative entropy of any order α ∈ [0, ∞] is a
quantum relative entropy; i.e., it satisfies the three relative entropy axioms of DPI,
additivity, and normalization.

Proof. Since D̃α (ρ∥σ) fulfills both additivity and normalization properties (as shown in Ex-
ercise 6.4.4), our task is to demonstrate its compliance with the DPI. For α > 1, the DPI of

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

6.4. OPTIMAL QUANTUM EXTENSIONS OF RELATIVE ENTROPIES 321

D̃α is derived from that of Q̃α . For α ∈ [ 12 , 1), it follows from the DPI of −Q̃α . Based on
Exercise 6.4.3 and Lemma 5.2.2, we know that if Q̃α is jointly convex for α > 1, then it sat-
isfies the DPI. Similarly, for α ∈ [ 12 , 1), if Q̃α is jointly concave, then −Q̃α satisfies the DPI.
Our objective is therefore to show that for α > 1, Q̃α is jointly convex, and for α ∈ [ 21 , 1),
Q̃α is jointly concave. The case α ∈ (0, 12 ] is effectively covered by the case α ∈ [ 12 , 1) when
we swap ρ with σ, thus it need not be considered separately.
Firstly, consider := α−1
−β α−β>α1 and define β 2α
. The proof’s central strategy is to decompose
the trace Tr (σ ρσ ] into two terms, one dependent only on ρ and the other solely on
σ. This decomposition allows us to separately assess the convexity in ρ and σ. To obtain this
decomposition, we utilize Young’s inequality (2.75), choosing M = σ −β ρσ −β , N = σ β ησ β ,
α 1
p = α, and q = α−1 = 2β , where η is an arbitrary positive semidefinite matrix in Pos(A).
With these choices, Tr[M N ] = Tr[ρη], leading to the inequality (cf. (2.75))
1 α α − 1 1
Tr [ρη] ⩽ Tr σ −β ρσ −β + Tr σ β ησ β 2β (6.165)
α α
α
Rearranging terms and recalling Q̃α (ρ∥σ) = Tr σ −β ρσ −β , we obtain
1
Q̃α (ρ∥σ) ⩾ αTr[ρη] − (α − 1)Tr σ β ησ β 2β . (6.166)
This inequality holds for all η ∈ Pos(A), with equality if M p = N q , which translates to
(Exercise (6.4.6))
α−1 −β
η = σ −β σ −β ρσ −β σ . (6.167)
Therefore, Q̃α (ρ∥σ) can be expressed as
n 1 o
Q̃α (ρ∥σ) = sup αTr[ρη] − (α − 1)Tr σ β ησ β 2β . (6.168)
η⩾0

With this expression, we can now analyze the convexity of each term independently.
A consequence of Lieb’s concavity theorem, given in Corollary B.6, establishes the con-
cavity of the function
1 1 1
2β1
σ 7→ Tr σ β ησ β 2β = Tr η 2 σ 2β η 2 , (6.169)
where we used the fact that LL∗ and L∗ L have the same non-zero eigenvalues, where L :=
1 1
σ β η 2 . Therefore, the term −(α − 1)Tr σ β ησ β 2β is convex in σ. Furthermore, the linearity
of αTr[ρη] in ρ ensures its convexity in ρ. As a result, for any p ∈ Prob(n) and two sets of
n density matrices in D(A), {ρx }x∈[n] and {σx }x∈[n] , it follows that
X n X 1 1 o
2β 12 2β
X X
Q̃α px ρ x px σx ⩽ sup α px Tr[ρx η] − (α − 1) px Tr η σx η
2
η⩾0
x∈[n] x∈[n] x∈[n] x∈[n]
n 1 1 o
2β 12 2β
X
⩽ px sup αTr[ρx η] − (α − 1)Tr η σx η
2
η⩾0
x∈[n]
X
(6.168)→ = px Q̃α (ρx ∥σx ) .
x∈[n]
(6.170)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

322 CHAPTER 6. ENTROPIES AND RELATIVE ENTROPIES

This proves the case for α > 1. For α ∈ 12 , 1 , we apply similar reasoning using the reverse

Young’s inequality (3.240). Using the same substitutions for M and N , we obtain (6.166)
but with the inequality reversed. Consequently, we have
n 1 o
Q̃α (ρ∥σ) = inf αTr[ρη] + (1 − α)Tr σ β ησ β 2β . (6.171)
η⩾0

Observe that β < 0 since α < 1. As with the previous case, the joint concavity of Q̃α follows
from the concavity of the function in (6.169), completing the proof.
Exercise 6.4.6. Using the same notations as in the proof above, show that M p = N q if and
only if η have the form given in (6.167).
Exercise 6.4.7. Show that for any ρ, σ ∈ D(A), the function α 7→ D̃α (ρ∥σ) is continuous
for all α ∈ [0, ∞].
We are now ready to prove the closed formula for the minimal quantum Rényi relative
entropy.

Single Letter Formula

Theorem 6.4.3. For any α ∈ [0, ∞], the regularized minimal quantum extension of
the Rényi relative entropy, Dα , is given by the sandwiched Rényi relative entropy of
order α. That is, for all α ∈ [0, ∞], quantum system A, and ρ, σ ∈ D(A), we have

Dreg
α (ρ∥σ) = D̃α (ρ∥σ) . (6.172)

Remark. Recall that a priori, Dregα is only known to be partially additive, however, the
theorem above implies that it is fully additive.
Proof. It is sufficient to prove the theorem for all α ⩾ 12 , since if the theorem holds for this
case then the case α ∈ (0, 12 ) simply follows from Exercise 6.2.7 via the relation
α
Dreg
α (ρ∥σ) = Dreg (ρ∥σ)
1 − α 1−α
α (6.173)
(6.172) → = D̃1−α (ρ∥σ)
1−α
= D̃α (ρ∥σ) .

We will therefore assume in the rest of the proof that α ⩾ 21 . Since D̃α is a relative en-
tropy that reduces to the Rényi relative entropy in the classical domain, it follows from
Theorem 6.4.1 that
Dreg
α (ρ∥σ) ⩽ D̃α (ρ∥σ) . (6.174)
For the reversed inequality, we first show that
1
Dreg Dα Pσ⊗n ρ⊗n σ ⊗n .

α (ρ∥σ) ⩾ lim (6.175)
n→∞ n

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

6.4. OPTIMAL QUANTUM EXTENSIONS OF RELATIVE ENTROPIES 323

Indeed, since Pσ⊗n (ρ⊗n ) commutes with σ ⊗n they have a common eigenbasis that spans
An . Let ∆n ∈ CPTP(An → An ) be the completely dephasing channel in this basis. From
Exercise 3.5.21 we have Pσ⊗n (ρ⊗n ) = ∆n (ρ⊗n ). Therefore,

Dα E ρ⊗n E σ ⊗n ⩾ Dα ∆n ρ⊗n ∆n σ ⊗n

sup
E∈CPTP(An →X) (6.176)
⊗n
⊗n
= Dα Pσ⊗n ρ σ .

Dividing both sides of the equation above by n and taking the limit n → ∞ proves (6.175).
It is left to show that the right-hand side of (6.175) is no smaller than D̃α (ρ∥σ). We will
divide this part of the proof into several cases:
1. The case α > 1 and ρ ̸≪ σ. Recall that for every n ∈ N the states Pσ⊗n (ρ⊗n )
and σ ⊗n have a common eigenbasis. Therefore, both of these states are diagonal
in this eigenbasis and the condition that ρ ̸≪ σ implies that these diagonal states
also satisfy Pσ⊗n (ρ⊗n ) ̸≪ σ ⊗n (see Exercise 3.5.22). Hence, we must have that
Dα Pσ⊗n (ρ⊗n ) σ ⊗n = ∞ for all n ∈ N.

2. The case α > 1 and ρ ≪ σ. Observe first that since Pσ (ρ) and σ commutes, we have
(cf. Exercise 6.2.8)
1 h 1−α 1−α α
i
Dα Pσ (ρ) σ = log Tr σ 2α Pσ (ρ)σ 2α . (6.177)
α−1
Now, from the pinching inequality (3.235) we have ρ ⩽ |spec(σ)|Pσ (ρ), so that (cf.
Exercise B.3.2)
α
1 1−α ρ 1−α
Dα Pσ (ρ) σ ⩾ log Tr σ 2α σ 2α
α−1 |spec(ρ)|
(6.178)
1 h 1−α 1−α α i α
= log Tr σ 2α ρσ 2α − log |spec(σ)| .
α−1 α−1
Hence, replacing ρ and σ above with ρ⊗n and σ ⊗n , and recalling from (8.102) that
|spec(σ ⊗n )| ⩽ (n + 1)|A| we get in the limit n → ∞
1 1 h 1−α 1−α α i
lim Dα Pσ⊗n ρ⊗n σ ⊗n ⩾

log Tr σ 2α ρσ 2α . (6.179)
n→∞ n α−1

3. The case α ∈ [ 21 , 1) and ρ ⊥ σ. In this case Dα Pσ⊗n (ρ⊗n ) σ ⊗n

= ∞ since
Pσ⊗n (ρ⊗n ) = ρ⊗n (and note also that ρ and σ commute; i.e., classical).

4. The case α ∈ [ 21 , 1) and ρ ̸⊥ σ. In this case, the first inequality in (6.178) holds in
1
the opposite direction since the factor α−1 is negative. We therefore need another
argument or trick. First, observe that
1−α 1−α α 1−α 1−α α−1 1−α 1−α
σ 2α ρσ 2α = σ 2α ρσ 2α σ 2α ρσ 2α . (6.180)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

324 CHAPTER 6. ENTROPIES AND RELATIVE ENTROPIES

Then, using the pinching inequality ρ ⩽ |spec(σ)|Pσ (ρ) we get

1−α 1−α 1−α 1−α
σ 2α ρσ 2α ⩽ |spec(σ)|σ 2α Pσ (ρ)σ 2α (6.181)
Combining this with the fact that the function t 7→ tα−1 is anti-operator monotone for
α ∈ [ 21 , 1) (see Table B.1) we get that
1−α 1−α α−1 1−α 1−α α−1

α−1
σ ρσ
2α 2α ⩾ |spec(σ)| σ Pσ (ρ)σ
2α 2α (6.182)

Combining the above inequality with (6.180) gives

1−α 1−α α
1−α α−1
1−α
1−α 1−α
α−1
Tr σ 2α ρσ 2α ⩾ |spec(σ)| Tr σ 2α Pσ (ρ)σ 2α σ 2α ρσ 2α
(6.183)
1−α α
h 1−α i
Exercise 6.4.8 → = |spec(σ)|α−1 Tr σ 2α Pσ (ρ)σ 2α .

Using the above inequality in (6.177) gives

1 h 1−α 1−α α i
Dα Pσ (ρ) σ ⩾ log Tr σ 2α ρσ 2α − log |spec(σ)| . (6.184)
α−1
Finally, replacing ρ and σ above with ρ⊗n and σ ⊗n , and recalling that |spec(σ ⊗n )| ⩽
(n + 1)|A| we get (6.179) in the limit n → ∞.
This completes the proof.
Exercise 6.4.8. Show that for any a, b ∈ R and any ρ, σ ∈ D(A)
h i h i
Tr (σ a Pσ (ρ)σ a )b σ a ρσ a = Tr (σ a Pσ (ρ)σ a )b+1 (6.185)

Hint: Recall that Pσ (ρ) commutes with σ and that all the pinching projectors commutes with
all the operators above except for the single ρ.

Corollary 6.4.1. Let A be a quantum system and ρ, σ ∈ D(A). For α ⩾ 1/2

1
Dreg Dα Pσ⊗n ρ⊗n σ ⊗n

α (ρ∥σ) = lim (6.186)
n→∞ n

Proof. Follows trivially from a combination of the theorem above with Eqs. (6.175,6.179).

Note that the corollary above demonstrates that an optimizer for (6.157) is En = ∆n .
Exercise 6.4.9. Show that for all α ∈ [0, ∞]
1 ⊗n
Dreg Pρ⊗n σ ⊗n .

α (ρ∥σ) ⩾ lim sup Dα ρ (6.187)
n→∞ n

Further, show that the equality above holds for all α ∈ (0, 21 ).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

6.4. OPTIMAL QUANTUM EXTENSIONS OF RELATIVE ENTROPIES 325

6.4.2 The Maximal Quantum Extension

In this subsection we apply the results of Sec. 5.3.1 to relative entropies. In particular, we will
see that the maximal quantum extension of the Rényi relative entropy has a closed formula
for Rényi order parameter α ∈ [0, 2]. We start with the following corollary of Theorem 5.3.2.

Corollary 6.4.2. Let D be a classical relative entropy, ρ, σ ∈ D(A), and suppose

ρ = ψ := |ψ⟩⟨ψ|. Then,
D(ψ∥σ) = Dmax (ψ∥σ) (6.188)
where Dmax is the quantum max relative entropy.

Proof. Let {e1 , e2 } be the standard basis of R2 , and observe that λmax in (5.112) is precisely
2−Dmax (ψ∥σ) . Therefore, Theorem 5.3.2 gives
D(ψ∥σ) = D e1 ∥ 2−Dmax (ψ∥σ) e1 + 1 − 2−Dmax (ψ∥σ) e2

(6.189)
Theorem 6.2.2 → = Dmax (ψ∥σ)
where the last equality holds since D is a relative entropy.
The corollary above demonstrates that the maximal quantum extension is closely related
to Dmax . The corollary is universal in the sense that it holds for any classical relative entropy
D, however, it is quite limited as it holds only for pure ρ. In the corollary below we will see
that for some of the Rényi relative entropies there exists a closed formula for the maximal
quantum extension without any restriction on ρ and σ. This closed formula is given in terms
of the family of geometric relative entropies.

The Geometric Relative Entropy

Definition 6.4.2. The geometric relative entropy of order α ∈ [0, 2] is defined for
any ρ ∈ D(A) and 0 < σ ∈ Pos(A) as
1 h 1 1 α
i
b α (ρ∥σ) :=
D log Tr σ σ − 2 ρσ − 2 (6.190)
α−1
and for singular σ ∈ Pos(A) is defined by

b α (ρ∥σ) := lim D
D b α (ρ∥σ + εI) . (6.191)
+ε→0

Remarks:
1. Alternatively, one can define the geometric relative entropy for any ρ, σ ∈ D(A) using
the decomposition (D.27) with ρ̃ := ρ11 − ζρ−1 ∗
22 ζ and σ̃ := σ11 . Then, the geometric
relative entropy of order α ∈ [0, 2] is given by
( h 1 α i
1 −2 − 12
b α (ρ∥σ) = α−1 log Tr σ̃ σ̃ ρ̃σ̃ if α ∈ [0, 1) or ρ ≪ σ
D (6.192)
∞ otherwise

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

326 CHAPTER 6. ENTROPIES AND RELATIVE ENTROPIES

2. The geometric relative entropy can be written differently using the relation M f (M ∗ M ) =
1 1
f (M M ∗ )M given in Exercise B.0.1. Denoting M := ρ 2 σ − 2 we get

1
log Tr σM ∗ M (M ∗ M )α−1

D
b α (ρ∥σ) =
α−1
1
log Tr σM ∗ (M ∗ M )α−1 M

= (6.193)
α−1
1 1

1 α−1
−1
= log Tr ρ ρ 2 σ ρ 2 .
α−1

3. In the limit α → 1 we get for ρ ≪ σ

h i
D(ρ∥σ)
b b α (ρ∥σ) = Tr ρ log ρ 21 σ −1 ρ 21
:= lim D , (6.194)
α→1

which is known as the Belavkin–Staszewski relative entropy.

4. Observe that for α = 2 the definition of the geometric relative entropy coincides with
the Petz quantum Rényi divergence of the same order.

Exercise 6.4.10. Show that the two definitions above for the geometric relative entropy are
equivalent (for any two density matrices ρ, σ ∈ D(A)); i.e., prove (6.192).

Exercise 6.4.11. Show that the geometric relative entropy satisfies the properties (axioms)
of additivity and normalization of a quantum relative entropy.

Exercise 6.4.12. Show that the geometric relative entropy reduces to the Rényi relative
entropy in the classical domain.

Instead of proving directly that the geometric relative entropy satisfies the DPI, we will
show that it is equal to the maximal quantum extension of the Rényi relative entropy. Since
the latter satisfies the DPI, this will imply that geometric relative entropy also satisfies the
DPI.

Corollary 6.4.3. The regularized maximal quantum extension of the Rényi

divergence, Dα , with α ∈ [0, 2], is given by the geometric relative entropy;
specifically, for any ρ, σ ∈ D(A) and α ∈ [0, 2]
reg
Dα (ρ∥σ) = D
b α (ρ∥σ) . (6.195)

Proof. The proof follows directly from Theorem 5.3.3 for the case σ > 0 and from Theo-
rem D.2.1 for the general case. We leave the details as an exercise.

Exercise 6.4.13. Provide the full details of the proof of the corollary above.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

6.5. NOTES AND REFERENCES 327

6.5 Notes and References

Axiomatic derivations of entropies and relative entropies have a plentiful literature, starting
with the seminal work by [200] followed and refined by [73], [64], and [3], amongst others.
These early papers focussed on the derivation of the Shannon entropy until the scope was
extended by [190]. Detailed reviews on the various axiomatic derivations can be found in
books by [2] and [71], and for a more recent guide on the topic see [56]. The axiomatic
approach presented here was formally introduced by [98] and [99]. We point out that other
functions that were studied in literature, like the Tsallis entropies, are not entropies according
to the definition adopted here, as they are not additive in general. Moreover, historically,
the terminology of ‘relative entropy’ has been reserved only to the KL-divergence or to the
Umegaki relative entropy in the quantum case, whereas we used this terminology to include
all additive, monotonic functions, as given in Definitions 6.2.1 and 6.3.1.
There are several proofs that can be found in the literature on Erdös theorem (Lemma 6.2.1).
The elementary proof we adopted here is due to [134].
There is a rich literature on both classical and quantum Rényi divergences. A recent
guide on classical Rényi divergences with a thorough details of their properties can be found
in the review article by [215]. In the quantum domain, the books by [208], and more recently
by [206], devotes significant portion to the study of the quantum Rényi divergences, and we
refer the reader to these books for more details on the history and developments of quantum
Rényi divergences.
In this chapter we focused on three types of extensions of the Rényi relative entropy
to the quantum domain: the Petz quantum Rényi divergence (introduced by [178]), the
minimal (sandwiched) quantum Rényi divergence (introduced by [167] and independently
by [233]), and the maximal quantum Rényi divergence (introduced by [160]). These three
extensions are by no means the only quantum extensions of the classical Rényi relative
entropy. Examples include the two parameter family studied in [136] and [10], and the more
recent divergence that was introduced by [77].
The min and max quantum relative entropy were first introduced by [58]. Given that all
quantum relative entropies are bounded by these two divergences, it is not a surprise that
the min and max relative entropies have be used extensively in quantum information.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

328 CHAPTER 6. ENTROPIES AND RELATIVE ENTROPIES

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

CHAPTER 7

Conditional Entropy

In this chapter, we delve into a variant of the entropy function, widely prevalent in informa-
tion theory and quantum resource theories, especially in the realm of dynamical resources,
which is the focus of the second volume of this book. This variant is known as conditional
entropy, which pertains to the entropy associated with a physical system A that shares a
correlation with another system B. When an observer, say Bob, has access to system B, he
can reduce his uncertainty about system A by performing a quantum measurement on his
subsystem. In essence, conditional entropy quantifies the residual uncertainty of system A
when such access to system B is available.
Traditionally, the conditional entropy of a bipartite state ρAB is defined in terms of the von
Neumann entropy associated with system AB minus the von Neumann entropy associated
with system B. This is given by:

H(A|B)ρ := H ρAB − H ρB .

(7.1)

See Fig. 7.1 for a heuristic description of this definition in terms of a Venn diagram. However,
in this chapter, we take a different approach. Here, conditional entropy is defined axiomati-
cally, similar to how we defined entropy and relative entropy. This approach provides a more
rigorous definition of conditional entropy, placing the intuitive Venn diagram interpretation
on a more solid theoretical foundation.

7.1 Quantum Conditional Majorization

We start this chapter with a section in which we extend the definition of conditional majoriza-
tion to the quantum domain. Recall that conditional majorization, as defined in Sec. 4.6,
characterizes the uncertainty associated with a physical system given access to another sys-
tem that is correlated with it. Therefore, as we will see shortly, conditional majorization
provides the foundation for the definition of conditional entropy.
The majorization relation between two probability vectors can be generalized to the
quantum domain in a straightforward manner. That is, for any two density matrices ρ ∈

329
330 CHAPTER 7. CONDITIONAL ENTROPY

Figure 7.1: A Venn-diagram for the conditional entropy.

′
D(A) and σ ∈ D(A′ ) we say that ρ majorizes σ and write ρA ≻ σ A if the probability
vector consisting of the eigenvalues of ρ, majorizes the probability vector consisting of the
eigenvalues of σ. For conditional majorization, such a straightforward extension from the
classical to the quantum domain is more complex as it involves two systems that can be
correlated quantumly (i.e. entangled). For this reason we employ the axiomatic approach
to introduce quantum conditional majorization, and then discuss some of its key properties.
As discussed above, intuitively, conditional majorization is a pre-order on the set D(AB)
that characterizes the uncertainty of system A given access to system B. To make this intu-
ition more precise, we employ the axiomatic approach determining which set of operations
in CPTP(AB → AB ′ ) can only increase the uncertainty of system A (even if one has access
to system B). We will now examine two highly intuitive axioms that these channels must
adhere to. These two axioms extend the principles explored in Sec. 4.6.1 to the quantum
domain.

7.1.1 Conditional Unital Channels

What types of channels are expected to increase the conditional uncertainty associated with
a system A given access to B? To address this question, let’s consider a bipartite quantum
state in the form uA ⊗ρB , where uA represents the maximally mixed state and ρB is a certain
density matrix. Given that such a product state is uncorrelated, access to B does not aid
in reducing the uncertainty about A. Consequently, we can infer that this state exhibits the
highest degree of conditional uncertainty on A|B (i.e., A given access to B). Now, any chan-
nel N ∈ CPTP(AB → AB ′ ) that preserves or increases this conditional uncertainty should
transform states with maximal conditional uncertainty into states retaining this property.
Specifically, for all ρ ∈ D(B), such a channel N ∈ CPTP(AB → AB ′ ) must satisfy:
′ ′
N AB→AB uA ⊗ ρB = uA ⊗ σ B

(7.2)
′ ′
where σ B is some density matrix in D(B ′ ). By tracing out system A, we can express σ B as
′ ′
σ B = N AB→B uA ⊗ ρB ,

(7.3)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

7.1. QUANTUM CONDITIONAL MAJORIZATION 331

′ ′
where N AB→B := TrA ◦ N AB→AB . Such a channel N ∈ CPTP(AB → AB ′ ) is referred to
as conditionally unital. It is important to note that both the input and output systems on
Alice’s side remain the same, whereas on Bob’s side, the systems B and B ′ can be different.

Lemma 7.1.1. Let N ∈ CPTP(AB → ÃB ′ ) be a bipartite quantum channel and let
′
JNAB ÃB be its Choi matrix. Then, N AB→ÃB is conditionally unital if and only if its
Choi matrix satisfies
′ ′
JNB ÃB = JNBB ⊗ uÃ . (7.4)

′
Proof. We begin by proving that the channel N AB→ÃB is conditionally unital if its Choi
matrix has the form in (7.4). To this end, let ρ ∈ D(B) and consider that
h ′
T ′
i
N uA ⊗ ρB = TrAB JNAB ÃB uA ⊗ ρB ⊗ I ÃB

1 h
B ÃB ′

B T
ÃB ′
i
= TrB JN ρ ⊗I
|A| (7.5)
1 h
Ã BB ′

B T
ÃB ′
i
(7.4)→ = TrB u ⊗ JN ρ ⊗I
|A|
= uÃ ⊗ σ B ,
h T i
1 ′ ′
where σ B := |A| TrB JNBB ρB ⊗ I B .
′ ′
We next prove that the Choi matrix of N AB→ÃB has the form in (7.4) if N AB→ÃB is
conditionally unital. Recall that the defining property of a conditionally unital channel is
that for every state ρ ∈ D(B), there exists a state σ ∈ D(B ′ ) such that
′ ′
N AB→ÃB (uA ⊗ ρB ) = uÃ ⊗ σ B . (7.6)
By taking the trace over Ã on both sides of the equation above we get that
h i
B′ AB→B ′ A B ABB ′ A B T B′

σ =N (u ⊗ ρ ) = TrAB JN u ⊗ ρ ⊗I
1 h i (7.7)
BB ′ B T B′

= TrB JN ρ ⊗I .
|A|
On the other hand, observe that
h i
AB→ÃB ′ A B ′ B T ÃB ′
TrAB JNAB ÃB A

N (u ⊗ ρ ) = u ⊗ ρ ⊗I
1 h i (7.8)
B ÃB ′ B T ÃB ′

= TrB JN ρ ⊗I .
|A|
Therefore, from the two expressions above for σ B and N (uA ⊗ ρB ) we conclude that (7.6)
can be expressed as
h i h i
B ÃB ′ B T ÃB ′ Ã BB ′ B T ÃB ′

TrB JN ρ ⊗I = TrB (u ⊗ JN ) ρ ⊗I . (7.9)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

332 CHAPTER 7. CONDITIONAL ENTROPY

′ ′ ′
Denote by η B ÃB := JNB ÃB − uÃ ⊗ JNBB and observe that the equation above can be written
as h i
B ÃB ′ B T ÃB ′

Tr η ρ ⊗I =0 ∀ ρ ∈ D(B). (7.10)
Due to the existence of bases of density operators that span the space of linear operators
acting on B, we conclude from (7.10) that for every operator ζ ∈ L(B) we have
h ′
′
i
TrB η B ÃB ζ B ⊗ I ÃB =0. (7.11)

Note that by multiplying both sides of the equation above by any element ξ ∈ L(ÃB ′ ) and
taking the trace we get that
B ÃB ′ B ÃB ′

Tr η ζ ⊗ξ =0. (7.12)

Since the equation above holds for all ζ ∈ L(B) and all ξ ∈ L(ÃB ′ ) it also holds for any
′ ′
linear combinations of matrices of the form ζ B ⊗ ξ ÃB . Since matrices of the form ζ B ⊗ ξ ÃB
′
span the whole space L(B ÃB ′ ) we conclude that η B ÃB is orthogonal (in the Hilbert-Schmidt
′
inner product) to all the elements of L(B ÃB ′ ). Therefore, we must have η B ÃB = 0 which is
equivalent to (7.4). This completes the proof.

7.1.2 Semi-Causal Channels

An additional requirement for the channel N ∈ CPTP(AB → AB ′ ) to not diminish con-
ditional uncertainty is related to preventing information leakage from Alice’s subsystem to
Bob’s. This is crucial since any leaked information could reduce the uncertainty about sys-
tem A. Conditional uncertainty specifically pertains to the uncertainty about system A
when only Bob’s system is accessible. To support this, we introduce a causality assump-
tion ensuring that system A does not causally influence system B ′ . This is mathematically
represented as follows: for all M ∈ CPTP(A → A),
′ ′
N AB→B ◦ MA→A = N AB→B , (7.13)
′ ′
where N AB→B := TrA ◦ N AB→AB . This condition guarantees that any operation MA→A
applied by Alice to her system remains undetected by Bob. We refer to such a condition
as A ̸→ B ′ semi-causal. For a visual representation, see Fig. 7.2 depicting a semi-causal
channel.
′
Exercise 7.1.1. Let N ∈ CPTP(AB → ÃB ′ ). Show that N AB→ÃB is A ̸→ B ′ semi-causal
if and only if the marginals of its Choi matrix satisfy
′ ′
JNABB = uA ⊗ JNBB . (7.14)
Hint: In one direction, take M ∈ CPTP(A) in (7.13) to be the replacer channel that always
output the maximally mixed state irrespective of the input state (i.e. take M to be the
completely depolarizing channel, also know as the completely randomizing channel), and
compute the Choi matrix for the channels on both sides of (7.13).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

7.1. QUANTUM CONDITIONAL MAJORIZATION 333

′
Figure 7.2: An illustration of an A ̸→ B ′ semi-causal bipartite channel N AB→AB . The marginal
channel N AB→B ′ equals N AB→B ′ ◦ MA→A for any choice of M ∈ CPTP(A → A).

The following theorem establishes the equivalence between A ̸→ B ′ semi-causal chan-

nels and A ̸→ B ′ signalling channels. Specifically, a bipartite quantum channel N ∈
CPTP(AB → AB ′ ) is classified as A ̸→ B ′ signalling if it meets the following criteria:
there exists a reference system R, alongside a quantum channel E ∈ CPTP(AR → A) and
an isometry F ∈ CPTP(B → RB ′ ), such that
′ ′
N AB→AB = E RA→A ◦ F B→RB ; . (7.15)

In essence, channels that are A ̸→ B ′ signalling are those that can be implemented via
one-way communication from Bob to Alice. An illustration of this concept can be found in
Fig. 7.3.

Theorem 7.1.1. Let N ∈ CPTP(AB → AB ′ ) be a bipartite quantum channel. The

following two statements are equivalent:
′
1. The channel N AB→AB is an A ̸→ B ′ signalling, as defined in (7.15).
′
2. The channel N AB→AB is an A ̸→ B semi-causal, as defined in (7.13).

Remark. The theorem above demonstrate the intuitive assertion that semi-causal bipartite
channels are channels that can be realized with one-way communication from Bob to Alice.
With such channels, Alice cannot influence Bob’s system. We also point out that the relation
in (7.15) has been written in a compact form; that is, we removed identity channels so that

′
′ ′
′

E RA→A ◦ F B→RB := E RA→A ⊗ idB →B ◦ idA→A ⊗ F B→RB . (7.16)

Proof. We begin by proving the implication 1 ⇒ 2. Consider the following marginal of the

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

334 CHAPTER 7. CONDITIONAL ENTROPY

Figure 7.3: A bipartite channel that is A ̸→ B ′ signalling.

′
channel N AB→AB :
′ ′
N AB→B := TrA ◦ N AB→AB
′
(7.15)→ = TrA ◦ E RA→A ◦ F B→RB
′ (7.17)
= TrRA ◦ F B→RB
′
= TrA ◦ F B→B ,

where we have utilized the trace-preserving property of quantum channels and denoted
′ ′
F B→B := TrR ◦ F B→RB . It is evident that this channel satisfies (7.13) with any trace
′
preserving map MA→A , establishing that N AB→AB is A ̸→ B ′ semi-causal.
Moving on to the implication 2 ⇒ 1, we examine two distinct purifications of the marginal
′
Choi matrix JNABB :
′
1. Consider a physical system C and a pure (unnormalized) state ψ ABAB C , which acts as
′ ′
a purification of both JNABAB and its marginal state JNABB .
′ 1 ′
2. Denoting by φBB R , an (unnormalized) purification of the operator J BB ,
|A| N
we get
BB ′ R ABB ′
from (7.14), that ΩAÃ ⊗ φ is another purification JN .

1
Since the marginal of JNAB equals I AB , it implies that φB = |A| JNB = I B . This property
implies the existence of an isometry F ∈ CPTP(B → B ′ R) that satisfies

′ ′
φBB R = F B̃→B R (ΩB B̃ ) . (7.18)

Moreover, given that two purifications of the same positive semi-definite matrix are connected
by an isometry (as per Exercise 2.3.32), there must be an isometry V ∈ CPTP(RÃ → AC)
satisfying

ABAB ′ C RÃ→AC AÃ BB ′ R
ψ =V Ω ⊗ϕ . (7.19)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

7.1. QUANTUM CONDITIONAL MAJORIZATION 335

Finally, tracing out system C and denoting by E RÃ→A := TrC ◦ V RÃ→AC gives
′
′

JNABAB = E RÃ→A ΩAÃ ⊗ ϕBB R

RÃ→A AÃ B̃→B ′ R B B̃
(7.18)→ = E Ω ⊗F Ω (7.20)
′

= E RÃ→A ◦ F B̃→B R Ω(AB)(ÃB̃) .

The equation above implies that (7.15) holds. This completes the proof.
Exercise 7.1.2. Show that if |B ′ | = 1 then any channel in CPTP(AB → AB ′ ) is A ̸→ B ′
semi-causal.

7.1.3 Quantum Conditionally Mixing Operations

We do not expect conditional uncertainty to decrease under operations that are both con-
ditional unital and A ̸→ B ′ semi-causal. Such operations provides a quantum extension to
the operations introduced in Definition 4.6.1.

Definition 7.1.1. A quantum channel N ∈ CPTP(AB → AB ′ ) is called a

conditionally mixing operation (CMO) if it is both conditionally unital and A ̸→ B ′
semi-causal. The set of all such conditionally mixing operations in
CPTP(AB → AB ′ ) is denoted by CMO(AB → AB ′ ).

′
The Choi matrix, JNAB ÃB , of a channel N ∈ CMO(AB → ÃB ′ ) that is both conditional
unital and A ̸→ B ′ semi-causal must satisfies:
′ ′
1. JNB ÃB = JNBB ⊗ uÃ (i.e. N is conditionally unital; see (7.4)).
′ ′
2. JNABB = uA ⊗ JNBB (i.e. N is semi-causal; see (7.14)).
3. JNAB = I AB (i.e. N is trace preserving).
′
4. JNAB ÃB ⩾ 0 (i.e. N is completely positive).
Observe the symmetry of the first two conditions above under exchange of the local input
system A and the local output system Ã.
As a straightforward example of a CMO, consider that the reference system R in Theo-
′
rem 7.1.1 is classical. In such a scenario, we define X := R and the channel N AB→AB can
be expressed as:
′ ′ ′
X
N AB→AB = E XA→A ◦ F B→XB = A→A
E(x) ⊗ FxB→B (7.21)
x∈[m]

′
Here, {FxB→B }x∈[m] constitutes a quantum instrument, and for each x ∈ [m], the map E(x)
A→A

is a quantum channel in CPTP(A → A). We will explore later that this channel typifies

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

336 CHAPTER 7. CONDITIONAL ENTROPY

one-way LOCC (Local Operations and Classical Communication). Notably, if each E(x) is
a unital channel, this one-way LOCC is also conditionally unital. This channel essentially
represents Bob performing a quantum measurement on his system and conveying the result
to Alice, who then applies a unital channel to her system.
Exercise 7.1.3. Let ω ∈ D(B) and N ∈ CMO(AB → ÃB ′ ). Show that the channel
E ∈ CPTP(A → ÃB ′ ) defined for any ρ ∈ L(A) by
′ ′
E A→ÃB (ρA ) := N AB→ÃB (ρA ⊗ ω B ) (7.22)
is also CMO; i.e. show that E ∈ CMO(A → ÃB ′ ).
Exercise 7.1.4. Let Υ ∈ L(AB ÃB ′ → AB ÃB ′ ) be a linear map defined for all ω ∈
L(AB ÃB ′ ) as

AB ÃB ′ A B ÃB ′ BB ′ Ã Ã ABB ′ A BB ′
Υ ω := u ⊗ ω −ω ⊗u +u ⊗ ω −u ⊗ω . (7.23)
′
1. Show that a channel N ∈ CPTP(AB → ÃB ′ ) is CMO if and only if Υ(JNAB ÃB ) = 0.
2. Show that Υ is self-adjoint; i.e. show that Υ = Υ∗ .
3. Show that Υ is idempotent; i.e. Υ ◦ Υ = Υ.

7.1.4 Definition of Conditional Majorization

We are now prepared to define quantum conditional majorization, grounded on the two
aforementioned axioms, for channels that preserve or increase conditional uncertainty.

Quantum Conditional Majorization

Definition 7.1.2. Let A, B, B ′ be three quantum systems, and let ρ ∈ D(AB) and
′ ′
σ ∈ D(AB ′ ). We say that ρAB conditionally majorizes σ AB , and write ρAB ≻A σ AB ,
if there exists a channel N ∈ CMO(AB → AB ′ ) such that
′ ′
σ AB = N AB→AB ρAB .

(7.24)

Exercise 7.1.5. Show that quantum conditional majorization as defined above is a pre-order.
This definition effectively extends the concept of majorization. Specifically, when |B| =
|B ′ | = 1, the set CMO(A → A) coincides with the set of unital channels. Consequently, the
relation ρA ≻A σ A as defined above (under the condition |B| = |B ′ | = 1) transforms into the
well-known majorization relation ρA ≻ σ A . Expanding on this concept, quantum conditional
majorization exhibits the following notable property.

Lemma 7.1.2. Let ρ ∈ D(AB) and σ ∈ D(AB ′ ) be two product states; i.e.
′ ′
ρAB = ρA ⊗ ρB and σ AB = σ A ⊗ σ B . Then,
′
ρAB ≻A σ AB ⇐⇒ ρA ≻ σ A . (7.25)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

7.1. QUANTUM CONDITIONAL MAJORIZATION 337

Proof. If ρA ≻ σ A then there exists a unital channel such that σ A = U A→A (ρA ). Let
′
E ∈ CPTP(B → B ′ ) be a replacement channel that always outputs σ B . Then, the channel
′ ′
N AB→AB := U A→A ⊗ E B→B (7.26)
′ ′ ′
is CMO and satisfies σ A ⊗ σ B = N AB→AB ρA ⊗ ρB . Hence, ρAB ≻A σ AB . Conversely, if
′ ′ ′
ρAB ≻A σ AB then there exists a semi-causal quantum channel N AB→AB = E RA→A ◦ F B→RB
(that is also conditionally unital) that satisfies
′ ′
σ A ⊗ σ B = E RA→A ◦ F B→RB ρA ⊗ ρB .

(7.27)
′
Tracing system B ′ on both sides, and denoting τ R := TrB ′ ◦ F B→RB ρB gives

σ A = E RA→A ρA ⊗ τ R .

(7.28)

Finally, note that the channel

U A→A (ω A ) := E RA→A ω A ⊗ τ R

∀ ω ∈ D(A) , (7.29)
′
is a unital channel since N AB→AB is conditionally unital. Since by definition, σ A = U A→A ρA ,
we conclude that ρA ≻ σ A . This completes the proof.
′ ′
Determining whether ρAB ≻A σ AB for two given bipartite quantum states ρAB and σ AB
can be challenging. This task is essentially about verifying the existence of a Choi matrix
′
J AB ÃB N that satisfies both the four initial conditions outlined at the start of this subsection
(characteristic of a CMO) and the additional criterion:
h ′
T ′
i ′
TrAB JNAB ÃB ρAB ⊗ I ÃB = σ ÃB . (7.30)

′
These five conditions imposed on J AB ÃB N constitute an SDP (Semidefinite Programming)
feasibility problem that can be solved efficiently and algorithmically on a computer.

7.1.5 Specific Conditional Majorization Relations

Generally, as mentioned above, determining if one state conditionally majorizes another
can be resolved through an SDP. However, there are key examples that are useful for later
discussions and can be resolved without an SDP. For instance, for any state ρ ∈ D(ABB ′ ),
it holds that
′
ρABB ≻A ρAB (7.31)
since partial tracing over system B ′ qualifies as a CMO. Another straightforward case is for
any ρ ∈ D(AB) and σ ∈ D(B), where

ρAB ≻A ρA ⊗ σ B . (7.32)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

338 CHAPTER 7. CONDITIONAL ENTROPY

A more nuanced example is the fact that the maximally entangled state conditionally ma-
jorizes all states of the same dimensions.

Theorem 7.1.2. Let ρ ∈ D(AB). Then,

ΦAB ≻A ρAB . (7.33)

Proof. In Sec.1.4.1, we discussed how the maximally entangled state ΦAB can be utilized for
teleporting an unknown quantum state. Specifically, a teleportation protocol from Bob to
Alice (note that in Sec.1.4.1, we examined teleportation from Alice to Bob) involves Bob
performing a joint quantum measurement on his part of ΦAB and the state to be teleported.
This is followed by classical communication to Alice, who then applies a unitary operation.
Such a protocol conforms to the structure of CMO channels described in (7.21), where each
A→A
E(x) is a unitary channel. This implies that for any bipartite state ρAB , there exists a
channel N ∈ CMO(AB → AB) such that

ρAB = N AB→AB ΦAB .

(7.34)

We emphasize that in the realization of the channel N AB→AB , Bob locally prepares the
state ρB B with B ′ ∼
′
= A and then employs the maximally entangled state ΦAB to teleport
′
the B subsystem to Alice, resulting in the state ρAB . The equation above implies that
ΦAB ≻A ρAB .

The next theorem characterizes the states in D(AB) that are conditionally majorized by
a state in Pure(A). Note that all the states in Pure(A) are equivalent under majorization.

Theorem 7.1.3. Let ρ ∈ D(AB). Then, the following statements are equivalent:

1. The state ρAB satisfies

I A ⊗ ρB ⩾ ρAB . (7.35)

2. For every pure state ψ ∈ Pure(A)

ψ A ≻A ρAB . (7.36)

Proof. To demonstrate that the first statement implies the second, we search for a channel
N ∈ CMO(A → AB) satisfying ρAB = N A→AB (ψ A ). Our strategy involves considering a
binary measurement-and-prepare channel defined as:

N A→AB (ω A ) := Tr[ψ A ω A ]ρAB + Tr[(I A − ψ A )ω A ]τ AB ∀; ω ∈ L(A); , (7.37)

where τ AB is a density matrix to be determined. By definition, we have ρAB = N A→AB (ψ A ).

The remaining task is to identify a τ ∈ D(AB) that ensures N A→AB is a CMO (both A ̸→ B
semi-causal and conditionally unital).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

7.1. QUANTUM CONDITIONAL MAJORIZATION 339

For N A→AB to be A ̸→ B semi-causal, it must satisfy the condition N A→B ◦ MA→A =

N A→B for any M ∈ CPTP(A → A) (cf. (7.13)). That is, N A→B must be a replacement
channel, independent of its input. Observing that for every input state ω ∈ D(A),

N A→B (ω A ) = Tr[ψ A ω A ]ρB + Tr[ I A − ψ A ω A ]τ B ,

(7.38)

and given that the left-hand side is independent of ω A , we conclude N A→AB is A ̸→ B

semi-causal if and only if τ B = ρB . This is the first condition for τ AB .
The second condition arises from N A→AB being conditionally unital. This is satisfied if
and only if (cf. Eqs. (7.2,7.3))

N A→AB (I A ) = ρAB + (|A| − 1) τ AB (7.39)

equals
I A ⊗ N A→B (uA ) = I A ⊗ ρB , (7.40)
using (7.38) with τ B = ρB . Equating these two operators dictates that τ AB must be

I A ⊗ ρB − ρAB
τ AB := . (7.41)
|A| − 1

τ AB is positive semi-definite, as we assume I A ⊗ρB ⩾ ρAB , and has unit trace, qualifying as a
density matrix. Also, it satisfies τ B = ρB . This concludes the proof that the first statement
implies the second.
Conversely, if N ∈ CMO(A → AB) is such that ρAB = N A→AB (ψ A ), then

ρB = N A→B (ψ A ) , (7.42)

where N A→B := TrA ◦ N A→AB is the marginal channel. Since N A→AB is A ̸→ B semi-causal
the marginal channel N A→B must satisfy

N A→B uA = N A→B ψ A

(7.43)
(7.42)→ = ρB .

Moreover, since N A→AB is conditionally unital we have

N A→AB I A = I A ⊗ N A→B (uA )

(7.44)
(7.43)→ = I A ⊗ ρB .

Combining everything we get

I A ⊗ ρB − ρAB = N A→AB I A − N A→AB ψ A

(7.45)
= N A→AB I A − ψ A ⩾ 0 ,

where the last inequality follows from the fact that I A − ψ A ⩾ 0 and N A→AB is a completely
positive map. This concludes the proof.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

340 CHAPTER 7. CONDITIONAL ENTROPY

In Theorem 7.1.2 we saw that under conditional majorization ΦAB is the maximal element
of D(AB). On the other hand, the maximally mixed state uA satisfies the opposite inequality
that ρAB ≻A uA for all ρ ∈ D(AB). Combining this maximal and minimal elements gives
the state
ΦAB ⊗ uÃ . (7.46)
Remarkably, the following theorem shows that under conditional majorization, this state is
equivalent to any pure state in Pure(AÃ).

Theorem 7.1.4. For all ψ ∈ Pure(AB) with |B| = |A| we have

ψ AÃ ≻AÃ ΦAB ⊗ uÃ ≻AÃ ψ AÃ . (7.47)

Proof. We first prove that ΦAB ⊗ uÃ ≻AÃ ψ AÃ . We will denote by τ AÃ the density matrix

AÃ I AÃ − ψ AÃ

τ := , (7.48)
m2 − 1

where m := |A|. Let N AÃB→AÃ be a quantum channel defined by

N AÃB→AÃ := E AB→AÃ ◦ TrÃ , (7.49)

where for all ω ∈ L(AB)

E AB→AÃ (ω AB ) := Tr[ΦAB ω AB ]ψ AÃ + Tr[(I AB − ΦAB )ω AB ]τ AÃ . (7.50)

By definition, the channel above satisfies

N AÃB→AÃ ΦAB ⊗ uÃ = ψ AÃ . (7.51)

Therefore, it is left to show that the channel N is CMO. Since the channel N AÃB→AÃ does
not have an output on Bob’s side it is trivially A ̸→ B ′ semi-causal (|B ′ | = 1). To show that
it is conditionally unital observe that for any σ ∈ D(B) we have

AÃB→AÃ AÃ B
= E AB→AÃ mI A ⊗ σ B

N I ⊗σ

= ψ AÃ + mTr mI B − uB σ B τ AÃ (7.52)

(7.48)→ = I AÃ .

Hence, N ∈ CMO(AÃB → AÃ). This completes the proof that ΦAB ⊗ uÃ ≻AÃ ψ AÃ .
To prove that ΦAB ⊗ uÃ ≻AÃ ψ AÃ , we set τ AB := (I AB − ψ AB )/(m2 − 1) and denote by
N AÃ→AÃB a quantum channel defined for all ω ∈ L(AÃ) as

AÃ→AÃB AÃ AÃ→AB
N (ω ) := N (ω ) ⊗ uÃ .
AÃ
(7.53)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

7.2. DEFINITION OF CONDITIONAL ENTROPY 341

where
N AÃ→AB (ω AÃ ) := Tr[ψ AÃ ω AÃ ]ΦAB + Tr[(I AÃ − ψ AÃ )ω AÃ ]τ AB . (7.54)

By definition, the channel above satisfies

N AÃ→AÃB ψ AÃ = ΦAB ⊗ uÃ . (7.55)

Therefore, it is left to show that the channel N is CMO. To show that it is conditionally
unital observe that

AÃ→AÃB AÃ
= ΦAB + (I AB − ΦAB ) ⊗ uÃ

N I
(7.56)
AB Ã
=I ⊗u .

To show that it is A ̸→ B semi-causal observe that for all ω ∈ D(AÃ) the marginal channel
N AÃ→B satisfies

N AÃ→B ω AÃ = Tr[ψ AÃ ω AÃ ]uB + Tr[(I AÃ − ψ AÃ )ω AÃ ]τ B
(7.57)
Exercise (7.1.6)→ = uB ,

which is independent of ω AÃ , so that N AÃ→AÃB is A ̸→ B semi-causal. Hence, N ∈

CMO(AÃ → AÃB). This completes the proof.

Exercise 7.1.6. Use the definition of τ AB to verify the first equality in (7.56) and the second
equality in (7.57).

7.2 Definition of Conditional Entropy

In Sec. 7.1, we introduced the concept of conditional majorization between bipartite states
ρ ∈ D(AB) and σ ∈ D(AB ′ ). This relationship is established if a conditional unital and
′
A ̸→ B ′ signalling channel in CPTP(AB → AB ′ ) can transform ρAB into σ AB . Monotonic
functions under this pre-order quantify the conditional uncertainty Bob has about Alice’s
system A. Furthermore, if these functions are additive under tensor products, we refer to
them as conditional entropies (see the forthcoming definition).
In the classical entropy’s definition (see Definition 6.1.1), we require H to behave mono-
tonically under majorization. Majorization typically involves comparing two probability
vectors of equal dimensions, but this can be extended to vectors of differing dimensions by
padding the shorter vector with zeros. In the quantum realm, we expand the concept of con-
ditional majorization by introducing additional isometries to account for varying dimensions
of Alice’s input and output systems.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

342 CHAPTER 7. CONDITIONAL ENTROPY

Extension of Conditional Majorization

Consider ρ ∈ D(AB) and σ ∈ D(A′ B ′ ), where |A| may differ from |A′ |. We state that
′ ′ ′ ′ ′
ρAB conditionally majorizes σ A B , denoted as ρAB ≻A σ A B , if for |A| ⩾ |A |, an
′ AB A→A′ A′ B ′
isometry V ∈ CPTP(A → A) exists such that ρ ≻A V σ . Conversely, if
′ ′ A→A′ AB ′ ′
|A | > |A|, an isometry U ∈ CPTP(A → A ) exists so that U ρ ≻A ′ σ A B .

With this extension, we can now define conditional entropy. Specifically, we consider
[
H: D(AB) → R; , (7.58)
A,B

as a function mapping the set of all bipartite states across finite dimensions to the real line.
The function H assigns to each bipartite density matrix ρAB a real number, denoted by
H(A|B)ρ . This notation distinguishes conditional entropy from the entropy of the marginal
state ρA .
Our objective is to identify when H constitutes a conditional entropy. For systems where
|B| = 1, we denote H(A|B)ρ as H(A)ρ := H(ρA ), aligning with the notation of conditional
entropy. This notation proves useful when examining composite systems with multiple sub-
systems. Since we define entropy functions as non-constant zero functions, we assume (im-
plicitly throughout this book, and in the definition below) the existence of a quantum system
A and a state ρ ∈ D(A) such that H(A)ρ ̸= 0.

Quantum Conditional Entropy

Definition 7.2.1. The function H in (7.58) is termed a conditional entropy if for
every ρ ∈ D(AB) and σ ∈ D(A′ B ′ ), it satisfies the following two properties:
′ ′
1. Monotonicity: If ρAB ≻A σ A B , then H(A|B)ρ ⩽ H(A′ |B ′ )σ .

2. Additivity: H(AA′ |BB ′ )ρ⊗σ = H(A|B)ρ + H(A′ |B ′ )σ .

There are several properties of conditional entropy that follows from the definition above.
First, observe that the case that system B is trivial, i.e. |B| = 1, a conditional entropy
function reduces to an entropy function. Moreover, if ρAB = ω A ⊗ τ B is a product state,
then it can be converted reversibly to the product state ω A ⊗ uB by a product channel of
the form idA→A ⊗ E B→B which is in CMO(AB → AB). Therefore, from the monotonicity
property above it follows that

H(A|B)ωA ⊗τ B = H(A|B)ωA ⊗uB , (7.59)

so that H(A|B)ρ depends only on ω A . Moreover, the function ω A 7→ H(A|B)ωA ⊗uB satisfies
the two axioms of entropy and therefore can be considered itself as an entropy of ω A . In other
words, conditional entropy reduces to entropy on product states as intuitively expected.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

7.3. INEVITABILITY OF NEGATIVE QUANTUM CONDITIONAL ENTROPY 343

Next, conditional entropy is invariant under the action of local isometric channels. That
′ ′
is, for a bipartite state ρAB and isometric channels U A→A and V B→B ,

H(A|B)ρ = H(A|B ′ )V(ρ) = H(A′ |B)U (ρ) . (7.60)

′
To see the first equality, observe that idA ⊗ V B→B ∈ CMO(AB → AB ′ ) so that H(A|B)ρ ⩽
H(A|B ′ )V(ρ) . For the converse, let V −1 ∈ CPTP(B ′ → B) be one of the left-inverses of the
′
isometry V B→B (see for example (3.201)). Again, since idA ⊗ V −1 ∈ CMO(AB ′ → AB) we
conclude that
H(A|B ′ )V(ρ) ⩽ H(A|B)V −1 ◦V(ρ) = H(A|B)ρ . (7.61)
Finally, the second equality in (7.60) (i.e., the invariance under an isometry on system A)
follows directly from the monotonicity of conditional entropy under the extended version of
conditional majorization.
Exercise 7.2.1. Prove the second equality in (7.60). Hint: Use twice the monotonicity of
conditional entropy under the extended version of conditional majorization.
Consider the product state uA ⊗ ρB where |A| = 2. Similar to what we found previously
for unconditional entropy, the following inequality holds

H(A|B)u⊗ρ = H(A)u > 0. (7.62)

This strict inequality allows us to set a normalization factor for conditional entropy. To be
consistent with the normalization convention for unconditional entropy, we set for the case
|A| = 2 that H(A|B)u⊗ρ = 1, which in turn implies that for |A| > 2

H(A|B)u⊗ρ = log2 |A| . (7.63)

In the rest of this book we will always assume that H is normalized in this way.
Exercise 7.2.2. Let H be conditional entropy. Show that for every Hilbert spaces A and B

H(A|B)ρ ⩽ log |A| ∀ ρ ∈ D(AB) , (7.64)

with equality if ρAB = uA ⊗ τ B for some τ ∈ D(B). Hint: Find a channel in CMO(AB →
AB) that takes ρAB to uA ⊗ ρB .

7.3 Inevitability of Negative Quantum Conditional En-

tropy
In Theorem (7.1.2) we saw that the maximally entangled state ΦAB conditionally majorizes
all the states in D(AB). Therefore, from the monotonicity property of the conditional
entropy we get that for every ρ ∈ D(AB), with |A| = |B|, and every conditional entropy
function H,
H(A|B)ρ ⩾ H(A|B)Φ . (7.65)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

344 CHAPTER 7. CONDITIONAL ENTROPY

That is, the maximally entangled state has the least amount of conditional entropy. We will
see below that H(A|B)Φ is negative.
Unlike entropy, quantum conditional entropy can be negative. This unintuitive phenom-
ena puzzled the community for quite some time until an operational interpretation for the
quantum conditional entropy was found. This operational interpretation is given in terms of
a protocol known as quantum state merging (which we study in volume 2 of this book). In
the following theorem we show that certain entangled states must have negative conditional
entropy, while classical conditional entropy is always non-negative.
The lower bound in the theorem below is given in terms of the conditional min-entropy
defined on every bipartite state ρ ∈ D(AB) as
Hmin (A|B)ρ := − inf log2 {λ : ρAB ⩽ λIA ⊗ ρB } . (7.66)
λ⩾0

In the next subsection we will see that the conditional min-entropy is indeed a conditional
entropy. Originally, this quantity was given the name conditional min-entropy because it was
known to be the least among all Rényi conditional entropies. The theorem below strengthen
this observation by proving that all plausible quantum conditional entropies are not smaller
than the conditional min-entropy.
Exercise 7.3.1. Consider the conditional min-entropy as defined in (7.66).
1. Show that for the maximally entangled state ΦAB (with |A| = |B|) we have
Hmin (A|B)Φ = − log |A| . (7.67)

2. Show that if a density matrix ρ ∈ D(AB) satisfies Hmin (A|B)ρ = log |A| then ρAB =
uA ⊗ ρB .
3. Show that a state ρ ∈ D(AB) has non-negative conditional min-entropy if and only if
I A ⊗ ρB ⩾ ρAB .
4. Show that if ρAB is separable then its conditional min-entropy is non-negative.

Theorem 7.3.1. Let H be a quantum conditional entropy. For all ρ ∈ D(AB),

H(A|B)ρ ⩾ Hmin (A|B)ρ , (7.68)

with equality if ρAB is the maximally entangled state ΦAB .

Remark. The theorem above states that for the maximally entangled state ΦAB we have
H(A|B)Φ = Hmin (A|B)Φ . Combining this with Exercise 7.3.1 we conclude that
H(A|B)Φ = − log |A| . (7.69)
That is, all conditional entropies are negative on the maximally entangled state and equal to
− log |A|. Moreover, in conjunction with the third part of Exercise 7.3.1, the theorem above
implies that all conditional entropies are non-negative on separable states (and therefore also
on classical states).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

7.3. INEVITABILITY OF NEGATIVE QUANTUM CONDITIONAL ENTROPY 345

Proof. Let’s start by examining the scenario where Hmin (A|B)ρ ⩾ 0, and denote m := |A|.
The proof strategy in this case revolves around identifying the largest integer k ∈ [m] such
that a classical system X, with dimension |X| = k, fulfills the condition uX ≻A ρAB . Once
we establish this optimal value of k, we can infer that every conditional entropy H must
satisfy
H(A|B)ρ ⩾ H(X)u = log k . (7.70)
H (A|B)
We begin by establishing that it is feasible to set k := 2 min ρ
.
H (A|B)
Let X be a classical system with dimension k := 2 min ρ
. By the assumption
that Hmin (A|B)ρ ⩾ 0, and given the dimension bound Hmin (A|B)ρ ⩽ log2 m, it follows
that k ∈ [m]. Observe that the case k = m implies that Hmin (A|B)ρ = log |A|. In this
case, according to the second part of Exercise 7.3.1 we must have ρAB = uA ⊗ ρB so that
H(A|B)ρ = log |A| = Hmin (A|B)ρ .
We therefore assume now that k < m. We look for a channel N ∈ CMO(A → AB) that
satisfies
A→AB 1 A
N Π = ρAB , (7.71)
k
where ΠA is a projection onto a k-dimensional subspace of A (i.e., set ΠA := x∈[k] |x⟩⟨x|A ,
P

where {|x⟩}x∈[m] is some orthonormal basis of A). The existence of such a channel will
prove that uX ≻A ρAB since uX (with |X| = k) is equivalent to k1 ΠA under conditional
majorization. We choose N A→AB to be a measure-and-prepare channel of the form

N A→AB (ω A ) := Tr[ΠA ω A ]ρAB + Tr[ I A − ΠA ω A ]τ AB

∀ ω ∈ L(A) , (7.72)

where τ AB is some density matrix that is chosen (see below) such that N A→AB is CMO.
Indeed,
X X the action of this channel is to perform a measurement according to the POVM
X
Π ,I − Π and prepare the state ρAB if the first outcome is obtained and the state τ AB
if the second outcome is obtained. By definition, this channel satisfies (7.71).
The channel N A→AB is A ̸→ B signalling if and only if the marginal channel N A→B :=
TrA ◦N A→AB satisfies N A→B ◦MA→A = N A→B for all M ∈ CPTP(A → A). In other words,
N A→AB is A ̸→ B signalling if and only if the marginal channel N A→B is a replacement
channel. Now, for every ω ∈ D(A) we have that

N A→B (ω A ) = Tr[ΠA ω A ]ρB + Tr[ I A − ΠA ω A ]τ B .

(7.73)

Therefore, by taking τ AB to have the property that its marginal τ B = ρB , we get that the
right-hand side does not depend on ω A , so that N A→AB is A ̸→ B semi-causal.
The channel N A→AB is conditional unital if and only if the state

N A→AB (I A ) = Tr[ΠA I A ]ρAB + Tr[ I A − ΠA I A ]τ AB

(7.74)
= kρAB + (m − k) τ AB

is equal to the state

I A ⊗ N A→B (uA ) = I A ⊗ ρB , (7.75)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

346 CHAPTER 7. CONDITIONAL ENTROPY

where the we used (7.73) with τ B = ρB . The equality between the two states above forces
τ AB to be
I A ⊗ ρB − kρAB
τ AB := . (7.76)
m−k
The operator τ AB is positive semi-definite because m − k > 0 and
1 A
I ⊗ ρB − ρAB ⩾ 2−Hmin (A|B)ρ I A ⊗ ρB − ρAB
k (7.77)
(7.66)→ ⩾ 0.

Also, τ AB has trace equal to one (so it is a density matrix) with marginal τ B = ρB . We
therefore proved that
H(A|B)ρ ⩾ log k = log 2Hmin (A|B)ρ .

(7.78)
Finally, since all conditional entropies are additive for tensor-product states, we conclude
that
1
H(A|B)ρ = lim H(An |B n )ρ⊗n
n→∞ n
1 j n n
k
(7.78)→ ⩾ lim log 2Hmin (A |B )ρ⊗n
n→∞ n (7.79)
1 nH (A|B)ρ
= lim log 2 min
n→∞ n
= Hmin (A|B)ρ .
Next, consider the case Hmin (A|B) < 0. The idea of the proof in this case is to find the
largest possible k ∈ N such that the system X (can be taken to be a classical system) with
dimension |X| = k satisfies
ψ XA ≻AX ρA ⊗ uX , (7.80)
where ψ ∈ Pure(XA) is some pure state. Due to the monotonicity property of every condi-
tional entropy H the above relation implies that
0 = H(XA)ψ
(7.80)→ ⩽ H(XA|B)ρ⊗u (7.81)
Additivity→ = H(X)u + H(A|B)ρ .

Finally, since H(X) log k we get that H(A|B)ρ ⩾ − log(k). We first show that (7.80)
−Hu =(A|B)
holds with k := 2 min
.
To prove the relation (7.80), consider the measure-and-prepare channel N ∈ CPTP(AX →
AXB) defined on all ω ∈ L(XA) as

N XA→XAB ω XA := Tr ψ XA ω XA uX ⊗ ρAB + Tr ΠXA ω XA τ XAB ,

(7.82)

where ΠAX := I AX − ψ XA and τ XAB is chosen (see below) such that N is CMO. Observe
that by definition we have

N XA→XAB ψ XA = uX ⊗ ρAB .

(7.83)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

7.3. INEVITABILITY OF NEGATIVE QUANTUM CONDITIONAL ENTROPY 347

We next show that there exists τ ∈ D(XAB) such that N as defined above is indeed CMO.
If N is XA ̸→ B semi-causal then we must have that the marginal channel N XA→B :=
TrXA ◦N XA→XAB is a replacement channel (i.e., a constant channel). Now, for all ω ∈ D(XA)
N XA→B ω XA = Tr ψ XA ω XA ρB + Tr ΠXA ω XA τ B .

(7.84)
Thus, by choosing τ XAB to have the property τ B = ρB , we get that the right-hand side of
the equation above does not depend on ω, so that N XA→XAB is XA ̸→ B semi-causal.
Next, the channel N XA→XAB is conditionally unital if the operator
N XA→XAB I XA = uX ⊗ ρAB + kmτ XAB

(7.85)
equals the operator
I XA ⊗ N XA→B uXA = I XA ⊗ τ B ,

(7.86)
B B
where the last equality follows from (7.84) with τ = ρ . We therefore conclude that
N ∈ CPTP(XA → XAB) as defined above is CMO if and only if τ XAB equals
kI A ⊗ ρB − ρAB
τ XAB := uX ⊗ . (7.87)
km − 1
Observe that this τ XAB is indeed a density matrix since by definition of k we have
kI A ⊗ ρB ⩾ 2−Hmin (A|B) I A ⊗ ρB
(7.88)
(7.66)→ ⩾ ρAB .
Moreover, from its definition in (7.87) we have τ B = ρB . Hence, with this τ XAB the channel
N XA→XAB is CMO that maps the pure states ψ XA to the state uX ⊗ ρAB . We therefore
conclude that
H(A|B)ρ ⩾ − log k = − log 2−Hmin (A|B) .

(7.89)
Finally, from the additivity property of conditional entropies we get
1
H(A|B)ρ = lim H(An |B n )ρ⊗n
n→∞ n
1 l
−Hmin (An |B n )ρ⊗n
m
(7.89)→ ⩾ − lim log 2
n→∞ n (7.90)
1
= − lim log 2−nHmin (A|B)ρ

n→∞ n
= Hmin (A|B)ρ .
It is left to prove the equality on maximally entangled states. Since conditional entropy is
invariant under local isometries we can assume without loss of generality that m := |A| = |B|.
From Theorem 7.1.4 we know that the state ΦAB ⊗ uÃ is equivalent under conditional ma-
jorization to any pure state in Pure(AÃ). Since the entropy of every pure state in Pure(AÃ)
is zero, we conclude that
0 = H(AÃ|B)Φ⊗u
Additivity→ = H(A|B)Φ + H(Ã)u (7.91)
= H(A|B)Φ + log m ,

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

348 CHAPTER 7. CONDITIONAL ENTROPY

where we used the fact that the entropy of the uniform state uÃ is log2 m. Hence,
H(A|B)Φ = − log m = Hmin (A|B)Φ (7.92)
This completes the proof.
We saw in the theorem above that the conditional entropy is positive for all separable
states. This does no mean that the conditional entropy is positive just for separable states.
In fact, some entangled states (i.e. states that are not separable) have positive conditional
entropy. The following corollary provides a simple criterion to determine if a bipartite state
has a positive conditional entropy.

Corollary 7.3.1. Let ρ ∈ D(AB). Then, the following statements are equivalent:

1. For any choice of conditional entropy H we have H(A|B)ρ ⩾ 0.

2. The state ρAB satisfies

I A ⊗ ρB ⩾ ρAB . (7.93)

3. For every pure state ψ ∈ Pure(A)

ψ A ≻A ρAB . (7.94)

4. The state ρAB can be obtained by CMO channel from a classical distribution;
i.e. there exists ω ∈ D(XY ) such that

ω XY ≻A ρAB . (7.95)

Proof. The proof follows directly from Theorem 7.3.1 in conjunction with Theorem 7.1.3.

As an example, consider the Werner state (cf. (3.247))

2 2
ρAB
t =t ΠAB
Sym + (1 − t) ΠAB ∀ t ∈ [0, 1] , (7.96)
d(d + 1) d(d − 1) Asy
where ΠAB AB
Sym and ΠAsy are the projections, respectively, to the symmetric and anti-symmetric
subspaces of AB. We will see in Chapter 12 that these states are entangled if and only
if t < 12 . Moreover, observe that the Werner states have uniform marginals, particularly,
ρB B
t = u for all t ∈ [0, 1]. In addition, the Werner states have only two distinct eigenvalues
2t 2(1−t)
given by d(d+1) and d(d−1) . Combining all this information we get that
3−d d+1
I A ⊗ ρB AB
t ⩾ ρt ⇐⇒⩽t⩽ . (7.97)
2 2
For d = 2 this condition holds only for t ∈ [ 21 , 1] in which case ρAB
t is separable. On the
other hand, for d ⩾ 3 this condition holds for all t ∈ [0, 1]. We therefore conclude that for
d ⩾ 3 all Werner states (including the entangled ones) have positive conditional entropy.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

7.3. INEVITABILITY OF NEGATIVE QUANTUM CONDITIONAL ENTROPY 349

Exercise 7.3.2. Use Exercise 4.6.7 to show that in the classical domain, every entropy
function H is also a conditional entropy; that is, show that function

H(X|Y )ρ := H(X)ρ ∀ ρ ∈ D(XY ) , (7.98)

is a conditional entropy of classical states. Moreover, give a counter example to the same
statement in the quantum domain.

7.3.1 Negativity Precludes Extensions from the Classical Domain

In Sections 5.3 and 6.4, we examined the expansion of quantum divergences and relative
entropies from the classical to the quantum domain. This process involved a systematic
approach to their extensions, incorporating both minimal and maximal forms. This subsec-
tion focuses on exploring optimal extensions of classical conditional entropy to the quantum
domain. However, we will show that such extensions are not feasible, primarily because
quantum conditional entropies exhibit negativity when applied to the maximally entangled
state.
Consider a classical conditional entropy H. Following the methodologies in Sections 5.3
and 6.4, for any ρ ∈ D(AB), we can define the minimal and maximal extensions of H as:

H(A|B)ρ := sup H(X|Y )ω : ω XY ≻X ρAB , ω ∈ D(XY )

(7.99)
H(A|B)ρ := inf H(X|Y )ω : ρAB ≻A ω XY , ω ∈ D(XY ) .

At first glance, these functions appear quite reasonable. For example, for any two states
ρ, σ ∈ D(AB) with ρAB ≻A σ AB , if ω XY ≻X ρAB , then it necessarily follows that ω XY ≻X
σ AB . Consequently,

H(A|B)ρ ⩽ sup H(X|Y )ω : ω XY ≻X σ AB , ω ∈ D(XY )

(7.100)
= H(A|B)σ .

This means H(A|B)ρ exhibits monotonic behavior under conditional majorization, aligning
with expectations for a measure of conditional uncertainty.
However, in general, H(A|B)ρ is not well defined! This is because CMO channels form
a subset of one-way LOCC and thus cannot generate entanglement. Since ω ∈ D(XY )
is classical and hence separable, any state N (ω) resulting from a one-way LOCC channel
N ∈ CMO(XY → AB) lies within SEP(AB). Therefore, H(A|B)ρ is undefined if ρAB is
entangled.

Exercise 7.3.3. Show that H(A|B)ρ is well defined if and only if I A ⊗ ρB ⩾ ρAB . Hint:
Recall Corollary 7.3.1.

In contrast to H(A|B)ρ, the quantity H(A|B)ρ is well-defined for all ρ ∈ D(AB). It

also exhibits monotonic behavior under conditional majorization. Specifically, consider two

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

350 CHAPTER 7. CONDITIONAL ENTROPY

states ρ, σ ∈ D(AB), where ρAB ≻A σ AB . Then, if σ AB ≻X ω XY for some ω ∈ D(XY ) then

ρAB ≻X ω XY is necessarily true. Hence,

H(A|B)σ ⩾ inf H(X|Y )ω : ρAB ≻X ω XY , ω ∈ D(XY )

(7.101)
= H(A|B)ρ .

Therefore, H qualifies as a measure of conditional uncertainty. However, since it is defined

via an optimization problem, it exhibits only sub-additivity. Consequently, we introduce its
regularized version:
1
Hreg (A|B)ρ := lim H An B n ρ⊗n .

(7.102)
n→∞ n

From its definition, Hreg satisfies

Hreg An B n ρ⊗n = nHreg (A|B)ρ .

(7.103)

Furthermore, this measure exhibits monotonic behavior under conditional majorization and
is inherently non-negative by definition. Does this not contradict Theorem ??? The answer
is no. The key lies in understanding that Hreg is only weakly additive, rather than strongly
additive, in general.

Exercise 7.3.4. Calculate Hreg (AÃ|B)ρ , where ρ is defined as ΦAB ⊗ uÃ . Question: Does
this result in a value of zero?

Interestingly, the function Hreg serves as an example of a function meeting all the criteria
expected of a quantum conditional entropy, except it is only weakly additive. Its non-
negativity teaches us an important lesson: the tendency of quantum conditional entropies to
assume negative values on certain entangled states is intrinsically connected to their property
of full additivity.

7.4 Conditional Entropies from Relative Entropies

Any quantum relative entropy can be used to define conditional entropy. In fact, for a given
quantum relative entropy D, there are two candidates for conditional entropy given by:

H(A|B)ρ := log |A| − D ρAB uA ⊗ ρB

(7.104)
H↑ (A|B)ρ := log |A| − min D ρAB uA ⊗ σ B

σ∈D(B)

The up arrow in the notation above indicates the optimization over σ B . By definition,
H↑ (A|B)ρ ⩾ H(A|B)ρ for all ρ ∈ D(AB).

Theorem 7.4.1. Let D be a quantum relative entropy. Then, the function H as

defined in (7.104), is a quantum conditional entropy.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

7.4. CONDITIONAL ENTROPIES FROM RELATIVE ENTROPIES 351

Proof. First, we demonstrate that H satisfies the monotonicity property of conditional en-
′ ′
tropy. Let N AB→A B be a CMO, and consider the bipartite density matrix ρAB . We begin
by considering the case where A = A′ so that
H(A B)N (ρ) = log |A| − D N (ρAB ) uA ⊗ TrA N ρAB

. (7.105)
′ ′
Since N is A ̸→ B ′ semi-causal the marginal channel N AB→B := TrA ◦ N AB→AB satisfies
′ ′
N AB→B ρAB = N AB→B uA ⊗ ρB .

(7.106)
To see this, take MA→A in (7.13) to be the completely randomizing channel. With this at
hand, we get
uA ⊗ TrA N ρAB = uA ⊗ TrA N uA ⊗ ρB

(7.107)
A B

N is conditionally unital→ = N u ⊗ ρ . (7.108)
Substituting this into (7.105) we obtain
H(A B)N (ρ) = log |A| − D N ρAB N uA ⊗ ρB

DPI→ ⩾ log |A| − D ρAB uA ⊗ ρB

(7.109)
= H(A|B)ρ . (7.110)
We also need to prove that H is invariant under the action of a local isometric channel
acting on system A. Let V ∈ CPTP(A → A′ ) be an isometry channel, ρ ∈ D(AB), and C
be a Hilbert space of dimension |C| = |A′ | − |A|. Observe that
′ ′

H(A|B)V(ρ) = log |A′ | − D V A→A ρAB uA ⊗ ρB

(7.111)

Clearly, if V is a unitary channel (i.e. A ∼ = A′ ) we have H(A′ |B)V(ρ) = H(A|B)ρ since in this
′ A A′
case |A| = |A | and V(u ) = u . We can therefore assume without loss of generality that
 
AB
′ ρ 0
V A→A ρAB = ρAB ⊕ 0CB := 

 (7.112)
CB
0 0
′
since the conditional entropy of V A→A ρAB does not change by a unitary channel on A′ .

|A| A′
Moreover, denote by t := |A ′ | and observe that u can be expressed as
′
uA = tuA ⊕ (1 − t)uC . (7.113)
Hence, substituting (7.112) and (7.113) into (7.111) gives

H(A|B)V(ρ) = log |A′ | − D ρAB ⊕ 0CB tuA ⊗ ρB ⊕ (1 − t)uC ⊗ ρB

Exercise 6.3.2→ = log |A′ | − D ρAB uA ⊗ ρB + log t

(7.114)
|A|
t=
|A′ |
−−−−→ = H(A|B)ρ .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

352 CHAPTER 7. CONDITIONAL ENTROPY

To prove the additivity property, let ρ ∈ D(A1 B1 ) and σ ∈ D(A2 B2 ) and observe that
since uA1 A2 = uA1 ⊗ uA2 we have

H(A1 A2 |B1 B2 )ρ⊗σ = log |A1 A2 | − D ρA1 B1 ⊗ σ A2 B2 uA1 ⊗ ρA1 B1 ⊗ uA2 ⊗ σ A2 B2
(7.115)
D is additive→ = H(A1 |B1 )ρ + H(A2 |B2 )σ .
Finally, the normalization property follows from the fact that when |B| = 1 and |A| = 2 we
have by definition
H(A|B)u = log 2 − D uA uA = 1 .

(7.116)
This completes the proof.
Exercise 7.4.1. Show that if D = Dmax then its corresponding conditional entropy H is the
conditional min entropy.
In the exercise below you will show that H ↑ behaves monotonically under conditional
unital channels and consequently behaves monotonically under conditional majorization.
However, in general, H ↑ does not necessarily satisfy the additivity property (at least a
general proof of additivity of H↑ is unknown to the author). Still, this expression has been
used extensively by the community, particularly since it can be shown that for D = Dα or
D = D̃α it is additive (here D̃α is the sandwiched Rényi divergence; see Definition 6.4.1).
This includes the Umegaki relative entropy, however, as we will see shortly, in this case
H(A|B)ρ = H ↑ (A|B)ρ = H(A|B)ρ for all ρ ∈ D(AB).
Exercise 7.4.2. Consider the function H↑ as defined above.
1. Show that H↑ does not increase under conditional unital channels, and use it to conclude
that it satisfies the monotonicity property of conditional entropy.
2. Prove that H↑ satisfies the invariance and normalization property of conditional en-
tropy.

7.5 Examples of Quantum Conditional Entropies

The von-Neumann Conditional Entropy
Let D be the Umegaki relative entropy. The conditional entropy with respect to this diver-
gence is defined by
H(A|B)ρ := log |A| − D ρAB uA ⊗ ρB .

(7.117)
To simplify the expression above, observe that
D ρAB uA ⊗ ρB = Tr ρAB log ρAB − Tr ρAB log uA ⊗ ρB

= −H(ρAB ) + log |A| − Tr ρAB log I A ⊗ ρB

AB
(7.118)
) + log |A| − Tr ρB log ρB

log(I A ⊗ ρB ) = I A ⊗ log ρB −−−−→ = −H(ρ

= H(ρB ) − H(ρAB ) + log |A| .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

7.5. EXAMPLES OF QUANTUM CONDITIONAL ENTROPIES 353

Therefore, the conditional von-Neumann entropy can be expressed simply as

H(A|B)ρ = H ρAB − H ρB .

(7.119)

This formula is consistent with the intuition of conditional entropy as depicted in Fig. 7.1.
The finding from the previous section, which establishes that conditional entropy is non-
negative for separable states, leads to an intriguing implication regarding the von Neumann
entropy.

Corollary 7.5.1. Let {px , ρx }x∈[m] be an ensemble of quantum states in D(A). The
von-Neumann entropy satisfies
X X X
px H(ρx ) ⩽ H px ρx ⩽ px H(ρx ) + H(p) (7.120)
x∈[m] x∈[m] x∈[m]

Proof. The lower bound follows from the concavity of H (see Exercise 6.3.3). To get the
upper bound, let ρXA := x∈[m] px |x⟩⟨x|X ⊗ ρA XA
P
x . Since ρ is a cq-state, and in particular
separable, it follows that

0 ⩽ H(X|A)ρ = H(ρAX ) − H(ρA ) , (7.121)

where ρA = px ρ A
P
x∈[m] x is the marginal state. Therefore,
X
H px ρx ⩽ H(ρAX )
x∈[m]
X (7.122)
Exercise (7.5.1)→ = px H(ρx ) + H(p) .
x∈[m]

This completes the proof.

Exercise 7.5.1. Using the same notations as in the proof above, show that
X
H(ρAX ) = px H(ρx ) + H(p) . (7.123)
x

Exercise 7.5.2. The quantum mutual information is a quantity defined for any ρ ∈ D(AB)
as
I(A : B)ρ := D ρAB ρA ⊗ ρB .

(7.124)
1. Express the quantum mutual information in terms of H(A|B)ρ and H(A)ρ .

2. Show that the von-Neumann entropy is subadditive; i.e. prove that for all ρ ∈ D(AB)

H(ρAB ) ⩽ H(ρA ) + H(ρB ) . (7.125)

Hint: Use the fact that the mutual information is non-negative.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

354 CHAPTER 7. CONDITIONAL ENTROPY

The von-Neumann conditional entropy behaves monotonically even under conditionally

unital channels that are not A ̸→ B semi-causal. To see this, we first prove the following
type of triangle equality of the Umegaki relative entropy.

Triangle Equality
Lemma 7.5.1. Let D be the Umegaki relative entropy. Then for any ρ ∈ D(AB),
σ, τ ∈ D(B) and ω ∈ D(A), we have

D ρAB ω A ⊗ σ B = D ρAB ω A ⊗ ρB + D ρB σ B .

(7.126)

Remark. The relation in the lemma above is equivalent to

D ρAB ω A ⊗ σ B = D ρAB ω A ⊗ ρB + D ω A ⊗ ρB ω A ⊗ σ B

(7.127)

which explains why we view this relation as a type of triangle equality.

Proof. By definition

D ρAB ω A ⊗ σ B = −H ρAB − Tr ρAB log ω A ⊗ σ B .

(7.128)

Using the property that

log ω A ⊗ σ B = log ω A ⊗ I B + I A ⊗ log σ B ,

(7.129)

we get by direct calculation

D ρAB ω A ⊗ σ B = −H ρAB − Tr ρA log ω A − Tr ρB log σ B

= −H ρAB − Tr ρA log ω A − Tr ρB log ρB + D ρB σ B

(7.130)
= −H ρAB − Tr ρAB log ω A ⊗ ρB + D ρB σ B

= D ρAB ω A ⊗ ρB + D ρB σ B .

This completes the proof.

Note that by taking in the lemma above ω A = uA we get that

D ρAB uA ⊗ σ B = D ρAB uA ⊗ ρB + D ρB σ B .

(7.131)

Therefore,
H ↑ (A|B)ρ := log |A| − min D ρAB uA ⊗ σ B

σ∈D(B)

(7.131)→ = log |A| − D ρAB uA ⊗ ρB − min D ρB σ B

σ∈D(B) (7.132)
AB A B

= log |A| − D ρ u ⊗ρ
= H(A|B)ρ .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

7.5. EXAMPLES OF QUANTUM CONDITIONAL ENTROPIES 355

The equality H(A|B)ρ = H ↑ (A|B)ρ reveals that the von-Neumann conditional entropy is
monotonic under conditionally unital channels that are not necessarily A ̸→ B semi-causal
(see part 1 of Exercise 7.4.2).
Another useful property satisfied by the von-Neumann entropy is known as the strong
subadditivity property. Recall from Exercise 7.5.2 that the von-Neumann entropy is subad-
ditive, that is, for any ρ ∈ D(AB),

H(AB)ρ ⩽ H(A)ρ + H(B)ρ , (7.133)

where H(AB)ρ denotes H(ρAB ). A stronger version of this inequality, known as the ‘strong
subadditivity of the von-Neumann entropy’ states that for any ρ ∈ D(ABC) we have

H(ABC)ρ + H(B)ρ ⩽ H(AB)ρ + H(BC)ρ . (7.134)

Note that this is a stronger version of the previous inequality since for |B| = 1 it reduces to
subadditivity. The above inequality is unique to the von-Neumann entropy and in general
is not satisfied by other entropy functions (at least not in this form).
We can express the strong subadditivity in terms of conditional entropies. Observe that
since H(A|BC)ρ = H(ABC)ρ − H(BC)ρ and H(A|B)ρ = H(AB)ρ − H(B)ρ , the strong
subadditivity can be expressed as

H(A|BC)ρ ⩽ H(A|B)ρ . (7.135)

This version of the strong subadditivity is perhaps more intuitive than (7.134) since it can be
interpreted as the statement that by removing the access to system C, one can only increase
the uncertainty about system A. Note also that the above form of the strong subadditivity
is satisfied by any conditional entropy function. That is, for any conditional entropy H and
ρ ∈ D(ABC) we have
H(A|BC)ρ ⩽ H(A|B)ρ . (7.136)
The above inequality is a simply consequence of the monotonicity property of conditional
entropy, since tracing out system C is a map belonging to CMO(ABC → AB). In terms of
conditional majorization, we can express it as

ρABC ≻A ρAB . (7.137)

Exercise 7.5.3. Let ρ ∈ D(ABC). Show that

H(A|B)ρ + H(A|C)ρ ⩾ 0 (7.138)

with equality if ρABC is a pure state. Hint: If ρABC is a mixed state, let ψ ABCD be its
purification, and express H(A|C)ρ in terms of systems A, B, D (for example, H(AC)ψ =
H(BD)ψ ). Finally, use (7.134) with D replacing C.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

356 CHAPTER 7. CONDITIONAL ENTROPY

The Conditional Rényi Entropies

In Chapter 6 we encountered three types of quantum relative entropies that generalize the
classical Rényi divergences:
1. The Petz quantum Rényi divergence, Dα , as defined in Definition 6.3.2 for α ∈ [0, 2].
2. The sandwiched Rényi relative entropy, D̃α , as defined in Definition 6.4.1 for all α ∈
[0, ∞].
b α , as defined in Definition 6.4.2 for all α ∈ [0, 2].
3. The geometric relative entropy, D
Each types of the quantum Rényi relative entropy above give rise to two types of conditional
entropies as given in (7.104). We denote the corresponding six conditional entropies by Hα ,
Hα↑ , H̃α , H̃α↑ , H b α↑ . The functions Hα , H̃α , H
b α , and H b α are all quantum conditional entropies
since they are additive. We next show that also Hα↑ is additive, and later we will see that
this also implies that H̃α↑ is additive. Therefore Hα↑ and H̃α↑ are conditional entropies as well.
However, to the author’s knowledge, the additivity of H b α↑ has not been explored.
In the following theorem we provide a closed form for the function
Hα↑ (A|B)ρ := log |A| − min Dα ρAB uA ⊗ σ B

∀ ρ ∈ D(AB) . (7.139)
σ∈D(B)

Closed Formula
Theorem 7.5.1. Let ρ ∈ D(AB) and α ∈ [0, 2]. Then,

↑ α h i
B 1/α
α
where ηαB := TrA ρAB

Hα (A|B)ρ = log Tr ηα . (7.140)
1−α

Proof. First, observe that

1 h α A 1−α i
Hα↑ (A|B)ρ = max log Tr ρAB I ⊗ σB
σ∈D(B) 1 − α
(7.141)
1 h
B
i
B 1−α
= max log Tr ηα σ
σ∈D(B) 1 − α
h 1/α i 1/α α
B
Set t := Tr ηα and denote by τ B := ηαB /t so that ηαB = tα τ B . Substituting
this into the previous equation we conclude
α 1 h α B 1−α i
Hα↑ (A|B)ρ = log t + max log Tr τ B σ
1−α σ∈D(B) 1 − α
α
log t − min Dα τ B ∥σ B

= (7.142)
1−α σ∈D(B)
α
= log t .
1−α
This completes the proof.
Exercise 7.5.4. Use the closed formula above to show that Hα↑ is additive and therefore a
conditional entropy.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

7.5. EXAMPLES OF QUANTUM CONDITIONAL ENTROPIES 357

The Optimized Conditional Min-Entropy

Among the plethora of families of quantum conditional entropies we saw above, there exists
one conditional entropy that appears often in applications, known as the optimized con-
ditional min-entropy. The optimized conditional min entropy is defined as the quantum
conditional entropy H̃α↑ (A|B)ρ with α = ∞. That is,
↑
(A|B)ρ := log |A| − min Dmax ρAB uA ⊗ σ B .

Hmin (7.143)
σ∈D(B)

The conditional min-entropy is defined in terms of an SDP. To see this, first observe that
from the above formula and from the definition of Dmax we get
↑
n o
2−Hmin (A|B)ρ = min t : tI A ⊗ σ B ⩾ ρAB , σ ∈ D(B), t ∈ R
n o (7.144)
B
ΛB := tσ B −−−−→ = min Tr Λ : I A ⊗ ΛB ⩾ ρAB , Λ ∈ Pos(B) .

Then, using the notations K1 := Pos(B), K2 := Pos(AB), H1 := I B , H2 := ρAB , and

N ∈ L(A → AB) given by

N (ω B ) := I A ⊗ ω B ∀ ω ∈ L(B) , (7.145)

we conclude that
↑
2−Hmin (A|B)ρ = min Tr[ΛH1 ] : Λ ∈ K1 , N (Λ) − H2 ∈ K2 .

(7.146)

The above optimization problem has precisely the same form as the conic linear programming
given in (A.52). Since the cones K1 and K2 are the sets of positive semidefinite matrices, this
conic program is an SDP program.
The above expression has a dual given by (A.57). Therefore, the conditional min-entropy
can be expressed in terms of the following optimization problem
↑
2−Hmin (A|B)ρ = max Tr[ηH2 ] : η ∈ K∗2 , H1 − N ∗ (η) ∈ K∗1

n
AB AB
B B
o (7.147)
Exercise 7.5.5→ = max Tr η ρ : η ∈ Pos(AB), η = I .

Any η AB as above is a Choi matrix; hence, it can be expressed as η AB = E ∗Ã→B ΩAÃ for
some channel E ∈ CPTP(B → A). We therefore get that
↑
h i
2−Hmin (A|B)ρ = max Tr ρAB E ∗Ã→B ΩAÃ
E∈CPTP(B→Ã)
D E
ΦAÃ E B→Ã ρAB ΦAÃ

= |A| max (7.148)
E∈CPTP(B→Ã)

F 2 E B→Ã ρAB , ΦAÃ ,

= |A| max
E∈CPTP(B→Ã)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

358 CHAPTER 7. CONDITIONAL ENTROPY

where F is the fidelity. That is, the conditional min-entropy can be expressed in terms
of the maximal overlap of E B→Ã ρAB with the maximally entangled state. We now use
the above expression to prove that the optimized conditional min-entropy is additive under
tensor products, and thereby prove that the optimized conditional min-entropy is indeed a
quantum conditional entropy as defined in Definition 7.2.1.

Lemma 7.5.2. The optimized conditional min-entropy is a quantum conditional

entropy satisfying both properties of Definition 7.2.1.

Proof. Since the optimized conditional min-entropy equals H̃α↑ with α = ∞, it is left to prove
↑
that it is additive. Let ρ ∈ D(AB), τ ∈ D(A′ B ′ ), and denote by Qmin (A|B)ρ := 2−Hmin (A|B)ρ .
Therefore, the additivity of Hmin would follow from the multiplicativity of Qmin . On the one
hand, from the primal problem (7.144) we have
n ′ ′ ′ ′ ′
o
Qmin (AA′ |BB ′ )ρ⊗τ = min Tr ΛBB : I AA ⊗ ΛBB ⩾ ρAB ⊗ τ A B , Λ ∈ Pos(BB ′ )
n o
B B AA′ B B′ AB A′ B ′ ′
⩽ min Tr Λ1 Tr Λ2 : I ⊗ Λ1 ⊗ Λ2 ⩾ ρ ⊗ τ , Λ1 ∈ Pos(B) , Λ2 ∈ Pos(B )
⩽ Qmin (A|B)ρ Qmin (A′ |B ′ )τ , (7.149)
′
where in the first inequality we restricted ΛBB to have the form ΛB B
1 ⊗ Λ2 , and in the last
AA′ B B′ AB A′ B ′
inequality we replaced the condition I ⊗ Λ1 ⊗ Λ2 ⩾ ρ ⊗ τ with the two conditions
A B AB A′ B′ A′ B ′
I ⊗ Λ1 ⩾ ρ and I ⊗ Λ2 ⩾ τ .
To get the opposite inequality we use the dual expression of the conditional min-entropy
as given in (7.148). Specifically,
′ ′ 2
′ ′ ′

Qmin (AA′ |BB ′ )ρ⊗τ = |AA′ | max F E BB →ÃÃ ρAB ⊗ τ BB , ΦAÃ ⊗ ΦA Ã
E∈CPTP(BB ′ →ÃÃ′ )
′ ′
2
′ BB ′ A′ Ã′
E1B→Ã AB
E2B →Ã
AÃ
E = E1 ⊗ E2 −−−−→ ⩾ |AA | max F ρ ⊗ τ ,Φ ⊗ Φ
E1 ∈CPTP(B→Ã)
E2 ∈CPTP(B ′ →Ã′ )

= Qmin (A|B)ρ Qmin (A′ |B ′ )τ .

(7.150)
Combining the two equations above we conclude that

Qmin (AA′ |BB ′ )ρ⊗τ = Qmin (A|B)ρ Qmin (A′ |B ′ )τ , (7.151)

↑ ↑ ↑
so that Hmin (AA′ |BB ′ )ρ⊗τ = Hmin (A|B)ρ + Hmin (A′ |B ′ )τ . This completes the proof.
When system A is classical, the right-hand side of (7.148) has a simple interpretation as
a guessing probability. Indeed, suppose that A = X is a classical system with m := |X|.
Then the state ρAB , which we denote as ρXB , takes the form of a classical-quantum (cq)
state: X
ρXB = px |x⟩⟨x|X ⊗ ρB
x (7.152)
x∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

7.5. EXAMPLES OF QUANTUM CONDITIONAL ENTROPIES 359

where {px }x∈[m] is a probability distribution, and each ρx ∈ D(B). Furthermore, it’s im-
portant to note that CPTP(B → Ã), which is the same as CPTP(B → X̃), comprises of
POVM channels that were initially introduced in Sec. 3.5.4. Under these circumstances, the
aforementioned equation simplifies to the following (Exercise7.5.6):
↑ X
2−Hmin (X|B)ρ = max px Tr ΛB B

x ρx (7.153)
{Λx }
x∈[m]

where the maximum is over all POVMs {ΛB x }x∈[m] on system B. The expression above can
be interpreted as the maximum probability for Bob to guess correctly the value of X. Specif-
ically, given the cq-state ρXB , Bob can try to learn the classical value of X by performing a
quantum measurement/POVM, {ΛB x }x∈[m] , on his system with m := |X| possible outcomes.
The probability that X = x is px, and the probability that Bob gets the outcome y given
that X = x is given by Tr ΛB B
y ρx . If Bob’s takes y to be his guess for the value of X then
B B
Tr Λx ρx is the probability that Bob guesses correctly the value of X. Given that X = x
with probability px , we get that
X
px Tr ΛB B

Prg (X|B)ρ := max x ρx (7.154)
{Λx }
x∈[m]

is the maximal overall probability that Bob’s guess of X is correct. With this notation, the
conditional entropy of ρXB can be expressed as
↑
Hmin (X|B)ρ = − log Prg (X|B)ρ . (7.155)

Note that Hmin (X|B)ρ ⩾ 0 as expected.

Exercise 7.5.5. Prove the second equality in (7.147).
Exercise 7.5.6. Prove the reduction of (7.148) to (7.153) when A = X is classical.

The Conditional Max-Entropy

In Theorem 7.3.1, we demonstrated that the conditional min-entropy Hmin represents the
lowest possible conditional entropy for a bipartite quantum state. This naturally leads to
the question: what is the highest possible conditional entropy? To address this, we explore
the upper limit of all conditional entropies.

Definition 7.5.1. The conditional max-entropy of a quantum state ρ ∈ D(AB) is

defined as follows:

Hmax (A|B)ρ := log |A| − min Dmin ρAB uA ⊗ σ B .

(7.156)
σ∈D(B)

According to Theorem 7.5.1 and Exercise 7.5.4, the function Hα↑ is additive for all
↑
α ∈ [0, 2]. Hence, the additivity of the conditional max-entropy (i.e., Hα=0 ) is established.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

360 CHAPTER 7. CONDITIONAL ENTROPY

Therefore, we can affirm that the conditional max-entropy qualifies as a legitimate condi-
tional entropy measure. Moreover, given that the min-relative entropy Dmin is the smallest
relative entropy, the following inequality holds for all conditional entropies H↑ derived from
a relative entropy D (as specified in (7.104)), for any quantum state ρ ∈ D(AB):

H↑ (A|B)ρ ⩽ Hmax (A|B)ρ . (7.157)

This inequality signifies that the conditional max-entropy establishes an upper limit for all
conditional entropies defined in relation to a relative entropy. Additionally, as will be ex-
plored in subsequent discussions, the conditional max-entropy is essentially the counterpart,
or the dual, of the conditional min-entropy.

7.6 Duality Relations

We saw in Exercise 7.5.3 that for a pure state φ ∈ Pure(ABC) the conditional von-Neumann
entropy satisfies
H(A|B)φ + H(A|C)φ = 0 . (7.158)
Such a relation is called a duality relation, and motivates us to define a dual for any condi-
tional entropy.

Definition 7.6.1. Let H be a conditional entropy. For any ρ ∈ D(AB) with a

purification φ ∈ Pure(ABC) (where C is the purifying system; i.e. ρAB = TrC φABC )
we define the dual of H as

Hdual (A|B)ρ := −H(A|C)φ . (7.159)

Remark. Since the conditional entropy is invariant under local isometries (specifically, H(A|C)φ
remains invariant under isometries on system C) the dual to a conditional entropy is well
defined as it does not depend on the choice of the purifying system C.
By definition, the dual to a conditional entropy satisfies the invariance and additivity
properties of conditional entropy (see Exercise 7.6.1). To see that it satisfies also the nor-
malization property of a conditional entropy, let ρAB = uA with |A| = 2 and |B| = 1. A
purification of ρAB can be expressed as the maximally entangled state ΦAC with C = Ã.
Therefore,
Hdual (A)u = Hdual (A|B)ρ
by definition→ = −H(A|C)Φ (7.160)
(7.69)→ = log 2 = 1 .
Therefore, the dual to a conditional entropy would be itself a conditional entropy if it satisfies
the monotonicity property. We will see shortly that this is indeed the case for all the
conditional entropies studied in literature, although a general proof for all conditional entropy
functions is unknown to the author.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

7.6. DUALITY RELATIONS 361

Exercise 7.6.1. Show that the dual to a conditional entropy satisfies the invariance and
additivity properties of a conditional entropy.
The relation (7.158) implies that the conditional von-Neumann entropy is self dual; i.e.
dual
H (A|B)ρ = H(A|B)ρ for all ρ ∈ D(AB). Consider the Petz conditional Rényi entropy of
order α ∈ [0, 2] given for all ρ ∈ D(AB) by
1 h
AB α
A
i
B 1−α
Hα (A|B)ρ = log Tr ρ I ⊗ρ . (7.161)
1−α
In the lemma below we compute it’s dual.

Lemma 7.6.1. For any α ∈ [0, 2], the dual of Hα is given by

Hαdual (A|B)ρ = H2−α (A|B)ρ . (7.162)

Proof. Let X
ρAB = px |φx ⟩⟨φx |AB (7.163)
x∈[n]

be the spectral decomposition of ρAB , and let ρABC = |φ⟩⟨φ|ABC , with C ∼ = AB, be the
purification of ρAB given by
X√ 1
|φABC ⟩ = px |φx ⟩AB |φx ⟩C = ρAB 2 ⊗ I C |Ω(AB)C ⟩ . (7.164)
x∈[n]

where |Ω(AB)C ⟩ = x∈[n] |φj ⟩AB |φj ⟩C is the maximally entangled operator between system
P

AB and C. Now, observe that (see Exercise 7.6.2)

α α
ρAB ⊗ I C |Ω(AB)C ⟩ = I AB ⊗ ρC |Ω(AB)C ⟩ , (7.165)
where ρC := TrAB φABC . Therefore, from part 1 of Exercise 2.3.26 we get
h i D ABC E
AB α A B 1−α AB C α A B C 1−α ABC

Tr ρ I ⊗ρ = Ω ρ ⊗I I ⊗ρ ⊗I Ω
1−α α (7.166)
(7.165)→ = ΩABC I A ⊗ ρB ⊗ ρC ΩABC

1−α α−1 ABC
= φABC I A ⊗ ρB ⊗ ρC φ ,
1/2 (AB)C
where in the last equality we used the fact that |φABC ⟩ = I AB ⊗ ρC |Ω ⟩. Next, let
X X
ρB = qy |y⟩⟨y|B and ρAC = qy |χy ⟩⟨χy |B (7.167)
y∈[m] y∈[m]

be the spectral decompositions of ρB and ρAC , respectively, and consider the following
Schmidt decomposition between system B and system AC
X√ 1/2
|φABC ⟩ = qy |y⟩B |χy ⟩AC = ρB ⊗ I AC ΩB(AC) (7.168)
y∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

362 CHAPTER 7. CONDITIONAL ENTROPY

where |ΩB(AC) = y∈[m] |y⟩B |χy ⟩AC . Substituting the above expression for |φABC ⟩ into (7.166)
P
gives
h α A 1−α i 2−α α−1 B(AC)
Tr ρAB I ⊗ ρB = ΩB(AC) I A ⊗ ρB ⊗ ρC Ω . (7.169)

Finally, using the relation

2−α 2−α B(AC)
ρB ⊗ I AC |ΩB(AC) ⟩ = I B ⊗ ρAC |Ω ⟩ (7.170)

we conclude that
h α A 1−α i 2−α A α−1 B(AC)
Tr ρAB I ⊗ ρB = ΩB(AC) I B ⊗ ρAC I ⊗ ρC Ω
h 2−α A α−1 i (7.171)
= Tr ρAC I ⊗ ρC .

Therefore,
1 h 2−α A α−1 i
Hα (A|B)ρ = log Tr ρAC I ⊗ ρC
1−α (7.172)
= −H2−α (A|C)ρ .
Note that the above equality is equivalent to

Hαdual (A|B)ρ := −Hα (A|C)ρ = H2−α (A|B)ρ . (7.173)

This completes the proof.

Exercise 7.6.2. Prove the relations (7.165) and (7.170).

The Dual of the Optimized Conditional Min-Entropy

Lemma 7.6.2. Let ρ ∈ Pure(ABE). Then,

↑ ↑
(A|E)ρ = − max log F 2 ρAE , I A ⊗ τ E ,

Hmin (A|B)ρ = −H̃1/2 (7.174)
τ ∈D(E)

where F is the fidelity.

↑ ↑
Remark. It is noteworthy that the lemma above establishes H̃1/2 as the dual of Hmin . Conse-
↑
quently, H̃1/2 (A|B)ρ is sometimes referred to as the conditional max-entropy. However, we
↑
choose not to use this terminology here because, generally speaking, H̃1/2 (A|B)ρ ̸= Hmax (A)ρ ,
AB A B
particularly when ρ = ρ ⊗ ρ . In fact, as we will show later, the true dual of Hmin (as op-
↑
posed to Hmin ) aligns with the conditional max-entropy as defined in (7.156). Additionally,
when integrating the above lemma with (7.148), we derive the following relationship:

F E B→Ã ρAB , ΩAÃ = max F ρAE , I A ⊗ τ E .

max (7.175)
E∈CPTP(B→Ã) τ ∈D(E)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

7.6. DUALITY RELATIONS 363

↑
Proof. We start with the expression for Qmin (A|B)ρ = 2−Hmin (A|B)ρ , given in (7.148) as

F 2 E B→Ã ρAB , ΩAÃ .

Qmin (A|B)ρ = max (7.176)
E∈CPTP(B→Ã)

For any E ∈ CPTP(B → Ã) let VE ∈ CPTP(B → ÃR) be its Stinespring’s isometry. Observe

that VFB→ÃR ρABE is a purification of F B→Ã ρAB . Moreover, since ΩAÃ is already pure,
any purification of ΩAÃ in AÃRE must be of the form ΩAÃ ⊗ χRE , where χ ∈ Pure(RE).
Hence, from the Uhlmann’s theorem we get that

2 AÃ B→Ã AB 2 AÃ RE B→ÃR ABE
F Ω ,E ρ = max F Ω ⊗ χ , VE ψ . (7.177)
χ∈Pure(RE)

Now, observe that any purification of the state ρAE := TrB ρABE in Pure(AÃER) has the

form VEB→ÃR ρABE for some E ∈ CPTP(B → Ã). Therefore, when we add the maximiza-
tion over all E ∈ CPTP(B → Ã) to both sides of the equation above we get

Qmin (A|B)ρ = max F 2 ΩAÃ ⊗ χRE , ψ AÃRE , (7.178)
ψ∈Pure(AÃRE)
ψ AE =ρAE , χ∈Pure(RE)

where on the right-hand side we replaced that maximum over all E ∈ CPTP(B → Ã) with
a maximum over all pure states ψ ∈ Pure(AÃRE) with marginal ψ AE = ρAE . Finally,
applying the Uhlmann’s theorem to the expression above we conclude that

Qmin (A|B)ρ = max F 2 I A ⊗ χE , ρAE .

(7.179)
χ∈D(E)

This completes the proof.

Exercise 7.6.3. Show that Qmin is a convex function. That is, show that for every set of n
bipartite quantum states {ρAB
x }x∈[n] and every p ∈ Prob(n) we have
X X
Qmin (A|B)ρ ⩽ px Qmin (A|B)ρx where ρAB = px ρAB
x . (7.180)
x∈[n] x∈[n]

More generally, the duals of Hα↑ and H̃α↑ can also be computed and they are given by (see
the section ‘Notes and References’ below for more details)

1 1
H̃α↑dual (A|B)ρ = H̃β↑ (A|B)ρ for + = 2 , α, β ∈ [1/2, ∞]
α β (7.181)
H̃αdual (A|B)ρ = Hβ↑ (A|B)ρ for αβ = 1 , α, β ∈ [0, ∞] .

Observe that from the first equality above, by taking α = ∞ (and hence β = 1/2) we get
the statement given in the lemma above that the dual to the optimized conditional min

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

364 CHAPTER 7. CONDITIONAL ENTROPY

↑
entropy is H̃1/2 . On the other hand, from the second equality we see that the dual of Hmin
is H0↑ = Hmax (see Definition (7.5.1)). That is, for all ρ ∈ D(AB)
dual
Hmin (A|B)ρ = Hmax (A|B)ρ . (7.182)
We therefore get the following corollary.

Corollary 7.6.1. Let H be a quantum conditional entropy and suppose its dual
Hdual is also a quantum conditional entropy. Then, for all ρ ∈ D(AB)

H(A|B)ρ ⩽ Hmax (A|B)ρ (7.183)

Remark. Previously, we established that conditional entropies defined as in (7.104) are upper
bounded by the conditional max-entropy. However, the corollary above does not require the
conditional entropy H to be defined with respect to a relative entropy. Instead, it assumes
that the dual entropy Hdual is also a valid conditional entropy. It is worth noting that it
remains an open problem whether this additional assumption can be removed, i.e., whether
the upper bound provided by the conditional max-entropy applies to all conditional entropies
or just to those whose dual is also a conditional entropy.
Proof. Let φ ∈ Pure(ABC) be a purification of ρAB . From the definition of Hdual we get
H(A|B)ρ = −Hdual (A|C)φ
Theorem 7.3.1 applied to Hdual → ⩽ −Hmin (A|C)φ (7.184)
(7.182)→ = Hmax (A|B)ρ .
This completes the proof.
Exercise 7.6.4. Use the lemma above and the relations (7.181) to show that for any φ ∈
Pure(ABC) we have
Hα (A|B)φ + Hβ (A|C)φ = 0 for α + β = 2, α, β ∈ [0, 2]
1 1
H̃α↑ (A|B)φ + H̃β↑ (A|C)φ = 0 for + = 2, α, β ∈ [1/2, ∞] (7.185)
α β
Hα↑ (A|B)φ + H̃β (A|C)φ = 0 for αβ = 1, α, β ∈ [0, ∞] .
Exercise 7.6.5. In the following, use the duality relations above.
1. Show that the dual of Hα is itself a quantum conditional entropy for all α ∈ [0, 2] (i.e.
you need to show the monotonicity property).
2. Show that H̃α↑ is a quantum conditional entropy for all α ∈ [0, ∞] (i.e. you need to
show the additivity property).
3. Show that the dual of H̃α↑ is itself a quantum conditional entropy for all α ∈ [0, ∞].
4. Use part 2 to provide an alternative proof for the additivity of the optimized conditional
min-entropy.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

7.7. THE DECOUPLING THEOREM 365

7.7 The Decoupling Theorem

The decoupling theorem identifies the conditions under which a system, initially correlated
with another system (such as the environment), becomes decoupled from that environment
following a physical evolution. The conditional entropy, as examined in the previous sections,
can be employed to measure the extent of this decoupling. The decoupling theorem serves
as a pivotal instrument in both quantum Shannon theory and quantum resource theories. A
“smoothed” version of this theorem will also be discussed in Sec. 10.4.2.
In Sec. C.4, we introduced the concept of the G-twirling operation over a compact Lie
group. When the group G = U(A) encompasses all unitary matrices, the resulting G-twirling
map, denoted as G ∈ CPTP(A → A), is given by the channel in (3.242). Therefore, the
channel G A→A transforms a state ρ ∈ D(AE) — representing a system A correlated with an
environment E — into:
Z
A→A AE
dU A U A ρAE U ∗A = uA ⊗ ρE .

G ρ := (7.186)
U(A)

In this context, the G-twirling map acts as a completely randomizing channel, also known
as the completely depolarizing channel.
When a quantum channel N ∈ CPTP(A → B) is applied to both sides of the equation
above, we obtain: Z
dU A N A→B U A ρAE U ∗A = τ B ⊗ ρE

(7.187)
U(A)

where τ B := N A→B uA . The decoupling theorem estimates how closely N A→B U A ρAE U ∗A

(i.e., removing the integral and considering one specific unitary matrix) can approximate the
decoupled state τ A ⊗ ρE . Our discussion begins with a lemma using the square of the
Frobenius norm for this estimation. In this lemma, we utilize the function:
m 2 2
f ω AB := √ Tr ω AB − Tr uA ⊗ ω B

∀ ω ∈ L(AB) , (7.188)
m2 − 1
where m := |A|. To simplify the notation in this section, we will omit the square brackets
AB 2

in hcertain expressions.
i For instance, in the above formula, we used Tr ω instead of
2
Tr ω AB . It’s important to note that with this revised notation, all powers are included
within the trace operation.
Exercise 7.7.1. Let ρ ∈ Pos(AB) (we also assume ρAB is not the zero matrix) and set
m := |A|.
1. Show that
1 Tr(ρAB )2
⩽ ⩽m (7.189)
m Tr(ρA )2
h i
Hint: Start by showing Tr(ρB )2 = Tr ρAB ⊗ I Ã I A ⊗ ρÃB and then use the
Cauchy-Schwarz inequality. For the other side, show first that ρAB ⩽ mI A ⊗ ρB .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

366 CHAPTER 7. CONDITIONAL ENTROPY

2. Consider the function f as defined above.

(a) Show that f ρAB ⩾ 0 with equality if and only if ρAB = uA ⊗ ρB .
(b) Show that
2
f (ρAB ) ⩽ Tr ρAB . (7.190)

Lemma 7.7.1. Let ρ ∈ L(AE), m := |A|, N ∈ L(A → B), and τ AB := m1 JNAB , where
JNAB is the Choi matrix of N A→B .
Z 2
∗
dU A Tr N A→B U A ρAE U A − τ B ⊗ ρE = f ρAE f τ AB ,

(7.191)
U(A)

where f is defined in (7.188).

Remark. Observe that we do not assume that ρAE is a density matrix (not even Hermitian)
nor that N A→B is a quantum channel (just a linear map). However, if ρAE ⩾ 0 we can use
the bound (7.190) in conjunction with the lemma above to get the relatively simple upper
bound Z
2 2 2
dU A NUA→B ρAE − τ B ⊗ ρE 2 ⩽ Tr ρAE Tr τ AB ,

(7.192)
U(A)

where NUA→B := N A→B ◦ U A→A , with U A→A (·) := U A (·)U ∗A , and we used the fact that any
Hermitian matrix η ∈ Herm(BE) satisfies ∥η∥22 = Tr[η 2 ]. Moreover, taking the square root
on both sides of the equation above and using Jensen’s inequality (see Sec. B.4) we obtain
that sZ
q
2
Tr (ρAE )2 Tr (τ AB )2 ⩾ dU A ∥NUA→B (ρAE ) − τ B ⊗ ρE ∥2
U(A)
Z (7.193)
Jensen′ s Inequality→ ⩾ dU A NUA→B ρ AE
− τ B ⊗ ρE

2
U(A)

Finally, it’s important to recognize that since the average of the integrand in the above
equation is less than the expression on the
left-hand side, it implies the existence
AE of at least
AB
A A→B AE B E
one unitary U for which NU ρ − τ ⊗ ρ 2 is smaller than Tr ρ Tr τ .
Proof. For simplicity of the exposition we will omit the superscript from NUA→B and simply
write it as NU . With these notations, the integrand of (7.191) can be decomposed into three
terms:
2
Tr NU ρAE − τ B ⊗ ρE

2 2 (7.194)
= Tr NU ρAE − 2Tr τ B ⊗ ρE NU ρAE + Tr τ B ⊗ ρE .

From (7.187), the integral of the second term above can be simplified as
Z
2
dU Tr τ B ⊗ ρE NU ρAE = Tr τ B ⊗ ρE .

(7.195)
U(A)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

7.7. THE DECOUPLING THEOREM 367

Therefore, taking the integral over U(A) on both sides of (7.194) gives
Z
2
dU Tr NU ρAE − τ B ⊗ ρE

U(A)
Z 2 2 2 (7.196)
AE
= dU Tr NU ρ − Tr τ B Tr ρE .
U(A)

To compute the remaining integral we use a linearization technique that is based on Exer-
cise 3.5.28. That is, we linearize the square in the integrand by using Exercise 3.5.28 with
the flip operator F B B̃E Ẽ = F B B̃ ⊗ F E Ẽ . Explicitly,

AE
2 h
AE
AE
B B̃E Ẽ i
Tr NU ρ = Tr NU ρ ⊗ NU ρ F
h ⊗2 i
= Tr NU⊗2 ρAE F B B̃E Ẽ (7.197)
h ⊗2 i
= Tr ρAE NU∗⊗2 F B B̃ ⊗ F E Ẽ

.

Taking the integral over U(A) on both sides and using the fact that NU∗ = U ∗ ◦ N ∗ we obtain
Z
AE
2 h
AE ⊗2 ∗⊗2 B B̃
E Ẽ
i
dU Tr NU ρ = Tr ρ G N F ⊗F , (7.198)
U(A)

where G ∈ CPTP(AÃ → AÃ) denotes the twirling channel

Z
G(·) := dU U ∗ ⊗ U ∗ (·)U ⊗ U . (7.199)
U(A)

Next, we make use of the fact that the twirling channel turns states to symmetric ones
(see (3.247)). Specifically, observe that from (3.247) we get

∗⊗2 B B̃
G N F = aI AÃ + bF AÃ . (7.200)

where the coefficients a, b ∈ R will be computed shortly using (3.247). Substituting (7.200)
into (7.198) gives
Z 2 h ⊗2 AÃ i
dU Tr NU ρAE = Tr ρAE aI + bF AÃ ⊗ F E Ẽ

U(A)
(7.201)
h ⊗2 E Ẽ i h ⊗2 AÃE Ẽ i
= aTr ρE F + bTr ρAE F
2 2
(3.248)→ = aTr ρE + bTr ρAE .

Combining this with (7.196) we obtain

Z
2 2 2 2
dU Tr NU ρAE − τ B ⊗ ρE = a − Tr τ B Tr ρE + bTr ρAE .

(7.202)
U(A)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

368 CHAPTER 7. CONDITIONAL ENTROPY

It is therefore left to compute the coefficients a and b. From (3.247) they can be expressed
as: h i h AÃ i
∗⊗2 B B̃ ∗⊗2 B B̃
mTr N F − Tr N F F
a := (7.203)
m(m2 − 1)
and h i h i
mTr N ∗⊗2 F B B̃ F AÃ − Tr N ∗⊗2 F B B̃

b := . (7.204)
m(m2 − 1)
To simplify the expressions above we use the definition of the adjoint map to get
h i h ⊗2 B B̃ i
Tr N ∗⊗2 F B B̃ = Tr N (I A ) F
h i
2 B ⊗2 B B̃

N I A
= JN = mτ −−−−→ = m Tr
B B
τ F (7.205)
h 2 i
2
τB
h i
F B := TrB̃ F B B̃ = I B −−−−→ = m Tr ,

and h i h i
Tr F AÃ N ∗⊗2 F B B̃ = Tr F B B̃ N ⊗2 F AÃ . (7.206)
⊗2
Moreover, since the Choi matrix of N ⊗2 is given by m2 τ AB we get
h i h h ⊗2 AÃ ii
Tr F B B̃ N ⊗2 (F AÃ ) = m2 Tr F B B̃ TrAÃ τ AB F ⊗ I B B̃
h ⊗2 AÃ i
= m2 Tr τ AB F ⊗ F B B̃ (7.207)

(3.248)→ = m2 Tr (τ AB )2 .

We therefore conclude that a and b can be expressed as

2 2
m2 Tr τ B − mTr τ AB
a= and
m2 − 1
2 2 (7.208)
m2 Tr τ AB − mTr τ B m AB

b= = √ f τ .
m2 − 1 m2 − 1
Finally, substituting these expressions into (7.202), and observing that
2 2
B 2 Tr τ B − mTr τ AB 1
f τ AB ,

a − Tr τ = = −√ (7.209)
m2 − 1 m2 − 1
we get that the right-hand side of (7.202) equals the right-hand side of (7.191). This com-
pletes the proof.

Exercise 7.7.2. Demonstrate clearly that substituting the expressions in the proof above for
B 2

a − Tr τ and b into (7.202) results in the equality (7.191).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

7.7. THE DECOUPLING THEOREM 369

Exercise 7.7.3. Using the same notations as in the lemma above, with ρ ∈ D(AE) and
N ∈ CP(A → B), show that for all σ ∈ D(E),
Z 2
A A→B A AE

A ∗ B E
⩾ f ρAE f τ AB ,

dU Tr N U ρ U −τ ⊗σ (7.210)
U(A)

with equality if and only if σ E = ρE .

In the proof of the lemma above, we used the twirling operation G as defined in (7.199).
In Sec. 15.2.1 we will see that this channel belong to a family of channels that we call the
G-twirling operations. One of the properties of all G twirling operations, particular the
channel G as defined in (7.199), is that they can be expressed as a finite convex combination
of unitary channels. In our context here, it means that there exists k ∈ N, p ∈ Prob(k),
and a set of unitary matrices {Ux }x∈[k] ⊂ U(A) such that G ∈ CPTP(AÃ → AÃ) as defined
in (7.199) can be expressed as
X
G(ω AÃ ) = px (Ux ⊗ Ux ) ω AÃ (Ux ⊗ Ux )∗ ∀ω ∈ L(AÃ) . (7.211)
x∈[k]

Therefore, working with this expression for the twirling map, we obtain the following corol-
lary.

Corollary 7.7.1. Let k ∈ N, p ∈ Prob(k), and {Ux }x∈[k] ⊂ U(A) be as in (7.211).

Then, using the same notations as in Lemma 7.7.1, we have
X ∗ 2
px Tr N A→B UxA ρAE UxA − τ B ⊗ ρE = f ρAE f τ AB .

(7.212)
x∈[k]

Exercise 7.7.4. Prove the corollary above.

In the following theorem we make use of the conditional entropy H̃2↑ (A|B)ω which is
defined on every ω ∈ Pos(AB) terms of the divergence D̃2 as (c.f. (7.104))

H̃2↑ (A|B)ω = − min D̃2 ω AB I A ⊗ η B

η∈D(B)
2 (7.213)
Definition 6.4.1→ = − min log Tr I A ⊗ η −1/4 ω AB I A ⊗ η −1/4

.
η∈D(B)

Decoupling Theorem
1
Theorem 7.7.1. Let ρ ∈ D⩽ (AE), N ∈ CP(A → B), and τ AB := |A| JNAB , where
JNAB is the Choi matrix of N A→B . Then,
Z
↑ ↑
∗ 1
dU A N A→B U A ρAE U A − τ B ⊗ ρE ⩽ 2− 2 H̃2 (A|E)ρ +H̃2 (A|B)τ . (7.214)
U(A) 1

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

370 CHAPTER 7. CONDITIONAL ENTROPY

Proof. In the first step of the proof we upper bound the trace norm with the Hilbert Schmidt
norm. Working with the Frobenius norm we will be able to use (7.193). From the third part
of Exercise 5.4.2 it follows that for any matrix M ∈ Herm(A) and σ ∈ Pos(A)
p
∥M ∥1 ⩽ Tr[σ] σ −1/4 M σ −1/4 2 . (7.215)
Taking σ = η B ⊗ ζ E ∈ D(BE) and
M = NU ρAE − τ B ⊗ ρE

(7.216)
gives
NU ρAE − τ B ⊗ ρE 1

− 1 − 1 (7.217)
⩽ η B ⊗ ζ E 4 NU ρAE − τ B ⊗ ρE η B ⊗ ζ E 4

2
B − 41 1
B E
The choice of η and ζ will be made later. Denoting by := (η ) ÑUA→B (·) NUA→B (·)(η B )− 4 ,
AE := E − 14 AE E − 41 AB := 1 AB
ρ̃ (ζ ) ρ (ζ ) , and τ̃ J , we get
m Ñ

NU ρAE − τ B ⊗ ρE 1 ⩽ ÑU ρ̃AE − τ̃ B ⊗ ρ̃E .

(7.218)
2

Taking the integral over U ∈ U(m) on both sides we obtain

Z Z
AE B E
dU ÑU ρ̃AE − τ̃ B ⊗ ρ̃E

dU NU ρ −τ ⊗ρ 1 ⩽
U(A) U(A) 2 (7.219)
p
(7.193)→ ⩽ Tr [(τ̃ AB )2 ] Tr [(ρ̃AE )2 ] .
Finally, choosing η E and ζ B such that
2 ↑
E − 41 E − 41
= 2−H̃2 (A|E)ρ
AE 2 A AE A
Tr (ρ̃ ) = Tr (I ⊗ (η ) )ρ (I ⊗ (η ) )
2 (7.220)
↑
B − 41 B − 14
= 2−H̃2 (A|B)τ
AB 2 A AB A
Tr (τ̃ ) = Tr (I ⊗ (ζ ) )τ (I ⊗ (ζ ) )

completes the proof.

Corollary 7.7.2. Let k ∈ N, p ∈ Prob(k), and {Ux }x∈[k] ⊂ U(A) be as in (7.211).

Then, using the same notations as in Theorem 7.7.1, we have

− 21 H̃2↑ (A|E)ρ +H̃2↑ (A|B)τ

A ∗
X
A→B A AE B E
px N Ux ρ Ux −τ ⊗ρ ⩽2 . (7.221)
1
x∈[k]

Exercise 7.7.5. Use Theorem (7.7.1) and Corollary (7.7.1) to prove the corollary above.
Exercise 7.7.6. Show that if ω AB := 1t JNAB , where t := Tr JNAB (i.e. ω AB = |A|

t
τ AB is a
AB := 1 AB AB
density matrix), and similarly, σ r
ρ with r := Tr ρ then the decoupling theorem
above can be expressed as
Z
rt − 12 H̃2↑ (A|E)σ +H̃2↑ (A|B)ω
∗
dU A N A→B U A ρAE U A − τ B ⊗ ρE ⩽ 2 . (7.222)
U(A) 1 |A|

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

7.8. NOTES AND REFERENCES 371

7.8 Notes and References

Conditional majorization was first introduced in [90] in the context of the quantum uncer-
tainty principle, but its quantum version was fully defined in [103].
Many text books on quantum information includes a section on quantum conditional
entropies and their properties. Specifically, the books by [208], and [206], provide a com-
prehensive review on the various quantum Rényi conditional entropies and their properties.
The axiomatic approach for the quantum conditional entropy, as presented in this chapter,
is due to [103]. Semi-causal maps were first introduced by [14], and was conjectured to have
the characterization given in Theorem 7.1.1. Shortly after this conjecture was proved by [72].
The relatively simplified proof provided here for Theorem 7.1.1 is due to [179]. The duality
relation given in Lemma 7.6.1 was first introduced in [210]. The proofs for the two other
relations in Eq. 7.181 were given for the first relation by [167] and independently by [15],
and for the second relation of (7.181) by [209].

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

372 CHAPTER 7. CONDITIONAL ENTROPY

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

CHAPTER 8

The Asymptotic Regime

As the dimension of a physical system grows, one can employ several tools from probability
theory and statistics (e.g. the law of large numbers), to study its behaviour and properties.
Specifically, one of the main goals of quantum resource theories is to determine the rate
at which many copies of one resource can be converted into many copies of another. The
methods and tools developed here provide the foundations for several topics in this asymp-
totic domain. We start by reviewing some of these concepts and their generalizations to the
quantum world.

8.1 Classical Typicality

A central theme in information theory is finding efficient methods for transmitting informa-
tion from one party (Alice) to another (Bob). Consider a scenario where Alice wishes to
send Bob a message x from a set of m possibilities. This transmission would require log2 (m)
classical bits, achievable through log2 (m) uses of a perfectly noiseless classical bit-channel.
At first glance, it seems the resource cost for sending a message of size m is:

log2 (m)[c → c] , (8.1)

where [c → c] denotes one usage of a noiseless cbit-channel.

However, if Alice’s message to Bob consists of English alphabet letters (where m = 26),
additional information about the message emerges. For instance, the letter ‘E’ appears with
a frequency of about 12%, while ‘Z’ occurs only about 0.07% of the time. Thus, it’s more
probable for Bob to receive ‘E’ than ‘Z’. Shannon’s groundbreaking 1948 paper illustrated
how to leverage this additional information to significantly reduce classical communication
costs. This approach is based on the crucial concept of typicality, which plays a major role
in information science and, by extension, in quantum resource theories.

373
374 CHAPTER 8. THE ASYMPTOTIC REGIME

8.1.1 i.i.d. Information Source

Building on the above, we model the messages that Alice sends to Bob as being drawn from
a classical information source. This source can be conceptualized as a sequence of random
variables X1 , X2 , ..., each representing an output. For instance, the word ’PEACE’ sent by
Alice translates into a five-letter sequence: X1 =P, X2 =E, X3 =A, X4 =C, X5 =E, with
each letter having a specific probability of occurrence. To simplify the mathematics and as
a first approximation, we make two assumptions about this source.
Firstly, we assume that the source’s various uses are independent, meaning that each
letter is emitted without being influenced by the previous ones. However, it’s clear that
this assumption does not strictly apply to the English alphabet. Take, for instance, the
letter ‘T’, which is the second most frequent letter in English, with an occurrence frequency
of about 9%. If Alice sends the letter ‘M’ to Bob, the likelihood of the next letter being
‘T’ is significantly lower than 9%, given the relative rarity of the ‘MT’ combination in
English words. Thus, for many information sources, including English, the assumption of
independence should be considered more as a first order approximation than a definitive
rule.
Secondly, we assume the source’s uses are identically distributed, implying each use of the
source can be represented by a random variable, sharing the same alphabet and possessing an
identical probability distribution. That is, for all k ∈ N, the probability, Pr(Xk = x) := px ,
that the random variable Xk is equal to some x in the alphabet will be independent of k. We
will therefore consider here information sources that are both independent and identically
distributed, or in short, i.i.d. information sources. An i.i.d. source is thus represented
by a single random variable X, with an alphabet x ∈ X and a corresponding probability
distribution {px }x∈X . For simplicity, we consider finite alphabets, taking X = {1, . . . , m},
and denote the distribution as i.i.d.∼ p, emphasizing the probability vector p = (p1 , . . . , pm )T
of X.
Consider, for example, n uses of a binary i.i.d. source, producing a sequence of n bits
n
X = (X1 , . . . , Xn ), with p being the probability of outcome “0” and 1 − p for “1” (hence
Xk ∈ {0, 1}). For large n, it’s highly likely that the sequence X n will contain roughly np zeros
and n(1 − p) ones. The occurrence probability of such a typical sequence is approximately:

Pr (X n = xn ) = px1 px2 · · · pxn ≈ pnp (1 − p)n(1−p) . (8.2)

This probability can be further simplified to:

Pr (X n = xn ) ≈ 2−nH(X) , (8.3)

where H(X) is the binary Shannon entropy:

H(X) := −p log2 (p) − (1 − p) log2 (1 − p) . (8.4)

Despite the variety of typical sequences xn , they all share approximately the same probability
of occurrence. This effect is known as the asymptotic equipartition property, a direct result
of the (weak) law of large numbers.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

8.1. CLASSICAL TYPICALITY 375

8.1.2 The Law of Large Numbers

Let X1 , X2 , . . . be an i.i.d.∼ p source with the same distribution as of X. Suppose that
X X
E(X) := px x < ∞ and E(X 2 ) := p x x2 < ∞ . (8.5)
x∈X x∈X

Since we only consider in this book sets with finite cardinality, these conditions will trivially
hold.

The Law of Large Numbers

Theorem 8.1.1. With the notations as above, for all ε > 0
1 X
lim Pr Xj − E(X) > ε = 0 . (8.6)
n→∞ n
j∈[n]

1
PThe law above is very intuitive as it shows that for very large n, the probability that
n j∈[n] Xj is close to E(X) is almost one. In particular, (8.6) is equivalent to the statement
that
1 X
lim Xj = E(X) in probability. (8.7)
n→∞ n
j∈[n]

Proof. We first prove the theorem

P for the case that the expectation value of X is zero; i.e.
1
E(X) = 0. Denote by Sn := n j∈[n] Xj . Then,

n
1 X
E(Sn2 ) = 2 E(Xj Xk ) (8.8)
n j,k=1

A key observation is that for j ̸= k, the two random variables Xj and Xk are independent,
and consequently
X
E(Xj Xk ) = xj xk pxj pxk = E(Xj )E(Xk ) = 0 (8.9)
xj ,xk ∈X

Therefore, the only contributing terms in (8.8) are those with j = k. Hence.

1 X 1
E(Sn2 ) = 2
E(Xj2 ) = E(X 2 ) . (8.10)
n n
j∈[n]

The above equation already demonstrates that for very large n the variance of Sn is very
small, indicating that it will reach a single value in the limit n → ∞. On the other hand,
E(Sn2 ) can be splitted into two terms, those for which the value of Sn is close to zero and

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

376 CHAPTER 8. THE ASYMPTOTIC REGIME

those for which it is at least ε-distance away from zero:

X X X
E(Sn2 ) = s2n Pr(Sn2 = s2n ) = s2n Pr(Sn2 = s2n ) + s2n Pr(Sn2 = s2n )
s2n |sn |⩽ε |sn |>ε
X
⩾ s2n Pr(Sn2 = s2n )
|sn |>ε
X
⩾ ε2 Pr(Sn2 = s2n ) = ε2 Pr(|Sn | > ε) .
|sn |>ε

We therefore conclude that

E(Sn2 ) E(X 2 )
Pr(|Sn | > ε) ⩽ = −−−→ 0 (8.11)
ε2 nε2 n→∞
This completes the proof for the case E(X) = 0. If E(X) ̸= 0 then define a sequence of
i.i.d.∼ p random variables Yj := Xj − E(X). With this definition we get the E(Yj ) = 0 so
that 1 X 1 X
n→∞
Pr Xj − E(X) > ε = Pr Yj > ε −−−→ 0. (8.12)
n n
j∈[n] j∈[n]

This completes the proof.

Exercise 8.1.1 (Markov Inequality). Prove that for any t > 0 and any nonnegative random
variable X
E(X)
Pr (X ⩾ t) ⩽ . (8.13)
t
The law of large numbers above does not tell us much how fast the probability in (8.6)
goes to zero. In the proof we saw in (8.11) that it goes to zero at least as 1/n. If instead of
requiring that the expectation value of X and X 2 are finite, we require a stronger condition
that the alphabet of X themselves are all bounded, then it is possible to show that in this
case the probability in (8.6) goes to zero exponentially fast with n.

Hoeffding’s Inequality
Theorem 8.1.2. Let X1 , . . . , Xn be n independent random variable satisfying
aj ⩽ Xj ⩽ bj for all j = 1, . . . , n. Then,
!
1 X 2n2 ε2
Pr Xj − E(Xj ) > ε ⩽ exp − P 2
. (8.14)
n
j∈[n] j∈[n] (bj − aj )

Note that Hoeffding’s inequality above does not assume that the random variables X1 , . . . , Xn
are identically distributed. If we add this assumption (so that X1 , . . . , Xn are i.i.d.), then
we get a simplified version of Hoeffding’s inequality given by
2nε2
1 X
Pr Xj − E(X) > ε ⩽ exp − . (8.15)
n (b − a)2
j∈[n]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

8.1. CLASSICAL TYPICALITY 377

where we assumed that a ⩽ X ⩽ b.

To prove Hoeffding’s inequality we will need the following lemma.

Hoeffding’s Lemma
Lemma 8.1.1. Let X be a real valued bounded random variable with expected value
E(X) = µ and a ⩽ X ⩽ b for some a, b ∈ R with b > a. Then, for all t ∈ R we have

t2 (b − a)2

tX

E e ⩽ exp tµ + . (8.16)
8

Proof. Consider first the case µ = 0. We therefore must have a ⩽ 0 ⩽ b. Also, it’s important
to note that if a = 0, then the condition E(X) = 0 leads to E etX = 1 (can you see why?).
As a result, the inequality (8.16) is valid under these circumstances. Therefore, we will
proceed with the assumption that a < 0. The convexity of the function f (x) := etx implies
that for any a ⩽ x ⩽ b we have
b − x ta x − a tb
etx ⩽ e + e (8.17)
b−a b−a
where we wrote x as the convex combination x = a b−x
b−a
+ b x−a
b−a
. The key idea of the inequality
above is that the right-hand side depends linearly on x. Applying this inequality to the
random variable X we get
b − E(X) ta E(X) − a tb
E etX ⩽ e + e
b−a b−a
b ta a tb (8.18)
µ = 0 −−−−→ = e − e
b−a b−a
a
c := −
b−a
−−−−→ = (1 − c + cet(b−a) )eta .

Note that c > 0 since a < 0. Finally, denote by s := t(b − a) the right-hand side of the
equation above becomes equal to
(1 − c + ces )e−cs = ef (s) , where f (s) := −cs + log (1 − c + ces ) , (8.19)
and we used the equality eta = e−cs . Consider the Tylor expansion of f (s) up to its second
order
1
f (s) = f (0) + sf ′ (0) + s2 f ′′ (q) , (8.20)
2
where q is some real number between zero and s. By straightforward calculation, we get that
f (0) = f ′ (0) = 0 and f ′′ (q) ⩽ 41 (see Exercise 8.1.2 for more details). Combining everything
we conclude that
1 2 ′′ 1 2 1 2 2
E etX ⩽ ef (s) = e 2 s f (q) ⩽ e 8 s = e 8 t (b−a) .

(8.21)
This completes the proof for the case µ = 0. The proof for the case µ ̸= 0 is obtained
immediately by defining X̃ := X − µ and applying the theorem for X̃ (see Exercise 8.1.3).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

378 CHAPTER 8. THE ASYMPTOTIC REGIME

Exercise 8.1.2. Show that for f ′′ (q) ⩽ 41 . Hint: Calculate the second derivative f ′′ (q) and
show that it can be expressed as p(1 − p) for some number p > 0 (that depends on q) and use
the fact that p(1 − p) ⩽ 14 .

Exercise 8.1.3. Show that the proof for the case µ ̸= 0 in the lemma above follows imme-
diately by defining X̃ := X − µ and applying the theorem for X̃.

We are now ready to prove the Hoeffding’s inequality.

Proof of Theorem 8.1.2. Denote by Sn := X1 + · · · + Xn and observe that for any r, t > 0
we have
Pr (Sn − E(Sn ) ⩾ t) = Pr er(Sn −E(Sn )) ⩾ ert

Markov’s Inequality (8.13)→ ⩽ e−rt E er(Sn −E(Sn ))

Y n
−rt
=e E er(Xj −E(Xj ))
j=1
n
Y (8.22)
{Xj } are independent→ = e−rt E er(Xj −E(Xj ))

j=1
n
1 2 2
Y
Hoeffding’s Lemma→ ⩽ e−rt e8r (b j −aj )

j=1

= eg(r)
where g(r) := −rt + 18 r2 j∈[n] (bj − aj )2 is a quadratic function whose minimum is given by
P

the right-hand side of (8.14). This completes the proof.

Exercise 8.1.4. Show that the minimum of the function g(r) above is given by the right-hand
side of (8.14).

8.1.3 Typical Sequences

As we saw above, a typical sequence xn = (x1 , . . . , xn ) that is drawn from an i.i.d.∼ p source
has a probability to occur that is close to 2−nH(p) . In this section we apply the law of large
numbers to make the notion of typical sequences rigorous.

Typical Sequence
Definition 8.1.1. Let ε > 0 and let X be a random variable with cardinality
|X | = m, corresponding to an i.i.d. source. A sequence of n source outputs
xn := (x1 , . . . , xn ) is called ε-typical if

2−n(H(X)+ε) ⩽ Pr(X n = xn ) ⩽ 2−n(H(X)−ε) . (8.23)

P
where H(X) := − x∈[m] px log px is the Shannon entropy.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

8.1. CLASSICAL TYPICALITY 379

By taking the log on all sides of (8.23), the condition in (8.23) can be re-expressed as

1 1
log2 − H(X) ⩽ ε . (8.24)
n Pr(X n = xn )

We will denote the set of all ε-typical sequences by

Tε (X n ) := {xn = (x1 , . . . , xn ) : xn is ε-typical} . (8.25)

Therefore, the probability that a sequence is ε-typical is given by

X
Pr(Tε (X n )) := Pr(X n = xn ) . (8.26)
xn ∈Tε (X n )

More generally, for any set of sequences Kn ⊆ [m]n we will use the notation Pr(Kn ) to denote
the probability that a sequence belongs to Kn . That is,
X
Pr(Kn ) := Pr(X n = xn ) . (8.27)
xn ∈Tε (X n )

In the following theorem, we denote the constant

2
c := 2 > 0 , (8.28)
log(pmax /pmin )

where pmin > 0 and pmax are the smallest and largest positive (i.e., non-zero) components of
p := (p1 , . . . , pm )T .

2
Theorem 8.1.3. Let p ∈ Prob(m), ε ∈ (0, 1), δn := e−cε n where c is defined
in (8.28), X be a random variable associated with an i.i.d.∼ p source, and for each
n ∈ N, let Kn ⊆ [m]n be a set of sequences with cardinality |Kn | ⩽ 2nr for some
r < H(X). Then, for all n ∈ N the following three inequalities hold:

1. Pr(Tε (X n )) > 1 − δn .

2. (1 − δn )2n(H(X)−ε) ⩽ |Tε (X n )| ⩽ 2n(H(X)+ε) .

′
3. Pr(Kn ) ⩽ e−c n , for some c′ > 0.

Proof. For the first inequality, we assume without loss of generality that p > 0, since any x
with px = 0 never occur and can be removed from the alphabet of X. Let Y := − log2 (X)
be the random variable whose alphabets symbols are given by Y := {− log2 Pr(X = x)}x∈[m] ,
with corresponding probabilities px := Pr(X = x). Let Y1 , Y2 , . . . be an i.i.d. sequences of
random variables where each Yj corresponds to Xj as above. By definition, each Yj satisfies

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

380 CHAPTER 8. THE ASYMPTOTIC REGIME

− log pmax ⩽ Yj ⩽ − log pmin . Therefore, from Hoeffding’s inequality, particularly (8.15), we
get that
1 X 2
Pr Yj − E(Y ) > ε ⩽ e−cε n . (8.29)
n
j∈[n]

Observe that
X X
E(Y ) = − px log2 (Pr(X = x)) = − px log2 px = H(X) . (8.30)
x∈[m] x∈[m]

Moreover,

1 X 1 X 1 n 1 1
Yj = − log2 Pr(Xj ) = − log2 Pr(X ) = log2 . (8.31)
n n n n Pr(X n )
j∈[n] j∈[n]

We therefore conclude that

1 1 2
Pr log2 n
− H(X) > ε ⩽ e−cε n . (8.32)
n Pr(X )

The equation above states that the probability that the random variable X n = (X1 , . . . , Xn )
2
is not an ε-typical sequence, is no greater than e−cε n . This completes the proof of first part
of the theorem.
For the second inequality, we get from the definition of ε-typical sequences that
X X
1⩾ Pr(X n = xn ) ⩾ 2−n(H(X)+ε) = |Tε (X n )| 2−n(H(X)+ε) . (8.33)
xn ∈Tε (X n ) xn ∈Tε (X n )

Therefore,
|Tε (X n )| ⩽ 2n(H(X)+ε) . (8.34)
On the other hand, observe that from the first part and the definition of ε-typical sequences
we get
X X
1 − δn ⩽ Pr(X n = xn ) ⩽ 2−n(H(X)−ε) = |Tε (X n )| 2−n(H(X)−ε) . (8.35)
xn ∈Tε (X n ) xn ∈Tε (X n )

Hence,
(1 − δn )2n(H(X)−ε) ⩽ |Tε (X n )| . (8.36)
For the last part of the proof (i.e., third inequality), let 0 < ε′ < 21 (H(X) − r). The
probability of Kn can be expressed as:
X X X
Pr(X n = xn ) = Pr(X n = xn ) + Pr(X n = xn ) (8.37)
xn ∈Kn xn ∈Kn ∩Tε′ (X n ) xn ∈Kn
xn ̸∈Tε′ (X n )

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

8.1. CLASSICAL TYPICALITY 381

′2n
From the first part of the theorem, the last term can not exceed δn′ := e−cε so that
X X
Pr(X n = xn ) ⩽ Pr(X n = xn ) + δn′
xn ∈Kn xn ∈Kn ∩Tε′ (X n )
′
x n is ε′ -typical→ ⩽ 2−n(H(X)−ε ) |Kn | + δn′
′
(8.38)
|Kn | ⩽ 2nr −−−−→ ⩽ 2−n(H(X)−ε −r) + δn′
1 1
ε′ <
2
(H(X) − r) −−−−→ ⩽ 2−n 2 (H(X)−r) + δn′ .

1
Since both δn′ and 2−n 2 (H(X)−r) decrease exponentially fast with zero, there exists c′ > 0
′
sufficiently small such that Pr(Kn ) ⩽ e−c n . This completes the proof.

It’s also pertinent to mention that (8.32) can be expressed equivalently as:

1 1
log2 −−−→ H(X) in probability. (8.39)
n Pr(X n ) n→∞

This expression is the exact formulation of the asymptotic equipartition property, which will
be examined in greater detail later in the book.

Exercise 8.1.5. Using the same notations as in the theorem above, show that if instead of
Hoeffding’s inequality we use the law of large numbers (i.e., Theorem 8.1.1) then we can still
show that for any δ > 0 and sufficiently large n ∈ N

Pr (Tε (X n )) > 1 − δ . (8.40)

Exercise 8.1.6 (Variant of Part 3 of Theorem 8.1.3). Prove the following variant of part
3 of the theorem above: Let r < H(X) and let {Kn }n∈N be sets of sequences of size n, and
suppose for each a ∈ N there exists n > a such that |Kn | ⩽ 2nr . Then, for any δ > 0 and
every b ∈ N there exists n > b such that Pr(Kn ) ⩽ δ.

8.1.4 Application: Data Compression

A data compression scheme is a process by which a sender (Alice) transmits to a receiver
(Bob) a message of size 2n by communicating less than n cbits of communication (See
Fig. 8.1). This is possible because Alice draws the message from an i.i.d. information source
so that non-typical messages are highly unlikely to occur. Specifically, in a compression
scheme of rate r, Alice encodes (compresses) a message xn = (x1 , . . . , xn ) (drawn from an
i.i.d. source; particularly, xj ∈ X , where X is the alphabet set of the source) into a bit string
of size y m = (y1 , . . . , ym ) ∈ {0, 1}n , with m = ⌊rn⌋, and transmits the sequence y m to Bob.
Bob then decompresses y m into a sequence z n = (z1 , . . . , zn ) with zj ∈ X . The goal is that
z n will be almost identical to xn ; see Fig. 8.1. We will denote the compression map by C
and the decompression map by D so that y m = C(xn ) and z n = D(y m ).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

382 CHAPTER 8. THE ASYMPTOTIC REGIME

Figure 8.1: A compression-decompression scheme of rate r.

Definition 8.1.2. A compression-decompression scheme, (C, D) of rate r is said to

be reliable if

n n n n
lim Pr(Z = X ) = lim Pr D (C(X )) = X = 1 . (8.41)
n→∞ n→∞

Shannon Theorem (1948)

Theorem 8.1.4. Given an i.i.d. source with entropy H(X), a reliable
compression-decompression scheme of rate r exists if and only if r > H(X).

Proof. Suppose r > H(X). We need to show that there exists a reliable compression scheme
of rate r. Let δ > 0 and let ε > 0 be such that r > H(X) + ε. Then, from the first
2
part of Theorem 8.1.3 for all n ∈ N we have Pr(Tε (X n )) ⩾ 1 − e−cε n , where c > 0 is
a constant defined in (8.28). Let k ∈ 1, 2, . . . , |Tε (X n )| be the index labeling all the ε-
typical sequences in Tε (X n ). We assume that Alice and Bob agreed on the order before hand.
Define the compression map C : X n → {0, 1}m , with m := ⌈log2 |Tε (X n )|⌉ as follows. If xn
is the k th sequence of Tε (X n ) then C(xn ) is the binary representation of k. If xn ̸∈ Tε (X n )
then C(xn ) = (0, . . . , 0); i.e. if Bob receives the zero sequence he knows there is an error.
Now, from the second part of the theorem of typical sequences we know that
|Tε (X n )| ⩽ 2n(H(X)+ε) < 2nr . (8.42)
Therefore, for large enough n, the sequence y m = C(xn ) is of size m = ⌈log2 |Tε (X n )|⌉ ⩽ nr.
The decoding scheme D : {0, 1}m → X n is defined as follows. If y m is the zero sequence
Bob declares an error. Otherwise, if y m is the binary representation of k, then D(y m ) = xn
with xn being the k th sequence of Tε (X n ). It is left to show that the success probability goes
to one in the asymptotic limit n → ∞. Indeed, by construction,
(
0 if xn is not ε-typical
Pr Z n = xn X n = xn =

. (8.43)
1 if xn is ε-typical

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

8.2. QUANTUM TYPICALITY 383

Therefore, X
Pr(Z n = X n ) = Pr(X n = xn ) Pr Z n = xn X n = xn

xn ∈X n
X
(8.43)→ = Pr(X n = xn )
xn ∈Tε (X n )

= Pr(Tε (X n )) ⩾ 1 − δn .
Since limn→∞ δn = 0 we conclude that limn→∞ Pr(Z n = X n ) = 1. Hence, the compression-
decompression scheme above is reliable.
Conversely, suppose there exists compression-decompression scheme of rate r < H(X).
Then, there are at most 2nr outputs for D(y m ). Consequently, the set

Kn := { xn : D (C(xn )) = xn } , (8.44)

satisfies |Kn | ⩽ 2nr for all n. From the third part of Theorem 8.1.3 we get that limn→∞ Pr(Kn ) =
0. Hence, a compression-decompression scheme of rate r < H(X) cannot be reliable. This
completes the proof.

Note that in the proof above we showed that if r < H(X), then not only that the scheme is
not reliable, but in fact the probability that Z n = X n goes to zero; i.e. limn→∞ Pr(Kn ) = 0.
In other words, the error probability goes to one. This type of behaviour is known in
classical and quantum Shannon theories as the strong converse, whereas the weak converse
corresponds to a proof in which the error probability is shown to be bounded away from zero
as n goes to infinity (but not necessarily goes to one).

8.2 Quantum Typicality

In this section, we extend the concept of typicality to the quantum realm. We begin by
introducing the definition of an i.i.d. quantum information source.

8.2.1 i.i.d. Quantum Source

An i.i.d. quantum source can be considered as an ensemble of states {qy , ϕy }y∈[k] with each
ϕy ∈ Pure(A), from which states are independently selected. Consequently, after utilizing
the source n times, we obtain a sequence of n quantum states:

|ϕyn ⟩ := |ϕy1 ⟩|ϕy2 ⟩ · · · |ϕyn ⟩ . (8.45)

In contrast to classical sequences y n := (y1 , . . . , yn ), the quantum sequence above may not
be distinguishable, as the states {|ϕy ⟩}y∈[k] of the source are not orthogonal in general.
Imagine that Alice wishes to transmit the aforementioned state |ϕyn ⟩ to Bob. If Alice is
aware of the value of y n , she can employ Shannon’s compression coding to send y n to Bob
over a classical channel at a rate of H(Y ) (meaning the transmission of each source symbol

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

384 CHAPTER 8. THE ASYMPTOTIC REGIME

incurs a cost of H(Y )[c → c]). Upon receiving y n , Bob can recreate the state |ϕyn ⟩ (assuming
Bob knows the quantum source). However, as we will discuss, if Alice and Bob have access to
noiseless quantum channels, not only is it unnecessary for Alice to know y n , but she can also
transmit the state |ϕyn ⟩ to Bob more efficiently! Specifically, each source state transmission
costs H(A)ρ [q → q], where H(A)ρ := −Tr[ρA log ρA ] denotes the von Neumann entropy of
the state X
ρA := qy |ϕy ⟩⟨ϕy |A . (8.46)
y∈[k]

Observe that from the upper bound in (7.120) with ρx replaced by the pure state ϕy and p
replaced by q, we get that H(A)ρ ⩽ H(q) = H(Y ).
Without the classical knowledge of y, the quantum source generates the state ρ as men-
tioned above at each usage. Thus, we will denote by i.i.d.∼ ρ, an i.i.d. quantum source drawn
from an ensemble of states whose average state, as described in (8.46), is ρ. Additionally,
after n uses of the source, the produced state is:

ρ ⊗ ρ ⊗ · · · ⊗ ρ := ρ⊗n . (8.47)

While we assume Alice lacks access to the classical register y of the source, it’s plausible
that this value is recorded in some register system R. If R is classical, each source use
generates the cq-state:
X
ρRA := qy |y⟩⟨y|R ⊗ ϕA
y . (8.48)
y∈[k]

Alternatively, if the registrar system R is quantum, then each use of the source produces the
state X√
|ψ RA ⟩ = qy |y⟩R |ϕy ⟩A . (8.49)
y∈[k]

Both the classical and quantum register systems record the value of y. However, without
access to R, Alice and Bob cannot distinguish
P the source {qy , ϕy }y∈[k] from another source
{rz , ψz }z∈[ℓ] , whose average state is z∈[ℓ] rz ψz = ρ.
Although we assume that Alice and Bob do not have access to system R, it is necessary to
think about the quantum source with a recording system. Otherwise, without the knowledge
of y n , the states |ϕyn ⟩ become equivalent to ρ⊗n so that Bob, in principle, can prepare any
number of copies of ρ without any communication from Alice. However, if other parties have
access to y n , then they can verify that the state ρ⊗n that Bob prepared is not the original
state |ϕyn ⟩ that Alice intended to send.
Note that by applying the completely dephasing map ∆R on system R, the entangled
state |ψ RA ⟩ in (8.49) becomes the cq-state in (8.48). This demonstrates that taking the
registrar to be quantum is more general and we therefore adopt the entangled description
|ψ RA ⟩ of a quantum source. Note that in this picture, after n uses of the source, Alice shares
with the registrar the state
n n
|ψ R A ⟩ := |ψ RA ⟩⊗n . (8.50)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

8.2. QUANTUM TYPICALITY 385

In the quantum version of the compression scheme discussed above, the task of Alice is to
transfer her system An to Bob using the smallest possible number of noiseless qubit channels
[q → q]. We postpone the full details of this task to volume 2 of the book where we study
quantum Shannon theory in more details.

8.2.2 Typical Subspaces

Consider an i.i.d. quantum source generated from the ensemble {qy , ϕA
y }y∈[k] . Let
X X
ρA := qy |ϕy ⟩⟨ϕy |A = px |x⟩⟨x|A , (8.51)
y∈[k] x∈[m]

where {px } are the eigenvalues of ρ and {|x⟩} are the corresponding eigenvectors. We define
the classical system (random variable) X to have alphabet symbols x ∈ [m] corresponding
to a probability distribution {px }. We point out that the alphabet symbols of the system Y
that we discussed above corresponds to a different probability distribution {qy }.
Now, observe that the state
n
X X X
ρ⊗n = ··· px1 px2 · · · pxn |x1 · · · xn ⟩⟨x1 · · · xn |A
x1 ∈[m] x2 ∈[m] xn ∈[m]
X n
(8.52)
:= pxn |x ⟩⟨xn |A
n

xn ∈[m]n

where |xn ⟩ := |x1 · · · xn ⟩ and pxn := px1 px2 · · · pxn . We use the notation Tε (X n ) to denote
the set of all ε-typical sequences xn with respect to a classical system X corresponding to
an i.i.d.∼ p source. It is important to note that the components of the vector p ∈ Prob(m)
are the eigenvalues of ρ. With this notation, for every such i.i.d.∼ ρA source, we define a
corresponding typical subspace
Tε (An ) := span {|x1 · · · xn ⟩ : xn ∈ Tε (X n )} ⊆ An , (8.53)
and a typical projection X
Πnε := |xn ⟩⟨xn | . (8.54)
xn ∈T ε (X n )

2
Theorem 8.2.1. Let ρ ∈ D(A), ε ∈ (0, 1), δn := e−cε n where c is defined in (8.28),
Tε (An ) and Πnε be the typical subspaces and projections associated with a quantum
i.i.d.∼ ρ source. Further, for each n ∈ N, let Pn ∈ Pos(An ) be an orthogonal
projection to a subspace with dimension Tr [Pn ] ⩽ 2nr for some r < H(A)ρ (r is
independent on n). Then, for all n ∈ N the following three inequalities hold:

Tr Πnε ρ⊗n ⩾ 1 − δn .

1. (8.55)
2. (1 − δn )2n(H(A)ρ −ε) ⩽ Tr [Πnε ] ⩽ 2n(H(A)ρ +ε) . (8.56)
′
Tr Pn ρ⊗n ⩽ e−c n for some c′ > 0 .

3. (8.57)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

386 CHAPTER 8. THE ASYMPTOTIC REGIME

Proof. The proofs of the first two parts of the theorem follow from their classical counter-
parts. Particularly, for the first part
X
Tr Πnε ρ⊗n =

p xn
n n
x ∈Tε (X ) (8.58)
n
= Pr (Tε (X )) ⩾ 1 − δn .
For the second part
Tr [Πnε ] = dim (Tε (An )) = |Tε (X n )| . (8.59)
Therefore, this part follows as well from its classical counterpart. It is therefore left to prove
the third part.
We first split the trace into two parts:
Tr Pn ρ⊗n = Tr Pn ρ⊗n Πnε + Tr Pn ρ⊗n (I − Πnε ) .

(8.60)
Our objective is to show that both of these terms are going to zero as n → ∞. For the first
one X
Tr Pn ρ⊗n Πnε = pxn ⟨xn |Pn | xn ⟩

xn ∈Tε (X n )
X
pxn ⩽ 2−n(H(A)−ε) −−−−→ ⩽ 2−n(H(A)−ε) ⟨xn |Pn | xn ⟩
xn ∈Tε (X n )
(8.61)
⩽ 2−n(H(A)−ε) Tr [Pn ]
Tr [Pn ] ⩽ 2nr −−−−→ ⩽ 2−n(H(A)−ε−r) −−−→ 0 ,
n→∞

where we assumed that ε > 0 is small enough so that H(A) − r > ε. For the second term,
X X
Tr Pn ρ⊗n (I − Πnε ) = pxn ⟨xn |Pn | xn ⟩ ⩽

pxn −−−→ 0 . (8.62)
n→∞
xn ̸∈Tε (X n ) xn ̸∈Tε (X n )

This completes the proof.

8.3 Relative Typicality

As discussed earlier, in the realm of information theory and quantum information, we often
encounter scenarios where large sequences or states, denoted as xn or ρ⊗n respectively, are
drawn from i.i.d. sources. We saw that for sufficiently large values of n, these sequences or
states are highly likely to be typical, adhering to a certain expected pattern. Building on this
foundation, we now extend this concept of typicality to pairs of sequences and pairs of quan-
tum states. This extension is analogous to how majorization, a mathematical concept used
to compare probability vectors based on their spread or dispersion, is extended to relative
majorization. In the case of relative majorization, we transitioned from comparing individual
probability vectors to comparing pairs of vectors. Similarly, our extension involves analyzing
pairs of sequences and quantum states, assessing their typicality not just individually but in
relation to each other.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

8.3. RELATIVE TYPICALITY 387

8.3.1 The Classical Domain

We start with the following definition of a relative ε-typical sequence.

Definition 8.3.1. Let n, m ∈ N, p, q ∈ Prob(m), and ε ∈ (0, 1). A sequence

xn ∈ [m]n is said to be relative ε-typical with respect to an i.i.d.∼ p source, and a
probability vector q ∈ Prob(m) if

1 p xn
D(p∥q) − log ⩽ε. (8.63)
n q xn

The set of all relative ε-typical sequences is denoted by Trel n

ε (X ).

Exercise 8.3.1. Show that if xn ∈ Trel n

ε (X ) then

2−n(D(p∥q)+ε) pxn ⩽ qxn ⩽ 2−n(D(p∥q)−ε) pxn (8.64)

We saw in Sec. 8.1.1 that given an i.i.d.∼ p source, all typical sequences with large size n,
that are generated by the source, have approximately the same probability to occur given by
≈ 2−nH(p) . This phenomenon was dubbed as the asymptotic equipartition property (AEP).
Here we study a variant of this property as described in the following theorem.

Theorem 8.3.1. Let p, q ∈ Prob(m) with supp(p) ⊆ supp(q), ε ∈ (0, 1),

2
δn := e−cε n where c is defined in (8.28), X be a random variable associated with an
i.i.d.∼ p source, and for each n ∈ N, let Kn ⊆ [m]n be a set of sequences with
cardinality |Kn | ⩽ 2nr for some r < D(p∥q). Then, for all n ∈ N the following three
inequalities hold:

1. Pr(Trel n
ε (X )) > 1 − δn .

2. (1 − δn )2n(D(p∥q)−ε) ⩽ Trel n
ε (X ) ⩽ 2
n(D(p∥q)+ε)
.
′
3. Pr(Kn ) ⩽ e−c n , for some c′ > 0.

Remark. Roughly speaking, the above theorem indicates that almost all sequences xn ∈ [m]n
have the same ratio pqxxnn ≈ 2−nD(p∥q) . Observe also that although we consider two probability
distributions p and q, the sequences X1 , X2 , . . . are drawn from a single p-source.

Proof. Due to the similarity of this theorem to Theorem 8.1.3, we provide here only the proof
of the first inequality, leaving the remaining two inequalities as an exercise for the reader.
Without loss of generality suppose q > 0 and let Y := log pqXX be the random variable with
alphabet {log(px /qx )}x∈[m] and corresponding probability p = (p1 , . . . , pm )T . Consider the
sequence X n = (X1 , X2 , . . . , Xn ) drawn from an i.i.d.∼ p source, and for each j ∈ [n] let

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

388 CHAPTER 8. THE ASYMPTOTIC REGIME
pXj
Yj = log qXj
. Observe that for each j ∈ N
X
E(Yj ) = E(Y ) = px log(px /qx ) = D(p∥q) . (8.65)
x∈[m]

Therefore, since
1 pX n 1 pX · · · pXn
log = log 1
n qX n n qX1 · · · qXn
1 X pX (8.66)
= log j
n qXj
j∈[n]

we conclude that

1 pX n
Trel n

Pr ε (X ) = Pr log − D(p∥q) < ε
n qX n
1 X
= Pr Yj − E(Y ) < ε (8.67)
n
j∈[n]
2
Hoeffding′ s Inequality→ > 1 − e−ncε .

This completes the proof of the first inequality.

Exercise 8.3.2. Prove the outstanding inequalities of the aforementioned theorem. Hint:
Utilize a method similar to that applied in deriving the analogous inequalities in Theo-
rem 8.1.3.

The theorem above demonstrate that the probability that a sequence is relative ε-typical
is very high. Note however that the probability Pr Trel n
ε (X ) is computed with respect to
an i.i.d.∼ p source. If on the other hand, we change p with q we would get the probability
X
Pr Trel n

ε (X ) q
:= q xn . (8.68)
xn ∈Trel n
ε (X )

In the following exercise you show that this probability goes to zero exponentially fast with
n.

Exercise 8.3.3. Let p, q ∈ Prob(m) with supp(p) ⊆ supp(q), and ε ∈ (0, 1). Show that for
all n ∈ N
(1 − δn )2−n(D(p∥q)+ε) ⩽ Pr Trel n −n(D(p∥q)−ε)

ε (X ) q ⩽ 2 . (8.69)
Hint: Take the sum
over all xn ∈ Tε (X n ) in all sides of (8.64), and use the inequalities
1 ⩾ Pr Trel n
ε (X ) > 1 − δn .

The pair of probability vectors (p⊗n , q⊗n ) becomes more distinguishable as we increase
n. In the following corollary, we use the theorem above to characterize this distinguishability
with relative majorization.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

8.3. RELATIVE TYPICALITY 389

Corollary 8.3.1. Let m ∈ N, s, t ∈ Prob>0 (2), and p, q ∈ Prob(m) with

supp(p) ⊆ supp(q) and p ̸= q. Then, for large enough n ∈ N

(p⊗n , q⊗n ) ≻ (s, t) . (8.70)

Proof. Let ε > 0 be a small number and define the stochastic evolution matrix E ∈
STOCH(2, 2n ) by its action on the standard basis {exn := ex1 ⊗ · · · ⊗ exn }xn ∈[m]n as
(
(2)
e1 if xn ∈ Trel n
ε (X )
Eexn := (2) (8.71)
e2 if xn ̸∈ Trel n
ε (X )

(2) (2)
where e1 := (1, 0)T and e2 = (0, 1)T form the standard basis of R2 . We then get

p⊗n , q⊗n ≻ Ep⊗n , Eq⊗n

   
rel n rel n
Pr Tε (X ) Pr Tε (X ) q (8.72)
=  , 
rel n rel n
1 − Pr Tε (X ) 1 − Pr Tε (X ) q

Observe that due to the bounds Pr Trel n
ε (X ) ⩾ 1 − δn and the bound Pr Tε (X ) ⩽
rel n

2−n(D(p∥q)−ε) (see (8.69)) the pair of vectors on the right-hand side approaches the pair
(e1 , e2 ) as n → ∞, where {e1 , e2 } is the standard basis of R2 . Therefore, combining this
with Exercise 4.3.22 we conclude that for any s, t ∈ Prob>0 (2), and sufficiently large n ∈ N,

p⊗n , q⊗n ≻ (s, t) .

(8.73)

This concludes the proof.

8.3.2 Relative Typical Subspaces

In this subsection, we aim to expand the concept of relative typical sequences into the
quantum domain. However, we soon encounter a significant challenge: unlike classical states,
quantum states generally do not commute. This non-commutativity makes a straightforward
extension from the classical framework, as shown in (8.63), non-trivial. Consequently, the
definition of a relative typical subspace in the quantum context will significantly diverge
from its classical counterpart.
Let ρ, σ ∈ D(A) be two density matrices (which we can viewed here as two i.i.d. quantum
sources). Suppose also that supp(ρ) ⊆ supp(σ). Let the spectral decomposition of ρ and σ
be X X
ρ= px |ψx ⟩⟨ψx | and σ = qy |ϕy ⟩⟨ϕy | . (8.74)
x∈[m] y∈[m]

Similar to the notations in the previous section, for any integer n we will denote by y n :=
(y1 , . . . , yn ), qyn = qy1 · · · qyn , and |ϕyn ⟩ := |ϕy1 ⟩ ⊗ · · · ⊗ |ϕyn ⟩. Then, for any ε > 0 and n ∈ N

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

390 CHAPTER 8. THE ASYMPTOTIC REGIME

the relative typical subspace, Trel n

ε (A ), is defined as

rel n 1
Tε (A ) := span |ϕyn ⟩ : Tr [ρ log σ] − log (qyn ) ⩽ ε . (8.75)
n

We also denote by Πrel,nε the projection to the relative typical subspace Trel n
ε (A ). Note that
Tr [ρ log σ] is well defined since supp(ρ) ⊆ supp(σ).
The definition provided above clearly does not revert to the classical definition of a
relative typical sequence when ρ and σ commute. Nevertheless, as we will explore in the
theorem below and the subsequent sections, this definition proves to be an effective tool for
examining the distinguishability of quantum states. Furthermore, as demonstrated in the
upcoming exercise, relative typical subspaces do indeed converge to typical subspaces in the
case where ρ = σ.

Exercise 8.3.4. Show that if ρ = σ then the relative typical subspace reduces to the typical
subspace of ρ as defined in the previous section. That is, show that in this case Trel n
ε (A ) =
Tε (An ).

Theorem 8.3.2. Let ε > 0, ρ, σ ∈ D(A) with supp(ρ) ⊆ supp(σ), and c > 0 as
defined in (8.82). Then, for all n ∈ N
⊗n 2
Tr Πrel,n ⩾ 1 − e−ncε .

ε ρ (8.76)

Proof. Note that the function Tr[ρ log σ] can be expressed as

X X
Tr[ρ log σ] = ⟨ϕy ρ log σ ϕy ⟩ = ⟨ϕy |ρ|ϕy ⟩ log qy . (8.77)
y∈[m] y∈[m]

Therefore, denoting the relative distribution ry := ⟨ϕy |ρ|ϕy ⟩, and by Y the random variable
whose alphabet is [m], and its corresponding distribution is {ry }y∈[m] , we get that

Tr[ρ log σ] = E (log qY ) . (8.78)

Furthermore, we denote by Cnε the classical typical set

n n 1
Cε := y : Tr [ρ log σ] − log (qyn ) ⩽ ε . (8.79)
n

Then, by definition, X
⊗n
Tr Πrel,n ⟨ϕyn ρ⊗n ϕyn ⟩

ε ρ =
y n ∈Cn
ε
X (8.80)
= ry n ,
y n ∈Cn
ε

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

8.4. THE METHOD OF TYPES 391

where the last term is the probability that a sequence Y n belongs to Cnε . Therefore,
⊗n
Tr Πrel,n = Pr {Cnε }

ε ρ
n 1X o
= Pr E (log qY ) − log qYi ⩽ ε (8.81)
n
i∈[n]
−cnε2
Hoeffding′ s Inequality→ ⩾ 1 − e .

This completes the proof.

Exercise 8.3.5. Using the same notations as above, show that:
1. The constant c can be taken to be (cf. (8.28))
2
c= 2 , (8.82)
log(qmax /qmin )

where qmin is the smallest non-zero eigenvalue of σ, and qmax is the largest eigenvalue
of σ.

2. The relative typical projector Πrel,n

ε satisfies

2n(Tr[ρ log σ]−ε) Πrel,n

ε ⩽ Πrel,n
ε σ ⊗n Πrel,n
ε ⩽ 2n(Tr[ρ log σ]+ε) Πrel,n
ε . (8.83)

8.4 The Method of Types

The method of types is a fundamental concept in information theory that provides a powerful
framework for analyzing the statistical properties of sequences of symbols. Originating from
the work of Claude Shannon, this method revolves around the idea of categorizing sequences
into types based on the frequency of each symbol’s occurrence. By treating sequences with
similar compositions as a single type, the method simplifies the analysis of large sets of
data. This approach is particularly effective in understanding the behavior of random pro-
cesses, quantifying the efficiency of coding schemes, and in the study of large deviations and
typicality. The method of types has become a cornerstone in both classical and quantum
information theory, underpinning many key theorems and applications in areas such as data
compression, communication theory, and statistical inference.

Type of a Sequence
Definition 8.4.1. Let n, m ∈ N. For every xn := (x1 , . . . , xn ) ∈ [m]n and z ∈ [m], let
N (z|xn ) be the number of elements in the sequence xn that are equal to z. The type
of the sequence xn is a probability vector in Prob(m) given by
T 1
t(xn ) := t1 (xn ), . . . , tm (xn ) , where tz (xn ) := N (z|xn ) ∀ z ∈ [m]. (8.84)
n

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

392 CHAPTER 8. THE ASYMPTOTIC REGIME

For example, for m = 3, the type of the sequence x6 = (2, 1, 1, 3, 2, 2) is the probability
vector t(x6 ) = (1/3, 1/2, 1/6).
The significance of types comes into play when considering an i.i.d∼ p source. In this
case, the probability of a sequence xn ∈ [m]n drawn from the source is given by

N (1|x ) n n)
pxn := px1 · · · pxn = p1 · · · pN
m
(m|x

N (z|xn ) log2 pz
P
∀ r > 0 r = 2log r −−−−→ = 2 z∈[m]
P n ) log (8.85)
N (z|xn ) = ntz (xn ) −−−−→ = 2n z∈[m] tz (x 2 pz

−n H(t(xn ))+D(t(xn )∥p)
=2 ,

where H(t(xn )) is the Shannon entropy of the type of the sequence xn , and D (t(xn )∥p) is
the KL-divergence between t(xn ) and p (see (5.24)). The above formula manifest that the
probability distributions of sequences drawn from an i.i.d. source only depend on the type
of the sequence. As we will see below, this property can lead to a significant simplification
in some applications.
We denote by Type(n, m) ⊆ Prob(m) the set of all types of sequences in [m]n . For
example, for sequences of bits (i.e. m = 2)
( T T )
n−1 1 n−2 2 T
Type(n, 2) = (1, 0)T , , , , , . . . , (0, 1) (8.86)
n n n n

Note that any type t ∈ Type(n, m) has m components of the form nk where k ∈ {0, . . . , n}.
Therefore, the number of types in Type(n, m) cannot exceed (n + 1)m , which is polynomial
in n. The exact number of types can be computed using the “stars and bars” method in
combinatorics. It is given by

n+m−1
|Type(n, m)| = ⩽ (n + 1)m . (8.87)
n

On the other hand, the number all sequences of size n is mn which is exponential in n.
The set of all sequences xn of a given type t = (t1 , . . . , tm ) will be denoted as X n (t). We
emphasize that X n (t) denotes a set of all sequences in [m]n whose type is t, whereas t(xn )
denotes a single probability vector (i.e. the type of a specific sequence xn ). The number of
sequences in the set X n (t) is given by the combinatorial formula of arranging nt1 , . . . , ntm
objects in a sequence,

n n n!
|X (t)| = := Qm . (8.88)
nt1 , . . . , ntm x=1 (ntx )!

The above formula is somewhat cumbersome, but by using Stirling’s approximation we can
find simpler lower and upper bound.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

8.4. THE METHOD OF TYPES 393

Lemma 8.4.1. Let t ∈ Type(n, m). Then

1
2nH(t) ⩽ |X n (t)| ⩽ 2nH(t) . (8.89)
(n + 1)m

Proof. Let xn be a sequences of size n drawn from an i.i.d. source according to the distribu-
tion t. Then, X X
1= txn ⩾ txn
xn ∈[m]n xn ∈X n (t)
X
(8.85) with p = t −−−−→ = 2−nH(t) (8.90)
xn ∈X n (t)

= |X n (t)|2−nH(t) .
This proves that |X n (t)| ⩽ 2nH(t) . For the other inequality, we make use the Stirling’s
bounds √ 1 1
2πnn+ 2 e−n ⩽ n! ⩽ enn+ 2 e−n . (8.91)
By using the lower bound for n! and the upper bound for each (ntx )! of (8.88) we get that
√ 1
n n! 2πnn+ 2 e−n
|X (t)| = Qm ⩾ Qm ntx + 21 −ntx
x=1 (ntx )! x=1 e(ntx ) e
√ 1
2πn 2
= m Q ntx + 12
(8.92)
em n 2 m tx
√ x=1√
2π n
= √ m√ 2−nH(t) .
(e n) t1 · · · tm
It is left as an exercise (see Exercise 8.4.1) to show that
√ √
2π n 1
√ m√ ⩾ (8.93)
(e n) t1 · · · tm (n + 1)m
for all n, m, and t1 , . . . , tm .
Exercise 8.4.1. Prove the inequality in (8.93). Hint: Use the fact that the product t1 · · · tn
is Schur concave and achieves its maximum when t1 = t2 = · · · = tm = m1 .
Exercise 8.4.2. Let K ⊂ Type(n, m) be a set of probability distributions (that are types),
and define Cn := {xn ∈ [m]n : t(xn ) ∈ K}. Fix q ∈ K. Show that
X
Pr(Cn )q := q xn (8.94)
xn ∈Cn

approaches one in the limit n → ∞. Hint: Denote by Ccn the complement of Cn in [m]n and
show that Pr(Ccn )q approaches zero in the limit n → ∞. Use (8.85), (8.89), and the fact that
D(t∥q) > 0 for any type t ̸= q.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

394 CHAPTER 8. THE ASYMPTOTIC REGIME

Exercise 8.4.3. Let n ∈ N. Use the strong Stirling’s approximation, which states that
√ n n √ n n 1
2πn ⩽ n! ⩽ 2πn e 12n (8.95)
e e
to show that:

1. For any p ∈ (0, 1) such that np ∈ N we have

2nh(p)

n
⩽p (8.96)
np πnp(1 − p)

where h(p) = −p log p − (1 − p) log(1 − p) is the binary Shannon entropy.

2. For any integer k ⩽ n/2

n k
⩽ 2nh( n ) . (8.97)
k

8.4.1 Many Copies of a Quantum State

In this subsection we show that the method of types can be used to simplify the expression
of n copies of some quantum state σ ∈ D(A). Let m be the number of distinct eigenvalues
of σ, so that X
σ= q x Px (8.98)
x∈[m]

where spec(σ) = {q1 , . . . , qm }, and {Px }x∈[m] are orthogonal projectors. For n copies of sigma
X
σ ⊗n = q xn Pxn (8.99)
xn ∈[m]n

where qxn := qx1 · · · qxn and Pxn = Px1 ⊗ · · · ⊗ Pxn . From (8.85) the probability qxn =
H(t(xn ))+D(t(xn )∥q)
2−n depends only on the type of xn . Therefore, we can express σ ⊗n as
X
⊗n −n H(t)+D(t∥q)
σ = 2 Pt (8.100)
t∈Type(n,m)

where for any t ∈ Type(n, m) X

Pt := P xn . (8.101)
xn ∈X n (t)

Note that the set {Pt }t∈Type(n,m) is itself a set of orthogonal projectors. The significance of
the formula above is that the number of terms in the sum is given by

spec σ ⊗n = |Type(n, m)| ⩽ (n + 1)m ,

(8.102)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

8.4. THE METHOD OF TYPES 395

which is polynomial in n. Therefore, the original sum in (8.99) that consists of mn terms,
has been reduced to a sum with a polynomial number of terms.
This exponential reduction in the number of terms can be applied to the pinching map
PH given in Eqs. (3.227,3.233). For the case that H = σ ⊗n for some σ ∈ D(A) we have
|spec(H)| = |spec (σ ⊗n )| ⩽ (n + 1)m . Therefore, when combined with the pinching inequal-
ity (3.235) we conclude that for all ρ ∈ D(An )

n 1 n
Pσ⊗n ρA ρA .

⩾ m
(8.103)
(n + 1)

We will see later on that this inequality can be very useful as polynomial terms such as
(n + 1)m turns out to be “negligible” in some applications.

8.4.2 Sanov’s Theorem

Let C ⊆ Prob(m) be a set of probability vectors. Sanov’s theorem provides an estimate on
the probability that a given sequence xn , that is drawn from an i.i.d.∼ q source, has a type
belonging to C. Clearly, this probability can be zero if the set C does not contain types. It
is therefore necessary to assume that C is a “nice” set; particularly, we will assume that the
set C is non-empty and is such that C is the closure of its interior. Thus, the interior of C is
not empty (otherwise, C will also beSempty) and consequently C contains a ball of non-zero
radius. In particular, since the set n∈N Type(n, m) is dense in Prob(m) it follows that for
sufficiently large n the set C ∩ Type(n, m) is non-empty.
The probability that a sequence of size n has a type in C is given by
X
Prn (C) := q xn (8.104)
xn ∈Kn

where
Kn := xn ∈ [m]n : t(xn ) ∈ C ∩ Type(n, m) .

(8.105)

When n is very large one can expect the type of xn to be relatively close to q (we will
make this notion precise in the next subsection when we study strong typicality). Therefore,
if q ̸∈ C one can expect that the probability Prn (C) decreases with n. Indeed, Sanov’s
⋆
theorem states that for large n we have Prn (C) ≈ 2−nD(p ∥q) where the probability vector
p⋆ ∈ Prob(m) is defined as
p⋆ := arg min D(p∥q) , (8.106)
p∈C

where D is the KL-divergence. This result has a geometrical interpretation that the expo-
nential decay of Prn (C) is increasing with the “distance” (as measured by the KL-divergence)
of q from the set C (see Fig. 8.2).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

396 CHAPTER 8. THE ASYMPTOTIC REGIME

Figure 8.2: Sanov’s Theorem. The exponential decay factor is determined by the distance of q
from C (as measured by the KL-divergence). The green triangle represents the probability simplex
Prob(m), and the oval shape the set C.

Sanov’s Theorem
Theorem 8.4.1. Let C ∈ Prob(m) be a non-empty set of probability distributions
such that C is the closure to its interior, and consider an i.i.d.∼ q source. Using the
same notations as above,
1
lim − log Prn (C) = min D(p∥q) := D(p⋆ ∥q) . (8.107)
n→∞ n p∈C

Remark. Note that while the vector p⋆ is not necessarily in n∈N Type(n, m), there exists
S
a sequence of types {pn }n∈N , with pn ∈ C ∩ Type(n, m) for sufficiently large n, such that
pn → p∗ as n → ∞.

Proof. We will prove the theorem by finding upper and lower bounds for Prn (C). For the
upper bound, using (8.85) we get

H(t(xn ))+D(t(xn )∥q)
X X
Prn (C) = 2−n = |X n (t)|2−n H(t)+D(t∥q)

xn ∈Kn t∈C∩Type(n,m)
X
(8.89)→ ⩽ 2−nD(t∥q)
t∈C∩Type(n,m)
X ⋆ ∥q)
(8.108)
By definition of p → ⩽ ⋆
2−nD(p
t∈C∩Type(n,m)
⋆ ∥q)
⩽ |Type(n, m)|2−nD(p
⋆ ∥q)
(8.87)→ ⩽ (n + 1)m 2−nD(p .

Note that we got this upper bound without assuming that C is the closure of its interior.
We only assumed that C is non-empty so that p⋆ exists.
For the lower bound, let {pn }n∈N , with pn ∈ C ∩ Type(n, m) for sufficiently large n, such

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

8.5. STRONG TYPICALITY 397

that pn → p∗ as n → ∞ (see the remark above). Then, for sufficiently large n we have
X
Prn (C) = |X n (t)|2−n H(t)+D(t∥q)
t∈C∩Type(n,m)

T aking only the term n −n H(pn )+D(pn ∥q) (8.109)
t=pn in the sum → ⩾ |x (pn )|2
1
(8.89)→ ⩾ 2−nD(pn ∥q) .
(n + 1)m

From the two bounds above we get

1 1 1
D(p⋆ ∥q) − log(n + 1)m ⩽ − log Prn (C) ⩽ D (pn ∥q) + log(n + 1)m . (8.110)
n n n
The proof is concluded by taking the limit n → ∞.

8.5 Strong Typicality

In this section, we employ the method of types to define the concepts of strong typical
sequences and strong typical subspaces. While weak typicality serves as a valuable tool in
various applications due to its simplicity, it often yields less robust results. As we will see
in certain examples, there are instances where the application of strong typicality methods
becomes necessary to achieve more robust outcomes.

8.5.1 Strong Typical Sequences

Previously, we examined what is often called weak typicality. To understand its limitations,
consider an i.i.d.∼ p source sequence that is ε-typical, satisfying (8.24). Simplifying (8.24)
using (8.85), a sequence is (weakly) ε-typical if

H t (xn ) + D t(xn ) p − H(p) ⩽ ε .

(8.111)

In other words, xn is ε-typical if the entropy of its type approximates the entropy of p,
with a negligible correction term D (t(xn )∥p) as t(xn ) converges to p. To establish a more
robust concept of typicality, one might require that the type of xn is ε-close to p. This is
a somewhat more natural requirement and the question is then: which metric to use? The
fundamental requirement is that each element of t(xn ) should be ε-close to its counterpart
in p, necessitating
|pz − tz (xn )| ⩽ ε ∀ z ∈ [m] . (8.112)
Note that any sequence xn that satisfies ∥p − t(xn )∥p ⩽ ε (for some p ⩾ 1) will satisfies
the above equation, and therefore it impose a slightly stronger condition on the sequence
xn . On the other hand, the condition ∥p − t(xn )∥∞ ⩽ ε is precisely equivalent to the above
condition, and we will therefore use it to measure the distance between t(xn ) and p.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

398 CHAPTER 8. THE ASYMPTOTIC REGIME

Strongly Typical Sequences

Definition 8.5.1. Let xn ∈ [m]n be a sequence of size n drawn from an i.i.d.∼ p
source. The sequence xn is said to be strongly ε-typical if N (z|xn ) = 0 for all
z ̸∈ supp(p) and
∥p − t(xn )∥∞ ⩽ ε. (8.113)
The set of all strongly typical sequences of size n is denoted by Tst n
ε (X ).

Note that, in accordance with our previous notation, the condition stipulated in the
above definition can be equivalently expressed as:

N (z|xn )
pz − ⩽ε ∀ z ∈ [m] such that pz > 0 , (8.114)
n

and N (z|xn ) = 0 whenever pz = 0. This latter condition intuitively implies that a typ-
ical sequence should not include any alphabet characters that have a zero probability of
occurrence.
In the next theorem we prove several properties of typical sequences. We will use the
notation X
Pr(Tst n
ε (X )) := p xn (8.115)
xn ∈Tst n
ε (X )

to denote the probability that a sequences is strongly ε-typical (with respect to an i.i.d.∼ p
source). Moreover, we set the constant a > 0 to be
Y
a := − log pz . (8.116)
z∈supp(p)

Properties of Strongly Typical Sequences

Theorem 8.5.1. Let X be the random variable of an i.i.d.∼ p source, and let ε > 0.
For all n ∈ N the following inequalities hold:
−2ε n 2
1. Pr(Tst n
ε (X )) ⩾ 1 − e .

2. 2−n(H(X)+aε) ⩽ pxn ⩽ 2−n(H(X)−aε) , for all xn ∈ Tst n

ε (X ).
2
3. 1 − e−2ε n 2n(H(X)−εa) ⩽ |Tst n
ε (X )| ⩽ 2
n(H(X)+εa)
.

Remark. Observe that the probability that a sequence is strongly ε-typical approach one
exponentially fast with n. The second property highlights the equipartition property, indi-
cating that the probability of every typical sequence approximately 2−nH(X) . The first and
third properties bear resemblance to their counterparts related to weak typical sequences.
However, due to the subtle distinctions between the two concepts of typicality, the upcoming
proof will incorporate additional tools to address these differences.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

8.5. STRONG TYPICALITY 399

Proof. For any z ∈ [m], let 1z (X) be the indicator random variable that equals 1 if X = z
and 0 otherwise. Fix z ∈ [m], and let Z1 , Z2 , . . . be an i.i.d. sequences of random variables
where each Zj := 1z (Xj ) is an indicator random variable as above that corresponds to Xj .
From the law of large numbers, particularly the application of Hoeffding’s inequality (8.15)
to the sequence Z1 , . . . , Zn we get
1 X 2
Pr Zj − E(Z) > ε ⩽ e−2nε , (8.117)
n
j∈[n]

where we used the fact that 0 ⩽ Zj ⩽ 1 so that the constants a and b in (8.15) are given by
a = 0 and b = 1. By definition,
X
E(Z) = px δxz = pz , (8.118)
x∈[m]

and
1 X 1 X 1
Zj = δxj z = N (z|xn ) = tz (xn ) . (8.119)
n n n
j∈[n] j∈[n]

We therefore conclude that

2
Pr tz (xn ) − pz > ε ⩽ e−2nε .

(8.120)

The equation above holds for all z ∈ [m] and all n ∈ N. Hence, it states that the probability
that the random variable X n = (X1 , . . . , Xn ) is not an ε-typical sequence, is no greater than
2
e−2nε . This completes the proof of first part of the theorem.
Suppose now that xn ∈ Tst n
ε (X ) and observe that

(z|xn )
Y
p xn = p x1 p x2 · · · p xn = pN
z . (8.121)
z∈supp(p)

Taking the log on both sides and dividing by n gives

1 X
log pxn = tz (xn ) log pz (8.122)
n
z∈supp(p)

Now, since xn is strongly ε-typical sequence, for every z ∈ [m]

pz − ε ⩽ tz (xn ) ⩽ pz + ε . (8.123)

Combining this with the previous equation gives

X 1 X
(pz + ε) log pz ⩽ log pxn ⩽ (pz − ε) log pz (8.124)
n
z∈supp(p) z∈supp(p)

That is,
1
−εa − H(X) ⩽ log pxn ⩽ εa − H(X) (8.125)
n

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

400 CHAPTER 8. THE ASYMPTOTIC REGIME

Multiplying both sides by n and raising all sides to the power of 2 completes the proof of
the second part.
To prove the third part, observe that from the lower bound of Part 2 we get
X X
1⩾ p xn ⩾ 2−n(H(X)+εa)
xn ∈Tst n
ε (X ) xn ∈Tst n
ε (X ) (8.126)
st n −n(H(X)+εa)
= Tε (X ) 2 .
Therefore, |Tst n
ε (X )| ⩽ 2
n(H(X)+εa)
. On the other hand, from Part 1 we get
2
X
1 − e−2nε ⩽ p xn
xn ∈Tst n
ε (X )
X
Upper bound of Part 2→ ⩽ 2−n(H(X)−εa) (8.127)
xn ∈Tst n
ε (X )
−n(H(X)−εa)
= Tst n
ε (X ) 2 .
2
Hence, 1−e−2nε 2n(H(X)−εa) ⩽ |Tst n
ε (X )|. This completes the proof of the third part.

Exercise 8.5.1. Let xn ∈ [m]n and y k ∈ [m]k be two sequences of size n and k, respectively,
drawn from the same i.i.d.∼ p source. Show that if both xn and y k are strongly ε-typical then
also the joint sequence (xn , y k ) ∈ [m]n+k is strongly ε-typical with respect to the i.i.d.∼ p
source.
Exercise 8.5.2. Let 1 < n ∈ N, ε ∈ ( n1 , 1), and consider an i.i.d.∼ p source, where
p ∈ Prob(m).
1. Show that if a sequence xn−1 is strongly ε-typical then for any y ∈ [m] the sequence
xn := (y, xn−1 ) is strongly (ε + n1 )-typical.
2. Let ε′ := ε − n1 . Show that T′ ⊂ Tst n
ε (X ), where

T′ := (y, xn−1 ) : y ∈ [m] , xn−1 ∈ [m]n−1 , t(xn−1 ) − p ⩽ ε′

∞
. (8.128)

3. Show that for any q ∈ Prob(m) we have

′2
X
qx1 px2 · · · pxn ⩾ 1 − e−2(n−1)ε . (8.129)
xn ∈Tst n
ε (X )

Hint: Use the first part of the theorem above.

8.5.2 Strong Typical Subspace

The extension of strong typicality from classical sequences to quantum states follows similar
lines to the quantum extension of weak classical typicality. Let ρ be an i.i.d. quantum source
with spectral decomposition X
ρA = px |x⟩⟨x|A , (8.130)
x∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

8.6. CLASSICAL HYPOTHESIS TESTING 401

where {px }x∈[m] are the eigenvalues of ρ and {|x⟩}x∈[m] are the corresponding eigenvectors.
As before we denote
n
X
ρ⊗n = pxn |xn ⟩⟨xn |A (8.131)
xn ∈[m]n

where |xn ⟩ := |x1 , . . . , xn ⟩ and pxn := px1 px2 · · · pxn . For any i.i.d. quantum source ρ we
define a corresponding strongly typical subspace

Tst n
n
ε (A ) := span |x ⟩ ∈ An : xn ∈ Tst n
ε (X ) (8.132)

where Tst n
ε (X ) is the set of (classical) sequences of size n, drawn from an i.i.d.∼ p source
(with p being the probability vector whose components are the eigenvalues of ρ). The
strongly typical projection to this subspace is given by
X
Πn,st
ε := |xn ⟩⟨xn | . (8.133)
xn ∈Tst n
ε (X )

Theorem 8.5.2. Let ρ ∈ D(A), ε > 0, and for each n ∈ N let Tεn (ρ) and Πn,st
ε be the
strongly typical subspace and projection associated with a quantum i.i.d.∼ ρ source.
The following inequalities hold for all n ∈ N:
⊗n 2
1. Tr [Πn,st
ε ρ ] ⩾ 1 − e−2ε n .

2. 2−n(H(A)ρ +aε) Πn,st

ε ⩽ Πn,st
ε ρ
⊗n n,st
Πε ⩽ 2−n(H(A)ρ −aε) Πn,st
ε .

3. (1 − δ)2n(H(ρ)−εa) ⩽ Tst n
ε (A ) ⩽ 2
n(H(ρ)+εa)
.

The proof follow directly from the classical version of this theorem and is left as an
exercise.

Exercise 8.5.3. Prove the theorem above.

Exercise 8.5.4. Let ρ ∈ D(A), ε > 0, integer m = o(n) (e.g. m = ⌊ns ⌋ for some 0 < s < 1),
and σm ∈ D(Am ). Let also Πn,st
ε be the strongly typical projection associated with the quantum
i.i.d.∼ ρ source. Show that

lim Tr Πn,st ρ⊗(n−m) ⊗ σm = 1 .

ε (8.134)
n→∞

8.6 Classical Hypothesis Testing

Imagine a game where a player, named Alice, is handed one of two biased dice. These dice are
characterized by probability vectors, denoted as p and q. Alice’s challenge is to determine
if she holds the p-dice or the q-dice. To make an informed decision, she’s allowed to roll
the dice n times. With a single roll (i.e., n = 1), the chances of Alice making an incorrect

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

402 CHAPTER 8. THE ASYMPTOTIC REGIME

assumption can be high. However, with an increase in the number of rolls, her probability
of making a mistake significantly reduces. This prompts a natural question: how rapidly
does the error probability decrease as the number of rolls, n, gets larger and larger? This
situation encapsulates the essence of a classical hypothesis testing problem.
In the realm of hypothesis testing, an observer or player aims to decide between two
hypotheses related to two i.i.d. sources. These hypotheses are represented as the p-source
and the q-source. Upon n independent interactions with this source, the observer receives
a sequence denoted as xn = (x1 , . . . , xn ) that belongs to the set [m]n . The challenge is to
ascertain the correct hypothesis based on this sequence.
The observer’s decision-making process can be represented by a function gn : [m]n →
{0, 1}. This function divides all potential sequences into two distinct groups:
1. The set {xn ∈ [m]n : gn (xn ) = 0} corresponds to the first hypothesis. Here, the
observer believes the sequences in this set are from the p-source.
2. The set {xn ∈ [m]n : gn (xn ) = 1} pertains to the second hypothesis, indicating that
the observer surmises the sequences are from the q-source.
Given this decision-making framework, two potential errors can emerge:
1. Type I Error. The observer incorrectly concludes that the sequence is from the q-
source when, in reality, it is from the p-source. The probability of this error occurring
is: X
α(gn ) := pxn . (8.135)
xn ∈[m]n
gn (xn )=1

2. Type II Error. The observer mistakenly assumes the sequence is from the p-source
when it actually originates from the q-source. The likelihood of this error is:
X
β(gn ) := q xn . (8.136)
xn ∈[m]n
gn (xn )=0

In the two errors above, we considered a deterministic hypothesis test, where the func-
tion gn : [m]n → {0, 1} remains fixed. A more general approach introduces an element of
randomness to the problem. Here, the observer randomly selects the function gn based on a
specific probability distribution.
To illustrate, consider a set of ℓ functions denoted as {gn,k }k∈[ℓ] , where each gn,k : [m]n →
{0, 1}. Accompanying these functions is a probability vector s ∈ Prob(ℓ). In this probabilis-
tic framework, when given the sequence xn , the observer first samples a values k according to
the distribution s. The observer then attributes the sequence to the p-source if gn,k (xn ) = 0,
and to the q-source if gn,k (xn ) = 1. It’s crucial to note that for each k ∈ [ℓ]:
X
β(gn,k ) = qxn = q⊗n · bk (8.137)
xn ∈[m]n
gn,k (xn )=0

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

8.6. CLASSICAL HYPOTHESIS TESTING 403

n
where the xn -component of the bit vector bk ∈ {0, 1}m is one if gn,k (xn ) = 0 and zero
otherwise. The vector
X
t := sk bk (8.138)
k∈[ℓ]

is termed the probabilistic hypothesis test. With the aforementioned notations and consider-
ing this broader context, the two types of errors can be described as follows:

1. Type I Error. This pertains to the likelihood of the observer incorrectly attributing
the sequence to the q-source when it originates from the p-source:
X X
α(t) := sk pxn = 1 − p⊗n · t . (8.139)
k∈[ℓ] xn ∈[m]n
gn,k (xn )=1

2. Type II Error. This represents the chance of the observer mistakenly deducing the
sequence belongs to the p-source when it is from the q-source:
X X
β(t) := sk qxn = q⊗n · t . (8.140)
k∈[ℓ] xn ∈[m]n
gn,k (xn )=0

From its definition (8.138), all the components of the probabilistic hypothesis test vector t are
n n
between zero and one (i.e. t ∈ [0, 1]m ). Conversely, any vector in [0, 1]m can be expressed
n
as a convex combination of bit vectors in {0, 1}m . Hence, t uniquely characterizes the
probabilistic hypothesis test performed by the observer.
The goal of the observer is therefore to choose a probabilistic test vector t such that both
types of error are very small. There are two common ways to do that, and we discuss both
now. The first one is the asymmetric method in which the observer minimizes the type II
error, β(t), while at the same time keep the type I error, α(t), below a certain threshold
ε > 0. The optimal way to do it is characterized by the Stein’s lemma. The second method,
also known as the symmetric way, in which one assumes a prior {s0 , s1 } known to the observer
in which the p-source occur with probability s0 , and the q-source with probability s1 . In
this case, the goal is to minimize the error probability that is given by s0 α(t) + s1 β(t). The
optimal value of this probability of error is characterized by the Chernoff information. A
fundamental instrument in these methods is the divergence used in hypothesis testing.

8.6.1 The Classical Hypothesis Testing Divergence

For a single application of the source, specifically when n = 1 as presented in Eqs. (8.139,8.140),
we define the two error types as α(t) := 1 − p · t and β(t) := q · t. By minimizing β(t) while
constraining α(t) to a specific threshold, we obtain the following divergence.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

404 CHAPTER 8. THE ASYMPTOTIC REGIME

Definition 8.6.1. For any p, q ∈ Prob(m) and ε ∈ [0, 1), the classical hypothesis
testing divergence is defined as
ε
(p∥q) := − log min q · t : p · t ⩾ 1 − ε , t ∈ [0, 1]m

Dmin (8.141)

where the minimization is over all probabilistic hypothesis test vectors t whose
components are in the interval [0, 1].

The hypothesis testing divergence is always non-negative and equals infinity if p · q = 0.

ε
The reason for the notation Dmin is that for ε = 0, it reduces to the min relative entropy:
0
(p∥q) = − log min q · t : p · t ⩾ 1 , t ∈ [0, 1]m

Dmin
X
∀ x∈[m]
tx =1 if px ̸=0 −−−−→ = − log qx (8.142)
x∈supp(p)

= Dmin (p∥q) .
ε
To see that Dmin in the definition above is indeed an (unnormalized) divergence, let p, q ∈
Prob(m), E ∈ STOCH(n, m), and observe that
ε
(Ep∥Eq) = − log min (Eq)T s : (Ep)T s ⩾ 1 − ε , s ∈ [0, 1]n

Dmin
−−−−→ ⩽ − log min q · t : p · t ⩾ 1 − ε , t ∈ [0, 1]m

Replacing E T s∈[0,1]m (8.143)
with arbitrary t∈[0,1]m

ε
= Dmin (p∥q) .

Exercise 8.6.1. Show that the constraint p · t ⩾ 1 − ε in (8.141) can be replaced with
ε
p · t = 1 − ε (i.e. both constraints leads to the same value of Dmin (p∥q)).
ε
Exercise 8.6.2. Show that for all p, q ∈ Prob(m), Dmin (p∥q) is non-decreasing in ε, and
ε
Dmin (p∥q) ⩾ − log(1 − ε) , (8.144)

with equality if p = q.

The classical hypothesis testing divergence is closely related to the testing region defined
in (4.139). To see the connection, first observe that we can replace the condition p · t ⩾ 1 − ε
in (8.141) with the equality p·t = 1−ε (since any t that satisfies p·t > 1−ε is not optimal).
With this change, the optimal q · t in (8.141) can be interpreted as the lowest point of the
intersection of the testing region T(p, q) with the vertical line x = 1 − ε (see Fig. 8.3). That
is, the optimal q · t is the y-component of the lower Lorenz curve LC(p, q) at x = 1 − ε.
We can use the above geometrical interpretation of the hypothesis testing divergence to
ε
obtain a closed formula for Dmin . Without loss of generality, suppose that the components
of p and q are ordered as in (4.116). Then, from Theorem 4.3.3 we know that the vertices
of the lower Lorenz curve of (p, q) are given by {(ak , bk )}mk=0 as defined in (4.142). Let ℓ be

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

8.6. CLASSICAL HYPOTHESIS TESTING 405

Figure 8.3: The location of the point in the testing region T(p, q) with the optimal testing vector
t that minimizes (8.141).

an integer such that aℓ < 1 − ε ⩽ aℓ+1 . Then, the optimal point on the lower Lorenz curve is
located between the ℓ and the ℓ + 1 vertices. The line between these two vertices has a slop
bℓ+1 − bℓ qℓ+1
= . (8.145)
aℓ+1 − aℓ pℓ+1
Hence, the y component of the optiml point is given by
qℓ+1
q · t = bℓ + (1 − ε − aℓ ) . (8.146)
pℓ+1
To summarize, we can express the hypothesis testing divergence as

ε
qℓ+1
Dmin p q = − log bℓ + (1 − ε − aℓ ) (8.147)
pℓ+1
ε
where ℓ ∈ {0, . . . , m − 1} is the integer satisfying aℓ < 1 − ε ⩽ aℓ+1 . Recall that Dmin
is non-decreasing with ε (see Exercise 8.7.4), as it is also evident from the equation above.
Therefore, we can bound the hypothesis testing divergences by taking above the two extreme
cases 1 − ε = aℓ and 1 − ε = aℓ+1 to get the simpler bounds
ε

− log bℓ+1 ⩽ Dmin p q ⩽ − log bℓ (8.148)
where ℓ, as before, is the integer satisfying aℓ < 1 − ε ⩽ aℓ+1 .

8.6.2 The Stein’s Lemma

Here we study the minimization of the type II error βn (t) while at the same time keeping
the type I error αn (t) below a certain threshold ε ⩾ 0.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

406 CHAPTER 8. THE ASYMPTOTIC REGIME

The Stein’s Lemma

Theorem 8.6.1. Using the same notations as above, for any 0 < ε < 1, m ∈ N, and
p, q ∈ Prob(m)
1 ε
p⊗n q⊗n = D(p∥q) ,

lim Dmin (8.149)
n→∞ n

where D is the KL-divergence.

Remark. The theorem above states that the type II error can be made as small as ≈ 2−nD(p∥q)
while at the same time keeping the type I error below the threshold ε. The rate of this
exponential decay is given by the KL-divergence. We postpone the proof of this theorem to
the next section in which we prove the more general theorem known as the quantum Stein’s
lemma. For the interested reader, we also provide two more direct proofs of this theorem
(that applicable only to the classical case) in Appendix D.4.

8.6.3 The Chernoff Information

In the second method of optimizing α(t) and β(t), also known as the symmetric method,
there exists a prior {s0 , s1 } known to the observer in which the p-source occur with proba-
bility s0 , and the q-source with probability s1 . In this case, for a single use of the source (i.e.
n = 1) the error probability in which the observer guess incorrectly the type of the source is
given for any p, q ∈ Prob(m) by

Prerror (p, q, s0 ) := min s0 α(t) + s1 β(t) : t ∈ [0, 1]m .

(8.150)

Note that this probability of error can be expressed as

Prerror (p, q, s0 ) = minm s0 (1 − p · t) + s1 q · t
t∈[0,1]

= s0 − maxm (s0 p − s1 q) · t
t∈[0,1]
X
= s0 − (s0 px − s1 qx )+ (8.151)
x∈[m]
s1 X
λ := → = s0 1 − (px − λqx )+
s0
x∈[m]

where (s − t)+ := s − t if s ⩾ t and is zero otherwise.

Exercise 8.6.3. Show that the probability of error above can be expressed also as
1
Prerror (p, q, s0 ) = 1 − ∥s0 p − s1 q∥1 . (8.152)
2
Exercise 8.6.4. Let p, q ∈ Prob(m) and p′ , q′ ∈ Prob(m′ ). Show that (p, q) ≻ (p′ , q′ ) if
and only if
Prerror (p, q, s0 ) ⩾ Prerror (p′ , q′ , s0 ) ∀ s0 ∈ [0, 1] . (8.153)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

8.6. CLASSICAL HYPOTHESIS TESTING 407

Exercise 8.6.5. Let p, q ∈ Prob(m) and s0 , α ∈ (0, 1).

1. Show that X
Prerror (p, q, s0 ) = min{s0 px , s1 qx } . (8.154)
x∈[m]

Hint: Use the formula min{a, b} = 21 (a + b) − 21 |a − b|.

2. Use the inequality min{a, b} ⩽ aα b1−α to show that for all α ∈ [0, 1]
X
Prerror (p, q, s0 ) ⩽ sα0 s1−α
1 pαx qx1−α . (8.155)
x∈[m]

3. Show that
1 X
lim inf − log Prerror (p⊗n , q⊗n , s0 ) ⩾ − log pαx qx1−α . (8.156)
n→∞ n
x∈[m]

The exercise above demonstrates that in the asymptotic limit the optimal probability of
error is bounded by
1
lim inf − log Prerror (p⊗n , q⊗n , s0 ) ⩾ ξ(p, q) , (8.157)
n→∞ n
where X
ξ(p, q) := − log min pαx qx1−α
α∈[0,1]
x∈[m] (8.158)

Definition 6.2.2→ = max (1 − α)Dα (p∥q) .
α∈[0,1]

In the following theorem we show that the inequality in (8.157) is in fact an equality.

The Chernoff Bound

Theorem 8.6.2. Let p, q ∈ Prob(m) and s0 ∈ (0, 1). Then,
1
lim − log Prerror (p⊗n , q⊗n , s0 ) = ξ(p, q) . (8.159)
n→∞ n

Remark. Note that the Chernoff bound ξ(p, q) does not depend on s0 . Moreover, the
theorem above also states that for very large n we have that the probability of error,
Prerror (p⊗n , q⊗n , s0 ) ≈ 2−nξ(p,q) , decays exponentially fast with n with an exponential factor
given by the Chernoff bound.
Proof. We need to prove the opposite inequality of (8.157). We will establish it by finding a
lower bound on the probability of error Prerror (p⊗n , q⊗n , s0 ). For this purpose, set λ := s1 /s0
and K := {x ∈ [m] : px ⩾ λqx }. From (8.151) we have
X X nX X o
Prerror (p, q, s0 ) = s0 px + s1 qx ⩾ min px , qx , (8.160)
x∈Kc x∈K x∈Kc x∈K

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

408 CHAPTER 8. THE ASYMPTOTIC REGIME

where Kc is the complement of K in [m]. Similarly, for any n ∈ N we denote Kn := {xn ∈

[m]n : pxn ⩾ λqxn } so that
n X X o
Prerror (p⊗n , q⊗n , s0 ) ⩾ min p xn , q xn . (8.161)
xn ∈Kcn xn ∈Kn

Next, we characterize the set Kn . From (8.85) the inequality pxn ⩾ λqxn holds if and only if

−n H(t(xn ))+D(t(xn )∥p) −n H(t(xn ))+D(t(xn )∥q)
2 ⩾ λ2 , (8.162)

which is equivalent to
1
D (t(xn )∥q) − D (t(xn )∥p) ⩾ log λ . (8.163)
n
Therefore, denoting by

1
Cn := t ∈ Prob(m) : D (t∥q) − D (t∥p) ⩾ log λ (8.164)
n

we get that the error probability can be bounded by

n X X o
Prerror (ρ⊗n , σ ⊗n , s0 ) ⩾ min p xn , qxn . (8.165)
xn ∈[m]n xn ∈[m]n
t(xn )∈Ccn t(xn )∈Cn

The first sum corresponds to the probability (with respect to an i.i.d.∼ p source) that a
sequence of size n has a type belonging to Ccn , whereas the second sum corresponds to the
probability (with respect to an i.i.d.∼ q source) that a sequence of size n has a type belonging
to Cn . These probabilities are very similar to the one appearing in Sanov’s theorem (see
Theorem 8.4.1) except that here the set Cn depends on n. To remove this dependancy on n,
observe that in the limit n → ∞ the set Cn approaches the set

C := {t ∈ Prob(m) : D (t∥q) ⩾ D (t∥p)} , (8.166)

and similarly, the set Ccn (in Prob(m)) approaches the set

S := {t ∈ Prob(m) : D (t∥q) ⩽ D (t∥p)} . (8.167)

Let p⋆ , q⋆ ∈ Prob(m) be the optimizers

D(p⋆ ∥p) := min D(r∥p) and D(q⋆ ∥q) := min D(r∥q) . (8.168)
r∈S r∈C

From definitions and the continuity of the relative entropy, there exists two sequences of
vectors {pn }n∈N and {qn }n∈N with limits pn → p∗ and qn → q∗ as n → ∞ such that for

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

8.6. CLASSICAL HYPOTHESIS TESTING 409

each n ∈ N the vector pn ∈ Ccn ∩Type(n, m) and the vector qn ∈ Cn ∩Type(n, m). Therefore,
following similar lines as given in Sanov’s theorem we get for the first sum
X X
n −n H(t)+D(t∥p)
p xn = |X (t)|2
xn ∈[m]n t∈Ccn ∩Type(n,m)
t(xn )∈Ccn
(8.169)
⩾ |X n (pn )|2−n H(pn )+D(pn ∥p)
T aking only the term
t=pn in the sum →
1
(8.89)→ ⩾ m
2−nD(pn ∥p) .
(n + 1)

Similarly, the second sum is bounded from below by

X 1
qxn ⩾ m
2−nD(qn ∥q) . (8.170)
(n + 1)
xn ∈[m]n
t(xn )∈Cn

Substituting the two lower bounds above into (8.161) yields

min 2−nD(pn ∥p) , 2−nD(qn ∥q)

1 ⊗n ⊗n 1
lim sup − log Prerror (ρ , σ , s0 ) ⩽ lim sup − log
n→∞ n n→∞ n (n + 1)m (8.171)
= min{D(p⋆ ∥p), D(q⋆ ∥q)} .

To compute p⋆ , observe that

min D(r∥p) = min D(r∥p) : D (r∥q) ⩽ D (r∥p)
r∈K r∈Prob(m)

Exercise 8.6.6→ = min D(r∥p) : D (r∥q) = D (r∥p)
r∈Prob(m) (8.172)
nX rx X qx o
= min rx log : rx log =0 .
r∈Prob(m) px px
x∈[m] x∈[m]

Using the method of Lagrange multipliers, we define the Lagrangian

X rx X qx X
L(r) := rx log +µ rx log +ν rx , (8.173)
px px
x∈[m] x∈[m] x∈[m]

where µ and ν are some coefficients (Lagrange multipliers). Hence,

∂L(r) rx qx
= log(e) + log + µ log +ν =0 (8.174)
∂rx px px
which gives after isolating rx
µ
qx
rx = apx = apx1−µ qxµ (8.175)
px

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

410 CHAPTER 8. THE ASYMPTOTIC REGIME

where a = e−1 2−ν is some constant determined by the condition

P
x rx = 1. Hence,

p1−µ
x qx
µ
rx = P 1−µ µ (8.176)
x′ ∈[m] px′ qx′

Hence, denoting by rµ the probability vector whose components are as above we conclude
that
min D(r∥p) = D(rµ ∥p) (8.177)
r∈K

where µ is the number satisfies

D (rµ ∥p) = D (rµ ∥q) . (8.178)

Moreover, note that for µ = 0, r0 = p, and for µ = 1, r1 = q. Therefore, from the

continuity of the function KL-divergence it follows that the equality above is achieved for
some µ ∈ [0, 1].
From the symmetry of the expression for rµ it is clear that by the repetition of the same
computation as above we get q⋆ = p⋆ = rµ . We therefore conclude that
1
lim − log Prerror (ρ⊗n , σ ⊗n , t0 ) ⩽ D (rµ ∥p) (8.179)
n→∞ n
where µ is determined by D (rµ ∥q) = D (rµ ∥p). In the exercise below you will show that for
this µ X
D (rµ ∥q) = − log min px1−α qxα . (8.180)
α∈[0,1]
x∈[m]

This completes the proof.

Exercise 8.6.6. Prove the second equality of (8.172). Hint: Suppose that the minimum is
obtained for some r ∈ Prob(m) that satisfies D (r∥q) < D (r∥p) and get a contradiction by
showing that the vector t = (1 − ε)r + εp (with small ε > 0) satisfies D(t∥p) < D(r∥p).

Exercise 8.6.7.PProve Eq. (8.180). Hint: Show that the condition D(rµ ∥p) = D(rµ ∥q)
qx
is equivalent to x∈[m] p1−µ µ
x qx log px = 0 and compare it with the derivative of the function
f (s) := x∈[m] p1−s s
P
x qx .

8.7 Quantum Hypothesis Testing

One of the foundational aspects of quantum mechanics is the inability to perfectly distinguish
between quantum systems. To elucidate this concept of distinguishability, let’s build upon
the ideas presented in the previous section. Consider a scenario where an experimenter,
named Alice, possesses a quantum system in her lab (e.g., an electron in a certain spin state)
that could be in one of two potential states, denoted as ρ ∈ D(A) and σ ∈ D(A). Alice can
carry out a POVM denoted by {Λx }x∈[m] on her system to deduce its state. Depending on

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

8.7. QUANTUM HYPOTHESIS TESTING 411

the measurement outcome x she may infer the state to be ρ or σ. In this section, we delve
into the best strategy Alice can employ to accurately determine the state of her quantum
system.
We note that it’s adequate to contemplate POVMs composed of only two elements. We
can define Λ ∈ Eff(A) to be the sum of all effects {Λx }, where x leads Alice to infer ρ.
Conversely, I − Λ is the sum of the remaining POVM elements, corresponding to x values
that result in Alice inferring σ. Two types of errors might arise:

1. Type I Error: Alice possesses the state ρ but incorrectly infers it as σ. The associated
probability is:
α(Λ) := Tr [ρ(I − Λ)] (8.181)

2. Type II Error: Alice has the state σ but mistakenly deduces it as ρ. The correspond-
ing probability is:
β(Λ) := Tr [σΛ] (8.182)

As in the classical scenario, we explore strategies to minimize the error probabilities α(Λ)
and β(Λ). With the asymmetric approach, the objective is to minimize the Type II error,
β(Λ), while ensuring that the Type I error, α(Λ), stays beneath a specific threshold ε > 0.
The optimal approach is encapsulated by the quantum Stein’s lemma. In the symmetric
strategy, the observer is aware of a prior {s0 , s1 } where ρ occurs with probability s0 , and σ
with probability s1 . The aim here is to reduce the overall error probability represented by
s0 α(Λ) + s1 β(Λ).
It’s worth noting that any pair of quantum states ρ, σ ∈ D(A) that aren’t identical satisfy
Tr[ρσ] < 1. This implies:

lim Tr ρ⊗n σ ⊗n = lim (Tr[ρσ])n = 0 .

(8.183)
n→∞ n→∞

In essence, as n approaches infinity, the states ρ⊗n and σ ⊗n become orthogonal with respect
to the Hilbert-Schmidt inner product. Naturally, we might question the rate at which these
states turn distinguishable. Both the quantum Stein’s lemma and the quantum Chernoff
bound address this question, in the asymmetric and symmetric contexts, respectively.

8.7.1 The Quantum Hypothesis Testing Divergence

In asymmetric hypothesis testing, the goal is to minimize the Type II error while simultane-
ously ensuring that the Type I error remains bounded by a small parameter ε > 0. Under
this scenario, the pertinent error probability is defined as:

β ∗ (ε) := min β(Λ) : α(Λ) ⩽ ε , 0 ⩽ Λ ⩽ I .

(8.184)

Exercise 8.7.1. Show that the optimal probability β ∗ (ε) = 0 for all ε ⩾ 0 if and only if ρ
and σ satisfies ρσ = 0.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

412 CHAPTER 8. THE ASYMPTOTIC REGIME

By taking the − log of the above error probability one obtains a quantity known as the
quantum hypothesis testing divergence.

Definition 8.7.1. The quantum hypothesis testing divergence is defined for all
ρ, σ ∈ D(A) and ε ∈ [0, 1) as
ε

Dmin (ρ∥σ) := − log min Tr [σΛ] : Tr [ρΛ] ⩾ 1 − ε . (8.185)
Λ∈Eff(A)

The hypothesis testing divergence is invariably non-negative and reaches infinity when ρ
and σ are orthogonal. Furthermore, Exercise 8.7.2 guides you to demonstrate that when ρ
and σ are diagonal in the same basis, the quantum hypothesis testing divergence, denoted
ε ε
as Dmin (ρ∥σ), simplifies to its classical equivalent, Dmin (p∥q). Here, p and q represent
the diagonal elements of ρ and σ, respectively. In the classical context, we observed that for
ε
ε = 0, the divergence Dmin reduces to the min relative entropy. This observation is consistent
in the quantum scenario as well (refer to Exercise 8.7.2). Such parallels justify the use of
the ‘min’ subscript in naming the quantum hypothesis testing divergence. Next, we aim to
establish that this function indeed qualifies as an (unnormalized) divergence.

Data Processing Inequality

Theorem 8.7.1. Let ε ∈ [0, 1), ρ, σ ∈ D(A), and E ∈ CPTP(A → B). Then,
ε ε

Dmin E(ρ) E(σ) ⩽ Dmin (ρ∥σ) . (8.186)

Proof. Let 0 ⩽ Γ ⩽ I B be an optimal effect such that

ε
2−Dmin (E(ρ)∥E(σ)) = Tr [E(σ)Γ] (8.187)

and Tr [E(ρ)Γ] ⩾ 1 − ε. Note that Λ := E ∗ (Γ) satisfies 0 ⩽ Λ ⩽ I A and Tr [ρΛ] = Tr[E(ρ)Γ] ⩾

1 − ε. Hence,
ε
2−Dmin (E(ρ)∥E(σ)) = Tr [E(σ)Γ]
= Tr [σΛ]
(8.188)
⩾ min {Tr [σΛ′ ] : Tr [ρΛ′ ] ⩾ 1 − ε , Λ′ ∈ Eff(A)}
ε
= 2−Dmin (ρ∥σ)

This completes the proof.

Exercise 8.7.2. Consider the definition above of the quantum hypothesis testing.
ε
1. Show that if ρ, σ ∈ D(A) are diagonal in the same basis of A then Dmin (ρ∥σ) reduces
ε
to its classical counterpart Dmin (p∥q), where p and q are the diagonals of ρ and σ,
respectively.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

8.7. QUANTUM HYPOTHESIS TESTING 413

2. Show that for ε = 0, the quantum hypothesis testing divergence simplifies to the quan-
tum min relative entropy. That is, show that for all ρ, σ ∈ D(A)
ε=0
Dmin (ρ∥σ) = Dmin (ρ∥σ) = − log Tr [σΠρ ] . (8.189)

Exercise 8.7.3. Show that the constraint Tr [ρΛ] ⩾ 1 − ε in (8.185) can be replaced with
ε
Tr [ρΛ] = 1 − ε (i.e. both constraints leads to the same value of Dmin (ρ∥σ)).
Exercise 8.7.4.
1. Show that for all ρ, σ ∈ D(A) we have
ε
Dmin (ρ∥σ) ⩾ − log(1 − ε) , (8.190)

with equality if ρ = σ.
ε
2. Show that Dmin (ρ∥σ) is non-decreasing in ε.
Exercise 8.7.5. Show that the quantum hypothesis testing divergence equals its minimal
extension from classical states. That is, show that for all ρ, σ ∈ D(A)
ε
ρA σ A = ε
E A→X (ρA ) E A→X (σ A ) ,

Dmin sup Dmin (8.191)
E∈CPTP(A→X)

where the supremum is over all classical systems X and POVM channels E ∈ CPTP(A → X)
that takes ρ and σ to diagonal density matrices (i.e. probability vectors).

Computation of the Quantum Hypothesis Testing Divergence

The quantum hypothesis testing divergence can be computed using semidefinite program-
ming (SDP) techniques. In particular, we can express it for ρ, σ ∈ D(A) as in (A.52) via
ε
2−Dmin (ρ∥σ) = min Tr[ηH1 ] (8.192)
N (η)−H2 ⩾0
η⩾0

with the identifications H1 := σ, H2 := (1 − ε) ⊕ (−I A ), and N : Herm(A) → R ⊕ Herm(A)

defined for all η ∈ Herm(A) as

N (η) := Tr[ρη] ⊕ (−η) . (8.193)

Note that N is a linear map and its dual map N ∗ : R ⊕ Herm(A) → Herm(A) is given by
(see the exercise below)
N ∗ (t ⊕ ω) = tρ − ω . (8.194)
From (A.57) it then follows that the dual to the above SDP optimization problem is given
by
ε
2−Dmin (ρ∥σ) = max
∗
Tr[(t ⊕ ω)H2 ] (8.195)
H1 −N (t⊕ω)⩾0
t∈R+ , ω∈Pos(A)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

414 CHAPTER 8. THE ASYMPTOTIC REGIME

Substituting the expressions above for H1 , H2 , and N ∗ , gives

ε
n o
2−Dmin (ρ∥σ) = max t(1 − ε) − Tr[ω] : tρ ⩽ σ + ω , t ∈ R+ , ω ∈ Pos(A) . (8.196)

The maximization above is over all t ∈ R+ and ω ∈ Pos(A). For a fixed t, we want to
minimize Tr[ω] such that ω ⩾ 0 and ω ⩾ tρ − σ. Under these constraints, it follows trom
Exercise 8.7.7 that the choice ω := (tρ − σ)+ has the minimal trace. We therefore conclude
that
ε
Dmin (ρ∥σ) = − log max f (t) (8.197)
t∈R+

where f : R+ → R+ is the function

f (t) := t(1 − ε) − Tr(tρ − σ)+ . (8.198)
Exercise 8.7.6.
1. Verify that indeed (8.192) is equivalent to (8.185).
2. Prove that the dual map N ∗ is given by (8.194), and use it to derive (8.196) from (8.195).
Exercise 8.7.7.
1. Let η ∈ Herm(A). Show that

Tr[η+ ] = inf Tr[ζ] : ζ ⩾ η , ζ ∈ Pos(A) , (8.199)
where η+ is the positive part of η (i.e. η = η+ −η− with η+ , η− ∈ Pos(A) and η+ η− = 0).
2. Show that the function in (8.198) can be expressed as
1
1 + t(1 − 2ε) − ∥tρ − σ∥1 .
f (t) = (8.200)
2
1
Hint: Recall that η+ = 2 (|η| + η).

The Relationship between the Hypothesis Testing and Rényi Divergences

In the subsequent theorem, we establish a connection between the hypothesis testing di-
vergence and the R’enyi relative entropies. For this purpose, we use the notation Dα to
represent the Petz quantum α-R’enyi divergence (as defined in Definition 6.3.2), and D̃α
to denote the quantum sandwiched (minimal) α-R’enyi divergence (as per Definition 6.4.1).
Additionally, we use h(α) = −α log α − (1 − α) log(1 − α) to denote the binary Shannon
entropy.

Theorem 8.7.2. Let ε ∈ (0, 1) and ρ, σ ∈ D(A).

ε α 1
1. For all α > 1 : Dmin (ρ∥σ) ⩽ D̃α (ρ∥σ) + log (8.201)
α−1 1−ε

ε α h(α) 1
2. For all α ∈ (0, 1) : Dmin (ρ∥σ) ⩾ Dα (ρ∥σ) + − log (8.202)
1−α α ε

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

8.7. QUANTUM HYPOTHESIS TESTING 415

Remark. Given that D̃α represents the minimal quantum extension of the classical α-Rényi
relative entropy, it follows that D̃α (ρ∥σ) ⩽ Dα (ρ∥σ). Consequently, in the upper bound
of (8.201), we can substitute D̃α (ρ∥σ) with Dα (ρ∥σ).
ε
Proof. Let Λ ∈ Eff(A) be such that 2−Dmin (ρ∥σ) = Tr[Λσ] and Tr[Λρ] = 1−ε. Set p := Tr[Λσ],
and define the binary POVM channel E ∈ CPTP(A → X) via

E(ω) := Tr[ωΛ]|0⟩⟨0| + Tr[ω(I − Λ)]|1⟩⟨1| ∀ ω ∈ L(A) . (8.203)

From the DPI of D̃α we get that

D̃α (ρ∥σ) ⩾ D̃α E(ρ) E(σ) = Dα (1 − ε, ε)T (p, 1 − p)T

1
log (1 − ε)α p1−α + εα (1 − p)1−α

By definition→ =
α−1
1
log (1 − ε)α p1−α

Removing εα (1 − p)1−α −−−−→ ⩾ (8.204)
α−1
α
= log(1 − ε) − log p
α−1
α ε
By definition of p→ = log(1 − ε) + Dmin (ρ∥σ) .
α−1
This concludes the proof of (8.201).
ε
To prove (8.202), let α ∈ (0, 1). We will use the expression for Dmin (ρ∥σ) as given
in (8.197) and (8.198). To bound the expression Tr(tρ − σ)+ in equation (8.198), we employ
the quantum weighted geometric-mean inequality given by (B.68). This inequality asserts
that for any pair of matrices M, N ∈ Pos(A) and any value of α within the range [0,1]:
1 h i
Tr M + N − M − N ⩽ Tr M α N 1−α .

(8.205)
2
Since the term |M − N | can be expressed as |M − N | = 2(M − N )+ − (M − N ), the above
inequality is equivalent to

Tr(M − N )+ ⩾ Tr[M ] − Tr M α N 1−α

(8.206)

Taking M = tρ and N = σ we have

Tr(tρ − σ)+ ⩾ t − tα Tr ρα σ 1−α

(8.207)
= t − tα 2(α−1)Dα (ρ∥σ) .

Substituting this into (8.197) and (8.198) we get

ε (ρ∥σ)
n o
−Dmin
2 = max t(1 − ε) − Tr(tρ − σ)+
t∈R+
n o (8.208)
(8.207)→ ⩽ max − tε + tα 2(α−1)Dα (ρ∥σ)
t∈R+

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

416 CHAPTER 8. THE ASYMPTOTIC REGIME

It is straightforward to check that for fixed α, ρ, σ, ε, the function t 7→ −tε + tα 2(α−1)Dα (ρ∥σ)
obtains its maximal value at 1
α 1−α
t= 2−Dα (ρ∥σ) . (8.209)
ε
Substituting this value into the optimization in (8.208) gives
α
α 1−α
ε (ρ∥σ)
−Dmin
2 ⩽ (1 − α) 2−Dα (ρ∥σ) . (8.210)
ε
By taking − log on both sides we get (8.202). This concludes the proof.

8.7.2 Asymmetric Discrimination of Quantum States

The subsequent theorem extends Theorem 8.6.1 to encompass the quantum domain. Conse-
quently, the proof we present for the quantum scenario also substantiates the classical Stein’s
lemma, as given in Theorem 8.6.1.

The Quantum Stein’s Lemma

Theorem 8.7.3. Let A be a finite dimensional system, 0 < ε < 1, and ρ, σ ∈ D(A)
with supp(ρ) ⊆ supp(σ). Then,
1 ε
Dmin ρ⊗n σ ⊗n = D(ρ∥σ) ,

lim (8.211)
n→∞ n

where D(ρ∥σ) := Tr[ρ log ρ] − Tr[ρ log σ] is known as the Umegaki relative entropy.

Remark. The quantum Stein’s lemma indicates that the optimal type II error approximately
follows the behavior of ≈ 2−nD(ρ∥σ) with respect to the number of copies, n, of ρ and σ. Specif-
ically, the lemma offers an operational interpretation of the Umegaki divergence, D(ρ∥σ),
defining it as the maximal rate at which the type II error diminishes to zero in an exponential
manner with increasing n. Additionally, it’s worth noting that the theorem above implies
that the limit on the right-hand side of (8.211) exists and is independent on ε.
Proof. The proof follows from the bounds in Theorem 8.7.2. Specifically, from (8.201) we
get for any ε ∈ (0, 1) and any α > 1

1 ε ⊗n ⊗n
1 ⊗n ⊗n
α 1
lim sup Dmin ρ σ ⩽ lim sup D̃α ρ σ + log
n→∞ n n→∞ n α−1 1−ε (8.212)
= D̃α (ρ∥σ) ,

where in the last equality we used the additivity (under tensor products) of D̃α . Since the
equation above holds for all α > 1 we conclude that
1 ε
ρ⊗n σ ⊗n ⩽ lim+ D̃α (ρ∥σ)

lim sup Dmin
n→∞ n α→1 (8.213)
= D(ρ∥σ) ,

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

8.7. QUANTUM HYPOTHESIS TESTING 417

where the equality above follows from continuity in α of the function α 7→ D̃α (ρ∥σ).
For the opposite inequality, we use the bound (8.202) to get for all α ∈ (0, 1)

1 ε ⊗n ⊗n
1 ⊗n ⊗n
α h(α)
lim inf Dmin ρ σ ⩾ lim inf Dα ρ σ + + log ε
n→∞ n n→∞ n 1−α α (8.214)
= Dα (ρ∥σ) ,

where we used the additivity of Dα . Since Dα is continuous in α, and since the equation
above holds for all α ∈ (0, 1), it must also hold for α = 1; that is,

1 ε
Dmin ρ⊗n σ ⊗n ⩾ D(ρ∥σ) .

lim inf (8.215)
n→∞ n

Combining this with the inequality (8.213), we conclude that the limit

1 ε
Dmin ρ⊗n σ ⊗n

lim (8.216)
n→∞ n

exists and equals to D(ρ∥σ).

Exercise 8.7.8. [The Umegaki Relative Entropy] Let D be the Umegaki relative entropy.

ε
1. Show that D satisfies the DPI. Hint: Use (8.211) and the fact that Dmin satisfies the
DPI.

2. Show by direct calculation that for any two cq-states in D(AX), ρAX := px ρA
P
x∈[n] x ⊗
|x⟩⟨x|X and σ AX := x∈[n] qx σxA ⊗ |x⟩⟨x|X we have
P

X
D ρAX σ AX = px D ρ A A

x σx + D(p∥q) , (8.217)
x∈[n]

where the components of the probability vectors p and q are {px }x∈[n] and {qx }x∈[n] ,
respectively.

3. Use the above two properties to show that for any two ensembles of states {px , ρx }x∈[n]
and {qx , σx }x∈[n] we have
X X X
D px ρ x qx σx ⩽ px D (ρx ∥σx ) + D(p∥q) . (8.218)
x∈[n] x∈[n] x∈[n]

In particular, show that the Umegaki relative entropy is jointly convex.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

418 CHAPTER 8. THE ASYMPTOTIC REGIME

8.7.3 Symmetric Distinguishability of Quantum States

Consider the following setup, in which an observer (say Alice) is given a quantum state ρ
with probability t0 := t ∈ [0, 1], and a quantum state σ ∈ D(A) with probability t1 := 1 − t.
As before, the goal is for Alice to guess correctly which state she was given. For this purpose
Alice perform a binary outcome POVM consisting of two POVM elements Λ0 := Λ and
Λ1 := I − Λ, where 0 ⩽ Λ ⩽ I. If Alice gets outcome 0 she declares that the state is ρ, and
if the outcome is 1 she declares that the state is σ. For any such POVM the probability of
error is given by
Prerror (Λ, ρ, σ, t) := t0 α(Λ) + t1 β(Λ)
= t0 Tr[ρΛ1 ] + t1 Tr[σΛ0 ] (8.219)

= t0 + Tr t1 σ − t0 ρ Λ
Minimizing the above expression over all 0 ⩽ Λ ⩽ I gives

Prerror (ρ, σ, t) := min Prerror (Λ, ρ, σ, t)

0⩽Λ⩽I

(8.199)→ = t0 − Tr t1 σ − t0 ρ −
1 1 (8.220)
∀η ∈ Herm(A), η− = (|η| − η) −−−−→ = t0 − t 1 σ − t0 ρ 1 − 1 + 2t0
2 2
1
= 1 − t0 ρ − t1 σ 1 .
2
Exercise 8.7.9. Show that with t := t0 and r := t1 /t0 we can express the probability of error
as:
Prerror (ρ, σ, t) = t 1 − Tr(ρ − rσ)+ . (8.221)

Exercise 8.7.10. Let ρ, σ ∈ D(A).

1. Let ε ∈ (0, 1). Show that

ε Prerror (ρ, σ, t) − tε
2−Dmin (ρ∥σ) = sup . (8.222)
t∈(0,1) 1−t

Hint: Use (8.221) and (8.198).

2. Let t and r be as in (8.221). Show that

ε
ε + r2−Dmin (ρ∥σ)

Prerror (ρ, σ, t) = t inf . (8.223)
ε∈(0,1)

Hint: Recall that Tr(ρ − rσ)+ = supΛ∈Eff(A) Tr[Π(ρ − rσ)] and split the supremum over
all ε ∈ (0, 1) and all Λ ∈ Eff(A) such that Tr[Λρ] = 1 − ε.

As previously discussed, with increasing n copies of ρ and σ, the states ρ⊗n and σ ⊗n
become more distinguishable. We will demonstrate in the upcoming theorem that the error
probability, Prerror (ρ⊗n , σ ⊗n , t), diminishes at an exponential rate as n approaches infinity.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

8.7. QUANTUM HYPOTHESIS TESTING 419

This rate is characterized by what is known as the quantum Chernoff bound. The classical
counterpart of the subsequent theorem, along with its proof, can be found in Section 8.6 (see
Theorem 8.6.2). We will use the notation ξQ (ρ, σ) to denote the quantum extension of the
classical Chernoff bound ξ(p, q) as given in (8.158). In the quantum domain it is defined as

ξQ (ρ, σ) := − log min Tr[ρα σ 1−α ]

0⩽α⩽1
(8.224)
Definition 6.3.2→ = max (1 − α)Dα (ρ∥σ) .
α∈[0,1]

The Quantum Chernoff Bound

Theorem 8.7.4. Let ρ, σ ∈ D(A). For any probability distribution
{t0 := t, t1 := 1 − t} with 0 < t < 1,
1
lim − log Prerror (ρ⊗n , σ ⊗n , t) = ξQ (ρ, σ) . (8.225)
n→∞ n

Proof. In the proof of Theorem 8.7.2 we used (8.207) to bound Tr(tρ − σ)+ . Dividing both
sides of (8.207) by t and denoting r := 1/t we get that (8.207) is equivalent to

Tr(ρ − rσ)+ ⩾ 1 − r1−α 2(α−1)Dα (ρ∥σ) . (8.226)

Combining this with (8.221) we get that for all α ∈ (0, 1)

Prerror (ρ, σ, t) ⩽ tr1−α 2(α−1)Dα (ρ∥σ)

(8.227)
= tα0 t1−α
α 1−α
1 Tr ρ σ .

Hence,
n
Prerror (ρ⊗n , σ ⊗n , t) ⩽ tα0 t11−α Tr ρα σ 1−α

(8.228)
so that
1
lim inf − log Prerror (ρ⊗n , σ ⊗n , t) ⩾ − log Tr ρα σ 1−α .

(8.229)
n→∞ n
Since the above equation holds for all 0 ⩽ α ⩽ 1 we have

1
lim − log Prerror (ρ⊗n , σ ⊗n , t) ⩾ max − log Tr ρα σ 1−α

n→∞ n α∈[0,1]
(8.230)
= − log min Tr ρα σ 1−α .

α∈[0,1]

To prove the opposite inequality, let

X X
ρ= px ψx and σ = qy ϕy , (8.231)
x∈[m] y∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

420 CHAPTER 8. THE ASYMPTOTIC REGIME

be the spectral decomposition of ρ and σ (here m := |A|), where ψx , ϕy ∈ Pure(A) for all
x, y ∈ [m]. Then, for any projection Π ∈ Pos(A) (i.e. Π2 = Π) we have
X X
Tr [Πρ] = px ⟨ψx |Π|ψx ⟩ = px ⟨ψx |Π2 |ψx ⟩
x∈[m] x∈[m]
X X
−−−−→ = px ψ x Π |ϕy ⟩⟨ϕy |Π ψx
X
ϕy = I A
y∈[m]
(8.232)
x∈[m] y∈[m]
X
= px |⟨ψx |Π|ϕy ⟩|2
x,y∈[m]

Similarly, since I − Π is also a projection we get

X
Tr [(I − Π)σ] = qy |⟨ψx |(I − Π)|ϕy ⟩|2 . (8.233)
x,y∈[m]

Therefore, taking Λ = Π in (8.219) gives

Prerror (Π, ρ, σ, t) = t0 Tr[ρΠ] + t1 Tr[σ(I − Π)]

X
(8.232), (8.233)→ = t0 px |⟨ψx |I − Π|ϕy ⟩|2 + t1 qy |⟨ψx |Π|ϕy ⟩|2
x,y∈[m] (8.234)
X
⩾ min{t0 px , t1 qy } |⟨ψx |I − Π|ϕy ⟩|2 + |⟨ψx |Π|ϕy ⟩|2 .
x,y∈[m]

Moreover, since for any two complex numbers c1 and c2 satisfies |c1 |2 + |c2 |2 ⩾ 21 |c1 + c2 |2 ,
we get that

1 X 2
Prerror (Π, ρ, σ, t0 ) ⩾ min{t0 px , t1 qy } ⟨ψx |I − Π|ϕy ⟩ + ⟨ψx |Π|ϕy ⟩
2
x,y∈[m]
1 X 2
= min{t0 px , t1 qy } ⟨ψx |ϕy ⟩
2
x,y∈[m] (8.235)
1 X
= min{t0 pxy , t1 qxy }
2
x,y∈[m]
1
(8.154)→ = Prerror (p, q, t) ,
2

where p = (pxy ) ∈ Prob(m2 ) and q = (qxy ) ∈ Prob(m2 ) are probability vectors with
components
pxy := px |⟨ψx |ϕy ⟩|2 and qxy := qy |⟨ψx |ϕy ⟩|2 . (8.236)
Moreover, note that the relation (8.236) respects tensor products. That is, for ρ⊗n and σ ⊗n

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

8.8. NOTES AND REFERENCES 421

the corresponding probability vectors are p⊗n and q⊗n , respectively. Hence,
1 1 1
lim inf − log Prerror (ρ⊗n , σ ⊗n , t) ⩽ lim inf − log Prerror (p⊗n , q⊗n , t)
n→∞ n n→∞
n 2
Theorem 8.6.2→ = max (1 − α)Dα (p∥q)
0⩽α⩽1
(8.237)
(6.103)→ = max (1 − α)Dα (ρ∥σ)
0⩽α⩽1

= ξQ (ρ, σ) .

This completes the proof.

8.8 Notes and References

Many of the classical concepts in this chapter such as typicality of sequences, the method of
types, and classical hypothesis testing can be found in standard textbooks on information
theory and statistics, e.g. [54]. The topic of quantum typicality is covered by many books
on quantum information including [170, 232, 230]. The concept of relative typical subspace
was first introduced in [27].
The expression of the Hypothesis testing divergence in terms of the function in (8.198)
is due to [40], whereas other variants can be found in [67]. The direct part of the quantum
Stein’s lemma was first proved by [121], while the strong converse part was proved almost
10 years later by [173]. A shorter version for both the direct and strong converse parts was
found later in [27]. However, our extremely shorter version presented in this chapter is based
on the moderm approach that involves the inequalities given in Theorem 8.7.2.
We followed [172] for the proof of the optimality of the quantum Chenoff bound, and [9]
for its achievability.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

422 CHAPTER 8. THE ASYMPTOTIC REGIME

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

Part III

The General Framework of Resource

Theories

423
CHAPTER 9

Static Quantum Resource Theories

In this chapter, we present a precise definition of a quantum resource theory (QRT) and
explore its general characteristics. As mentioned in the introduction, any set of natural
constraints on a physical system results in a QRT. A prime example is the spatial separation
between two individuals, Alice and Bob, which naturally leads to the LOCC (Local Oper-
ations and Classical Communication) constraint, forming the basis of entanglement theory.
In this theory, every physical system is analyzed in the context of spatial separation. This
implies that any physical system, for instance, system A, is considered a bipartite composite
system, denoted as A = (AA , AB ). Here, AA represents a subsystem on Alice’s side, and
AB is a subsystem on Bob’s side. It’s important to note that even if A is not inherently
a composite system and is solely located on Alice’s side, it can still be regarded in this
framework with AA := A and AB being a trivial subsystem (i.e., |AB | = 1). For simplicity,
in entanglement theory, the notations A for AA and B for AB are often used. However,
in the context of general resource theories, it is crucial to remember that physical systems,
symbolized as A, B, C, etc., are interpreted in relation to the constraints applied to them.

9.1 The Structure of Quantum Resource Theories

According to the first axiom of quantum mechanics, each physical system is uniquely as-
sociated with a corresponding Hilbert space. However, the reverse of this statement is not
necessarily true. For instance, mathematically, the Hilbert space C4 is isomorphic to the
Hilbert space C2 ⊗ C2 . Yet, physically, C4 may represent two entirely distinct physical sys-
tems. The space C4 could describe a single atom with four energy levels, or it might represent
a composite system of two spatially separated electrons (spins). Therefore, it is important
to clarify that while we use the notations A, B, C, etc., to denote both physical systems
and their corresponding Hilbert spaces, in the forthcoming discussion, these notations will
primarily refer to specific physical systems.
In the rest of this book, we will use the symbol 1 to represent the trivial system. In this
context, the only element of CPTP(A → 1) is the trace operation. Additionally, this notation
allows us to equate quantum channels in CPTP(1 → A) with density matrices in D(A),

425
426 CHAPTER 9. STATIC QUANTUM RESOURCE THEORIES

and conversely. By adopting this identification, we can interpret all entities in quantum
mechanics — such as states, POVMs, quantum instruments, and others — as specific forms
of quantum channels. This integrative perspective aligns with the methodologies utilized in
resource theories. We will embrace this approach in our discussions throughout the book.

Quantum Resource Theory

Definition 9.1.1. Let F be a mapping that takes any two physical systems A and B
to a set of quantum channels F(A → B) ⊂ CPTP(A → B). The mapping F is called
a quantum resource theory if it satisfies the following conditions:

1. Doing nothing is free. For any physical system A, the identity channel
idA ∈ F(A → A).

2. Concatenation is free. For any three physical systems, A, B, and C, if

E ∈ F(A → B) and N ∈ F(B → C) then N ◦ E ∈ F(A → C).

3. Discarding a system is free. For any system A, the set F(A → 1) ̸= ∅; i.e.
F(A → 1) = CPTP(A → 1) = {Tr}.

Moreover, the set F(A → B) is called the set of free operations from system A to
system B, and the set F(A) := F(1 → A), is identified as the set of free states.

The physical interpretation of Definition 9.1.1 is as follows. Consider a (possibly com-

posite) quantum system held by one agent or distributed to a group of parties. A QRT
models what the parties can physically accomplish given some restrictions or constraints
that result from technical or experimental limitations, the rules of some game, or simply
the laws of physics. What operations the agents can still perform given these restrictions is
mathematically described by F(A → B), which is typically much smaller than the set of all
quantum channels in CPTP(A → B).
The first condition in Definition 9.1.1 simply says that the identity map (i.e., doing noth-
ing) is free, an obvious requirement for any meaningful QRT. We point out, however, that
in some QRTs “doing nothing” can be considered resourceful, particularly, if the systems
involved decohere with the environment and resources degrade in time, so that the preserva-
tion or storage of a resource is itself a resource. Nonetheless, in such resource theories, the
identity channel idA in the definition above corresponds to a channel with zero time delay
(i.e. instantaneous) so that it is indeed free. In general, the time delay of the channels in
such resource theories needs to be incorporated into the formalism, and we will discuss it
in more details in volume two of this book when we introduce dynamical resource theories.
For all the static QRTs that we study in this book, these considerations will not affect the
formalism.
The second property above can be viewed as the defining property of a QRT. It essentially
states that free operations cannot generate a resource. A resource in our model is a quantum
state that is not free (i.e. ρ ∈ D(A) but ρ ̸∈ F(A)) or a quantum channel that is not free

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

9.1. THE STRUCTURE OF QUANTUM RESOURCE THEORIES 427

(i.e. E ∈ CPTP(A → B) but E ̸∈ F(A → B)). In particular, the second property above
implies the following rule, known as the “golden” rule of QRTS.

The Golden Rule of QRTs

For any two physical systems A and B, if a free channel E ∈ F(A → B) acts on a free
state ρ ∈ F(A), the resulting state E(ρ) is a free quantum state in F(B).

We included in the definition above the property that the trace is a free operation. In all
QRTs studied in literature this is indeed the case although one can consider a QRT in which
“waste” or “trash” is considered a resource. In this book we will not consider such resource
theories, and will always consider the trace as a free operation. This assumption also leads
to the following very useful property of QRTs.
Suppose σ ∈ F(B) is a free state, and define the replacement channel
NσA→B (ρA ) := Tr[ρA ]σ B ∀ ρ ∈ L(A) . (9.1)
Then, if we view σ B as a channel, σ 1→B , from the trivial system 1 to B, the channel NσA→B
can be expressed as a combination of the trace channel and the channel σ 1→B ; specifically,
NσA→B = σ 1→B ◦ Tr , (9.2)
and since both Tr and σ 1→B are free, it follows that NσA→B is free. Note that this means
that we can convert any state ρ ∈ D(A) to any free state σ ∈ F(B) by free operations (as
intuitively expected).
QRTs emerge from a specific set of limitations or constraints applied to the entire spec-
trum of quantum operations. The mapping F exemplifies this, as the set F(A → B) generally
forms a strict subset of all channels in CPTP(A → B). While every QRT is linked to a unique
set of restrictions, these restrictions frequently share common characteristics that contribute
to extra structural complexity. These characteristics are so prevalent that some researchers
have integrated them into the foundational definition of a QRT.

9.1.1 The Axiom of Free Instruments

In some resource theories, like the QRT of athermality, quantum measurements are not
considered free. However, in most QRTs, certain measurements are free. Mathematically,
this implies the existence of systems A, B, X — with X being a classical system — such that
the set F(A → BX) is non-empty. Consider a quantum instrument in F(A → BX), denoted
as X
E A→BX = ExA→B ⊗ |x⟩⟨x|X . (9.3)
x∈[m]

According to the fundamental principle of QRTs, if ρ ∈ F(A) is a free state, then the state
E A→BX (ρA ) must also be a free state in F(BX). This state, expressed as
X
E A→BX (ρA ) = px σxB ⊗ |x⟩⟨x|X (9.4)
x∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

428 CHAPTER 9. STATIC QUANTUM RESOURCE THEORIES

is a classical-quantum (cq) state, where for each x ∈ [m], px := Tr[Ex (ρ)] and σxB :=
1 A→B A
E
px x
(ρ ). If there existed an x ∈ [m] for which px ̸= 0 and σx ̸∈ F(B), then the
quantum instrument E A→BX would create a resource σxB from the free state ρ ∈ F(A) with
a non-zero probability. To prevent such scenarios in QRTs, in this book we always assumes
the axiom of free instruments.

The Axiom of Free Instruments

Let A and B be two quantum
P systems and X be a classical system. If F(A → BX) is
non-empty and E := x∈[m] Ex ⊗ |x⟩⟨x| ∈ F(A → BX) then for every ρ ∈ F(A) and
x ∈ [m], the state ExA→B (ρA )/Tr[ExA→B (ρA )] belongs to F(B).

Note that the axiom of free instruments (AFI) reduces to the golden rule of QRTs when
|X| = 1, thus serving as an extension of this rule to encompass quantum measurements.
Additionally, when |X| > 1, the golden rule of QRTs only ensures that E A→BX (ρA ) is a
A→B (ρA )
free cq-state. Without further assumptions like the AFI, we cannot infer that each ETr[E
x
x (ρ)]
is a free state. Since physical QRTs comply with the AFI (as do all QRTs studied in the
literature), the rest of this book will proceed under the assumption that QRTs adhere to
the AFI, without explicitly stating it each time. We will use the notation F⩽ (A → B) ⊂
CP⩽ (A → B) for the set of trace non-increasing CP maps that are part of free quantum
instruments. Specifically, E ∈ F⩽ (A → B) if there exists a classical system X with dimension
m ∈ N and mapsP E1 , . . . , Em ∈ CP⩽ (A → B), with the properties that (1) Ex = E for some
x ∈ [m], and (2) x∈[m] Ex ⊗ |x⟩⟨x| ∈ F(A → BX).

9.1.2 QRTs with a Tensor Product Structure

Since the free operations arise from certain physical constraints, it is natural to assume that
they can act on a subsystem of a composite system. That is, a free operation E ∈ F(A → B)
can act on the state ρAC as E A→B (ρAC ) := E A→B ⊗ idC (ρAC ). This leads us to the following
definition.

Tensor Product Structure

Definition 9.1.2. A QRT, F, is said to admit a tensor-product structure if it fulfills
these additional criteria:

4. Completely free operations: For any three systems A, B, and C, and a channel
E ∈ F(A → B), it holds that E A→B ⊗ idC ∈ F(AC → BC).

5. Freedom of relabeling: For any integer n, a free channel N ∈ F(An → B n ), and

n n
permutation channels PπA and PπB−1 corresponding to a permutation π on n
n n n n
elements, the composition PπB ◦ N A →B ◦ PπA−1 is in F(An → B n ).

The condition 5 above is very intuitive as it just state that the relabeling of An =

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

9.1. THE STRUCTURE OF QUANTUM RESOURCE THEORIES 429

(A1 , . . . , An ) and B n = (B1 , . . . , Bn ), respectively, as (Aπ(1) , . . . , Aπ(n) ) and (Bπ(1) , . . . , Bπ(n) ),

keeps the channel N free. Note that we do not require that Pπ ◦ N ◦ Pπ−1 = N , but only
that Pπ ◦ N ◦ Pπ−1 is itself a free channel. See Fig. 9.1 for an illustration.

Figure 9.1: Illustration of the fifth condition. Relabeling maintains the “freeness” of N .

Conditions 1-5 above have several additional implications. First, note that since the trace
is a free channel (property 3), the partial trace is also free (since id ⊗ Tr is free). Second,
note that if N ∈ F(A → B) and M ∈ F(A′ → B ′ ) then

N ⊗ M = (N ⊗ id) ◦ (id ⊗ M) ∈ F(AA′ → BB ′ ) . (9.5)

In particular, this means that if two states are free then their tensor product is also free.
Finally, appending a free state is also a free channel. Specifically, let σ ∈ F(B) and define

NσA→AB (ρA ) := ρA ⊗ σ B ∀ ρ ∈ L(A) . (9.6)

This channel can be viewed as a tensor product of two free channels, namely N A→AB =
idA ⊗ σ 1→B , and therefore is free.

9.1.3 Two Additional properties: Closedness and Convexity

Resource theories can vary a lot as the set of free operations is unique to each resource theory.
For example LOCC in entanglement theory (Chapter 12), seems to be a very different set
of operations than the set of thermal operations in thermodynamics (Chapter 17). Yet, in
addition to the tensor product structure, both of these sets of operations have additional
structure that is common to both of them. Here we discuss two additional common properties
that are satisfied by almost all QRTs studied in literature.

6. For any physical system A the set of free states F(A) is closed.

7. For any physical system A the set of free states F(A) is convex.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

430 CHAPTER 9. STATIC QUANTUM RESOURCE THEORIES

Property 6 states that if for a sequence of states {ρn }n∈N ⊂ F(A) the limit ρ := limn→∞ ρn
exists, then that limit is in F(A) as well. Equivalently, if {ρn }n∈N ⊂ F(A) and there exists
ρ ∈ D(A) such that limn→∞ T (ρ, ρn ) = 0, where T is the trace distance (or any other distance
measure) then ρ ∈ F(A). Note that if this property does not hold then it would mean that
there exists a sequence of free states that approaching a resource ρ. However, if T (ρ, ρn ) is
extremely small, say 10−100 , for all practical purposes it is not possible to distinguish between
ρ and ρn . Therefore, the assumption that F(A) is closed is very practical and consequently
satisfied by all the QRTs studied in literature so far.
Property 7 is not satisfied by all QRTs, e.g. non-Gaussianity in quantum optics, although
many resource theories like entanglement do satisfy it and it is quite common. Besides of
being a convenient mathematical property, we can develop some intuition for this property.
Consider a QRT in which an agent, say Alice, has access to an unbiased coin. She can flip the
unbiased coin and prepares the state ρ ∈ F(A) if she get a head and otherwise prepares the
state σ ∈ F(A). Since ρ and σ are free, she can prepare them at no cost. Suppose now that
Alice forgets which state she prepared. We will assume here that this “forgetting” is itself
a free operation. Then, her state of the system is now 21 ρ + 21 σ. Therefore, we can assume
that the convex combination 12 ρ + 12 σ := τ is also free since Alice prepared it at no cost.
Moreover, if τ is free, Alice can repeat the same process with ρ and τ to get that 34 ρ + 14 σ
is also free. Repeating this process, Alice can prepare any combination 2kn ρ + (1 − 2kn )σ,
with n ∈ N and k ∈ [2n ]. Therefore, such convex combinations must also be free. Finally,
since the set { 2kn } is dense in [0, 1], Property 6 implies that for any t ∈ [0, 1] the convex
combination tρ + (1 − t)σ is free.

Exercise 9.1.1. Consider the process described above.

k
1. Show that for any n ∈ N and k ∈ [2n ] Alice can prepare the state 2n
ρ + (1 − 2kn )σ. How
many times Alice has to flip the coin.

2. Suppose that Alice has access to a biased coin, with probability 0 < p < 1 to get a
head (and 1 − p to get a tail). Show that Alice can use the coin to prepare any convex
combination of free states.

The two properties of closedness and convexity can also be applied to quantum channels.
That is, we can require that for any two physical systems A and B the set F(A → B) is
both closed and convex. However, we postpone the discussion of them to the second volume
of this book where we study dynamical QRTs.

9.2 State-Based Resource Theories

Certain types of quantum phenomena can be identified directly on the level of states without
involving constraints on quantum processes. This is particularly true for phenomena such
as coherence and certain forms of Bell nonlocality. Within the framework of QRTs, the
challenge becomes identifying sets of free operations that align with a predefined set of free

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

9.2. STATE-BASED RESOURCE THEORIES 431

states. It’s interesting to note that these free states, represented as F(A) = F(1 → A),
can themselves be considered a unique kind of free operations, specifically as preparation
channels. This approach affords substantial flexibility in choosing a consistent set of free
operations for any given set of free states, even within QRTs that admits a tensor-product
structure.
Consider, for example, the phenomenon of quantum coherence. Quantum coherence epit-
omizes a key aspect of quantum mechanics, illustrating the principle that particles, such as
electrons or photons, can simultaneously exist in multiple states. This phenomenon stems
from the principle of superposition, enabling particles to exist in a mixture of states, or
in coherent superposition, thus allowing them to interfere with one another in predictable
manners. However, coherence is a fragile state, easily disturbed by external influences in a
process known as decoherence, where quantum systems relinquish their superposition and
adopt more classical behaviors. In recent developments, the capability to control and pre-
serve quantum coherence has become crucial for the advancement of cutting-edge quantum
technologies, including quantum computing and quantum cryptography, empowering the
execution of tasks that surpass the capabilities of classical physics.
Considering the significance of this pivotal phenomenon, extensive efforts have been
dedicated to characterizing it within the realm of quantum resource theories. How is this
achieved? We start by identifying the set of free states in D(A). This is accomplished as fol-
lows: for any system A, a classical basis of the system is identified, denoted as {|x⟩}x∈[m] ⊂ A.
Subsequently, the set of free states, or incoherent states, is defined as all diagonal density
matrices in D(A) with respect to the classical basis. Thus, in the QRT of coherence, the set
of free states is clearly defined and is specified for any system A with dimension m := |A| as:
nX o
F(A) = px |x⟩⟨x| : p ∈ Prob(m) . (9.7)
x∈[m]

Consequently, the primary challenge in the resource theory of quantum coherence lies in
identifying a set of free operations that aligns consistently with the above set of free states.
Exercise 9.2.1. Let F(A) be the set of free states defined in (9.7), and let ∆ ∈ CPTP(A →
A) be the completely dephasing map defined with respect to the classical basis. Show that for
all ρ ∈ D(A) we have that ρ ∈ F(A) if and only if ∆(ρ) = ρ.
Physical factors often play a pivotal role in determining the choice of free operations
within the realm of quantum mechanics. Nonetheless, even when these free operations are
well-defined and grounded in physical principles, it is advantageous to investigate other
classes of free operations that correspond to the same set of free states. This exploration
can provide valuable insights and potentially reveal alternative mathematical or theoretical
frameworks, as alternate classes might offer simpler or more elegant solutions that are not
immediately apparent in operations primarily motivated by physical factors. A pertinent
example is found in entanglement theory, where characterizing the class of LOCC is notably
complex. To circumvent these complexities, considerable research has focused on entangle-
ment theory within broader and more mathematically accessible sets of operations, such as

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

432 CHAPTER 9. STATIC QUANTUM RESOURCE THEORIES

separable operations and non-entangling operations (refer to Chapter 12). A commonality

among these resource theories of entanglement is the identification of the set of separable
states as the free states. Exploring more advanced operations can result in demonstrating
no-go theorems for the less powerful but physically motivated free operations. In essence,
if a quantum information task is unachievable with a more capable class of operations, it
will certainly be infeasible with a weaker set. This section delves into various consistent
sets of free operations in general QRTs, emphasizing their physical justifications and unique
properties. To illustrate these abstract concepts, the QRTs of coherence and entanglement
will frequently be used as examples.
Often some physical consideration will motivate a certain choice of free operations. But
even in this case, it is valuable to study different classes of free operations for the same set
of free states. This is because different classes may have an easier or more elegant math-
ematical structure than the physically-motivated class of operations. This is the case, for
example, in entanglement theory where LOCC is a notoriously difficult class of operations
to characterize. To avoid the technical difficulties that arise when using these operations,
much work has been devoted to the study of entanglement theory under larger and more
analytically-friendly sets of operations such as separable operations, non-entangling opera-
tions, and more (see Chapter 12). In all these resource theories of entanglement, a shared
characteristic is the designation of separable states as the set of free states. Investigating
more advanced operations can facilitate the proof of no-go theorems for less powerful, albeit
more intuitive, free operations. This is based on the principle that if a quantum information
task is unachievable using a more capable class of operations, it will inevitably be impossible
with a weaker set. In this section, we study different consistent sets of free operations in
general QRTs, highlighting their various physical motivations and properties. As examples
to illustrate abstract ideas, we will often use the QRTs of coherence and entanglement to
demonstrate them.

9.2.1 Resource Non-Generating Operations

Let F(A) be a set of free density matrices. Any conceivable set of free operations F(A → B)
must satisfies the properties given in Definition 9.1.1. In particular, any channel N ∈ F(A →
B) must satisfy the golden rule of QRTs. We use this property in the following definition.

RNG Operations
Definition 9.2.1. Let F(A) ⊂ D(A) be the set of free states on any physical system
A. The set of resource non-generating operations (RNG) between two physical
systems A and B is defined as:
n o
RNG(A → B) := N ∈ CPTP(A → B) : N (ρ) ∈ F(B), ∀ ρ ∈ F(A) (9.8)

RNG operations form the maximal set of free operations. That is, every other QRT F

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

9.2. STATE-BASED RESOURCE THEORIES 433

with the same set of free state F(A) must satisfy

F(A → B) ⊆ RNG(A → B) . (9.9)

In the QRT of coherence this set of RNG operations is denoted by MIO(A → B) where the
acronym MIO stands for maximally incoherent operations. Denoting the ∆A ∈ CPTP(A →
A) and ∆B ∈ CPTP(B → B) the completely dephasing channels with respect to the clas-
sical systems A and B, respectively, we get from the definition above in conjunction with
Exercise ?? that
n o
MIO(A → B) = N ∈ CPTP(A → B) : ∆B ◦ N A→B ◦ ∆A = N A→B ◦ ∆A . (9.10)

Exercise 9.2.2. Prove (9.10).

Exercise 9.2.3. Consider the QRT of coherence where F(A) and F(B)
are sets of diagonal
density matrices with respect to some fixed bases |x⟩A x∈[m] and |y⟩B y∈[n] of A and B,
respectively. Show that a quantum channel N ∈ MIO(A → B) if and only if there exists
conditional probability distribution {py|x } such that for all x ∈ [m] the state
X
N A→B |x⟩⟨x|A = py|x |y⟩⟨y|B . (9.11)
y∈[n]

In entanglement theory the set of RNG operations is called non-entangling operations.

They consists of all bipartite quantum channels that cannot generate entanglement from
separable states. Non-entangling operations form a strict superset of LOCC. For example,
the (global) swap operator, that swap all the subsystems of Alice with those of Bob is non-
entangling. Such a swap operation is highly non-local and clearly cannot be simulated with
LOCC.
In general, RNG operations do not qualify as completely free. This means there are
instances where an operation E ∈ RNG(A → B), when combined with an identity operation
on some system C, i.e., E A→B ⊗ idC , does not remain a free operation. To illustrate this,
consider the context of entanglement theory and a product state ΦAÃ ⊗|0⟩⟨0|B shared between
Alice and Bob, where the sizes of A and B are equal. Previously, we noted that a global
swap operation is non-entangling. Specifically, applying a global swap to the aforementioned
state results in |0⟩⟨0|A ⊗ ΦB B̃ , maintaining its product state nature. However, if a partial
swap is applied between subsystems A and B (keeping Ã intact), the resulting state becomes
|0⟩⟨0|A ⊗ ΦÃB , which is entangled. This demonstrates that while a swap operation does not
generate entanglement when applied to the entire system, it can induce entanglement when
acting on individual subsystems.

9.2.2 Completely Resource Non-Generating Operations

The realization that RNG (resource non-generating) operations are generally not completely
free (as outlined in condition 4 of Definition 9.1.2) underscores their somewhat non-physical

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

434 CHAPTER 9. STATIC QUANTUM RESOURCE THEORIES

nature. This leads us to a new definition that not only adheres to the golden rule of Quantum
Resource Theories (QRTs) but also integrates the tensor product structure.

Completely RNG Operations

Definition 9.2.2. Given k ∈ N, a quantum channel N ∈ RNG(A → B) is defined as
k-RNG if for any reference system R of dimension k, the channel idR ⊗ N belongs to
RNG(RA → RB). Furthermore, the channel N is termed completely RNG (CRNG)
if it is k-RNG for all k ∈ N. The set of all CRNG channels in CPTP(A → B) is
denoted by CRNG(A → B).

The definition above generalizes the concepts of k-positivity and complete-positivity (see
Definition 3.4.1) to QRTs. Specifically, if we take the free set F(A) = D(A) to be the set
of all density matrices acting on A, then maps that are k-RNG and completely-RNG are
equivalent to maps that are k-positive and completely positive, respectively. Moreover, the
set of k-RNG maps with k = 1 is simply the set of RNG maps.
As an example, let F(AB) := SEP(AB) ⊂ D(AB) be the set of all separable states, and
let N ∈ CRNG(AB → A′ B ′ ) be a (bipartite) quantum channel that takes separable states
to separable states even when acting on subsystems. Specifically, for any composite reference
′ ′
system R = RA RB the channel idR ⊗ N AB→A B is non-entangling (i.e. RNG). Recall the
discussion at the beginning on this chapter that every system R in entanglement theory is
viewed as a bipartite system RA RB with RA on Alice’s side and RB on Bob’s side. Taking
RA ∼= A and RB ∼ = B we get that the state ΦRA A ⊗ ΦRB B is a product state between Alice
composite system RA A and Bob’s composite system RB B (see Fig. 9.2a). Therefore, since
′ ′
product states are in particular separable, and since idR ⊗ N AB→A B is non-entangling we
get that
′ ′
N AB→A B ΦRA A ⊗ ΦRB B ∈ SEP(RA A′ RB B ′ )

(9.12)

is a separable state between Alice’s system RA A′ and Bob’s system RB B ′ .

Figure 9.2: The state ΦRA A ⊗ ΦRB B in the lens of two bipartite cuts.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

9.2. STATE-BASED RESOURCE THEORIES 435

A key observation in this example is that the state ΦRA A ⊗ ΦRB B can be viewed as a
maximally entangled state between system R = RA RB ∼ = AB and system AB (see Fig. 9.2b);
i.e.
Φ(RA RB )(AB) = ΦRA A ⊗ ΦRB B . (9.13)
Therefore, the state in (9.12) is proportional to the Choi matrix of N . Since RA ∼
= A and
RB ∼ = B we conclude that the Choi matrix
′ ′ ′ ′

JNABA B := N ÃB̃→A B ΩAÃ ⊗ ΩB B̃ (9.14)

is an unnormalized separable state between Alice’s system AA′ and Bob’s system BB ′ . From
Exercise 3.2.4 it follows that the Choi matrix can be expressed as
′ ′ ′ ′
X
JNABA B = ψjAA ⊗ ϕBB
j (9.15)
j∈[k]

′ ′
where the sets {ψjAA }j∈[k] and {ϕBB
j }j∈[k] are sets of (possibly unnormalized) pure states
(i.e. rank one operators) in Pos(AA ) and Pos(BB ′ ), respectively. For each j ∈ [k], we can
′

write
′ ′
|ψjAA ⟩ = I A ⊗ Mj |ΩAÃ ⟩ and |ϕBB B
B B̃
j ⟩ = I ⊗ Nj |Ω ⟩ , (9.16)
for some complex matrices Mj ∈ L(A, A′ ) and Nj ∈ L(B, B ′ ). Using this notation in (9.15)
and comparing it with (9.14) we conclude that the channel N has the following operator
sum representation
′ ′ X
N AB→A B ρAB = (Mj ⊗ Nj )ρAB (Mj ⊗ Nj )∗ ∀ ρ ∈ L(AB) . (9.17)
j∈[k]

A channel is classified as separable if it possesses at least one operator sum representation

where each Kraus operator can be expressed as a tensor product in the form of Mj ⊗ Nj .
The collection of all such separable channels is denoted by SEP(AB → A′ B ′ ). This leads
to an important conclusion in entanglement theory: the set of completely non-entangling
operations, i.e., CRNG, aligns exactly with the set of separable channels. In other words,
we have CRNG(AB → A′ B ′ ) = SEP(AB → A′ B ′ ).

Exercise 9.2.4. Let RNG(AB → A′ B ′ ) be the set of non-entangling operations (i.e. RNG
with respect to the set F(AB) = SEP(AB)).

1. Show that N ∈ RNG(AB → A′ B ′ ) is k-RNG with k ⩾ |A| then it is completely RNG.

2. Show that

RNG = 1-RNG ⊇ 2-RNG ⊇ · · · ⊇ |AB|-RNG = CRNG = SEP . (9.18)

We saw above that for F(AB) = SEP(AB) we have CRNG ⊂ RNG where the inclusion
is strict since the global swap operator is non-entangling but also not a separable channel.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

436 CHAPTER 9. STATIC QUANTUM RESOURCE THEORIES

Moreover, we will see in Chapter 12 that also the inclusion RNG ⊇ 2-RNG in the exercise
above can be strict. However, in other resources theories some of these inclusions can be
equalities. For example, in the QRT of coherence, in which F(A) ⊂ D(A) consists of diagonal
states with respect to a fixed basis {|x⟩A }x∈[m] , we have that RNG = CRNG. To see this,
let N ∈ MIO(A → B) (recall that in the QRT of coherence we denote all RNG operations
from system A to B by MIO(A → B)). According to (9.10), ∆B ◦ N ◦ ∆A = N ◦ ∆A . We
need to show that for any system C, we have N ⊗ idC ∈ MIO(AC → BC). Let ∆C be the
completely dephasing channel with respect to the classical basis of system C. In the exercise
below you show that ∆AC = ∆A ⊗ ∆C . Therefore,

∆CB ◦ idC ⊗ N ◦ ∆CA = ∆C ⊗ ∆B ◦ idC ⊗ N ◦ ∆C ⊗ ∆A

C B A
◦ ∆C ⊗ idA

∆C ◦ ∆C = ∆C −−−−→ = id ⊗ ∆ ◦ N ◦ ∆
C (9.19)
A
◦ ∆C ⊗ idA

N ∈ MIO(A → B) −−−−→ = id ⊗ N ◦ ∆
C CA

∆CA = ∆C ⊗ ∆A −−−−→ = id ⊗ N ◦ ∆ .

Thus, N ⊗ idC ∈ MIO(AC → BC).

Exercise 9.2.5. Let {|x⟩A }x∈[m] and {|y⟩B }y∈[n] , be respectively, two orthonormal bases of
A and B. Further, let ∆AB be the completely dephasing channel with respect to the basis
{|xy⟩AB }x,y . Show that ∆AB = ∆A ⊗ ∆B , where ∆A and ∆B are the completely dephasing
channels with respect to the bases {|x⟩A }x∈[m] and {|y⟩B }y∈[n] , respectively.

9.2.3 Physically Implementable Operations

The use of CPTP maps and generalized measurements in quantum information science is
so common that their physical implementations are often taken for granted. Specifically,
in Sec. 3.4.5 we saw that the Stinespring dilation theorem ensures that every quantum
channel in CPTP(A → A) can be implemented with a unitary evolution on the joint system
AE followed by tracing out the environment degrees of freedom (i.e. tracing out system
E). Similarly, generalized measurements can be implemented with a joint unitary operation
followed by a projective measurement as described in Fig. 3.1. In the context of QRTs, such
implementations of free channels (or free generalized measurements) may not be free since
the joint unitary (or projective measurement) identified in such implementation is itself not
necessarily free.
As an example, consider the QRT of coherence, with F = MIO being the set of free
operations. If MIO were physically implementable, one would expect that for any channel
N ∈ MIO(A → A) there exists a system E, a free unitary channel U ∈ MIO(AE → AE),
and a diagonal state γ ∈ F(E) such that

N A→A (ω A ) = TrE U AE ω A ⊗ γ E U ∗AE .

(9.20)

Now, from Exercise 9.2.3 we know that a channel V ∈ CPTP(A → A) is MIO if and only if
V(|x⟩⟨x|A ) is a diagonal state in D(A) for all x ∈ [m] (here m := |A|). Therefore, if V is a

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

9.2. STATE-BASED RESOURCE THEORIES 437

unitary channel, then we must have for all x ∈ [m]

V A→A (|x⟩⟨x|A ) := V |x⟩⟨x|A V ∗ = |π(x)⟩⟨π(x)|A (9.21)

where π is some permutation on m elements. This relation implies that the unitary matrix V
satisfies V |x⟩A = eiθx |π(x)⟩A . In other words, up to phases, all the free unitary operations in
the QRT of coherence are permutations. Given that permutation matrices form an extremely
small set of operations relative to the set of all unitary channels, it is not too hard to show (see
the relevant references at the end of this chapter) that there exists channels in MIO(A → A)
that do not have the form (9.20) with free (i.e. incoherent) U AE and free (i.e. diagonal)
γ E . In other words, it costs coherence (i.e. resources) to implement some free channels in
MIO(A → A).
The above problem does not occur in the QRT of entanglement in which the set of
free operations is LOCC. This is always the case whenever a QRT is defined in terms of a
physical restriction (e.g. distant labs in entanglement theory) that is imposed on the set of
free operations. On the other hand, any QRT such as quantum coherence, in which first the
free states are identified, and only then consistent free operations are proposed, may face
such an implementation problem. Aside from the QRT of coherence, all the free operations
of the QRTs studied in this book will have a physically implementable set of free operations.

Definition 9.2.3. Let F be a QRT, and A and B two physical systems. We say that
F(A → B) is physically implementable if any channel in F(A → B) can be generated
by a sequence of unitary channels (possibly on composite systems), projective
measurements, appending of free states, and processing of the classical outcomes,
where each element in the sequence is itself a free action (see Fig. 9.3 for an
illustration).

Remark. In the definition above we added classical processing as a possible free physically
implementable operation. This include for example classical communication between sub-
systems, if these were allowed in the QRT (e.g. entanglement theory). Note also that if
the free operations in a QRT are not physically implementable (according to the definition
above), then the QRT would identify certain maps as being free with no way to physically
implement these processes using free operations.
For a given designation of free states F(A), it is possible to construct a unique physically
implementable QRT that admits a tensor-product structure. Simply define the free opera-
tions to be any composition of (i) appending arbitrary free states, (ii) CRNG unitaries and
projective measurements, (iii) discarding subsystems, and (iv) all free classical-processing
maps. For a given two subsystems A and B we denote this set of physically implementable
operations (PIO) as PIO(A → B). By design, PIO(A → B) is physically implementable and
has tensor-product structure. Most QRTs that were studied in literature have the property
that all the isometries in RNG are completely free. In such QRTs, PIO is the minimal set
of free operations that is consistent with the set of free states F(A). The class PIO(A → B)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

438 CHAPTER 9. STATIC QUANTUM RESOURCE THEORIES

Figure 9.3: Example of a physically implementable free operation on a composite system AB.

fits into the hierarchy of operations as

PIO ⊆ CRNG ⊆ RNG . (9.22)

9.2.4 Dually RNG Operations

Let E ∈ F(A → B) ⊆ CPTP(A → B) represent a free operation within a specific resource
theory F. As E is a completely positive map, its dual map E ∗ is also completely positive.
However, it is crucial to note that the dual map E ∗ may not necessarily preserve trace.
Despite this, in many resource theories, although E ∗ is not trace-preserving, it satisfies the
following condition:
E ∗ (σ)
∈ F(A) ∀ σ ∈ F(B) . (9.23)
Tr [E ∗ (σ)]
This means that E ∗ is an RNG operation up to normalization.

Exercise 9.2.6. Consider the resource theory of quantum entanglement. Show that if E ∈
SEP(AB → A′ B ′ ) then
′ ′
E ∗ σA B
∗ A ′B′ ∈ SEP(AB) ∀ σ ∈ SEP(A′ B ′ ) . (9.24)
Tr [E (σ )]

The need for normalization in (9.23) can be eliminated by broadening the definition of
RNG operations to include cone-preserving operations. Specifically, for each system A, let
us define K(A) ⊆ Pos(A) as the cone represented by:

K(A) := tσ : σ ∈ F(A) , t ∈ R+ . (9.25)

We then classify a map E ∈ CP(A → B) as a K-preserving operation if, for every η ∈ K(A),
it holds that E(η) ∈ K(B). By extending the scope of RNG operations to maps that are
not necessarily trace-preserving, the condition in (9.23) essentially signifies that E ∗ is a K-
preserving operation.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

9.2. STATE-BASED RESOURCE THEORIES 439

Definition 9.2.4. Using the same notations as above, we say that a quantum
channel E ∈ CPTP(A → B) is dually resource non-generating if both E and its dual
map E ∗ are K-preserving operations.

Remark. Note that since E is a trace-preserving map, the requirement for it to be a K-

preserving operation is essentially equivalent to the condition of it being a RNG operation.
Accordingly, we will use dRNG(A → B) to represent the collection of all dually RNG channels
within CPTP(A → B).
In a resource theory F, if every operation E ∈ F(A → B) satisfies (9.23), then it follows
that:
F(A → B) ⊆ dRNG(A → B) ⊆ RNG(A → B) . (9.26)
This implies that dually RNG channels form a subset of the RNG channels, which includes all
the free channels. For instance, in the theory of entanglement, the dual of any separable map
is clearly separable, making separable maps dually non-entangling (and thus, LOCC channels
as well). Protocols involving dually RNG operations are more restricted compared to other
operations, such as non-entangling maps. This can provide a more precise approximation to
LOCC, potentially leading to improved bounds on their operational power.
Take, for example, the resource theory of coherence. Let E ∈ CPTP(A → B) be an RNG
operation; that is, E ∈ MIO(A → B). The condition in (9.23) for this theory is equivalent
to:
∆A ◦ E ∗ σ B = E ∗ σ B

∀ σ ∈ F(B) . (9.27)
For any ρ ∈ D(A), the above equation leads to:

Tr σ B E ρA = Tr σ B E ◦ ∆A ρA .

(9.28)

Since σ ∈ F(B) if and only if σ = ∆B (τ ) for some τ ∈ D(B), we can rewrite the left-hand
side of the equation as:

Tr σ B E ρA = Tr τ B ∆B ◦ E ρA ,

(9.29)

utilizing the self-adjoint nature of ∆B . Similarly, applying the self-adjoint property of ∆B

to the right-hand side of Equation (9.28) results in:

Tr σ B E ◦ ∆A ρA = Tr τ B ∆B ◦ E ◦ ∆A ρA

B A (9.30)
ρA .

E ∈ MIO(A → B) −−−−→ = Tr τ E ◦ ∆

Combining these equations, it is concluded that for all ρ ∈ D(A) and τ ∈ D(B), the following
holds:
Tr τ B E ◦ ∆A ρA = Tr τ B ∆B ◦ E ρA .

(9.31)
Hence, the condition becomes:

E A→B ◦ ∆A = ∆B ◦ E A→B . (9.32)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

440 CHAPTER 9. STATIC QUANTUM RESOURCE THEORIES

This means that if E ∈ dRNG(A → B), it must satisfy the above condition. The exercise
below demonstrates that the converse is also true, leading to the conclusion that a quantum
channel E ∈ CPTP(A → B) is in dRNG(A → B) if and only if it satisfies the condition
in Equation (9.32). Notably, for the case where A = B, this condition simplifies to the
commutation relation [E, ∆] = 0.
We note that in the QRT of coherence, quantum channels that satisfy Equation (9.32)
are identified as Dephasing-covariant Incoherent Operations, abbreviated as DIO. Therefore,
we have demonstrated that within the resource theory of coherence, the set dRNG(A → B),
is equivalent to the set of DIO channels, denoted as DIO(A → B).

Exercise 9.2.7. Show that DIO(A → B) ⊆ dRNG(A → B); i.e., let E ∈ CPTP(A → B) be
a quantum channel satisfying (9.32), and show that E ∈ dRNG(A → B).

Exercise 9.2.8. Show that the inclusion DIO(A → B) ⊂ MIO(A → B) is strict.

9.3 Affine Resource Theories

Some QRTs satisfy a stronger condition than convexity, and we call them affine resource
theories (ARTs). We will see later on that ARTs have many desired properties, and in
particular, their conversion rates can be computed efficiently and algorithmically with an
SDP.

An Affine Set
Definition 9.3.1. Let A be a physical system, and F(A) ⊆ D(A) be a set of density
matrices. The set F(A) is called an affine set, if every affine combination of n free
states X
σ := tx ρx ρ1 , . . . , ρn ∈ F(A) , t1 , . . . , tn ∈ R , (9.33)
x∈[n]

such that σ ∈ D(A), satisfies σ ∈ F(A).

As an example of an affine set, let F be the QRT of coherence in which F(A) is the
set of all diagonal states in D(A) with respect to a fixed basis of A. The set of diagonal
states
P F(A) is affine since for any affine combination of diagonal states is diagonal;
P that is,
if x∈[n] tx ρx ⩾ 0, where each state ρx ∈ F(A) (i.e. each ρx is diagonal) and x∈[n] tx = 1,
P
then also x∈[n] tx ρx is a diagonal state. Therefore, the set of free states in the QRT of
coherence is affine.

Exercise 9.3.1. Let F(A) be the set of all density matrices in D(A) with real components
with respect to a fixed basis {|x⟩}x∈[m] of A. That is, ρ ∈ F(A) ⊂ D(A) if and only if the
number ⟨x|ρ|x′ ⟩ is real for all x, x′ ∈ [m]. Show that F(A) is affine.

The following exercise provides another characterization of affine sets.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

9.3. AFFINE RESOURCE THEORIES 441

Exercise 9.3.2. Let F(A) ⊆ D(A) be a set of density matrices, and let K(A) := spanR {F(A)}
be the subspace of Herm(A) consisting of all linear combinations of the elements of F(A).
Show that F(A) is affine if and only if

F(A) = K(A) ∩ D(A) . (9.34)

Not all convex sets are affine. For example, the set of separable states in entanglement
theory is not affine. To see why, recall that product states of the form ψ A ⊗ϕB are free states.
Let m := |A|, n := |B|, {ψxA }x∈[m2 ] be a rank one basis of Herm(A), and {ϕB y }y∈[n2 ] be a
2 2 A B
rank one basis of Herm(B). Then, the m n states {ψx ⊗ ϕy }x,y form a basis of Herm(AB).
This means that any density matrix ρ ∈ D(AB) can be expressed as an affine combination
of product states; i.e. X X
ρAB = txy ψxA ⊗ ϕB
y , (9.35)
x∈[m2 ] y∈[n2 ]

where {txy } is a set of real numbers. Hence, the set of separable states is not affine since
even entangled states can be expressed as affine combination of product states. In this sense,
the set of separable states is maximally non-affine. More generally, we say that a set F(A)
is maximally non-affine if
Herm(A) = spanR {F(A)} . (9.36)
P
Exercise 9.3.3. Show that the set {txy }x,y in (9.35) must satisfy x,y txy = 1.

Exercise 9.3.4. Let |ΦAB √1 (|00⟩

+ ⟩ = 2
+ |11⟩) be the maximally entangled states.

1. Express (explicitly) ΦAB as an affine combination of product states.

2. Show that ΦAB cannot be expressed as a convex combination of product states.

It is straightforward to extend the definition of an affine set in D(A) to sets in CPTP(A →

B).

Affine Resource Theories

Definition 9.3.2. A QRT, F, is called an affine resource theory (AFT) if for any
two systems A and B, and any affine combination of n free channels
X
N A→B := tx ExA→B , E1 , . . . , En ∈ F(A → B) , t1 , . . . , tn ∈ R , (9.37)
x∈[n]

such that N ∈ CPTP(A → B), the channel N is free (i.e. N ∈ F(A → B)).

We will see in the next two chapters that ARTs have several properties that make them
much easier to study. Particularly, many problems in ARTs can be solved with semidefinite
programming, unlike certain convex QRTs, such as entanglement theory, in which even the
determination of whether a state is free or not is very hard (more precisely, belongs to a

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

442 CHAPTER 9. STATIC QUANTUM RESOURCE THEORIES

complexity class known as NP-hard). In the following theorem we show that if the set of
free operations is RNG or CRNG then the QRT is affine if and only if the set of free states
is affine.

Theorem 9.3.1. Let F(A) ⊆ D(A) be a set of free states on any physical system A,
and for any two physical systems A and B, let RNG(A → B) and CRNG(A → B) be
as defined in Definitions 9.2.1 and 9.2.2 (with respect to the sets F(A) and F(B)).
Then, the following statements are equivalent:

1. For all systems A, the set F(A) is affine.

2. For all systems A and B, the set RNG(A → B) is affine.

3. For all systems A and B, the set CRNG(A → B) is affine.

Proof. The implication 1 ⇒ 2: Let E1 , . . . , En ∈ RNG(A → B) and t1 , . . . , tn ∈ R be such

that X
N A→B := tx ExA→B ∈ CPTP(A → B) . (9.38)
x∈[n]

We need to show that N ∈ RNG(A → B). Let σ ∈ F(A) be a free state. Since N is a
quantum channel N A→B (σ A ) ∈ D(B). On the other hand, from the definition of N , we can
write this relation as
X
N A→B (σ A ) = tx ωxB ∈ D(B) where ωxB := ExA→B σ A .

(9.39)
x∈[n]

Since σ ∈ F(A) and Ex ∈ RNG(A → B) it follows that each ωx ∈ F(B). Finally, using the
assumption that F(B) is affine we get that N A→B (σ A ) ∈ F(B). Since σ was arbitrary state
in F(A) we conclude that N ∈ RNG(A → B).
The implication 2 ⇒ 3: Let E1 , . . . , En ∈ CRNG(A → B) be a set of n CRNG channels,
and let t1 , . . . , tn ∈ R be such that (9.38) holds. We need to show that N ∈ CRNG(A → B)
or equivalently, that for any reference system R, idR ⊗ N ∈ RNG(RA → RB). Since each
idR ⊗ Ex ∈ RNG(RA → RB) and since we assume that RNG(RA → RB) is affine it follows
that X
idR ⊗ N = tx idR ⊗ ExA→B ∈ RNG(RA → RB) . (9.40)
x∈[n]

Hence, CRNG(A → B) is affine.

The implications 3 ⇒ 1: By taking A to be the trivial system (i.e. |A| = 1) in
CRNG(A → B) we get that F(B) is affine for all systems B. This completes the proof.

9.3.1 Resource Destroying Maps

In specific QRTs, notably those concerning coherence and asymmetry, a distinct transfor-
mation exists. This transformation has the ability to map any density matrix into a free

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

9.3. AFFINE RESOURCE THEORIES 443

state, essentially ”destroying” the resource. At the same time, it functions as the identity
channel when applied to free states. This dual capability highlights its unique role in these
particular QRTs. We point out that such maps do not necessarily have to be channels, and
they might not even be linear. However, in the context of this book, where our exploration
is confined to theories within the realm of quantum mechanics, we will consistently assume
that these resource-destroying maps are at least linear.

Resource Destroying Map

Definition 9.3.3. Let F be a QRT. A resource destroying map (RDM) is a linear
map ∆ ∈ L(A → A) with the following two properties:

1. For all ρ ∈ D(A), ∆(ρ) ∈ F(A).

2. For all σ ∈ F(A), ∆(σ) = σ.

Remark. It is important to note that the definition of a RDM is relative to the set of free
states F(A). Consequently, different QRTs with an identical set of free states F(A) may
share the same RDM. Conversely, there could exist multiple RDMs corresponding to the
same set F(A). Furthermore, it is evident that ∆ ∈ Pos(A → A). This is because for any
positive operator Λ ∈ Pos(A), the transformed state ∆(Λ/Tr[Λ]) is a free state and therefore
positive semidefinite. However, it is crucial to understand that a RDM is not necessarily
completely positive.
As an example of a RDM, consider the QRT of coherence in which F(A) ⊂ D(A) is the
set of diagonal states with respect to the basis {|x⟩}x∈[m] (here m := |A|). Relative to this
basis, define the completely dephasing channel
X
∆(ρ) := ⟨x|ρ|x⟩|x⟩⟨x| . (9.41)
x∈[n]

It is simple to check that ∆ as defined above is a RDM with respect to the set of diagonal
matrices. In this example ∆ is a quantum channel.
Exercise 9.3.5. Show that ∆ as defined in (9.41) is a RDM. Moreover, show that it is
self-adjoint; i.e. ∆∗ = ∆.
Exercise 9.3.6. Show that any RDM ∆ ∈ L(A → A) is idempotent; i.e. ∆ ◦ ∆ = ∆.
In the following lemma we show that not all QRTs have a RDM.

Lemma 9.3.1. Let F be a QRT and let A be a quantum system. If F(A) is not
affine then the QRT F does not have a RDM.

Proof. Suppose by contradiction that F(A) is not affine, and yet, there exists a RDM ∆ ∈
L(A → A). Then, by definition, there exists t1 , . . . , tn ∈ R and σ1 , . . . , σn ∈ F(A) such that
X
σ := tx σx ∈ D(A) but σ ̸∈ F(A) . (9.42)
x∈[n]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

444 CHAPTER 9. STATIC QUANTUM RESOURCE THEORIES

Moreover, since ∆ is a RDM we must have ∆(σ) ∈ F(A). On the other hand,
X
∆(σ) = tx ∆(σx )
x∈[n]
X
= tx σx (9.43)
x∈[n]

= σ ̸∈ F(A) ,

in contradiction with the fact that ∆(σ) ∈ F(A). Therefore, if a QRT admits a RDM then
it’s set of states must be affine.

Characterization of self-adjoint RDM

All the RDMs that play a role in well studied QRTs, such as the QRTs of coherence and
asymmetry, have the property that they are self-adjoint. Self-adjoint RDMs have a relatively
simple characterization, and they always exists if the set of free states is affine. In the
definition below we consider an affine set F(A) ⊆ D(A), and denote by K(A) := spanR {F(A)}
the subspace of Herm(A) consisting of all linear combinations of the elements of F(A).
Moreover, we denote by K(A)⊥ the orthogonal complement of K(A) in Herm(A) that satisfies
Herm(A) = K(A) ⊕ K(A)⊥ .

Self-Adjoint RDM
Definition 9.3.4. Let F(A), K(A), and K(A)⊥ be as defined above. The linear map
∆ : Herm(A) → Herm(A)

∆(η + ζ) := η ∀ η ∈ K(A) and ∀ ζ ∈ K(A)⊥ , (9.44)

is called the self-adjoint RDM.

Remark. Note that in the context above, the map ∆ is referred to as the self-adjoint RDM.
This designation is justified by demonstrating that for every ART, there exists a unique
self-adjoint RDM (see theorem below). This uniqueness within each ART underscores the
specific role that such a map plays in the theory.
Exercise 9.3.7. Verify that ∆ in the definition above is indeed a RDM which is self-adjoint
(with respect to the Hilbert-Schmidt inner product).

Theorem 9.3.2. Let F(A) ⊆ D(A) be an affine set. Then, there exists a unique
self-adjoint RDM associated with F(A) which is given by ∆ as defined in (9.44).

Proof. The existence of ∆ follows from Definition 9.3.4 and Exercise 9.3.7. To prove unique-
ness, let ∆ : Herm(A) → Herm(A) be a self-adjoint RDM. We would like to show that it
coincide with the RDM given in Definition 9.3.4. Indeed, from the linearity of ∆, and the

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

9.3. AFFINE RESOURCE THEORIES 445

fact that ∆ is a RDM, we must have ∆(η) = η for all η ∈ K(A). Moreover, since ∆ is
self-adjoint for every ζ ∈ K(A)⊥ and η ∈ K(A) we have
0 = Tr[ηζ] = Tr[∆(η)ζ] = Tr[η∆(ζ)] . (9.45)
Since the above equation holds for all η ∈ K(A) we conclude that ∆(ζ) ∈ K(A)⊥ . However,
since ∆ is a RDM we must have ∆(ζ) ∈ K(A). Both conditions hold only if ∆(ζ) = 0. To
summarize, we got that for all η ∈ K(A) and all ζ ∈ K(A)⊥ we have
∆(η + ζ) = ∆(η) + ∆(ζ)
∆(ζ) = 0 −−−−→ = ∆(η) (9.46)
=η.
Hence, ∆ coincides with the map defined in (9.44). This completes the proof.
Exercise 9.3.8. Let F(A) ⊆ D(A) be an affine set, and let {ηx }x∈[m] be an orthonormal
basis of spanR {F(A)} (w.r.t. to Hilbert-Schmidt inner product). Show that the linear map
X
∆(ω) := Tr [ηx ω] ηx ∀ω ∈ L(A) , (9.47)
x∈[m]

is a self-adjoint resource destroying map.

We have seen that if a F(A) ⊆ D(A) has a RDM then it must be affine. On the other
hand, any affine set F(A) has a unique self-adjoint RDM. Therefore, if F(A) has a RDM then
it also has a self-adjoint RDM. Note however that the self-adjoint RDM, ∆, as defined in
Definition 9.3.4, is not necessarily a quantum channel. To be a quantum channel, the Choi
matrix of ∆, denoted as J∆AÃ , must satisfy the two conditions of a quantum channel, namely,
J∆AÃ ⩾ 0 and J∆A = I A . In the QRT of coherence, the completely decohering map as defined
in (9.41) is indeed a self-adjoint RDM that is also a quantum channel.
However, there exists affine sets for which the self-adjoint RDM is not a quantum channel.
For example, let F(A) be the set of all density matrices in D(A) whose components are real
with respect to a fixed basis {|x⟩}x∈[m] (with m := |A|). A QRT with this set of states
have been used in literature to quantify the imaginarity of quantum mechanics. Now, in the
exercise below you show that the self-adjoint RDM of this resource theory is give by
1
η + ηT

∆(η) = ∀ η ∈ Herm(A) , (9.48)
2
where the transpose is taken with respect to the fixed basis {|x⟩}x∈[m] . Therefore, since the
traspose is not completely positive it follows that ∆ above is not completely positive (see
Exercise below).
Exercise 9.3.9. Consider the affine set, F(A), of all density matrices in D(A) whose com-
ponents are real with respect to a fixed basis {|x⟩}x∈[m] (with m := |A|).
1. Show that the self-adjoint RDM associated with F(A) is given by (9.48).
2. Show that this self-adjoint RDM is not completely positive.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

446 CHAPTER 9. STATIC QUANTUM RESOURCE THEORIES

9.4 Resource Witnesses

In certain resource theories, the complexity of the set of free states is such that determining
whether a given state is part of this set can be challenging. This is particularly evident
in the theory of mixed bipartite entanglement. To address this challenge, a valuable tool
from convex analysis can be employed to ascertain whether a quantum state qualifies as a
free state. This approach provides a practical method for navigating the intricacies of these
theories and identifying free states within complex sets.

Resource Witness
Definition 9.4.1. Let F be a QRT and let A be a physical system. An operator
W ∈ Herm(A) is called a resource witness if the following two conditions holds:

1. For any σ ∈ F(A)

Tr [W σ] ⩾ 0 . (9.49)

2. There exists ρ ∈ D(A) such that

Tr [W ρ] < 0 . (9.50)

Since a resource-witness is an Hermitian matrix it corresponds to an observable and the

expressions Tr[W ρ] and Tr[W σ] corresponds to expectation values of W and in principle can
be measured in a laboratory. Therefore, resource witnesses provides a practical method to
test if a given quantum system is a resource.
The condition (9.50) in the definition above is equivalent to the statement that W is not
a positive semidefinite matrix. Clearly, every positive semidefinite matrix satisfies (9.49).
Therefore, resource witnesses can be viewed as the elements of F(A)∗ that are not positive
semidefinite. Recall from Sec. A.8 that F(A)∗ denotes the dual cone of F(A) defined by

F(A)∗ := W ∈ Herm(A) : Tr [W σ] ⩾ 0 ∀ σ ∈ F(A) .

(9.51)

Since Pos(A) ⊆ F(A)∗ we conclude that the set of all resource-witnesses can be viewed as
the non-positive semidefinite matrices in F(A)∗ . If F(A) is closed and convex then the set of
all witnesses completely determines the set of free states.

Theorem 9.4.1. Let A be a physical system, F(A) ⊆ D(A) be a closed and convex
subset of density matrices, and σ ∈ D(A). Then, σ ∈ F(A) if and only if

Tr [W σ] ⩾ 0 (9.52)

for all resource witnesses W ∈ Herm(A).

Proof. This theorem follows from the property that any closed and convex set K ⊂ Herm(A)
satisfies K∗∗ = K (see Theorem A.8.1). Hence, in particular, F(A)∗∗ = F(A). The latter

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

9.5. NOTES AND REFERENCES 447

means that σ ∈ F(A) if and only if σ ∈ F(A)∗∗ ; i.e. if and only if

Tr[W σ] ⩾ 0 ∀ W ∈ F(A)∗ . (9.53)

Note that the inequality above holds trivially for all W ⩾ 0. Therefore, it is sufficient to
check it for all 0 ̸⩽ W ∈ F(A)∗ ; i.e. for all resource witnesses.
We point out that for affine QRTs, determining whether a quantum state is free or not is
relatively an easy task. Since any affine set F(A) has a self-adjoint resource destroying map
(see Definition 9.3.4), to determine if a state ρ ∈ D(A) is free or not, all we have to do is to
check if ∆(ρ) = ρ. Such a simplification does not occur in certain important convex QRTs
(e.g. entanglement theory).

9.5 Notes and References

The term “resource theory” has a bit of a history, starting with the early recognition that
quantum information theory, particularly quantum Shannon theory (which we cover in vol-
ume 2 of this book), is a theory of interconversions among different resources; see [19]
and [62]. Originally coined by Schumacher in 2003 (unpublished), the term “resource the-
ory” first appeared in a paper on the QRT of quantum reference frames by [97], although
the framework for a QRT of information had already been investigated in a series of earlier
papers by [174, 125, 129]. The definition given here for a QRT is due to [30] and [48]. Other
definitions involving symmetric monoidal categories can be found in papers by [52, 80]. How-
ever, these definitions involve terms from category theory and goes beyond the scope of this
book.
Affine resource theories were introduced by [88] and resource destroying maps by [148].

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

448 CHAPTER 9. STATIC QUANTUM RESOURCE THEORIES

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

CHAPTER 10

Quantification of Quantum Resources

One of the most useful aspects of a QRT is that it generates precise and operationally
meaningful ways to quantify a given physical resource. Here we study a variety of resource
measures that can be introduced in any QRT. We start with the definition of a resource
measure, and discuss some additional desirable properties that any resource measure should
satisfy. After that, we study different families of specific resource measures applicable to
any QRT. We put emphasis on the Umegaki relative entropy of a resource, as it turns out
that this resource measure has several operational interpretations, and it plays a major role
throughout this book.

10.1 Definitions and Properties of Resource Measures

In their definition, the set of free states, F(A), and the set of free operations, F(A → B),
are defined on any Hilbert spaces A and B. Consequently, a resource measure is defined as
a function whose domain is the set of all density matrices in every finite dimension. In the
following definition we consider a function
[
M: D(A) → R , (10.1)
A

where the union is over all Hilbert spaces A. This union also includes the trivial system
A = C (i.e. |A| = 1) in which case D(A) = {1} consists of only one element, namely the
number 1.

Definition 10.1.1. The function M in (10.1) is called a resource measure if it

satisfies:

1. Monotonicity: M E(ρ) ⩽ M(ρ) for all E ∈ F(A → B) and all ρ ∈ D(A).

2. Normalization: M(1) = 0.

449
450 CHAPTER 10. QUANTIFICATION OF QUANTUM RESOURCES

The first property is fundamental in resource theories. It asserts that the value of any
resource measure cannot be increased through the use of free operations. This principle,
known as monotonicity, is consistent with the “golden rule” of QRTs that free operations
cannot generate resources. The normalization condition, in conjunction with monotonicity,
leads to the positivity of every resource measure M , which can be expressed as

M(ρ) ⩾ 0 ∀ ρ ∈ D(A) . (10.2)

The positivity follows from the fact that the trace is a free operation in any QRT, and
therefore, from the monotonicity of M under free operations we have

M(ρ) ⩾ M(Tr[ρ]) = M(1) = 0 . (10.3)

Similarly, the two conditions of normalization and monotonicity implies that for any finite
dimensional system A,
σ ∈ F(A) ⇒ M (σ) = 0 . (10.4)
Indeed, any state σ ∈ F(A) can be viewed as a free channel σ 1→A , where 1 represent the
trivial system corresponding to the Hilbert space C. Hence,

M σ A = M σ 1→A (1) ⩽ M(1) = 0 ,

(10.5)

where the inequality follows from the monotonicity property of M. Therefore, since M is
non-negative we must have M(σ) = 0 for all σ ∈ F(A) and all finite dimensional systems A.
We discuss now several properties that are satisfied by some, but not all, resource measures.

Faithfulness
The condition expressed in (10.4) quantitatively defines the notion of “no resource.” Intu-
itively, one might be inclined to consider that the reverse of (10.4) should also hold true.
This concept is referred to as faithfulness. A general resource measure M is deemed faithful
if M (ρ) = 0 necessarily implies that ρ is a free state.
However, it’s important to recognize that for certain tasks, some resource states may not
offer any operational advantage over free states. In such scenarios, these states should be
assigned a zero value by any measure that quantifies their utility for performing the specified
task. For instance, as we will explore later, the measure of distillable entanglement, which
is a significant measure of entanglement, is zero for all bound entangled states. Therefore,
although faithfulness is an intuitively attractive property, it is not an essential requirement
for a resource measure. This perspective allows for a more nuanced understanding of resource
measures and their application in various contexts within quantum resource theories.

Resource Monotones
In certain QRTs, quantum measurements do not belong to the set of free operations. One
such example is quantum thermodynamics as we will see in Chapter 17. However, in many

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

10.1. DEFINITIONS AND PROPERTIES OF RESOURCE MEASURES 451

other QRTs, like entanglement, quantum measurements can be free, and they represent
an important component of the theory. In such QRTs, the set of quantum instruments
F(A → BX), where X is a classical ‘flag’ system, is not empty.P In particular, any such
channel in E ∈ F(A → BX) can be express as E A→BX = E
x∈[m] x
A→B
⊗ |x⟩⟨x|X
, and
consequently, any resource measure M satisfies for all ρ ∈ D(A)

X
M ρA ⩾ M E A→BX (ρA ) = M px σxB ⊗ |x⟩⟨x|X ,

(10.6)
x∈[m]

where
1 A→B A
px := Tr[ExA→B (ρA )] and σxB := E (ρ ) . (10.7)
px x

We say that M is convex linear on qc-states if for all σ ∈ D(BX) as above

X X
M px σxB ⊗ |x⟩⟨x|X = px M(σxB ⊗ |x⟩⟨x|X ) . (10.8)
x∈[m] x∈[m]

Almost all resource measures studied in literature are convex linear on QC states. One
reason for that is that the equality above is satisfied by many functions, like the von-Neuman
entropy, Rényi entropies, all the Schatten p-norms, etc. If a QRT admits a tensor product
structure then the partial trace is considered free so that for every x ∈ [m]

M(σxB ⊗ |x⟩⟨x|X ) ⩾ M(σxB ) . (10.9)

Combining this with (10.6) and (10.8) we get that in such QRTs

X
M ρA ⩾ px M(σxB ) .

(10.10)
x∈[m]

This property is sometimes referred to as strong monotonicity. An intuitive justification

for requiring strong monotonicity is to prevent M from increasing on average when the
experimenter can post-select or “flag” the multiple outcomes of a quantum measurement.
We point out that in many QRTs, the classical flag states {|x⟩⟨x|X }x∈[m] are themselves
considered as free states in F(X). As physical QRTs admits a tensor product structure,
appending or discarding flags is considered a free operations (recall that the partial trace is
free and appending free states as in (9.6) is a free operation). For such QRTs we have in
fact equality in (10.9).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

452 CHAPTER 10. QUANTIFICATION OF QUANTUM RESOURCES

Resource Monotone
Definition 10.1.2. A resource measure M is called a resource monotone if it
satisfies:

P For any resource ρ ∈ D(A), and any free quantum

1. Strong monotonicity.
instrument E = x∈[m] Ex ⊗ |x⟩⟨x| ∈ F(A → BX)
X
M ρA ⩾ px M σxB .

(10.11)
x∈[m]

where {px }x∈[m] and {σxB }x∈[m] are defined in (10.7).

2. Convexity. For any ensemble of states {px , ρx }x∈[m] in D(A)

X X
M px ρ x ⩽ px M (ρx ) . (10.12)
x∈[m] x∈[m]

As we delve further, we will see that the convexity property above is extremely useful
from a mathematical perspective in calculating the resource monotone for a specific state.
Concurrently, a common physical interpretation of convex measures is that the process of
mixing states does not result in an increase in the resource quantity. However, it’s important
to be cautious in drawing parallels between this mathematical notion of convexity and the
physical process of mixing states, as the latter typically involves discarding information. We
have previously discussed this important distinction in Section 9.1.3. Additionally, it’s worth
noting that in QRTs where a freely available classical (flag) basis does not exist, and thus
strong monotonicity is not a relevant concept, convex resource measures will be referred to
as resource monotones.

Subadditivity
Some resource measures have additional properties that are mathematically convenient. One
of such properties is subadditivity. A resource measure M is said to be subadditive if for any
ρ ∈ D(A) and σ ∈ D(B),
M(ρ ⊗ σ) ⩽ M(ρ) + M(σ) . (10.13)
While subadditivity is a natural property to expect from a resource measure, it will not
hold for all measures in a general QRT. In particular, we will see examples of that when we
discuss superactivation.

Additivity
An even stronger property of a resource measure is additivity. That is, M is said to be
additive when equality holds in (10.13) for all states. While most resource measures do not
satisfy this property, there exists a procedure known as regularization that allows for the

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

10.1. DEFINITIONS AND PROPERTIES OF RESOURCE MEASURES 453

general construction of measures that are additive on multiple copies of the same state. We
have already encountered this procedure implicitly in few places of previous chapters. The
regularization of a resource measure M is defined for all ρ ∈ D(A) as
1
Mreg (ρ) := lim M ρ⊗n ,

(10.14)
n→∞ n

provided the limit exists. In the following exercise you show that the limit above exists if M
satisfies a weaker form of subadditivity.
Exercise 10.1.1. Show that the limit in (10.14) exists if M satisfies for all n, m ∈ N and
any density matrix ρ
M ρ⊗(m+n) ⩽ M ρ⊗m + M ρ⊗n .

(10.15)
Hint: Use Exercise 6.4.2.

Asymptotic Continuity
It’s a reasonable expectation for any resource measure with physical significance to exhibit
continuity. This expectation stems from the idea that if one quantum state is a slight per-
turbation of another, their resource
S contents should be very similar. However, it’s important
to note that a function f : A D(A) → R+ satisfying the following condition:

|A|∥ρ − σ∥1 ⩽ f (ρ) − f (σ) ⩽ |A|2 ∥ρ − σ∥1 ∀ ρ, σ ∈ D(A) , (10.16)

is indeed continuous. Yet, in the context of very large dimensions (i.e., |A| ≫ 1), this type of
continuity may not be practically useful. This is because, for the difference |f (ρ)−f (σ)| to be
small, ρ and σ need to be so closely aligned that they are virtually identical for all practical
purposes. Therefore, a more robust notion of continuity, known as asymptotic continuity,
is often considered. Asymptotic continuity is especially pertinent in the realm of large
dimensions. It limits the dependence on dimension to a logarithmic scale, thereby providing
a more practical and realistic measure of continuity when dealing with high-dimensional
quantum states. This concept is particularly useful in assessing the continuity of resource
measures in quantum systems where the dimensionality plays a significant role.

Asymptotic Continuity
Definition 10.1.3. A resource measure M is said to be asymptotically continuous if
for any ρ, σ ∈ D(A), and ε := 21 ∥ρ − σ∥1 ,

M(ρ) − M(σ) ⩽ f (ε) log |A| , (10.17)

where f : R → R is some continuous function, independent on the dimensions, and

satisfies limε→0+ f (ε) = 0.

Note that the above notion of continuity is stronger than regular notion of continuity
in the sense that the right-hand side of (10.17) depends on the dimension through a log

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

454 CHAPTER 10. QUANTIFICATION OF QUANTUM RESOURCES

function. This in particular implies that the regularization of M, if exists, is bounded. To

see why, note that if M is an asymptotically continuous resource measure then for any n ∈ N

1 ⊗n
1 ⊗n
1 n 1 ⊗n ⊗n
M ρ − M σ ⩽ log(|A| )f ∥ρ − σ ∥1
n n n 2

1 ⊗n ⊗n
=f ∥ρ − σ ∥1 log |A|
2
1 ⊗n
2
∥ρ − σ ⊗n ∥1 ⩽ 1 −−−−→ ⩽ max f (ε) log |A| .
0⩽ε⩽1

Moreover, since f is continuous, max0⩽ε⩽1 f (ε) := c < ∞. Hence, taking σ ∈ F(A) to be free
we get from the above equation that for all n ∈ N
1
M ρ⊗n ⩽ c log(|A|) .

(10.18)
n
Hence, taking the limit n → ∞ we get Mreg (ρ) < ∞.

Exercise 10.1.2. Let f : ∪A D(A) → R+ be a function that satisfy (10.16), and suppose
there exists a state σ ∈ D(A) such that limn→∞ n1 f (σ ⊗n ) < ∞. Show that for all other
ρ ∈ D(A) we must have
1
lim f ρ⊗n = ∞.

(10.19)
n→∞ n

Hint: Prove first that ∥ρ⊗n − σ ⊗n ∥1 ⩾ ∥ρ − σ∥1 for all n ∈ N.

If the set of free states, F(A), contains a full rank state for any system A, then one can
define a slightly weaker version of asymptotic continuity that will be very useful for our
study, since most QRTs have this property.

Asymptotic Continuity (Alternative Definition)

Definition 10.1.4. A resource measure M is said to be asymptotically continuous if
for all ρ, σ ∈ D(A), and ε := 21 ∥ρ − σ∥1 ,

M(ρ) − M(σ) ⩽ f (ε) log min η −1 ∞

(10.20)
η∈F(A)

where f : R → R is some continuous function, independent on the dimensions, and

satisfies limε→0+ f (ε) = 0.

Observe that any density matrix η ∈ D(A) satisfies ∥η −1 ∥∞ ⩾ |A|. Therefore, the above
notion of asymptotic continuity is a weaker one than the version given in Definition 10.1.3.
On the other hand, if the QRT F has the property that there exists a constant 0 < c < ∞,
independent of the dimensions, such that

min ∥η −1 ∥∞ ⩽ c|A| (10.21)

η∈F(A)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

10.2. DISTANCE-BASED RESOURCE MEASURES 455

for any choice of system A (and c is independent on |A|) then the two notions of asymptotic
continuity become equivalent. Since all the QRTs studied in this book satisfies the above
condition, we will use these two notions of asymptotic continuity interchangeably.
Exercise 10.1.3. Let F be a QRT in which the maximally mixed state is free. Show that
the two notions of asymptotic continuity coincide in this case.
Asymptotic continuity is a property that is extensively utilized in QRTs, especially in
the asymptotic regime. Functions that are asymptotically continuous often incorporate the
von Neumann entropy or the Umegaki relative entropy. This reliance is partly because the
Umegaki relative entropy is the only asymptotically continuous relative entropy, making it
a unique and pivotal tool in QRTs. The proof of this uniqueness theorem, which establishes
the singular nature of the Umegaki relative entropy in terms of asymptotic continuity, is
an important aspect of these theories. However, we will delve into the details of this proof
later in Section 11.4. For now, our focus will shift to introducing key examples of resource
measures. These examples will provide a practical illustration of how the theoretical concepts
discussed above applied in QRTs.

10.2 Distance-Based Resource Measures

In this section we introduce a general distance-based recipe for constructing resource mea-
sures in QRTs. The idea is to quantify the amount of a resource in a quantum state by “how
far” it is from the set of free states. In Chapter 5 we saw several examples of well-defined
measures that satisfy the mathematical requirements of distance between two density matri-
ces. We also saw that from an operational perspective, the property of monotonicity under
quantum channels (i.e. data processing inequality) offers a more useful foundation for quan-
tifying distance than standard metric space approaches. We will therefore employ quantum
divergences to define resource measures.

Divergence-Based Resource Measure

Definition 10.2.1. Let F be a QRT, and let D be a quantum divergence. The
D-divergence of a resource is the function

D(ρ∥F) := inf D (ρ∥σ) (10.22)

σ∈F(A)

Moreover, if D(·∥·) is a quantum relative entropy then D(·∥F) is called a relative

entropy of a resource.

Remark. In this book, we consistently regard the set of free states, F(A), as a closed and
compact set. Consequently, the infimum in (10.22) can be substituted with a minimum. This
means that there exists an optimal state σ ⋆ ∈ F(A) which fulfills the following equation:

D(ρ∥F) = D ρ σ ⋆ .

(10.23)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

456 CHAPTER 10. QUANTIFICATION OF QUANTUM RESOURCES

The state σ ⋆ is called a closest free state (CFS); see Fig. 10.1 for an illustration.

Figure 10.1: The closest free state (CFS).

Remarkably, the two conditions that a quantum divergence has to satisfy, namely, DPI
and normalization, are sufficient to guarantee that D(·∥F) is a resource measure. Indeed,
D(·∥F) is non-negative since D is non-negative, and if ρ ∈ F(A) then D(ρ∥F) = 0. To see the
monotonicity of D(·∥F) under free operations observe that for any E ∈ F(A → B) and any
ρ ∈ D(A) we have
D E A→B (ρA ) F := inf D E A→B (ρA ) ω B

ω∈F(B)
A→B A
(ρ ) E A→B (σ A )

restricting ω = E(σ) −−−−→ ⩽ inf D E
σ∈F(A)
(10.24)
DPI→ ⩽ inf D ρA σ A

σ∈F(A)

= D ρA F .

Hence, D(·∥F) is indeed a resource measure.

Several of the properties of the divergence D carry over to D(·∥F). For example, suppose
that D is faithful, which is the case for almost all quantum relative entropies. Then, also
D(·∥F) is faithful; i.e. for any ρ ∈ D(A),
D(ρ∥F) = 0 ⇐⇒ ρ ∈ F(A) . (10.25)
Indeed, if D(ρ∥F) = 0 it means that there exists a σ ∈ F(A) such that D(ρ∥σ) = 0, but this
is possible only if ρ = σ which means that ρ ∈ F(A).
Also subadditivity carry over from D to D(·∥F). To see this, suppose that D is subadditive
so that for any ρ1 , σ1 ∈ D(A) and σ1 , σ2 ∈ D(B)

D ρ1 ⊗ ρ2 σ1 ⊗ σ2 ⩽ D(ρ1 ∥σ1 ) + D(ρ2 ∥σ2 ) . (10.26)
Then,
D ρA B
D ρA A AB

1 ⊗ ρ2 F = inf 1 ⊗ ρ2 σ
σ∈F(AB)

D ρA B A B

restricting σ = σ1 ⊗ σ2 −−−−→ ⩽ inf 1 ⊗ ρ2 σ1 ⊗ σ2
σ1 ∈F(A)
σ2 ∈F(B) (10.27)
ρA σ1A ρB σ2B

subadditivity of D −−−−→ ⩽ inf D 1 + inf D 2
σ1 ∈F(A) σ2 ∈F(B)

= D ρA B

1 F + D ρ2 F .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

10.2. DISTANCE-BASED RESOURCE MEASURES 457

If D is a relative entropy (in particular, it is additive) then D(·∥F) is not necessarily

additive since the restriction σ AB = σ1A ⊗ σ2B in the equation above can lead to a strict
inequality D(ρ1 ⊗ ρ2 F) < D(ρ1 ∥F) + D(ρ2 ∥F). Still, subadditive resource measures can be
regularized (see Exercise 10.1.1) so that the limit in the function

1
Dreg (ρ∥F) := lim D ρ⊗n F

(10.28)
n→∞ n

exists. Note that Dreg (·∥F) is at least weakly additive in the sense that Dreg ρ⊗m F =

mDreg (ρ∥F) for any integer m.

If the set F(A) is convex, and the quantum divergence D is jointly convex, then the
resulting D-divergence of a resource is convex as well. To see this, let {px , ρx }x∈[m] be an
ensemble of quantum states in D(A), and for each x ∈ [m] let σx be the corresponding CFS
of ρx . We then have
X X
D px ρx F = inf D px ρx σ
σ∈F(A)
x∈[m] x∈[m]
X X
−−−−→ ⩽ D px ρ x px σx
X
Taking σ = px σx
x∈[m]
x∈[m] x∈[m]
X (10.29)
Joint convexity→ ⩽ px D(ρx ∥σx )
x∈[m]
X
Each σx is CFS −−−−→ = px D(ρx ∥F) .
x∈[m]

10.2.1 The Umegaki Relative Entropy of a Resource

The Umegaki relative entropy of a resource (sometime we will call it in short the relative
entropy of a resource) is a resource measure that plays a major role in QRTs. It demonstrates
that many seemingly unrelated properties of physical systems can be characterized with the
same mechanism of resource theories. For instance, the entropy of entanglement that we will
encounter in Chapter 12, the free energy in quantum thermodynamics (Chapter 17), and the
various communication capacities of a quantum channel (to appear in volume 2), all can be
characterized as a relative entropy of a resource.
The relative entropy of a resource is a D-divergence of a resource, where D is the Umegaki
relative entropy. It is defined as

D (ρ∥F) := inf D (ρ∥σ) ∀ρ ∈ D(A) (10.30)

σ∈F(A)

where the Umegaki relative entropy D(ρ∥σ) := Tr[ρ log ρ] − Tr[ρ log σ], and the infimum is
taken over all free states σ ∈ F(A). Under the construction of D-divergence of a resource,
strong monotonicity (10.11) is not guaranteed to be satisfied, but for the relative entropy of
a resource this is indeed the case.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

458 CHAPTER 10. QUANTIFICATION OF QUANTUM RESOURCES

Theorem 10.2.1. Let D be the Umegaki relative entropy and F be a convex QRT.
Then, the relative entropy of a resource, D(·∥F), is a resource monotone.

Proof. Since F is convex, and since the Umegaki relative entropy is jointly convex (see
Exercise 8.7.8), it follows from (10.29) that D(·∥F) is convex. It is therefore left to show
that D(·∥F) satisfies the strong monotonicity property. Consider a free quantumP instrument
X
P
E = x∈[m] Ex ⊗ |x⟩⟨x| ∈ F(A → BX), with each Ex ∈ CP(A → B), and x∈[m] Ex is
trace-preserving. For any resource state ρ ∈ D(A), denote σ BX := x∈[m] px σxB ⊗ |x⟩⟨x|X ,
P

where px := Tr[ExA→B (ρA )] and σxB := p1x ExA→B (ρA ). Then,

D ρA F ⩾ D E A→BX ρA F = D σ BX F = min D σ BX ω BX ,

(10.31)
ω∈F(BX)

where we used thePmonotonicity under free operations of the relative entropy of a resource.
Denoting ω BX := x∈[m] qx ωxB ⊗ |x⟩⟨x|X we continue
X X
A
px σxB X
qx ωxB X

D ρ F ⩾ min D ⊗ |x⟩⟨x| ⊗ |x⟩⟨x|
ω∈F(BX)
x∈[m] x∈[m]
X
px D σxB ωxB + D(p∥q)

Exercise 8.7.8→ = min
ω∈F(BX)
x∈[m]
X
px D σxB ωxB

D(p∥q) ⩾ 0 −−−−→ ⩾ min
{ωx }⊂F(B)
x∈[m]
X X
px min D σxB ω B = px D σxB F .

=
ω∈F(B)
x∈[m] x∈[m]

This completes the proof.

So far we learned that the relative entropy of a resource is a faithful, subadditive resource
monotone assuming F is a QRT whose set of free states F(A) is convex. The final property
that we prove is that the Umegaki relative entropy of a resource is also asymptotically
continuous. Later on, we will see that this measure is the only asymptotically continuous
relative entropy of a resource, making the Umegaki relative entropy of a resource very unique.
Moreover, from the following chapters it will follow that asymptotic continuity has several
important applications in QRTs.
The relative entropy of a resource is not always bounded. As a very simple example,
suppose the set of free states F(A) consists of only one pure state |0⟩⟨0|. In this case, we get
that
D(|1⟩⟨1|∥F) = ∞ . (10.32)
Hence, for such a (pathological) example the relative entropy of a resource is not bounded,
and in particular, cannot be asymptotically continuous. However, in most QRTs, the set of

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

10.2. DISTANCE-BASED RESOURCE MEASURES 459

free states F(A) contains a full rank state. If such a state exists, say η, then since η −1 exists
and satisfies η −1 ⩽ ∥η −1 ∥∞ I A it follows that

D(ρ∥F) ⩽ D(ρ∥η) = Tr[ρ log ρ] − Tr[ρ log η]

Tr[ρ log ρ] < 0→ ⩽ −Tr[ρ log η] = Tr ρ log(η −1 )

log is an operator monotone→ ⩽ Tr ρ log ∥η −1 ∥∞ I A

= log ∥η −1 ∥∞ .

For example, if the set of free states contains the maximally mixed state then D ρA F ⩽
log |A|.

Asymptotic Continuity of the Relative Entropy of a Resource

Theorem 10.2.2. Let F(A) be a closed and convex set, ρ, σ ∈ D(A), and set
ε := 21 ∥ρ − σ∥1 . If
κ := max D(ω∥F) < ∞ (10.33)
ω∈D(A)

then
ε
D(ρ∥F) − D(σ∥F) ⩽ εκ + (1 + ε)h (10.34)
1+ε
where h(x) = −x log x − (1 − x) log(1 − x) is the binary Shannon entropy.

Proof. Let ρ, σ ∈ D(A) and write ρ − σ = ε(ω+ − ω− ) as in (5.170) with ω± := 1ε (ρ − σ)± ,

and with ε := 12 ∥ρ − σ∥1 being the trace distance between ρ and σ. Set t := 1/(1 + ε) so that

γ := tρ + (1 − t)ω− = tσ + (1 − t)ω+ , (10.35)

where for the second equality we used (5.170). The key idea of the proof is to find lower and
upper bounds for D(γ∥F) in terms of D(γ∥F), D(σ∥F), and κ.
Since F(A) is convex, we saw above that the relative entropy of a resource is convex, so
that
D(γ∥F) ⩽ tD(σ∥F) + (1 − t)D(ω+ ∥F)
(10.36)
⩽ tD(σ∥F) + (1 − t)κ .
To get a lower bound, let η be such that D(γ∥F) = D(γ∥η). Then, from the definition of
the Umegaki relative entropy we have

D(γ∥F) = −H(γ) − Tr[γ log η]

(10.35)→ = −H tρ + (1 − t)ω− − tTr[ρ log η] − (1 − t)Tr[ω− log η] . (10.37)

From Corollary 7.5.1 we have that the first term on the right-hand side above satisfies

−H (tρ + (1 − t)ω− ) ⩾ −tH(ρ) − (1 − t)H (ω− ) − h (t) . (10.38)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

460 CHAPTER 10. QUANTIFICATION OF QUANTUM RESOURCES

Substituting this into the previous equation gives

D(γ∥F) ⩾ −t H (ρ) + Tr[ρ log η] − (1 − t) H (ω− ) + Tr[ω− log η] − h (t)
= tD(ρ∥η) + (1 − t)D(ω− ∥η) − h (t) (10.39)
⩾ tD(ρ∥F) − h (t) ,

where in the last line we removed the term (1 − t)D(ω− ∥η) ⩾ 0, and replaced D(ρ∥η) with
D(ρ∥F). Combining the lower bound above with the upper bound in (10.36) we conclude
that
1−t
D(ρ∥F) − D(σ∥F) ⩽ t−1 h (t) + κ
t (10.40)
ε
= (1 + ε)h + εκ .
1+ε
The same upper bound holds for D(σ∥F) − D(ρ∥F) by exchanging between ρ and σ every-
where above. Hence, this completes the proof.

Exercise 10.2.1. Let F be as in Theorem 10.2.2 and suppose further that there exists a free
state η ∈ F(A) that is full rank; i.e. η > 0. Show that there exists a continuous function
f : [0, 1] → R+ , independent on the dimension of A, such that f (0) = 0 and

D(ρ∥F) − D(σ∥F) ⩽ f (ε) log ∥η −1 ∥∞ . (10.41)

Asymptotic Continuity of the Umegaki Relative Entropy

In Def. 10.1.3 we defined asymptotic continuity on resource measures. This definition can be
extended to divergences and relative entropies. Since a relative entropy D takes two density
matrices as its input, we will consider its continuity on the first argument. Moreover, we
saw that relative entropies can be unbounded, so this property needs to be accommodated
into the definition.

Definition 10.2.2. A relative entropy D is said to be asymptotically continuous if

there exists a continuous function f : [0, 1] → R+ such that f (0) = 0, and for all
ρ, ρ′ , σ ∈ D(A), with supp(ρ) ⊆ supp(σ) and supp(ρ′ ) ⊆ supp(σ)

|D(ρ∥σ) − D(ρ′ ∥σ)| ⩽ f (ε) log ∥σ −1 ∥∞ (10.42)

where ε := 21 ∥ρ − ρ′ ∥1 . We emphasize that f is independent of |A|.

We now argue that the Umegaki relative entropy is asymptotically continuous. This is a
simple consequence of Theorem 10.2.2.

Corollary 10.2.1. The Umegaki relative entropy is asymptotically continuous.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

10.2. DISTANCE-BASED RESOURCE MEASURES 461

Proof. Let ρ, ρ′ , σ ∈ D(A) and set ε := 12 ∥ρ − ρ′ ∥1 . Since supp(ρ) ⊆ supp(σ) and supp(ρ′ ) ⊆
supp(σ) we can assume without loss of generality that σ > 0. Let F(A) := {σ} be the
set consisting of σ (i.e., F(A) contains only one density matrix). The set F(A) is trivially
closed and convex. Moreover, note that for this F we get D(ρ∥F) = D(ρ∥σ) and similarly
D(ρ′ ∥F) = D(ρ′ ∥σ). Therefore, applying Theorem 10.2.2 gives

′ ε
|D(ρ∥σ) − D(ρ ∥σ)| ⩽ εκ + (1 + ε)h , (10.43)
1+ε

with
κ = max D(ω∥σ) ⩽ max −Tr[ω log σ] , (10.44)
ω∈D(A) ω∈D(A)

where we droped the term Tr[ω log ω] = −H(ω) as it is negative. Moreover, note that
−Tr[ω log σ] = Tr[ω log(σ −1 )], and since σ −1 ⩽ ∥σ −1 ∥∞ I A we get that

κ ⩽ max Tr[ω log(σ −1 )] ⩽ log ∥σ −1 ∥∞ (10.45)

ω∈D(A)

where we used the operator monotonicity of the log function. This completes the proof (see
Exercise 10.2.2).

Exercise 10.2.2. Show that there exists function f : [0, 1] → R+ , independent on the
dimension of A, such that f (0) = 0 and

−1 ε
ε log ∥σ ∥∞ + (1 + ε)h ⩽ f (ε) log ∥σ −1 ∥∞ . (10.46)
1+ε

Asymptotic Continuity of the von-Neumann Conditional Entropy

The conditional entropy (see Chapter 7) can be viewed (at least mathematically) as a resource
measure in a QRT in which the free operations are conditionally mixing operations (CMO)
channels and the free states are given by
n o
A B
F(AB) = u ⊗ σ : σ ∈ D(B) . (10.47)

Exercise 10.2.3. Show that any conditional entropy H (see Definition 7.2.1) is a resource
measure in the QRT in which F(AB → AB ′ ) = CMO(AB → AB ′ ).

For the case F := CMO we have

D ρAB F = min D ρAB uA ⊗ σ B

σ∈D(B)

cf . (7.132)→ = D ρAB uA ⊗ ρB
(10.48)
(7.117)→ = log |A| − H(A|B)ρ .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

462 CHAPTER 10. QUANTIFICATION OF QUANTUM RESOURCES

Moreover, observe that κ as defined in (10.33) satisfies

κ := max D ω AB F

ω∈F(AB)

= log |A| − max H(A|B)ρ (10.49)

ω∈F(AB)

Theorem 7.3.1→ ⩽ 2 log |A|

Therefore, taking this choice of F in Theorem 10.2.2 yields the following corollary.

Corollary 10.2.2. Let ρ, σ ∈ D(AB) and ε := 21 ∥ρAB − σ AB ∥1 . Then,

ε
|H(A|B)ρ − H(A|B)σ | ⩽ 2ε log |A| + (1 + ε)h (10.50)
1+ε

Exercise 10.2.4. Use Theorem 10.2.2 and the expressions above to prove the corollary.
Exercise 10.2.5. Using the same notations as in the corollary above, show that there exists
a continuous function f : [0, 1] → R+ satisfying limδ→0+ f (δ) = 0 and

H(A|B)ρ ⩾ H(A|B)σ − f (ε) log |A| . (10.51)

Exercise 10.2.6 (Asymptotic Continuity of the von-Neumann Entropy). Let ρ, σ ∈ D(A)

and ε := 12 ∥ρ − σ∥1 .
1. Show that the von-Neumann entropy H satisfies

ε
H(ρ) − H(σ) ⩽ ε log |A| + (1 + ε)h , (10.52)
1+ε
Hint: Consider the QRT in which F(A → A) is the set of all unital channels.
2. Show that there exists a continuous function f : [0, 1] → R+ satisfying limδ→0+ f (δ) = 0
and
H(ρ) ⩾ H(σ) − f (ε) log |A| . (10.53)

10.2.2 The Robustness of a Resource

Given a QRT F, the robustness and global robustness of a resource ρ ∈ D(A) are defined as

ρ + sω
R(ρ) := inf s ⩾ 0 : ∈ F(A) , ω ∈ F(A)
1+s
(10.54)
ρ + sω
Rg (ρ) := inf s ⩾ 0 : ∈ F(A) , ω ∈ D(A) .
1+s

Moreover, if there is no such s ⩾ 0 and ω ∈ F(A) such that ρ+sω

1+s
∈ F(A) then R(ρ) := ∞.
The robustness and global robustness of a resource measure of how robust a resource ρ is to

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

10.2. DISTANCE-BASED RESOURCE MEASURES 463

mixing with noise. In other words, they quantify the ability of the resource to maintain its
usefulness in the presence of disturbances. The term “global” refers to the fact that the noise
can be any density matrix ω ∈ D(A), which represents a wide range of possible disturbances.
However, if we limit the density matrix ω to only represent free states, then the resulting
quantity is called the robustness.
By definition, Rg (ρ) ⩽ R(ρ) (see exercise below) and if ρ ∈ F(A) then R(ρ) = Rg (ρ) = 0
since s above can be taken to be zero. Furthermore, from the following exercise, the converse
of this statement is also true; that is, Rg is faithful.
Exercise 10.2.7. Consider the robustness and global robustness as defined above.
1. Show that Rg (ρ) ⩽ R(ρ) for all ρ ∈ D(A).
2. Show that if F is affine then R(ρ) = ∞ for all ρ ∈ D(A) that is not free.
3. Show that Rg (ρ) = 0 if and only if ρ ∈ F(A).
Exercise 10.2.8. Let ρ ∈ D(A) and suppose R(ρ) < ∞.
1. Show that there exist τ, ω ∈ F(A) such that

ρ = 1 + R(ρ) τ − R(ρ)ω . (10.55)

2. Show that there exist τ ∈ F(A) and ω ∈ D(A) such that

ρ = 1 + Rg (ρ) τ − Rg (ρ)ω . (10.56)

The above decompositions of ρ are sometimes referred to as pseudo-mixtures of states.

Note that from the exercise above it follows that Rg (ρ) can also be expressed as

Rg (ρ) := min s ⩾ 0 : ρ = (1 + s)τ − sω , τ ∈ F(A) , ω ∈ D(A) , (10.57)
so that we can think of the pseudo-mixture in (10.56) as the optimal one achieved with
s = Rg (ρ).

Theorem 10.2.3. The global robustness of a resource, Rg , is a resource measure

satisfying the strong monotonicity property. Moreover, if the set of free states is
convex then Rg is a resource monotone.

Proof. Since we already saw that R Pg (ρ) = 0 for all ρ ∈ F(A), we prove now the strong
monotonicity property. Let E := x∈[m] Ex ⊗ |x⟩⟨x| ∈ F(A → BX) be a free quantum
instrument (if F(A → BX) is an empty set then strong monotonicity holds trivially). Let
ρ ∈ D(A) be as in (10.56) and for all x ∈ [m] denote by
1 1
σx := Ex (ρ) = 1 + Rg (ρ) Ex (τ ) − Rg (ρ)Ex (ω)
Tr[Ex (ρ)] Tr[Ex (ρ)]
(10.58)
Ex (τ ) Ex (ω)
= (1 + s) −s ,
Tr[Ex (τ )] Tr[Ex (ω)]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

464 CHAPTER 10. QUANTIFICATION OF QUANTUM RESOURCES

where s := Tr[E x (ω)]

R (ρ). On the other hand, the state σx can also be expressed in its
Tr[Ex (ρ)] g
optimal pseudo-mixture as in (10.56) via

σx = 1 + Rg (σx ) τx − Rg (σx )ωx (10.59)

for some ωx ∈ D(B) and τx ∈ F(B). Therefore, from the two expressions above for σx , and
the optimality of the pseudo-mixture in (10.59), we get that Rg (σx ) is no greater than s;
that is,
Tr[Ex (ω)]
Rg (σx ) ⩽ Rg (ρ) . (10.60)
Tr[Ex (ρ)]
From the above inequality we conclude that
X X
Tr[Ex (ρ)]Rg (σx ) ⩽ Tr[Ex (ω)]Rg (ρ) = Rg (ρ) , (10.61)
x∈[m] x∈[m]

P
where in the last equality we used the fact that x∈[m] Ex is trace-preserving. This completes
the proof of strong monotonicity.
Next, suppose that F(A) is convex, and let {px , ρx }x∈[m] be an ensemble of quantum
states in D(A). Express each ρx as a pseudo-mixture

ρx = 1 + Rg (ρx ) τx − Rg (ρx )ωx (10.62)
P
for some τx ∈ F(A) and ωx ∈ D(A). Denote by ρ̄ := x∈[m] px ρx . Then, from the equation
above we have
X
ρ̄ = px 1 + Rg (ρx ) τx − Rg (ρx )ωx = (1 + r)τ − rω (10.63)
x∈[m]

where
X 1 X 1 X
r := px Rg (ρx ), τ := px 1 + Rg (ρx ) τx , ω := px Rg (ρx )ωx . (10.64)
1+r r
x∈[m] x∈[m] x∈[m]

Note that τ ∈ F(A) since each τx ∈ F(A) and F(A) is convex. Since the pseudo-mixture
in (10.63) is not necessarily the optimal one we conclude that
X
Rg (ρ̄) ⩽ r = px Rg (ρx ) . (10.65)
x∈[m]

That is, Rg is a convex function. This completes the proof.

Exercise 10.2.9. Let F be a convex QRT, and let R be the corresponding robustness mea-
sure. Prove that R is a resource monotone. Hint: Follow similar steps as in the proof of
Theorem 10.2.3.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

10.2. DISTANCE-BASED RESOURCE MEASURES 465

The Logarithmic Global Robustness of a Resource

The logarithmic global robustness is another important distance based measure given by
replacing D in (10.30) with the max relative entropy Dmax . It is given by

Dmax (ρ∥F) := min Dmax (ρ∥σ) (10.66)

σ∈F(A)

The terminology of Dmax (ρ∥F) is due to the following connection between Dmax (ρ∥F) and
Rg .

Lemma 10.2.1. Let F be a QRT. Then, for any ρ ∈ D(A)

Dmax (ρ∥F) = log 1 + Rg (ρ) . (10.67)

Proof. By definition of Dmax and the logarithmic global robustness of a resource, we have
for all ρ ∈ D(A)
n o
Dmax (ρ∥F) = min log t : tσ ⩾ ρ , σ ∈ F(A)
n o (10.68)
= min log t : tσ − ρ = (t − 1)ω , σ ∈ F(A), ω ∈ D(A), t ⩾ 1 ,

since tσ − ρ ⩾ 0 implies that tσ − ρ = (t − 1)ω for some density matrix ω. Denoting by

s = t − 1 we continue
n o
Dmax (ρ∥F) = min log(1 + s) : (1 + s)σ − ρ = sω , σ ∈ F(A), ω ∈ D(A), s ⩾ 0
n ρ + sω o
Isolating σ→ = min log(1 + s) : ∈ F(A), ω ∈ D(A), s ⩾ 0
1+s
= log 1 + Rg (ρ) .
(10.69)
This completes the proof.

10.2.3 The Hypothesis Testing and α-Relative Entropy of a Re-

source
Let F be a quantum resource theory such that the set of free states F(A) ⊆ D(A) is closed
and convex. We define the hypothesis testing measure of a resource as
ε ε
Dmin (ρ∥F) := min Dmin (ρ∥ω) ∀ ρ ∈ D(A) . (10.70)
ω∈F(A)

For any α ∈ [0, 2], we also define the α-Rényi relative entropy of a resource as

Dα (ρ∥F) := min Dα (ρ∥ω) ∀ ρ ∈ D(A) . (10.71)

ω∈F(A)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

466 CHAPTER 10. QUANTIFICATION OF QUANTUM RESOURCES

Note that the case α = 0 corresponds to Dmin (ρ∥F). The special case of α = 1 is Dα=1 (ρ∥F) =
D(ρ∥F) (the relative entropy of a resource). Since the Petz quantum Rényi divergence,
Dα (·∥·), is non-decreasing with α, also Dα (·∥F) is not decreasing in α. The continuity of
Dα (·∥·) in α also carries over to Dα (·∥F) including the continuity at α = 1. This result is a
simple consequence of Sion’s minimax theorem.
Lemma 10.2.2 (Sion’s Minimax Theorem). Let X be a compact convex subset of a linear
topological space, and let Y be a convex subset of a topological space. Let f : X × Y →
R ∪ {−∞, +∞} be a real valued function satisfying
1. For every fixed y ∈ Y , the function x 7→ f (x, y) is lower semicontinuous and quasi-
convex on X.
2. For every fixed x ∈ X, the function y 7→ f (x, y) is upper semicontinuous and quasi-
concave on Y.
Then
min sup f (x, y) = sup min f (x, y) . (10.72)
x∈X y∈Y y∈Y x∈X

Lemma 10.2.3. Let ρ ∈ D(A) be a fixed density matrix, and define g : [0, 2] → R+
as g(α) := Dα (ρ∥F) for all α ∈ [0, 2]. Then, g(α) is a continuous function.

Proof. Let β ∈ (0, 2). Since Dα is monotonically non-decreasing in α we have that

lim g(α) = lim+ min Dα (ρ∥ω)
α→β + α→β ω∈F(A)

= inf min Dα (ρ∥ω)

α∈(β,2) ω∈F(A)
(10.73)
= min inf Dα (ρ∥ω)
ω∈F(A) α∈(β,2)

= min Dβ (ρ∥ω) = g(β) .

ω∈F(A)

When approaching β from below observe that

lim g(α) = lim− min Dα (ρ∥ω)
α→β − α→β ω∈F(A)
(10.74)
= sup min Dα (ρ∥ω)
α∈(0,β) ω∈F(A)

In order to switch the order between the sup and min above we need to verify that all the
conditions in Sion’s minimax theorem are satisfied. Indeed, the function f (ω, α) := Dα (ρ∥ω)
has the property that it is continuous in ω (and therefore lower semi-continuous). Moreover,
note that for a fixed α ∈ [0, 2], the function ω 7→ f (ω, α) is a quasi-convex function since for
any t ∈ [0, 1] and ω0 , ω1 ∈ F(A) we have

f (tω0 + (1 − t)ω1 , α) = Dα ρ tω0 + (1 − t)ω1

(6.111)→ ⩽ max Dα (ρ∥ω0 ), Dα (ρ∥ω1 ) (10.75)

= max f (ω0 , α), f (ω1 , α) .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

10.2. DISTANCE-BASED RESOURCE MEASURES 467

On the other hand, for a fixed ω ∈ F(A) the function α 7→ f (ω, α) is a continuous function
(and therefore upper semi-continuous) and quasi-concave since for any t ∈ [0, 1] and α0 , α1 ∈
[0, 2] we have
f (ω, tα0 + (1 − t)α1 ) = Dtα0 +(1−t)α1 (ρ∥ω)
monotonicity of Dα in α→ ⩾ Dmin{α0 ,α1 } (ρ∥ω) (10.76)

= min f (ω, α0 ), f (ω, α1 ) .
Therefore, f (ω, α) satisfies all the requirements of Sion’s minimax theorem. This means that
we can switch the order of the sup and min in (10.74) to get

lim g(α) = min sup Dα (ρ∥ω)

α→β − ω∈F(A) α∈(0,β)
(10.77)
continuity of Dα in α→ = min Dβ (ρ∥ω) = g(β) .
ω∈F(A)

This completes the proof of the lemma.

Using the bounds (8.201) and (8.202) we get that

Corollary 10.2.3. Let ε ∈ (0, 1) and ρ, σ ∈ D(A).

1. For all α ∈ (1, 2]

ε α 1
Dmin (ρ∥F) ⩽ Dα (ρ∥F) + log (10.78)
α−1 1−ε

2. For all α ∈ (0, 1)

ε α h(α) 1
Dmin (ρ∥F) ⩾ Dα (ρ∥F) + − log (10.79)
1−α α ε

The regularized α-Rényi relative entropy of a resource is defined as

1
Dαreg (ρ∥F) := lim Dα ρ⊗n F

∀ ρ ∈ D(A) . (10.80)
n→∞ n

The limit above exists since the α-Rényi relative entropy of a resource is subadditive (see
Exercise 10.1.1). From Corollary 10.2.3 it follows that
1 ε
ρ⊗n F ⩽ lim+ Dαreg (ρ∥F) ,

lim sup Dmin (10.81)
n→∞ n α→1

and similarly
1 ε
Dmin ρ⊗n F ⩾ lim− Dαreg (ρ∥F) .

lim inf (10.82)
n→∞ n α→1

Observe that in general we do not know if Dαreg (ρ∥F) is continuous at α = 1, but we can
show continuity from the right.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

468 CHAPTER 10. QUANTIFICATION OF QUANTUM RESOURCES

Lemma 10.2.4. Let ρ ∈ D(A) and let F be a quantum resource theory admitting a
tensor product structure, and has the property that F(A) ⊆ D(A) is closed and
convex. Then,
lim+ Dαreg (ρ∥F) = Dreg (ρ∥F) . (10.83)
α→1

Proof. Observe that

1
lim+ Dαreg (ρ∥F) = Dα ρ⊗n F

inf inf
α→1 α∈(1,∞) n∈N n
1
Dα ρ⊗n F

= inf inf
n∈N α∈(1,∞) n (10.84)
1
Lemma 10.2.3→ = sup D ρ⊗n F

n∈N n
= Dreg (ρ∥F) .

We therefore conclude that

1 ε
ρ⊗n F ⩽ Dreg (ρ∥F) .

lim sup Dmin (10.85)
n→∞ n

In several resource theories the opposite inequality also holds, but in general we do not know
if the limit limα→1− Dαreg (ρ∥F) equals to Dreg (ρ∥F). At the time of writing this book it is
a big open problem in the field to determine under what conditions the inequality in the
equation above can be replaced with an equality.

10.3 Computation of the Relative Entropy of a Resource

The computation of the relative entropy of a resource can be hard, depending of course
on the complexity of the set F(A). As we will see in the next Chapter, in entanglement
theory its computation belong to a class of problems knowns as NP hard. If F(A) is closed
and convex, some techniques from convex analysis can be employed to compute the relative
entropy. Particularly, in this case the converse problem can be computed efficiently as we
discuss now.

10.3.1 The Converse Problem

Let σ ∈ F(A) be the free state that optimize (10.30) for a given resource state ρ ∈ D(A). We
will see shortly that, as it suggests intuitively, σ must be on the boundary of the set F(A).
The boundary of F(A) is define as
n o
∂F(A) := ω ∈ D(A) : ∀ε > 0 ∃η, ζ ∈ Bε (ω) s.t. η ∈ F(A) and ζ ̸∈ F(A) (10.86)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

10.3. COMPUTATION OF THE RELATIVE ENTROPY OF A RESOURCE 469

where Bε (ω) is the set of all density matrices that are ε-close (in trace distance) to ω. That
is, in any neighbourhood of a state on the boundary of F(A) there exists at least one state
in F(A) and at least one state not in F(A).
The state σ can be thought of as the closest free state (CFS) to ρ, when we measure the
“distance” with the relative entropy. As we already mentioned, the computation of σ can
be very hard. However, for a given state ω ∈ ∂F(A) we can compute all the resource states
in D(A) for which ω is the CFS. This converse problem has several applications and can be
used to produce examples of resource states for which one knows the value of the relative
entropy of a resource.
Note that if 0 < ρ ̸∈ F(A) and D(ρ∥F) = D(ρ∥σ) (i.e. σ is a CFS) then σ > 0 or
otherwise D(ρ∥F) = ∞. For simplicity of the exposition here, we will always assume that σ
has full rank, and refer the interested reader to the end of this chapter for more details and
references on the singular case. We start by showing that if 0 < σ ∈ F(A) is a CFS then
σ ∈ ∂F(A).

Theorem 10.3.1. Let 0 < σ ∈ F(A) be a closest free state of a resource state
ρ ∈ D(A). Then, σ ∈ ∂F(A).

Proof. Consider the following Taylor expansion of the logarithmic function. This expansion
is based on the divided difference approach discussed in Appendix D.1. For any t > 0,
0 < σ ∈ D(A), and η ∈ Herm(A) we have

log(σ + tη) = log σ + tLσ (η) + O(t2 ) (10.87)

where Lσ : Herm(A) → Herm(A) is a linear operator defined as follows. Let {px }x∈[m] (with
m := |A|) be the eigenvalues of σ, and let {ηxy }x,y∈[m] be the matrix components of a matrix
η ∈ Herm(A) in the eigenbasis of σ. Then, the matrix components of Lσ (η) are given by
log px − log py
Lσ (η) xy := ηxy ∀x, y ∈ [m] , (10.88)
px − py

where the case px = py is understood in terms of the limit

log px − log py log px − log py 1

:= lim = . (10.89)
px − py py →px px − py px

In Exercise 10.3.1 below you show that Lσ is a linear self-adjoint map that satisfies Lσ (σ) = I.
Now, suppose by contradiction that 0 < σ ∈ F(A) is a CFS of ρ ∈ D(A), and σ ̸∈ ∂F(A).
This means that σ is in the interior of F(A), and in particular, there exists ε > 0 such that
Bε (σ) does not contain any resource state (i.e. Bε (σ) ⊂ F(A)). Moreover, since σ > 0 it
follows that for any σ ′ ∈ D(A) (i.e. not necessarily free) and small enough |t|, where t ∈ R
can be negative, the state ω := (1 − t)σ + tσ ′ ∈ Bε (A) ⊂ F(A). Hence, for small enough |t|,

D(ρ∥F) = D(ρ∥σ) ⩽ D(ρ∥ω) , (10.90)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

470 CHAPTER 10. QUANTIFICATION OF QUANTUM RESOURCES

since σ is a CFS of ρ. The above expression is equivalent to f (t) ⩽ Tr[ρ log σ] , where

f (t) := Tr ρ log σ + t(σ ′ − σ) .

(10.91)

Since f (0) = Tr[ρ log σ] achieves the maximum value, we must have f ′ (0) = 0. Using (10.87)
we get
f ′ (0) = Tr [ρLσ (σ ′ − σ)]
= Tr ρ Lσ (σ ′ ) − I

(10.92)
′
= Tr [Lσ (ρ)σ ] − 1
Therefore, the condition that f ′ (0) = 0 implies that Tr [Lσ (ρ)σ ′ ] = 1 for all σ ′ ∈ D(A). This
means that Lσ (ρ) = I which is possible only if ρ = σ. But since we assume that ρ ̸∈ F(A)
we get a contradiction. This completes the proof.

Exercise 10.3.1. Let 0 < σ ∈ D(A) and m := |A|.

1. Show that Lσ (σ) = I A .

2. Show that Lσ is a linear self-adjoint map. That is, show that for any η, ζ ∈ Herm(A)

Tr [ηLσ (ζ)] = Tr [Lσ (η)ζ] . (10.93)

3. Show that Lσ is invertible, and its inverse, L−1

σ , is also self-adjoint and is given by

−1 px − py
Lσ (ζ) xy := ζxy ∀ x, y ∈ [m] , ∀ ζ ∈ Herm(A) . (10.94)
log px − log py

The next theorem provides a formula for all the resource states that have the same CFS.
We will use the notation WITF (A) to denote the subset of Herm(A) that consists of all the
normalized resource witnesses of the QRT F. Explicitly,
n o
WITF (A) := η ∈ F(A)∗ : η ̸⩾ 0 , ∥η∥1 = 1 . (10.95)

Note that we normalized the resource witnesses to have a unit trace norm since if η ∈
Herm(A) is a resource witness also aη with 0 < a ∈ R is a resource witness, and for our
purposes it will be sufficient to consider only one representative of the set {aη}a>0 .

Closed Formula for the Relative Entropy of a Resource

Theorem 10.3.2. Let F(A) be convex and 0 < σ ∈ ∂F(A). Then, the set R(σ) of all
resource states in D(A) for which σ is the closest free state is given by
n o
R(σ) = σ − aL−1 σ (η) : η ∈ WIT F (A) , Tr[ση] = 0 , 0 < a ⩽ a max (10.96)

where amax is the largest positive number that satisfies amax L−1
σ (η) ⩽ σ.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

10.3. COMPUTATION OF THE RELATIVE ENTROPY OF A RESOURCE 471

Remark. The conditions Tr[ση] = 0 and a ⩽ amax ensure that the state σ − aL−1 σ (η) is a
density matrix. Indeed, the condition a ⩽ amax ensures that it is positive semidefinite, and
its trace is one since the self-adjointness of L−1
σ gives

Tr L−1
−1
σ (η) = Tr Lσ (I)η = Tr[ση] = 0 . (10.97)

Proof. From the supporting hyperplane theorem, (see Theorem A.6.4) it follows that for any
(fixed) σ ∈ ∂F(A) there exists an Hermitian matrix η ∈ Herm(A) such that

Tr[σ ′ η] ⩾ Tr[ση] ∀ σ ′ ∈ F(A). (10.98)

Moreover, since both σ and σ ′ are normalized, if η satisfies the equation above, also η + aI
satisfies it for any a ∈ R. We will therefore assume without loss of generality that Tr[ση] = 0
which means that Tr[σ ′ η] ⩾ 0 for all σ ′ ∈ F(A); i.e. η is a resource witness (observe that
the condition Tr[ση] = 0 implies that η ̸⩾ 0 since σ > 0). Note also that we can always
normalize η such that ∥η∥1 = 1. Quite often, such a resource witness that satisfies these
three conditions (i.e. Tr[ση] = 0, Tr[σ ′ η] ⩾ 0 for all σ ′ ∈ F(A), and ∥η∥1 = 1) is unique,
although for some special boundary points σ ∈ ∂F(A), there is a cone of such witnesses of
dimension greater than one (see Fig. 10.2).

Figure 10.2: A schematic diagram of free states (red) and resource states (green). Most points
on the boundary, like the points D and E, have a unique supporting hyperplane (which is also the
tangent plane). The point E is the closest free state of all the points on the vertical line from it.
Some of the points, like the points C and F, have more than one supporting hyperplane.The point
F is the closest free state of all the points in the shaded black area. Some points on the boundary,
like the points A and B, can not be a closest free states; for example, separable states of rank 1
(i.e.product states) are on the boundary of separable states, but can never be the closest separable
states of some entangled state.

Let ρ be a resource state in D(A) for which σ is the closest free state. The main idea of
the proof is the observation that η ′ := I A − Lσ (ρ) is a resource witness. To see that, first
observe that
Tr[η ′ σ] = 1 − Tr[σLσ (ρ)]
Lσ is self adjoint→ = 1 − Tr[Lσ (σ)ρ] (10.99)
Lσ (σ) = I A −−−−→ = 1 − Tr[ρ] = 0 .
Moreover, for every σ ′ ∈ F(A), define f (t) as in (10.91), but with non-negative t ∈ [0, 1]
(recall that here σ is a boundary point not in the interior of F(A), so that we can only
conclude that ω := (1 − t)σ + tσ ′ is a free state for non-negative t ∈ [0, 1]). Since σ is the

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

472 CHAPTER 10. QUANTIFICATION OF QUANTUM RESOURCES

closest free state to ρ, we must have that f ′ (0) ⩽ 0 (we cannot conclude that the derivative
is zero since t cannot be negative). From (10.92) we get for all σ ′ ∈ F(A)
0 ⩽ −f ′ (0) = 1 − Tr [Lσ (ρ)σ ′ ]
= Tr I − Lσ (ρ) σ ′

(10.100)
= Tr[η ′ σ ′ ] .
Hence, η ′ is a resource witness. We can then normalize it η := a1 η ′ with a > 0 such that
∥η∥1 = 1. We then conclude from the definition η ′ := I A − Lσ (ρ) that
ρ = L−1 A ′

σ I − η
−1
I A − aL−1

η = aη −−−−→ = Lσ
′
σ (η) (10.101)
L−1
σ (I) = σ −−−−→ = σ − aL−1
σ (η) .

Conversely, suppose ρ = σ −aL−1 σ (η) for some η ∈ WITF (A) and a > 0. We need to show
that D(ρ∥F) = D(ρ∥σ). For this purpose, let σ ′ ∈ F(A) be any free state, and observe that
D(ρ∥σ) ⩽ D(ρ∥σ ′ ) if and only if f (0) ⩾ f (1), where f (t) is defined in (10.91). From the joint
convexity of the relative entropy (and particularly its convexity in the second argument) it
follows that the function f (t) is concave (see Exercise 10.3.2). This means that if f ′ (0) ⩽ 0
then we must have f (0) ⩾ f (1) (Exercise 10.3.2). Now, note that from (10.92)
f ′ (0) = Tr [Lσ (ρ)σ ′ ] − 1
−−−→ = Tr Lσ σ − aL−1
′
σ (η) −
ρ = σ − aL−1 σ (η) σ −1
(10.102)
= −aTr [ησ ′ ]
η is a resource witness→ ⩽ 0 .
Hence, f (0) ⩾ f (1) which is equivalent to D(ρ∥σ) ⩽ D(ρ∥σ ′ ). Since σ ′ was arbitrary state
in F(A), this completes the proof.
The significance of the theorem above is that if for a given resource state ρ we have a
candidate σ that we believe to be a closest free state, then we can check it with the formula
in (10.96). Specifically, what needs to be checked is whether the matrix I − Lσ (ρ) is a
resource witness. We will see how this can be done when we compute the relative entropy
of entanglement on pure bipartite states. We also point out that the techniques used above
are not limited to the Umegaki relative entropy, and similar results can be obtained for the
α-Rényi relative entropy of a resource as defined in (10.71).
Exercise 10.3.2. Let f (t) be the function defined in (10.91).
1. Show that f (t) is concave. Hint: Use the convexity of D(ρ∥σ) in σ (with fixed ρ).
2. Show that if f ′ (0) ⩽ 0 then f (0) ⩾ f (1).
Exercise 10.3.3. Let 0 < ρ ∈ D(A) be a full rank resource state (i.e. ρ ̸∈ F(A)). Show
that the closest free state to ρ is unique. Hint: Let σ ̸= σ ′ be two closest free states, define
tσ + (1 − t)σ ′ , and use the strict concavity of the function f (σ) = Tr[ρ log σ].

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

10.3. COMPUTATION OF THE RELATIVE ENTROPY OF A RESOURCE 473

10.3.2 Exact Formula for Special Cases

In (10.71), for any α ∈ [0, 2], we defined the α-Rényi relative entropy of a resource as

Dα (ρ∥F) := min Dα (ρ∥ω) ∀ ρ ∈ D(A) . (10.103)

ω∈F(A)

This quantity has a simple closed formula if the set of free states is affine (see Sec. 9.3) and
in addition satisfy for any α ∈ [0, 2]

σα
∈ F(A) ∀ σ ∈ F(A) . (10.104)
Tr[σ α ]

Theorem 10.3.3. Let α ∈ [0, 2] and F(A) ⊆ D(A) be an affine set

satisfying (10.104). Then, for all ρ ∈ D(A) we have
1
Dα (ρ∥F) = log ∥∆ (ρα )∥1/α (10.105)
α−1
where ∆ ∈ Pos(A → A) is the self-adjoint RDM associated with the affine set F(A)
(see Definition 9.3.4).

Proof. Observe that for all σ ∈ F(A) we have

Tr ρα σ 1−α = Tr ρα ∆ σ 1−α

= Tr ∆ (ρα ) σ 1−α

(10.106)
= ∥∆ (ρα )∥1/α Tr γ α σ 1−α

where 1/α
∆ (ρα )
γ := h 1/α i (10.107)
Tr ∆ (ρα )

The proof is concluded with the observation that for α ⩽ 1 we have Tr [γ α σ 1−α ] ⩽ 1 (Hölder’s
inequality), and for α > 1 we have Tr [γ α σ 1−α ] ⩾ 1 (reverse Hölder’s inequality), where
equality holds in both cases for σ = γ. Therefore, σ = γ is the optimizer.

Exercise 10.3.4. Let α ∈ [0, 2]. Give a closed expression for Dα (ρ∥F) for the following
cases:

1. ρ ∈ D(A) and F(A) consists of a set of diagonal density matrices in some fixed basis.

2. ρ ∈ D(AB) and F(AB) = {σ A ⊗ uB : σ ∈ D(A)}.

3. ρ ∈ D(A) and F(A) consists of a set of symmetric density matrices (i.e. σ ∈ F(A) if
and only if σ = σ T where the transpose is taken in some fixed basis).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

474 CHAPTER 10. QUANTIFICATION OF QUANTUM RESOURCES

4. G is a unitary group, ρ ∈ D(A), and F(A) consists of the set of G-invariant states
(i.e. σ ∈ F(A) if and only if σ = U σU ∗ for all U ∈ G).

Exercise 10.3.5. Show that the expression given in (10.105) for the α-relative entropy of a
resource can be rewritten as

Dα (ρ∥F) = H1/α ∆(ρα ) − Hα (ρ) (10.108)

where Hα is the Rényi entropy and ρα := ρα /Tr[ρα ]. Conclude that for α = 1

D(ρ∥F) = H ∆(ρ) − H(ρ) . (10.109)

10.4 Smoothing of Resource Measures

Resource measures, by definition, do not necessarily have to be smooth. An illustrative
example is the logarithmic robustness, expressed as:

Dmax (ρ∥F) = min Dmax (ρ∥σ) . (10.110)

σ∈F(A)

This lack of smoothness in the logarithmic robustness can be traced back to the discontinuity
present in the max relative entropy. Contrasting with the Umegaki relative entropy, Dmax ,
does not exhibit asymptotic continuity. In fact, more broadly, it is not continuous with
respect to its first argument. For example, take

1 1 1 1
σ = |0⟩⟨0| + |1⟩⟨1| and ρε = − ε |0⟩⟨0| + |1⟩⟨1| + ε|2⟩⟨2| . (10.111)
2 2 2 2

For these choices we have D(ρε ∥σ) = ∞ for all ε ∈ (0, 1/2] whereas D(ρε=0 ∥σ) = 0. On the
other hand, in the laboratory, the preparation of a physical system in a state ρ always results
in some error such that the intended state ρ is differ (in trace distance) from the prepared
state by some small ε > 0. Therefore, discontinuous resource measures are unlikely to have
practical physical significance unless some smoothing process has been applied to them.In
the following definition we provide a simple method to smooth a resource measure.

Definition 10.4.1. Let F be a QRT, M be a resource measure, and ε > 0. The

ε-smoothed version of M is the function

Mε (ρ) := ′ min M(ρ′ ) . (10.112)

ρ ∈Bε (ρ)

Remark. In the definition we employed the notation Bε (ρ) := ρ′ ∈ D(A) : 12 ∥ρ′ − ρ∥1 ⩽ ε

to denote the “ball” of states ρ′ that are ε-close to ρ in trace distance. The rationale for
choosing the minimum over the ball Bε (ρ) stems from the intention to identify the minimum
amount of resource present within this ball. This approach ensures that the value Mε (ρ)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

10.4. SMOOTHING OF RESOURCE MEASURES 475

represents the minimum guaranteed resource level in the system, even when our knowledge
is limited to the state of the system being ε-close to ρ. Essentially, this method accounts
for uncertainty in the system’s state by considering the least amount of resource that can
be confidently ascribed to states within an ε-radius of ρ. This approach is both cautious
and practical, as it provides a conservative estimate of the resource quantity in the practical
situations where exact state information is not available.

Lemma 10.4.1. Let M be a resource measure. Then, its smoothed version, Mε , is

also a resource measure for all ε > 0.

Proof. By definition, Mε is non-negative since M is non-negative. On the other hand, for

any ρ ∈ F(A) we have Mε (ρ) ⩽ M(ρ) = 0, were we took ρ′ = ρ in (10.112). Thus, we must
have Mε (ρ) = 0 for all ρ ∈ F(A). It is therefore left to prove the monotonicity of Mε .
Let N ∈ F(A → B) and ρ ∈ D(A). Then,
Mε N (ρ) = min M(σ)

σ∈Bε (N (ρ))
n 1 o
= min M(σ) : ∥σ − N (ρ)∥1 ⩽ ε, σ ∈ D(B)
2
n 1 o
Restricting σ = N (ρ′ )→ ⩽ min M N (ρ′ ) : ∥N (ρ′ ) − N (ρ)∥1 ⩽ ε, ρ′ ∈ D(A)
2
n 1 o
Monotonicity of M→ ⩽ min M(ρ′ ) : ∥N (ρ′ ) − N (ρ)∥1 ⩽ ε, ρ′ ∈ D(A)
2
n 1 o
DPI of trace distance→ ⩽ min M(ρ′ ) : ∥ρ′ − ρ∥1 ⩽ ε, ρ′ ∈ D(A)
2
ε
= M (ρ) ,
(10.113)
where in the last inequality we used the DPI of the trace distance, particularly, the property
that if 12 ∥ρ′ − ρ∥1 ⩽ ε then 12 ∥N (ρ′ ) − N (ρ)∥1 ⩽ ε. Therefore, the former impose a stronger
constraint than the latter which give rise to the last inequality. This completes the proof.

10.4.1 Smoothing of Divergence-Based Resource Measures

Let ε > 0, D be a quantum divergence, and D(·∥F) its associated resource measure as defined
in (10.22). Then, the ε-smoothed version of D(·∥F) is given for all ρ ∈ D(A) by
Dε (ρ∥F) := ′ min D(ρ′ ∥F)
ρ ∈Bε (ρ)

= ′ min inf D (ρ′ ∥σ) (10.114)

ρ ∈Bε (ρ) σ∈F(A)

= inf Dε (ρ∥σ) ,
σ∈F(A)

where Dε is defined as
Dε (ρ∥σ) := ′ min D (ρ′ ∥σ) . (10.115)
ρ ∈Bε (ρ)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

476 CHAPTER 10. QUANTIFICATION OF QUANTUM RESOURCES

The quantity Dε is called the ε-smoothed version of the quantum divergence D. Smoothed di-
vergences play key roles in QRTs and in the next theorem we prove some useful relationships
among some of them.

Exercise 10.4.1. Let D be a quantum divergence and ε > 0. Show that Dε is itself a quantum
divergence.
ε ε
We denote the ε-smoothed version of Dmax as Dmax . For Dmin , the notation Dmin already
signifies the quantum hypothesis testing divergence (see Sec.8.7.1). This aligns with the
notion that the quantum hypothesis testing divergence is a smoothed version of Dmin (refer
to Exercise 8.7.4). Additionally, when smoothing Dmin in the form minρ′ ∈Bε (ρ) Dmin (ρ′ |σ),
the result is always zero. This is because for any ε > 0 and ρ ∈ D(A), there’s a ρ′ ∈ D(A)
that’s ε-close to ρ with ρ′ > 0, making Dmin (ρ′ ∥σ) = 0. Henceforth, Dmin ε
will exclusively
represent the quantum hypothesis testing divergence in this book.
ε
In the forthcoming theorem, we will establish specific inequalities that involve Dmin and
ε
Dmax . These inequalities are crucial and will play a significant role in the subsequent dis-
ε ε
cussions and analyses. The relationships between Dmin and Dmax are fundamental in un-
derstanding various aspects of quantum resources both in the single-shot regime as well as
in the asymptotic domain. Furthermore, later on we will use some of these relationships to
ε ε
provide operational interpretation to both Dmin and Dmax .

Theorem 10.4.1. Let ρ, σ ∈ D(A) and ε, ε1 , ε2 ∈ (0, 1) such that ε1 + ε2 ⩽ 1. Then,

the following relations hold:
ε1 ε2
Dmax (ρ∥σ) ⩾ Dmin (ρ∥σ) + log (1 − ε1 − ε2 ) (10.116)
ε D(ρ∥σ) + h(ε)
Dmin (ρ∥σ) ⩽ (10.117)
1−ε
√

ε 1−ε2 ε
Dmin (ρ∥σ) ⩾ Dmax (ρ∥σ) + log (10.118)
1−ε

where h(ε) := −ε log ε − (1 − ε) log(1 − ε) is the binary Shannon entropy.

We prove each of the inequalities separately:

ε1
Proof of inequality (10.116). Let ρ̃ ∈ D(A) be such that Dmax (ρ∥σ) = Dmax (ρ̃∥σ) and 21 ∥ρ̃ −
ρ∥1 ⩽ ε1 . From (5.170) it follows that there exists ω± ∈ D(A) such that

ρ̃ = ρ + ε1 (ω+ − ω− ) (10.119)

where without loss of generality we assumed the equality 12 ∥ρ̃ − ρ∥1 = ε1 . Denote r :=
ε1
Dmax (ρ∥σ) so that ρ̃ ⩽ 2r σ. Combining this with the inequality ρ ⩽ ρ̃ + ε1 ω− (that follows
from the equation above) gives
ρ ⩽ 2r σ + ε1 ω− . (10.120)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

10.4. SMOOTHING OF RESOURCE MEASURES 477

Finally, let Λ ∈ Eff(A) be the optimal operator satisfying

ε2
Dmin (ρ∥σ) = − log Tr [σΛ] and Tr[ρΛ] ⩾ 1 − ε2 . (10.121)
Then,
1 − ε2 ⩽ Tr[ρΛ]
(10.120)→ ⩽ 2r Tr[σΛ] + ε1 Tr[ω− Λ] (10.122)
ε
2 (ρ∥σ)
r−Dmin
(10.121) and Λ ⩽ I −−−−→ ⩽ 2 + ε1 .
Recalling the definition of r, we get from the equation above that
ε1 ε2
Dmax (ρ∥σ) − Dmin (ρ∥σ) ⩾ log(1 − ε1 − ε2 ) . (10.123)
This completes the proof.
ε
Proof of inequality (10.117). Let Λ ∈ Eff(A) be the optimal effect such that Dmin (ρ∥σ) =
− log Tr[Λσ] and Tr[Λρ] = 1 − ε. Define the channel E ∈ CPTP(A → X), with |X| = 2, as
E(ω) := Tr[Λω]|0⟩⟨0|X + Tr[(I − Λ)ω]|1⟩⟨1|X ∀ ω ∈ L(A) . (10.124)
ε (ρ∥σ)
−Dmin
By definition, E(ρ) = Diag(1 − ε, ε) and E(σ) = Diag(t, 1 − t), where t := 2 and
Diag(·, ·) denotes 2 × 2 diagonal matrix. With these definitions we get from the DPI that

D(ρ∥σ) ⩾ D E(ρ) E(σ) = D Diag(1 − ε, ε) Diag(t, 1 − t)
= −h(ε) − (1 − ε) log t − ε log (1 − t) (10.125)
ε
−ε log (1 − t) ⩾ 0, log t = −Dmin (ρ∥σ) −−−−→ ⩾ −h(ε) + (1 − ε)Dmin (ρ∥σ) .
ε

This completes the proof.

Proof of inequality (10.118). Recall from (8.196) that there exists t ∈ R+ and ω ∈ Pos(A)
such that tρ ⩽ σ + ω and
ε
2−Dmin (ρ∥σ) = (1 − ε)t − Tr[ω] . (10.126)
ε
Observe that since 2−Dmin (ρ∥σ) ⩾ 0 the equation above implies in particular that
Tr[ω] ⩽ (1 − ε)t . (10.127)
ε
Since we would like to relate between Dmin (ρ∥σ) and a smoothing of Dmax (ρ∥σ), we rewrite
tρ ⩽ σ + ω in a way that involves Dmax (ρ ∥σ) for some ρ′ ∈ D(A) and then estimate how
′
1 1
close ρ′ to ρ. For this purpose, we define the operator G := σ 2 (σ + ω)− 2 and observe that
that
tGρG∗ ⩽ G(σ + ω)G∗
(10.128)
=σ.
Moreover, denoting by Λ := G∗ G we get that
1 1
Λ = (σ + ω)− 2 σ(σ + ω)− 2
1 1
σ = σ + ω − ω →→ ⩽ I A − (σ + ω)− 2 ω(σ + ω)− 2 (10.129)
⩽ IA .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

478 CHAPTER 10. QUANTIFICATION OF QUANTUM RESOURCES

GρG∗
Hence, Λ ∈ Eff(A). Finally, denoting by ρ′ := Tr[Λρ]
(so that ρ′ ∈ D(A)), the condition
tρ ⩽ σ + ω implies that tTr[Λρ]ρ′ ⩽ σ, so that
Dmax (ρ′ ∥σ) ⩽ − log tTr[Λρ] .

(10.130)
Next, we estimate Tr[Λρ]:
Tr[Λρ] = 1 − Tr [(I − Λ)ρ]
1
tρ ⩽ σ + ω −−−−→ ⩾ 1 − Tr [(I − Λ)(σ + ω)]
t (10.131)
Tr[ω]
Tr [Λ(σ + ω)] = 1 −−−−→ = 1 −
t
(10.127)→ ⩾ ε ,
where we used the expression for Λ in (10.129) to get that Tr [Λ(σ + ω)] = 1. Combining
the two equations above gives Dmax (ρ′ ∥σ) ⩽ − log(tε). To estimate t, we use the fact that
ε
Tr[ω] ⩾ 0 to get from the relation in (10.126) that 2−Dmin (ρ∥σ) ⩽ (1 − ε)t. Substituting this
lower bound on t, into the inequality Dmax (ρ′ ∥σ) ⩽ − log(tε) gives

′ ε −Dmin ε (ρ∥σ)
Dmax (ρ ∥σ) ⩽ − log 2
1−ε
(10.132)
ε ε
= Dmin (ρ∥σ) − log .
1−ε
√
It is therefore left to show that ρ′ is 1 − ε2 -close to ρ.
√
Let |ψ AÃ ⟩ := ( ρ ⊗ I)|ΩAÃ ⟩ and |ψ̃ AÃ ⟩ := (G ⊗ I)|ψ AÃ ⟩. Observe that ψ AÃ and ψ̃ AÃ are
purifications of ρ and ρ̃ := GρG∗ , respectively. Moreover, observe that |ψ ′ AÃ ⟩ := √ 1 |ψ̃ AÃ ⟩
Tr[ρ̃]
is a purification of ρ′ . From Uhlmann’s theorem the fidelity between ρ and ρ′ satisfies:
AÃ
F (ρ, ρ′ ) ⩾ ⟨ψ ′ |ψ AÃ ⟩
−−−−→ ⩾ ⟨ψ̃ AÃ |ψ AÃ ⟩
Tr[ρ̃] ⩽ 1

1 AÃ AÃ
Real Part→ ⩾ ⟨ψ̃ |ψ ⟩ + ⟨ψ AÃ |ψ̃ AÃ ⟩ (10.133)
2
1
P :=
2
(G + G∗ ) −−−−→ = ⟨ψ AÃ |P ⊗ I Ã |ψ AÃ ⟩
= Tr[ρP ] .
Combining this with the fact that P ⩽ I A (see Exercise 2.3.14) we obtain
F (ρ, ρ′ ) ⩾ Tr[ρP ] = 1 − Tr[ρ(I − P )]
1
tρ ⩽ σ + ω −−−−→ ⩾ 1 − Tr[(σ + ω)(I − P )]
t
1 Tr[ω] 1 h 1 1
i
By definition of P→ = 1 − − + Tr σ 2 (σ + ω) 2 (10.134)
t t t
1 1 Tr[ω]
(σ + ω) 2 ⩾ σ 2 → ⩾ 1 −
t
(10.127)→ ⩾ ε .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

10.4. SMOOTHING OF RESOURCE MEASURES 479

Therefore, from the relation (5.202) between the trace distance and the fidelity we get 12 ∥ρ −
√
ρ′ ∥1 ⩽ 1 − ε2 . This completes the proof.
The relation between the smoothed max and min entropies can be used to obtained a
generalized version of the AEP property.

Corollary 10.4.1. For any ρ, σ ∈ D(A) with supp(ρ) ⊆ supp(σ), and any ε ∈ (0, 1)
1 ε
Dmax ρ⊗n σ ⊗n = D(ρ∥σ) .

lim (10.135)
n→∞ n

Proof. From (10.116) we get that for any ε1 , ε2 ∈ (0, 1) with ε1 + ε2 < 1
1 ε1 1
ρ⊗n ∥σ ⊗n ⩾ lim inf ε2
ρ⊗n ∥σ ⊗n + log (1 − ε1 − ε2 )

lim inf Dmax Dmin
n→∞ n n→∞ n
1 ε2 (10.136)
ρ⊗n ∥σ ⊗n

= lim inf Dmin
n→∞ n
(8.211)→ = D(ρ∥σ) .
√
Conversely, from (10.118), after replacing the roles between δ := 1 − ε2 and ε, we get for
every ε ∈ (0, 1)

1 ε ⊗n ⊗n
1 δ ⊗n ⊗n
δ
lim sup Dmax ρ ∥σ ⩽ lim sup Dmin ρ ∥σ − log
n→∞ n n→∞ n 1−δ
1 δ (10.137)
ρ⊗n ∥σ ⊗n

= lim sup Dmin
n→∞ n
(8.211)→ = D(ρ∥σ) .
From the two equations above it follows that (10.135) must hold.
The technique applied in the aforementioned theorem, especially in the proof of (10.118),
is also applicable for upper-bounding the smoothed max relative entropy. This can be
achieved by smoothing the second argument of Dmax . For further details, please refer to
Appendix D.3.
Exercise 10.4.2. Let ρ ∈ D(AB) and ε ∈ (0, 1). The smoothed version of Hmin and Hmax
(see Definition 7.5.1) are defined, respectively, as
ε ε
Hmin (A|B)ρ := ′max Hmin (A|B)ρ′ and Hmax (A|B)ρ := ′ min Hmax (A|B)ρ′ . (10.138)
ρ ∈Bε (ρ) ρ ∈Bε (ρ)

1. Show that
1 ε
Hmin An B n ρ⊗n = H(A|B)ρ .

lim (10.139)
n→∞ n
Hint: Use Corollary 10.4.1.
2. Show that
1 ε
Hmax An B n ρ⊗n = H(A|B)ρ .

lim (10.140)
n→∞ n
Hint: Use the duality relation between Hmin (A|B) and Hmax (A|B).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

480 CHAPTER 10. QUANTIFICATION OF QUANTUM RESOURCES

Closed Formula for the Smoothing of an Entropy Function

Generally, the task of smoothing resource measures or relative entropies tends not to result
in functions with closed forms. Yet, this scenario is notably different when it comes to the
smoothing of entropy functions. Here, the utilization of the concepts of steepest and flattest
approximations of a probability vector, as delineated in Sec. 4.2 on approximate majorization,
becomes particularly relevant. Intriguingly, these approximations reveal that a smoothed
entropy function can indeed be articulated in a closed form. This insight introduces an
element of simplicity and exactness into a domain typically characterized by its complexity.
Let H be a quantum entropy and ε ∈ [0, 1). Then, the ε-smoothed version of H is defined
as
Hε (ρ) := ′max H (ρ′ ) . (10.141)
ρ ∈Bε (ρ)

Observe that this definition is consistent with the definition of a smoothed relative entropy.
Specifically, if H is related to a quantum divergence D as H(ρ) = log |A| − D(ρ∥u) then Hε
as defined above is related to the smooth relative entropy Dε as Hε (ρ) = log |A| − Dε (ρ∥u).

Theorem 10.4.2. Let H be a quantum entropy and let ε ∈ [0, 1). Then, the
ε-smoothed version of H is given by

Hε (ρ) = H p(ε)

∀ ρ ∈ D(A) , (10.142)

where p(ε) is the probability vector defined in (4.76).

Proof. Let ρ′ be an optimal quantum state in Bε (ρ) such that Hε (ρ) = H(ρ′ ). We first argue
that without loss of generality we can assume that ρ′ commutes with ρ. To see this, let
∆ ∈ CPTP(A → A) be the completely dephasing channel in the eigenbasis of ρ. Then, since
∆ is a doubly stochastic channel we have H (∆(ρ′ )) ⩾ H (ρ′ ). Moreover, since ∆(ρ) = ρ we
have
1 1
∥∆ (ρ′ ) − ρ∥1 = ∥∆ (ρ′ ) − ∆(ρ)∥1
2 2 (10.143)
1 ′
DPI→ ⩽ ∥ρ − ρ∥1 ⩽ ε .
2
Hence, ∆ (ρ ) is also ε-close to ρ, so that ∆(ρ′ ) is also an optimizer of (10.141). Hence,
′

without loss of generality we can assume that ρ′ is diagonal in the same eigenbasis of ρ.
Let p = p↓ be the vector consisting of the eigenvalues of ρ. From the argument above
we can epress the smoothed entropy in (10.141) as

Hε (ρ) = ′max H (p′ ) . (10.144)

p ∈Bε (p)

Now, since Bε (p) has the property that for every p′ ∈ Bε (p) the vector p(ε) as defined
in (4.76) satisfies p′ ≻ p(ε) so that H(p′ ) ⩽ H(p(ε) ). Therefore, the choice p′ = p(ε) gives
the maximum value.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

10.4. SMOOTHING OF RESOURCE MEASURES 481

As a simple example, consider the min-entropy as defined in (6.22); i.e., Hmin (ρ) =
− log ∥ρ∥∞ for all ρ ∈ D(A). Note that this entropy is related to the max relative entropy
ε
via Hmin (ρ) = log |A| − Dmax (ρ∥u). From the theorem above, Hmin (ρ) = − log p(ε) ∞ .
Using the definition of p(ε) in (4.76) we get from (10.142) that
ε
Hmin (ρ) = − log(a)

k (10.145)
(4.81)→ = log ,
∥p∥(k) − ε
where k is the integer satisfying (4.82) which is equivalent to
∥p∥(k) − ε
pk+1 < ⩽ pk . (10.146)
k
ε
Alternatively, observe that from (4.87) we can also express Hmin (ρ) as
∥p∥(ℓ) − ε

ε
Hmin (ρ) = − log max . (10.147)
ℓ∈[n] ℓ
It is worth noting that for the case that H = Hmax (recall that Hmax (A)ρ := log Rank(ρ)
for all ρ ∈ D(A)), the definition in (10.141) results with a quantity that always equals log |A|
since for any ε ∈ (0, 1) and any ρ ∈ D(A), there always exists a full rank state ρ′ ∈ D(A)
that is ε-close to ρ. Therefore, in this case instead of taking the maximum in (10.141) we
take the minimum, so that the smoothed version of Hmax is defined as
ε
Hmax (A)ρ := ′ min Hmax (A)ρ′ . (10.148)
ρ ∈Bε (ρ)

Also this formula has a closed form.

Lemma 10.4.2. Let ε ∈ [0, 1) and ρ ∈ D(A). Then, the ε-smoothed max-entropy is
given by
ε
Hmax (A)ρ = log(m) (10.149)
where m is the integer satisfying ∥ρ∥(m−1) < 1 − ε ⩽ ∥ρ∥(m) .

Proof. From its definition we get

ε
ρA = min {log Rank(ρ′ ) : T (ρ, ρ′ ) ⩽ ε}

Hmax
(10.150)
m := Rank(ρ′ ) −−−−→ = min log m : T ρ, Dm (A) ⩽ ε ,

where we used the notation Dm (A) to denote the set of all density matrices in D(A), whose
rank is not greater than m. In Theorem 5.4.3 we showed that T ρ, Dm (A) = 1 − ∥ρ∥(m) .
Substituting this to the equation above we conclude that
ε

Hmax (A)ρ = min log m : ∥ρ∥(m) ⩾ 1 − ε . (10.151)
This completes the proof.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

482 CHAPTER 10. QUANTIFICATION OF QUANTUM RESOURCES

Exercise 10.4.3. Using the same notations as above:

1. Show that if we replace the maximum in (10.144) with a minimum we get
′ (ε)

′
max H (p ) = H p , (10.152)
p ∈Bε (p)

where p(ε) is the steepest ε-approximation of p as defined in (4.57).

2. Show that if ρ, σ ∈ D(A) are such that ρ ≻ σ then

ε ε
Hmax (A)ρ ⩽ Hmax (A)σ . (10.153)

Another Smoothed Version of Dmin

As previously mentioned, the smoothed version of Dmin given by minρ′ ∈Bε (ρ) Dmin (ρ′ ∥σ) is
always zero. Hence, we’ve recognized the hypothesis testing divergence as its smoothed
variant. Yet, one might seek to define the smoothed version of the min relative entropy
using a maximum instead of a minimum. Specifically, for any ρ, σ ∈ D(A) and ε ∈ [0, 1), we
introduce another smoothed version of Dmin as
(ε)
Dmin (ρ∥σ) := ′max Dmin (ρ′ ∥σ) . (10.154)
ρ ∈Bε (ρ)

It’s crucial to emphasize, however, that this quantity is not necessarily a divergence, given the
optimization uses a maximum instead of a minimum (thus, Lemma 10.4.1 isn’t applicable).
Nevertheless, in this book, we will encounter this function in some applications.
Exercise 10.4.4. Show that for every ρ ∈ D(A) we have
ε (ε)
Hmax (A)ρ = log |A| − Dmin (ρ∥u) . (10.155)

Theorem 10.4.3. Let ρ, σ ∈ D(A) and ε ∈ [0, 1). Then,

(ε) ε
Dmin (ρ∥σ) ⩽ Dmin (ρ∥σ) . (10.156)

Proof. We first prove the theorem for the classical case. For every ℓ ∈ [m] denote by Probℓ (m)
the set of probability vectors in Prob(m) with at most ℓ non-zero components. Then, by
definition,
(ε)
Dmin p u(m) = max Dmin q u(m)

q∈Bε (p)

| supp(q)|
= − min log (10.157)
q∈Bε (p) m

ℓ := | supp(q)| −−−−→ = log m − log min ℓ : T p, Probℓ (m) ⩽ ε

Theorem 5.4.3→ = log m − log min ℓ : ∥p∥(ℓ) ⩾ 1 − ε

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

10.4. SMOOTHING OF RESOURCE MEASURES 483

Hence,
(ε)
Dmin p u(m) = log(m/ℓ) ,

(10.158)
where ℓ ∈ [m] is the smallest integer satisfying ∥p∥(ℓ) ⩾ 1 − ε. The above expression coincide

with the lower bound in (8.148) for the case that q = u(m) . Hence, Dmin ε
p u(m) ⩾
(ε)
Dmin p u(m) .
For the case that q has positive rational components (as given in (4.133)) we use Theo-
rem 4.3.2, particularly the relation (p, q) ∼ (r, u(k) ), where r is defined in (4.134) to get
ε ε
r u(k)

Dmin (p∥q) = Dmin
(ε)
⩾ Dmin r u(k)

(10.159)
′ (k)

= ′max Dmin r u .
r ∈Bε (r)

(ε) (ε)
It is crucial to observe that since Dmin is not a divergence we cannot conclude that Dmin r u(k) =
(ε)
Dmin (p∥q). Instead, let C be the set of all vectors r′ ∈ Prob(k) that satisfies (r′ , u(k) ) ∼
(p′ , q) for some p′ ∈ Bε (p), and observe that C ⊂ Bε (r). Combining this with the equation
above gives
ε ′ (k)

Dmin (p∥q) ⩾ max ′
D min r u
r ∈C
= ′max Dmin (p′ ∥q) (10.160)
p ∈Bε (p)
(ε)
= Dmin (p∥q) .
ε
Finally, the case case with arbitrary q ∈ Prob(m), follows from the continuity of both Dmin
(ε)
and Dmin in their second argument. This completes the proof for the classical case.
For the quantum case we get from (8.191) that
ε ε

Dmin ρ σ = sup Dmin E(ρ) E(σ)
E∈CPTP(A→X)
(ε)
From the classical case→ ⩾ sup Dmin E(ρ) E(σ)
E∈CPTP(A→X)

sup Dmin E(ρ′ ) E(σ)

ρ ≈ε ρ′ ⇒ E(ρ) ≈ε E(ρ′ ) −−−−→ ⩾ sup (10.161)
E∈CPTP(A→X) ρ′ ∈Bε (ρ)

(8.191)→ = sup Dmin ρ′ σ

ρ′ ∈Bε (ρ)
(ε)
= Dmin ρ σ .

where the supremums are over all classical systems X and POVM channels E ∈ CPTP(A →
X) that takes ρ and σ to diagonal density matrices (i.e. probability vectors).
Exercise 10.4.5. Let ρ, σ ∈ D(A) and ε ∈ (0, 1). Show that
√ √ !
(ε) Λρ Λ
Dmin ρ σ ⩾ Dmin σ , (10.162)
Tr[Λρ]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

484 CHAPTER 10. QUANTIFICATION OF QUANTUM RESOURCES

for any Λ ∈ Eff(A) that satisfies Tr[Λρ] ⩾ 1 − ε2 . Hint: Use the gentle measurement lemma
(Lemma 5.4.3).
ε
In the quantum Stein’s lemma (Theorem 8.7.3) we saw that the regularization of Dmin
(ε)
yields the Umegaki relative entropy. We show now that the same holds also for Dmin . We
will use this result later on when we discuss the uniqueness of the Umegaki relative entropy.

Theorem 10.4.4. Let ε ∈ (0, 1), and ρ, σ ∈ D(A) with supp(ρ) ⊆ supp(σ). Then,
1 (ε) ⊗n ⊗n
lim Dmin (ρ ∥σ ) = D(ρ∥σ) . (10.163)
n→∞ n

Proof. From Theorem 10.4.3 we get that

1 (ε) 1 ε
lim sup Dmin (ρ⊗n ∥σ ⊗n ) ⩽ lim sup Dmin (ρ⊗n ∥σ ⊗n )
n→∞ n n→∞ n (10.164)
Theorem 8.7.3→ = D(ρ∥σ) .

In order to prove the opposite inequality, we make use of the method of relative typical
subspace introduced in Sec. 8.3.2. Set ε, δ ∈ (0, 1) and let Πrel,n
δ be the projection to the
n
relative typical subspace given in (8.75), Pδ be the projection to the δ-typical subspace
associated with ρ, and define

Πrel,n n ⊗n n rel,n
δ hPδ ρ Pδ Πδ
ρn := i . (10.165)
rel,n n ⊗n
Tr Πδ Pδ ρ

Observe that for any δ1 , δ2 > 0 we have for large enough n

h i h i h ⊗n i
An
Tr Πrel,n
δ P n ⊗n
δ ρ = Tr Πrel,n ⊗n
δ ρ − Tr Πrel,n
δ I − P n
δ ρ
h i n
⩾ Tr Πrel,n ρ⊗n − Tr I A − Pδn ρ⊗n (10.166)

δ

(8.76), (8.55)→ ⩾ 1 − δ1 − δ2 ,

where we use the properties of typical and relative-typical projectors. Taking δ1 + δ2 = ε2

and using the gentle measurement lemma (see Lemma 5.4.3) we get that 21 ∥ρn − ρ⊗n ∥1 ⩽ ε.
Hence,
(ε)
Dmin (ρ⊗n ∥σ ⊗n ) ⩾ Dmin (ρn ∥σ ⊗n ) = − log Tr Πρn σ ⊗n .

(10.167)
Now, we make two observations about Πρn : (1) It projects into a subspace of Trel n
ε (A ), and
(2) its trace is no greater than

Rank(ρn ) ⩽ Tr [Pδn ] ⩽ 2n(H(A)ρ +δ) . (10.168)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

10.4. SMOOTHING OF RESOURCE MEASURES 485

Due to the first property, it follows by definition (8.75) of the relative-typical subspace that
Tr [Πρn σ ⊗n ] ⩽ 2n(Tr[ρ log σ]+δ) Tr [Πρn ]. Combining this with the above equation we conclude
that for sufficiently large n
(ε)
Dmin (ρ⊗n ∥σ ⊗n ) ⩾ − log 2n(Tr[ρ log σ]+δ) Tr [Πρn ]

= −n (Tr[ρ log σ] + δ) − log Tr [Πρn ]

(10.169)
(10.168)→ ⩾ −n (Tr[ρ log σ] + δ) − n (H(A)ρ + δ)
= n (D(ρ∥σ) − 2δ)

Dividing both sides by n and taking the limit n → ∞ we conclude that

1 (ε) ⊗n ⊗n
lim inf D (ρ ∥σ ) ⩾ D(ρ∥σ) − 2δ . (10.170)
n→∞ n min
Since the above inequality holds for all δ > 0 it must also hold for δ = 0. This completes
the proof.

As a simple application of the result above, consider the smoothed max-entropy as defined
in (10.148). Then, from Exercise 10.4.4 and the theorem above we get the following version
of the AEP: For all ε ∈ (0, 1) and all ρ ∈ D(A)

1 ε
lim Hmax (An )ρ⊗n = H(A)ρ , (10.171)
n→∞ n
where H(A)ρ is the von-Neumann entropy of ρ.

Exercise 10.4.6. Prove this AEP version, and compare it with (10.140) for the case |B| = 1.

10.4.2 Smoothed Decoupling Theorem

In this subsection we show that the decoupling theorem as given in Theorem 7.7.1 can be
expressed with smoothed quantities. In particular, we will replace the optimized conditional
entropies H̃2↑ (A|E) and H̃2↑ (A|B) with the smoothed conditional min-entropies.

Decoupling Theorem (Smoothed Version)

1
Corollary 10.4.2. Let ρ ∈ D⩽ (AE), E ∈ CP⩽ (A → B), and τ AB := |A| JEAB , where
JEAB is the Choi matrix of E A→B . Then, for any ε > 0
Z
A A→B A AE

A ∗ B E − 12 Hmin
ε (A|E) +H ε (A|B)
dU E U ρ U −τ ⊗ρ ⩽2 ρ min τ
+ 8ε
U(A) 1

R (10.172)
A
where U(A) is the group of all unitary matrices acting on A, and U(A)
dU denotes
the integral over the Haar measure on U(A).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

486 CHAPTER 10. QUANTIFICATION OF QUANTUM RESOURCES

Proof. This corollary concerns with the replacement of the terms involving H2 in the decou-
pling theorem with the smoothed min-entropy. For this purpose, let ρ̃AE and τ̃ AB be such
ε ε
that Hmin (A|E)ρ = Hmin (A|E)ρ̃ and Hmin (A|B)τ = Hmin (A|B)τ̃ . Note also that by definition
AE AE AB AB
∥ρ − ρ̃ ∥1 ⩽ 2ε and ∥τ − τ̃ ∥1 ⩽ 2ε. Denoting by Ẽ the CP map whose Choi matrix
is τ̃ AB we get

where in the last inequality we used the fact that η := Ẽ (U ρ̃U ∗ ) ∈ Pos(BE) satisfies

∥η − τ ⊗ ρ∥1 = ∥η − (τ̃ + τ − τ̃ ) ⊗ ρ∥1

Triangle inequality→ ⩽ ∥η − τ̃ ⊗ ρ∥1 + ∥(τ − τ̃ ) ⊗ ρ∥1
τ ≈ε τ̃ −−−−→ ⩽ ∥η − τ̃ ⊗ (ρ̃ + ρ − ρ̃)∥1 + 2ε (10.174)
Triangle inequality→ ⩽ ∥η − τ̃ ⊗ ρ̃∥1 + ∥τ̃ ⊗ (ρ − ρ̃)∥1 + 2ε
ρ ≈ε ρ̃ −−−−→ ⩽ ∥η − τ̃ ⊗ ρ̃∥1 + 4ε .

Next, observe that

Ẽ U ρ̃U ∗ = E (U ρU ∗ ) + E U (ρ̃ − ρ)U ∗ + (Ẽ − E) U ρ̃U ∗ . (10.175)

so that from the triangle inequality of the trace norm we get

Ẽ U ρ̃U ∗ − τ ⊗ ρ 1
(10.176)
⩾ E U ρU ∗ − τ ⊗ ρ − E U (ρ̃ − ρ)U ∗ − (Ẽ − E) U ρ̃U ∗
1 1 1

It is therefore left to bound the average of the last two terms over the group U(A). Denote
by η± := (ρ̃AE − ρAE )± and ζ± := (τ̃ AB − τ AB )± . Since ρ̃ and τ̃ are ε-close to ρ and τ ,
respectively, we have Tr[η+ + η− ] ⩽ 2ε and Tr[ζ+ + ζ− ] ⩽ 2ε. Now, denote by N± the
CP maps whose Choi matrices are ζ± , respectively. We then have Ẽ − E = N+ − N− and

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

10.5. RESOURCE MONOTONES AND SUPPORT FUNCTIONS 487

ρ̃ − ρ = η+ − η− , so that
Z Z
∗
dU E U (ρ̃ − ρ)U =E U (η+ − η− )U ∗
dU
U(A) 1 1
ZU(A) h i Z h i
∗
Triangle inequality→ ⩽ dU Tr E U η+ U + dU Tr E U η− U ∗
U(A) U(A)
Z
dU U A η±
AE A∗
U = uA ⊗ η±
E
−−−−→ = Tr [E(u)] Tr [η+ ] + Tr [E(u)] Tr [η− ]
U(A)

= Tr[τ ] Tr [η+ ] + Tr[η− ]
⩽ 2ε .
(10.177)
Similarly,
Z Z
dU (Ẽ − E) U ρU ∗ = (N+ − N− ) U ρU ∗
dU
U(A) 1 U(A) 1
Z h i Z h i
∗ ∗
⩽ dU Tr N+ U ρU + dU Tr N− U ρU
U(A) U(A)

= Tr N+ (uA ) ⊗ ρE + Tr N− (uA ) ⊗ ρE

= Tr[ζ+ + ζ− ]Tr[ρE ]
⩽ 2ε .
(10.178)
Combining everything we get
Z
− 12 Hmin
ε (A|E) +H ε (A|B)
2 ρ min τ
⩾ dU E U ρU ∗ − τ ⊗ ρ − 8ε . (10.179)
U(A) 1

This completes the proof.

10.5 Resource Monotones and Support Functions

S
Let F be a convex QRT and for any fixed η ∈ Herm(B) define the function Gη : A D(A) → R
as

Gη (ρA ) := Tr η B E A→B ρA − sup Tr η B ω B

sup ∀ρ ∈ D(A) . (10.180)
E∈F(A→B) ω∈F(B)

Theorem 10.5.1. For any η ∈ Herm(A) the function Gη is a resource monotone.

Proof. We prove the following properties:

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

488 CHAPTER 10. QUANTIFICATION OF QUANTUM RESOURCES

1. Monotonicity. Let N ∈ F(A → A′ ) and denote cη := supω∈F(B) Tr η B ω B . Then,

′
h ′
′
i
Gη N A→A (ρA ) = sup Tr η B MA →B N A→A (ρA ) − cη
M∈F(A′ →B)

Tr η B E A→B ρA − cη

Replaceing M ◦ N with E −−−−→ ⩽ sup
E∈F(A→B)

= Gη ρA .

(10.181)

2. Normalization. Let σ ∈ F(A). Then,

Tr η B E A→B σ A = sup Tr η B ω B

sup (10.182)
E∈F(A→B) ω∈F(B)

where ω B = E A→B σ A ∈ F(B) can be taken to be any free state (by choosing
E to
B A
be a replacement channel in F(A → B) that outputs ω ). Hence, Gη σ = 0 for all
σ ∈ F(A).

3. Strong monotonicity. Let N = x∈[m] Nx ⊗ |x⟩⟨x| ∈ F(A → A′ X) be a free quantum

instrument. Then, observe that for any free channel M ∈ F(A′ X → B) we have
′ ′ ′ ′
X
→B
MA X→B ◦ N A→A X = MA
x ◦ NxA→A (10.183)
x∈[m]

where Mx ∈ F(A′ → B) is define by

′ ′ ′
′
MxA →B (ω A ) := MA X→B ω A ⊗ |x⟩⟨x|X ∀ ω ∈ L(A′ ) . (10.184)

We therefore get that

′
h ′
′
i
Gη N A→A X (ρA ) = sup Tr η B MA X→B N A→A X (ρA ) − cη
M∈F(A′ X→B)
X h ′ →B
′
i
B
= sup Tr η MA
x NxA→A (ρA ) − cη
Mx ∈F(A′ →B) (10.185)
x∈[m]
X ′

= px Gη σxA ,
x∈[m]

′ ′ ′
where σxA := p1x NxA→A (ρA ) and px := Tr NxA→A (ρA ) . Combining this with the

monotonicity property of Gη we conclude that

X ′
Gη ρA ⩾ px Gη σxA .

(10.186)
x∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

10.5. RESOURCE MONOTONES AND SUPPORT FUNCTIONS 489

4. Convexity. Let ρAX := x∈[m] px ρA X

P
x ⊗ |x⟩⟨x| be a cq-state in D(AX). Using a similar
argument as above we have
X
Gη ρAX = px Gη ρA

x . (10.187)
x∈[m]

On the other hand, since the partial trace is a free operation we get that

Gη ρA ⩽ Gη ρAX .

(10.188)

This equation is equivalent to

X X
px ρA px Gη ρA

Gη x ⩽ x . (10.189)
x∈[m] x∈[m]

Recall that the combination of monotonicity and normalization properties ensures that
Gη (ρ) ⩾ 0 for all density matrices. Additionally, if we define Cρ := E(ρ) : E ∈ F(A → B),
then the support function of Cρ in the space of Hermitian matrices Herm(B) is described by:

fρ (η) := sup ⟨ω B , η B ⟩ = sup Tr η B E A→B ρA .

(10.190)
ω∈Cρ E∈F(A→B)

As we will explore later, this family of resource monotones is complete, meaning that it can
be utilized to fully determine exact interconversions among resources. Furthermore, these
monotones are formulated as conic linear programming problems, and in some QRTs, they
reduce to semidefinite programming, which are comparatively simpler to compute.

Example: Measures of Conditional Uncertainty

Consider a QRT in which the free operations are CMO channels (see Sec. 7.1.3). Given that
conditional majorization can be determined by an SDP feasibility problem, it follows that
also the corresponding support functions can be computed with an SDP. Recall that in this
QRT the set of free states is given by

F(AB) = uA ⊗ σ B : σ ∈ D(B) .

(10.191)

Therefore, for this QRT, for any η ∈ D(AB ′ ) the coefficient cη is given by
h ′
′
i 1 ′
cη = sup Tr η AB uA ⊗ σ B = ηB ∞ (10.192)
σ∈D(B ′ ) |A|

Hence, for every η ∈ D(AB ′ ) the function Gη as defined above can be expressed as
h ′ ′ i 1 ′
Gη (A|B)ρ := sup Tr η AB N AB→AB ρAB − ηB ∞ . (10.193)
N ∈CMO(AB→AB ′ ) |A|

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

490 CHAPTER 10. QUANTIFICATION OF QUANTUM RESOURCES

We denote by fη the first term on the right-hand side above. In terms of the Choi matrix of
N , this function can be expressed as
h ′
′
i
fη (A|B)ρ = sup Tr J AB ÃB ρAB ⊗ η ÃB (10.194)
J AB ÃB ′

where the maximization is over all Choi matrices of a CMO.

′
Recallfrom Exercise
7.1.4 that a Choi matrix J AB ÃB is a Choi matrix of a CMO if and
′
only if Υ J AB ÃB = 0. Moreover, observe that the trace-preserving condition J AB = I AB
′ ′
follows from J B = |A|I B and J ABB = uA ⊗ J BB . Hence,
n h ′
′
i ′
o
fη (A|B)ρ = |A| sup Tr J AB ÃB ρAB ⊗ η ÃB : Υ J AB ÃB = 0, J B = I B .
J AB ÃB ′ ⩾0
(10.195)
The above optimization problem can be solved with an SDP.

Exercise 10.5.1. Use the strong duality relation of an SDP to show that the function fη
can also be expressed as:
n ′
′
′
o
fη (A|B)ρ = |A| inf′ Tr ξ B : uAÃB ⊗ ξ B + Υ ξ AB ÃB ⩾ ρAB ⊗ η ÃB . (10.196)
ξ AB ÃB ⩾0

Exercise 10.5.2. Show that if |A| = |B| then for the maximally entangled state ρAB = ΦAB
Gη as defined in (10.193) is given by

′ 1 ′
Gη (A|B)Φ = η AB ∞
− ηB ∞
. (10.197)
|A|

Hint: Recall that for all states ρ ∈ D(AB) we have ΦAB ≻A ρAB .

10.6 Notes and References

Measures of quantum resources in general QRTs where introduced in [30] and studied inten-
sively by [37] and [187]. The asymptotic continuity of the relative entropy of a resource as
given in (10.34) is due to [234]. The robustness of a resource as defined here is sometimes
called the global robustness and it was first introduced by [114] for entanglement theory.
The converse problem introduced here for the relative entropy of a resource was intro-
duced in [162] for the case of two-qubit entangled states, in [79] for all finite dimensions in
entanglement theory, and finally in [84] for arbitrary QRTs. This technique was also used
in [47] to compute the α-relative entropy of entanglement for pure bipartite states. More
additivity properties of the relative entropy of a resource can be found in [192].
The inequality (10.116) is due to [?], the inequality (10.117) is due to [228], and the
inequality (10.118) is due to [67]. The decoupling theorem presented here is due to [68]. The
relations between resource monotones and support functions as presented here is due to [95].

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

CHAPTER 11

Manipulation of Resources

One of the central goals of QRTs is to understand optimal and efficient ways to convert one
resource to another. A resource in this context correspond to a class of equivalent resource-
F
states. We say that two resource-states ρ, σ ∈ D(A) are equivalent if both ρ →− σ (i.e. ρ can
F
be converted to σ by free operations) and σ →− ρ. In this chapter we study the conversion of
resources in two regimes: the single-shot regime and the asymptotic regime.

11.1 Single-Shot Interconversions

The single-shot regime encompasses exact, probabilistic, and approximate interconversions.
By “exact” interconversions, we refer to the transformation of a resource, such as ρ, into
another target resource, like σ, with both 100% success probability and accuracy. However,
in practical scenarios, it is frequently unfeasible to achieve such perfect conversion of ρ to
σ, necessitating a tolerance for slight errors. Additionally, we will discover that allowing
for a small margin of error not only accommodates practical limitations but also provides
theoretical insight. This flexibility facilitates a smoother transition from the single-shot
regime to the asymptotic regime, highlighting the interconnectedness and practicality of
these concepts in resource theory.

11.1.1 Exact Interconversions

Different QRTs have different sets of free operations. Consequently, one cannot expect to
find a set of simple necessary and sufficient conditions that can be used in any QRT to
determine the conversion of one resource ρ to another resource σ. Still, in the theorem
below we show that there exists a common dual characterization for the problem of exact
state-conversion that is given in terms of the resource monotones discussed in Sec. 10.5. We
will assume that F is a closed convex QRT meaning that for every two Hilbert spaces A
and B the set F(A → B) is a closed convex set in the real vector space Herm(A → B). In
particular, F(A) is a closed convex set in Herm(A).

491
492 CHAPTER 11. MANIPULATION OF RESOURCES

Theorem 11.1.1. Let F be a closed convex QRT, ρ ∈ D(A), and σ ∈ D(B). The
following are equivalent:

1. There exists N ∈ F(A → B) such that σ B = N A→B ρA .

2. For all η ∈ D(B)

Gη ρA ⩾ Gη σ B ,

(11.1)
where Gη are the resource monotones defined in (10.180).

Proof. Let Cρ := E A→B ρA : E ∈ F(A → B) . Observe that Cρ is a convex set in Herm(B).
From the hyperplane separation theorem (see Theorem A.1.1), σ ̸∈ Cρ if and only if there
exists a hyperplane η ∈ Herm(B) that separates them; that is,

Tr η B σ B > max Tr η B ω B .

(11.2)
ω∈Cρ

Alternatively, σ ∈ Cρ if and only if for all η ∈ Herm(B) we have

Tr η B σ B ⩽ max Tr η B ω B

ω∈Cρ
(11.3)
= max Tr η B E A→B ρA .

E∈F(A→B)

Note that if the equation above holds for some η ∈ Herm(B) then it also holds if we replace
η B with η B + cI B and vice versa (here c is any real number). Therefore, the equation above
holds for all η ∈ Herm(B) if and only if it holds for all η ∈ Pos(B). Similarly, by dividing
both sides of the equation above by Tr[η] we conclude that σ ∈ Cρ if and only if (11.3) holds
for all density matrices η ∈ D(B).
Now, observe that σ ∈ Cρ if and only if for all M ∈ F(B
→ B) we have that M(σ) ∈ Cρ .
B A→B A
To see this, suppose σ ∈ Cρ so that σ = E ρ for some E ∈ F(A → B). Then,
B→B B B→B A→B A
M σ =M ◦E ρ and since M ◦ E ∈ F(A → B) we conclude that M(σ) ∈
Cρ . Conversely, if M(σ) ∈ Cρ for all M ∈ F(B → B), by taking the identity channel
M = idB ∈ F(B → B) we get immediately that σ ∈ Cρ .
Finally, from (11.3) we get that for any M ∈ F(B → B) we have M(σ) ∈ Cρ if and only
if for all η ∈ D(B)

Tr η B MB→B σ B ⩽ max Tr η B E A→B ρA .

(11.4)
E∈F(A→B)

Hence, σ ∈ Cρ if and only if the above equation holds for all M ∈ F(B → B). Taking
the maximum over all such M ∈ F(B → B) we conclude that σ ∈ Cρ if and only if for all
η ∈ D(B)
max Tr η B MB→B σ B ⩽ max Tr η B E A→B ρA .

(11.5)
M∈F(B→B) E∈F(A→B)

The proof is concluded by recognizing that the above inequality is equivalent to (11.1).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

11.1. SINGLE-SHOT INTERCONVERSIONS 493

In general, the theorem above does not provide an efficient way to determine if one
resource can be converted to another by free operations. This is the case even if the resource
monotones Gη themselves can be computed efficiently, as we need to check the conditions
for all η ∈ D(B). Therefore, instead, we can use (11.3) to conclude that ρA can be converted
to σ B by free operations if and only if
h i
max Tr η B E A→B ρA − σ B ⩾ 0 .

min (11.6)
η∈D(B) E∈F(A→B)

For some QRTs, the optimization problem above is an SDP and therefore can be solved
efficiently.

11.1.2 Stochastic (Probabilistic) Interconversions

Probabilistic interconversion of one resource to another can also be considered as an exact
interconversion except that the conversion does not occur with a 100% success. Specifically,
F
let ρ ∈ D(A) and σ ∈ D(B) be two resource states, and let Pr(ρ → − σ) be the maximum
probability that ρ can be converted to σ by free operations. For some states it may be that
F
Pr(ρ →− σ) = 0 in which case ρ cannot be converted to sigma even with small probability. In
F
another extreme case, Pr(ρ →− σ) = 1, meaning that ρ can be converted to σ deterministically.
F
If 0 < Pr(ρ →
− σ) < 1 we say that ρ can be converted to σ stochastically.
Any free probabilistic transformation can be characterized with a channel/instrument
E ∈ F(A → BX) of the form
X
E A→BX = ExA→B ⊗ |x⟩⟨x|X , (11.7)
x∈[n]
P
where each Ex ∈ CP(A → B) is trace non-increasing such that x∈[m] Ex is trace preserving.
From the axiom of free instruments we require that ExA→B (ω A )/Tr[ExA→B (ω A )] is a free state
in F(B) whenever ω ∈ F(A). If there exists such a free quantum instrument with the property
that
ExA→B (ρA )
σB = (11.8)
Tr[ExA→B (ρA )]
for some x ∈ [n], then we say that ρ can be converted to σ by free operations with probability
px := Tr[ExA→B (ρA )].
Recall that we used the notation F⩽ (A → B) to denote the set of trace non-increasing
CP maps in CP⩽ (A → B) that are part of free quantum instruments. With this notation,
ρ can be converted to σ by free operations with probability p if and only if there exists
F
E ∈ F⩽ (A → B) such that p = Tr[E(ρ)] and σ = E(ρ)/p. Therefore, Pr(ρ → − σ) can be
defined as
F E(ρ)
Pr(ρ →
− σ) := max Tr[E(ρ)] : σ = . (11.9)
E∈F⩽ (A→B) Tr[E(ρ)]
We will now demonstrate that this probability is, in fact, a resource monotone.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

494 CHAPTER 11. MANIPULATION OF RESOURCES

Theorem 11.1.2. Let σ ∈ D(B) be such that σ ̸∈ F(B) (i.e. σ is a resource state).
Then, the function fσ : D(A) → [0, 1], defined via
F
fσ (ρ) := Pr(ρ →
− σ) ∀ ρ ∈ D(A) , (11.10)

is a resource measure satisfying the strong monotonicity property.

Proof. First observe that from the axiom of free instruments and the fact that σ is a resource
state, we must have fσ (ρ) = 0 for all ρ ∈ F(A). Next, we show that fσ is a resource measure.
Let N ∈ F(A → C) be a free channel. Let M ∈ F⩽ (C → B) be an optimal free instrument
satisfying
F
Pr(N (ρ) → − σ) = Tr M N (ρ) . (11.11)
F
Define E := M ◦ N , and observe that E ∈ F⩽ (A → B) and Pr(N (ρ) →
− σ) = Tr [E (ρ)].
F
Hence, from the definition of Pr(ρ →
− σ) in (11.9) we get

F F
Pr(ρ →
− σ) ⩾ Pr(N (ρ) →
− σ) . (11.12)

By definition, this is equivalent to fσ (ρ) ⩾ fσ N (ρ) . We therefore established that fσ is a
resource measure.
To prove strong monotonicity, let N ∈ F(A → BY ) and denote by
X
τ BY := N A→CY (ρA ) = ty τyC ⊗ |y⟩⟨y|Y (11.13)
y∈[n]

where each τy ∈ D(C) and {ty }y∈[n] is a probability distribution. From the monotonicity of
fσ under free channels (in particular, under N ) we get

fσ ρA ⩾ fσ τ CY

X
= Pr
F
ty τyC ⊗ |y⟩⟨y|Y →
− σB . (11.14)
y∈[n]

F
Let E (y) ∈ F⩽ (C → B) be an optimal trace non-increasing CP map such that Pr τyC → − σB =

Tr E (y) (τy ) and E (y) (τy ) is proportional to σ. We also define M ∈ F⩽ (CY → B) as:

MCY →B ω C ⊗ |y⟩⟨y|Y := E (y) ω C

∀ ω ∈ L(C) ∀ y ∈ [n] . (11.15)

In Exercise 11.1.1 you show that MCY →B is indeed an element of F⩽ (CY → B). By

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

11.1. SINGLE-SHOT INTERCONVERSIONS 495
P
definition, the state MCY →B C
y∈[n] ty τy ⊗ |y⟩⟨y|
Y
is proportional to σ B . Therefore,
X h X i
F
Pr − σ B ⩾ Tr MCY →B
ty τyC ⊗ |y⟩⟨y|Y → ty τyC ⊗ |y⟩⟨y|Y
y∈[n] y∈[n]
X
(y)

= ty Tr E (τy )
y∈[n]
X
F
(11.16)
= ty Pr τyC →
− σB
y∈[n]
X
ty fσ τyC .

=
y∈[n]

Combining this with (11.14) we conclude that

X
fσ ρ A ⩾ ty fσ τyC .

(11.17)
y∈[n]

This completes the proof.

→B
Exercise 11.1.1. Show that MCY as defined above belongs to F⩽ (CY → B). Hint:
CY →BX
= x∈[m] FxCY →B ⊗ |x⟩⟨x|X such that F1CY →B = MCY →B .
P
Define a free channel F
Exercise 11.1.2. Let ρ ∈ D(A) and σ ∈ D(B). Show that for any M ∈ F(B → C)
F F
Pr ρ →
− M(σ) ⩾ Pr ρ → − σ . (11.18)

As discussed above, if ρA can be converted by free operations B

Pto σ , with some non-zero
probability, then there exists a free quantum instrument E = x∈[m] Ex ⊗ |x⟩⟨x| ∈ F(A →

BX) such that E1A→B (ρA ) is proportional to σ B . For every x ∈ [m], let px := Tr ExA→B (ρA ) ,
and for x ⩾ 2 let ωxB := p1x ExA→B (ρA ). With this notations we have
|X|
X
A→BX A B X
E (ρ ) = p1 σ ⊗ |1⟩⟨1| + px ωxB ⊗ |x⟩⟨x|X . (11.19)
x=2

Now, let M be a resource measure that satisfies the strong monotonicity property. Then, by
definition M satisfies
|X|
X
A B
M(ρ ) ⩾ p1 M(σ ) + px M(ωxB ) ⩾ p1 M(σ B ) . (11.20)
x=2

In other words, the probability p1 to convert ρA to σ B cannot exceed the ratio M(ρA )/M(σ B ).
Since this is true for all resource measures that satisfies the strong monotonicity property
we get that
F M(ρA )
Pr(ρA →− σ B ) ⩽ inf , (11.21)
M M(σ B )

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

496 CHAPTER 11. MANIPULATION OF RESOURCES

where the infimum is over all resource measures, M, that satisfy the strong monotonicity
F
property. Moreover, from the theorem above, for a fixed σ, the function Mσ (ω A ) := Pr(ω A →
−
σ B ) is itself a resource measure that satisfies the strong monotonicity property. Hence,

M(ρA ) Mσ (ρA ) F
inf B
⩽ B
= Mσ (ρA ) = Pr(ρA →
− σB ) . (11.22)
M M(σ ) Mσ (σ )

We therefore arrive at the following corollary.

Corollary 11.1.1. Let ρ ∈ D(A) and σ ∈ D(B). Then,

F M(ρA )
Pr(ρA →
− σ B ) = inf , (11.23)
M M(σ B )

where the infimum is over all resource measures, M, that satisfy the strong
monotonicity property.

11.1.3 Approximate Interconversion

In this section we provide the precise definitions of cost and distillation of a resource in the
single-shot regime. We start with the definitions of conversion distance and the “golden”
unit of resource theories.

The Conversion Distance

The conversion distance quantifies the proximity to which a resource state ρ ∈ D(A) can be
transformed into another resource state σ ∈ D(B) using free operations (refer to Fig. 11.1).
Mathematically, it is defined as follows:

F
1 B
σ − E A→B ρA

T ρ→
− σ := min 1
. (11.24)
E∈F(A→B) 2

F
It’s evident that the conversion distance is zero if ρ →
− σ is achievable. However, determin-
istic conversion from ρ to σ is often not feasible, raising the question of how closely σ can
be approximated by applying free operations to ρ. As such, conversion distance not only
provides a meaningful way to evaluate the efficiency of these conversions but, as the following
lemma demonstrates, also serves as a resource measure in its own right.

Lemma 11.1.1. Let ρ ∈ D(A), σ ∈ D(B), M ∈ F(A → A′ ), and N ∈ F(B → B ′ ).

Then,

F F F F
T M(ρ) →− σ ⩾T ρ→ − σ and T ρ →− N (σ) ⩽ T ρ → − σ . (11.25)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

11.1. SINGLE-SHOT INTERCONVERSIONS 497

Figure 11.1: The conversion distance from ρ to σ.

Proof. For the first inequality, by definition, for any M ∈ F(A → A′ )

F
1
T M(ρ) →
− σ = ∥σ − E ′ ◦ M (ρ)∥1
min
2 E ′ ∈F(A′ →B)
1
Replacing E ′ ◦ M with E→ ⩾ min ∥σ − E (ρ)∥1 (11.26)
E∈F(A→B) 2

F
=T ρ→ − σ .

For the second inequality, we have for any N ∈ F(B → B ′ )

F
1
T ρ→
− N (σ) = ∥N (σ) − E ′ (ρ)∥1
min
2 E ′ ∈F(A→B ′ )
1
Taking E ′ = N ◦ E→ ⩽ min ∥N (σ) − N ◦ E (ρ)∥1
E∈F(A→B) 2
(11.27)
1
DPI→ ⩽ min ∥σ − E (ρ)∥1
E∈F(A→B) 2

F
=T ρ→ − σ .

Exercise 11.1.3. Let F be a QRT, ρ, σ ∈ D(A), ε ∈ [0, 1], and k ∈ N. Show that if
F
T (ρ →
− σ) ⩽ ε then

⊗k F ⊗k
T ρ → − σ ⩽ kε . (11.28)

The subsequent lemma highlights an additional property of the conversion distance: small
changes in ρ result in only minor variations in the conversion distance. This property un-
derscores the stability of the conversion distance measure against slight perturbations in the
resource state.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

498 CHAPTER 11. MANIPULATION OF RESOURCES

Lemma 11.1.2. Let ε ∈ (0, 1), ρ ∈ D(A), σ ∈ D(B), and ρ̃ ∈ Bε (ρ). Then,

F F
T ρ→ − σ − T ρ̃ → − σ ⩽ε. (11.29)

Proof. By definition we get

F
1
T ρ̃ →
− σ = ∥σ − E(ρ̃)∥1
min
E∈F(A→B) 2

1 1
Triangle inequality→ ⩾ min ∥σ − E(ρ)∥1 − ∥E(ρ̃ − ρ)∥1
E∈F(A→B) 2 2 (11.30)
1 1
DPI→ ⩾ min ∥σ − E(ρ)∥1 − ∥ρ̃ − ρ∥1
E∈F(A→B) 2 2

F
⩾T ρ→ − σ −ε.
The proof is concluded by repeating the same lines as above after exchanging between ρ with
ρ̃.

The Golden Unit

Most resource theories contains a special type of a resource state, that we call here a “golden
unit”. For example, in entanglement theory, the maximally entangled state, ΦAB ∈ D(AB)
with |A| = |B| is a golden unit. Maximally entangled states are desirable since they can be
used to accomplish quantum information processesing tasks such as quantum teleportation
and superdense coding. Maximally entangled states has the property that they are closed
under tensor products; that is, a tensor product of two maximally entangled states is itself
a maximally entangled state. This motivates us the extend this property to all resource
theories.
In the definition below, for any integer m ∈ N, we denote Φm ∈ D(A), where m is
defined as m ≡ |A|, to represent a resource state, indicating that Φm does not belong to
the set of free states F(A). Importantly, we do not presuppose its specific form (such as
assuming it to be the maximally entangled state, which is common in entanglement theory).
Instead, we focus on outlining the necessary properties it must satisfy. Additionally, we
F F
use the equivalence notation ρ ∼ σ to signify that both conversions ρ → − σ and σ → − ρ are
possible. This notation helps clarify the relationship between resource states in terms of
their interconvertibility within the given resource theory.

Definition 11.1.1. The sequence of resource states {Φm }m∈N is called a golden unit
if for all m, n ∈ N the following two conditions hold:

1. Φn ⊗ Φm ∼ Φnm .
F
2. If n ⩾ m then Φn →
− Φm .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

11.1. SINGLE-SHOT INTERCONVERSIONS 499

A golden unit, can be used as a scale to measure the resourcefulness of a given state
ρ ∈ D(A). There are two distinct ways to do that.

Definition 11.1.2. Let ρ ∈ D(A), {Φm }m∈N be a golden unit, and ε ∈ (0, 1).

1. The ε-single-shot resource cost of ρ is defined as

n o
F
Costε (ρ) := min log m : T Φm → − ρ ⩽ε, m∈N . (11.31)

2. The ε-single-shot distillable resource of ρ is defined as

n o
F
Distillε (ρ) := max log m : T ρ → − Φm ⩽ ε , m ∈ N . (11.32)

Remark. Both the ε-resource cost and the ε-distillable resource

are defined
relative to the
F
golden unit {Φm }m∈N . If there is no m ∈ N such that T Φm → − ρ ⩽ ε then we define
Costε (ρ) := ∞. On the other hand,
for the trivial dimension m = 1 we must have Φm = 1
F
so that in this case T ρ →
− Φm = 0. Hence, trivially, there is always m ∈ N such that

F
T ρ→ − Φm ⩽ ε.
Exercise 11.1.4. Let E ∈ F(A → B) and ρ ∈ D(A). Show that
Costε (E(ρ)) ⩽ Costε (ρ) and Distillε (E(ρ)) ⩽ Distillε (ρ) . (11.33)
Hint: Use Lemma 11.1.1.
Exercise 11.1.5. Let ρ ∈ D(A). Show that
Costε (ρ) = ′ min Costε=0 (ρ′ ) . (11.34)
ρ ∈Bε (ρ)

That is, the ε-single-shot cost can be seen as the smoothed version of its respective zero-error
counterpart. Why something similar does not hold for Distillε (ρ)?
In some resource theories there exists a golden unit {Φm }m∈N with a property that
κ := max D(ω∥F) = D (Φm ∥F) = log(m) . (11.35)
ω∈D(A)

In such QRTs, one can use the asymptotic continuity of the relative entropy of a resource
to obtain an upper bound on the single-shot ε-distillable
resource
of some resource state
F
ρ ∈ D(A). Specifically, let m ∈ N be such that T ρ →− Φm ⩽ ε. Then, for such m there
F
exists σ ∈ D(A′ ) with m := |A′ | such that ρ →
− σ and σ ≈ε Φm . Now, since the relative
entropy of a resource is an entanglement monotone we get
D(ρ∥F) ⩾ D(σ∥F)
(11.36)

ε
(10.34)→ ⩾ D (Φm ∥F) − εκ − (1 + ε)h .
1+ε

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

500 CHAPTER 11. MANIPULATION OF RESOURCES

In many resource theories there exists a golden unit {Φm }m∈N with a property that κ =
D (Φm ∥F) = log(m). Therefore, for such resource theories the inequality above take the
form
ε
(1 − ε) log(m) ⩽ D(ρ∥F) + (1 + ε)h . (11.37)
1+ε

F
Since m was an arbitrary integer satisfying T ρ →
− Φm ⩽ ε, the inequality above implies
that
ε 1 1+ε ε
Distill (ρ) ⩽ D(ρ∥F) + h . (11.38)
1−ε 1−ε 1+ε
Note that for a small ε > 0 the upper bound is close to the relative entropy of a resource.

11.2 Generalized Asymptotic Equipartition Property

In Sec. 8.1.1 we saw that given an i.i.d.∼ p source, all typical sequences with large size n,
that are generated by the source, have approximately the same probability to occur given
by ≈ 2−nH(p) . This phenomenon is known as the asymptotic equipartition property (AEP).
In Sec. 8.3.1 we saw a variant of this property that involved the relative entropy. In this
subsection we generalize further this property, and express it in terms of the relative entropy
of a resource.
For any physical system A, let F(A) ⊆ D(A) be a convex closed subset of density matrices.
Recall that the relative entropy of a resource, and the logarithmic robustness, are defined
respectively as
D(ρ∥F) := min D (ρ∥σ)
σ∈F(A)
(11.39)
Dmax (ρ∥F) := min Dmax (ρ∥σ) .
σ∈F(A)

Recall also that for any ε > 0 the smoothed version of the logarithmic robustness is defined
as
ε
Dmax (ρ∥F) := ′ min Dmax (ρ′ ∥F) , (11.40)
ρ ∈Bε (ρ)

and the regularization of D (·∥F) as

1
Dreg (ρ∥F) := lim D ρ⊗n F .

(11.41)
n→∞ n

From the exercise below it follows that Dreg (ρ∥F) is well defined since the limit on the
right-hand side of the equation above exists.
Exercise 11.2.1.
1. Show that for any sequence of real numbers, {aj }∞n=1 , that is sub-additive, i.e. an+m ⩽
an + am , the limit limn→∞ an exists. Hint: See the hint given in Exercise 6.4.2.

2. Show that the limit of the sequence {an }, with an := n1 D ρ⊗n F , exists.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

11.2. GENERALIZED ASYMPTOTIC EQUIPARTITION PROPERTY 501

The Generalized AEP

Theorem 11.2.1. Let F be a QRT whose set of free states has the five properties
above. Then, for all ρ ∈ D(A)
1 ε 1 ε
ρ⊗n ∥F = lim+ lim inf Dmax ρ⊗n ∥F = Dreg (ρ∥F) .

lim+ lim sup Dmax (11.42)
ε→0 n→∞ n ε→0 n→∞ n

Remark. At first glance it may not be very clear why the theorem above corresponds to
the AEP property. Therefore, after the proof we will will give examples demonstrating that
for different choices of F(A), the above theorem reduces to the various variants of the AEP
studied in literature. In other words, the above theorem unifies all the variants of AEP into
a single formula.
We divide the proof into two lemmas.

Lemma 11.2.1. For any ρ ∈ D(A) and ε ∈ (0, 1)

1 ε
Dreg (ρ∥F) ⩾ lim sup Dmax ρ⊗n ∥F .

(11.43)
n→∞ n

Proof. From (10.118) we have for any 0 < ε < 1

1 ε 1
ρ⊗n ∥F = lim sup ε
(ρ⊗n ∥σn )

lim sup Dmax min n Dmax
n→∞ n n→∞ n σn ∈F(A )
1 √
1−ε2 ⊗n
(10.118)→ ⩽ lim sup min n Dmin (ρ ∥σn ) (11.44)
n→∞ n σ n ∈F(A )
1 √1−ε2 ⊗n
(10.70)→ = lim sup Dmin (ρ ∥F) .
n→∞ n

Combining this with the inequality (10.78), we then get for all α ∈ (1, 2)
1 ε 1
ρ⊗n ∥F ⩽ lim sup Dα ρ⊗n F = Dαreg (ρ∥F) ,

lim sup Dmax (11.45)
n→∞ n n→∞ n

where Dα (·∥F) is the α-relative entropy of a resource as defined in (10.71). Finally, since the
above inequality holds for all α ∈ (1, 2) we conclude that for all ε ∈ (0, 1)
1 ε
ρ⊗n ∥F ⩽ lim+ Dαreg (ρ∥F)

lim sup Dmax
n→∞ n α→1 (11.46)
reg
Lemma 10.2.4→ = D (ρ∥F) .
This completes the proof.

Lemma 11.2.2. For any ρ ∈ D(A)

1 ε
Dreg (ρ∥F) ⩽ lim+ lim inf Dmax ρ⊗n ∥F .

(11.47)
ε→0 n→∞ n

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

502 CHAPTER 11. MANIPULATION OF RESOURCES

Proof. Let ρn ∈ Bε (ρ⊗n ) and σn ∈ F(An ) be optimal states such that

ε
ρ⊗n ∥F = Dmax (ρn ∥F) = Dmax (ρn ∥σn ) .

Dmax (11.48)
Then, since the max relative entropy is no smaller than the relative entropy,
ε
Dmax (ρ⊗n ∥F) = Dmax (ρn ∥σn )
⩾ D(ρn ∥σn ) (11.49)
⩾ D(ρn ∥F) .
Now, since D (·∥F) is asymptotically continuous (see Theorem 10.2.2, particularly Exer-
cise 10.2.1), there exists a function f (ε) that is independent of dimensions with limε→0+ f (ε) =
0 such that for all n ∈ N
1 1
D ρ⊗n F − D(ρn ∥F) ⩽ f (ε) (11.50)
n n
Combining this with (11.49) gives
1 ε 1
Dmax ρ⊗n ∥F ⩾ D ρ⊗n F − f (ε)

(11.51)
n n
Taking the lim inf n→∞ on both sides followed by limε→0+ gives (11.47).
From the proof above we see that since ρn is ε-close to ρ⊗n we get a correction f (ε)
that we cannot eliminate without taking the limit ε → 0. It is an open problem if the
symmetry under permutations of F(An ) can be employed to eliminate this ε-dependence.
We can summarize it in the following conjecture.

The Strong AEP

Conjecture 11.2.1. Let F be a QRT. Then, for all ε ∈ (0, 1) and all ρ ∈ D(A)
1 ε
lim Dmax (ρ⊗n ∥F) = Dreg (ρ∥F) . (11.52)
n→∞ n

Note that in the conjecture above we removed the limit ε → 0+ that appear in Theo-
rem 11.2.1. One may be able to prove the conjecture above by showing first that
D1reg reg
− (ρ∥F) := lim Dα (ρ∥F) ,
−
(11.53)
α→1

is equal to Dreg (ρ∥F). From the next lemma we get that this continuity conjecture of
Dαreg (ρ∥F) at α = 1, if true, would imply the strong AEP.

Lemma 11.2.3. For any ρ ∈ D(A) and ε ∈ (0, 1)

1 ε
D1reg Dmax ρ⊗n ∥F .

− (ρ∥F) ⩽ lim inf (11.54)
n→∞ n

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

11.2. GENERALIZED ASYMPTOTIC EQUIPARTITION PROPERTY 503

Proof. Recall from Lemma 10.4.1 that for any ε1 , ε2 ∈ (0, 1) with ε1 + ε2 < 1 we have
(cf. (10.116))
ε1 ε2
Dmin (ρ∥σ) ⩽ Dmax (ρ∥σ) − log (1 − ε1 − ε2 ) . (11.55)
Then, from the definitions it follows that for any such ε1 , ε2 ∈ (0, 1) with ε1 + ε2 < 1 we have
ε1 ε2
Dmin (ρ∥F) ⩽ Dmax (ρ∥F) − log (1 − ε1 − ε2 ) , (11.56)
so that
1 ε2 1 ε1
Dmax ρ⊗n F ⩾ lim inf Dmin ρ⊗n F

lim inf
n→∞ n n→∞ n (11.57)
(10.82)→ ⩾ D1reg
− (ρ∥F) .

This completes the proof.

Observe that from (11.43) and (11.54) we get that for all ε ∈ (0, 1)
1 ε 1 ε
D1reg Dmax ρ⊗n ∥F ⩽ lim sup Dmax ρ⊗n ∥F ⩽ Dreg (ρ∥F) .

− (ρ∥F) ⩽ lim inf (11.58)
n→∞ n n→∞ n

Hence, if D1reg
− (ρ∥F) = D
reg
(ρ∥F) then the strong AEP holds. In other words, the conjecture
that the function α 7→ Dαreg (ρ∥σ) is continuous at α = 1 is stronger than the conjecture of
strong AEP.

11.2.1 Two Examples with Strong AEP

In this subsection we give two simple examples of QRTs that satisfy the stronger AEP
conjectured above. We start with the case that for each n ∈ N the set F(An ) consists of the
single state σ ⊗n , where σ is some fixed state in D(A). Note that in this case, Dreg (ρ∥F) =
ε ε
D(ρ∥σ) and Dmax (ρ∥F) = Dmax (ρ∥σ). By applying Lemma 11.2.2 to this case of free states
we get that
1 ε
ρ⊗n σ ⊗n .

D(ρ∥σ) ⩽ lim+ lim inf Dmax (11.59)
ε→0 n→∞ n

However, in this case, the inequality above still hold even if we remove the limit over ε.
Indeed, let ε ∈ (0, 1) and let δ > 0 such that ε + δ < 1. Observe that from (10.116) we have
1 ε 1 1−ε−δ ⊗n ⊗n
Dmax ρ⊗n σ ⊗n ⩾ lim inf Dmin

lim inf ρ σ
n→∞ n n→∞ n (11.60)
(8.211)→ = D(ρ∥σ) .
Combining this with the general result of Lemma 11.43 we arrive at the following stronger
result for the special case that F(An ) = {σ ⊗n }.

Corollary 11.2.1. Let ρ, σ ∈ D(A). Then, for any ε ∈ (0, 1)

1 ε
Dmax ρ⊗n σ ⊗n .

D(ρ∥σ) = lim (11.61)
n→∞ n

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

504 CHAPTER 11. MANIPULATION OF RESOURCES

To see how this corollary relates the AEP discussed in Sec. 8.1.1, take σ A = uA and
ε ε
observe that D(ρ∥u) = log |A| − H(ρ) and Dmax (ρ∥u) = log |A| − Hmin (ρ), where
ε
Hmin (ρ) := ′max Hmin (ρ′ ) , (11.62)
ρ ∈Bε (ρ)

is known as the smoothed min-entropy of ρ. Therefore, the corollary above implies in par-
ticular that for any ε ∈ (0, 1)

1 ε
Hmin ρ⊗n = H(ρ) .

lim (11.63)
n→∞ n

Recall that Hmin (ρ) = − log λmax (ρ) so that the above equation states that for any ε > 0
and sufficiently large n, there exists a state ρ′n ∈ D(An ) that is ε-close to ρ⊗n , and for which
λmax (ρ′n ) ≈ 2−nH(ρ) . In other words, it only requires a small perturbation to make all of the
eigenvalues of ρ⊗n to be bounded from above by 2−nH(ρ) .
The second example we consider here is a variant of the AEP involving the conditional
entropy. In this variant, we take the set of free states to be

F(AB) := uA ⊗ ρB : ρ ∈ D(A) .

(11.64)

With this set of free states we get

D ρAB F := min D ρAB σ AB

σ∈F(AB)

= min D ρAB uA ⊗ σ B

σ∈D(B) (11.65)
(7.132)→ = log |A| − H(A|B)ρ
Additivity of the conditional entropy→ = Dreg ρAB F .

Similarly,
Dmax ρAB ∥F := min Dmax ρ̃AB ∥σ AB

σ∈F(AB)
ρ̃∈Bε (ρ)

= min Dmax ρ̃AB ∥uA ⊗ σ B

1 ε 1 ε
H(A|B)ρ = lim+ lim inf Hmin (An |B n )ρ⊗n = lim+ lim sup Hmin (An |B n )ρ⊗n . (11.67)
ε→0 n→∞ n ε→0 n→∞ n

However, one can strengthen the above conclusion.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

11.3. THE GENERALIZED QUANTUM STEIN’S LEMMA 505

Theorem 11.2.2. Let ε ∈ (0, 1) and ρ ∈ D(AB). Then,

1 ε
H(A|B)ρ = lim Hmin (An |B n )ρ⊗n . (11.68)
n→∞ n

We postpone the proof of the above theorem to the next subsection.

11.3 The Generalized Quantum Stein’s Lemma

One may attempt to generalize the quantum Stein’s lemma given in (8.211) by optimizing
over σ ∈ F(A). Specifically, recall the hypothesis-testing relative entropy of a resource
defined in (10.70) as
ε ε
Dmin (ρ∥F) := min Dmin (ρ∥σ) . (11.69)
σ∈F(A)

One of the open problems in the field is whether or not the following generalization of the
quantum Stein’s lemma holds.

The Generalized Quantum Stein’s Lemma

Conjecture 11.3.1. Let F be a QRT. Then, for all ε ∈ (0, 1) and all ρ ∈ D(A)
1 ε
Dmin ρ⊗n F = Dreg (ρ∥F) .

lim (11.70)
n→∞ n

Recall from Lemma 10.4.1 that for any ε1 , ε2 ∈ (0, 1) with ε1 +ε2 < 1 we have (cf. (10.116))

ε1 ε2
Dmin (ρ∥σ) ⩽ Dmax (ρ∥σ) − log (1 − ε1 − ε2 ) . (11.71)

Then, from the definitions it follows that for any such ε1 , ε2 ∈ (0, 1) with ε1 + ε2 < 1 we have

ε1 ε2
Dmin (ρ∥F) ⩽ Dmax (ρ∥F) − log (1 − ε1 − ε2 ) , (11.72)

so that
1 ε1 1 ε2
ρ⊗n F ⩽ lim sup Dmax ρ⊗n F

lim sup Dmin
n→∞ n n→∞ n (11.73)
Lemma 11.43→ ⩽ Dreg (ρ∥F) .

This provides a proof for the strong converse of the conjecture above (note that we already
showed it in (10.85) using a different approach). However, a proof for the direct part is
unknown at the time of writing this book.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

506 CHAPTER 11. MANIPULATION OF RESOURCES

Equivalence of Conjectures
Theorem 11.3.1. Let F be a QRT and ρ ∈ D(A). Then the following two
statements are equivalent:

1. For all ε ∈ (0, 1) Eq. (11.52) holds.

2. For all ε ∈ (0, 1) Eq. (11.70) holds.

Proof. Recall the relation (10.118) from Lemma 10.4.1. The relation (10.4.1) implies that
for any ε ∈ (0, 1) we have
√

ε 1−ε2 ε
Dmin (ρ∥F)(ρ) ⩾ Dmax (ρ∥F) + log . (11.74)
1−ε

This in turn gives

1 ε 1 √1−ε2 ⊗n
Dmin (ρ∥F) ρ⊗n ⩾ lim inf Dmax

lim inf ρ F (11.75)
n→∞ n n→∞ n

Therefore, if Eq. (11.52) holds for all ε ∈ (0, 1) we get from the above equation that
1 ε
Dmin (ρ∥F) ρ⊗n ⩾ Dreg (ρ∥F) .

lim inf (11.76)
n→∞ n
Combining this with (11.73) gives (11.70).
Conversely, suppose (11.70) holds. Then, from (11.72) it follows that
1 ε2 1 ε1
Dmax ρ⊗n F ⩾ lim inf Dmin ρ⊗n F

lim inf
n→∞ n n→∞ n (11.77)
Assuming (11.70) holds→ = Dreg (ρ∥F) .

Since the above inequality holds for all ε2 ∈ (0, 1), by combining it with Lemma 11.43 we
get that (11.52) must hold for all ε ∈ (0, 1). This completes the proof.

Exercise 11.3.1. Show that all ρ ∈ D(A) we have

1 ε 1 ε
Dmin (ρ∥F) ρ⊗n = lim− lim sup Dmin (ρ∥F) ρ⊗n = Dreg (ρ∥F) .

lim− lim inf (11.78)
ε→1 n→∞ n ε→1 n→∞ n

We now give an example in which the generalized quantum Stein’s lemma does hold.
Consider, the set of free states defined in (11.64); i.e.

F(AB) := uA ⊗ ρB : ρ ∈ D(A) .

(11.79)

With this set of free states we got in (11.65) that

Dreg ρAB F = D ρAB F = log |A| − H(A|B)ρ .

(11.80)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

11.4. THE UNIQUENESS OF THE UMEGAKI RELATIVE ENTROPY 507

Recall the relation (10.82) given by

1 ε
Dmin ρ⊗n F ⩾ lim− Dαreg (ρ∥F) .

lim inf (11.81)
n→∞ n α→1

For the free states given in (11.79) we get that

Dα ρAB F = min D ρAB ∥uA ⊗ σ B

σ∈D(B)
(11.82)
= log |A| − Hα↑ (A|B)ρ .

Since Hα↑ is additive under tensor product (see Theorem 7.5.1 and Exercise 7.5.4) we conclude
that Dα (ρ∥F) = Dαreg (ρ∥F). Combining this with (11.81) gives
1 ε
Dmin ρ⊗n F ⩾ lim− Dα (ρ∥F)

lim inf
n→∞ n α→1 (11.83)
= D(ρ∥F) .

Finally, combining this with (11.73) we get that Conjecture 11.3.1 does hold for the set of
free states defined in 11.79. We summarize it in the following theorem.

Theorem 11.3.2. Let F(AB) be the set given in (11.79). Then, for all ρ ∈ D(AB)
and all ε ∈ (0, 1) we have
1 ε
Dmin ρ⊗n F = D(ρ∥F) .

lim (11.84)
n→∞ n

Observe that the combination of the above theorem with Theorem 11.3.1 implies that
for F(AB) as in (11.79) we also have
1 ε
Dmax ρ⊗n F = D(ρ∥F) .

lim (11.85)
n→∞ n

The above relation is equivalent to (11.68). Hence, Theorem 11.2.2 can be viewed as a
corollary of Theorem 11.3.2.

11.4 The Uniqueness of the Umegaki Relative Entropy

In this section we use the Stein’s lemma and the AEP property proved in the previous
sections, to show the uniqueness of the Umegaki relative entropy. Recall that the Umegaki
relative entropy is defined for any ρ, σ ∈ D(A) with supp(ρ) ⊆ supp(σ) as

D(ρ∥σ) := Tr[ρ log ρ] − Tr[ρ log σ] . (11.86)

This quantum relative entropy plays a key role in numerous applications in quantum in-
formation theory and beyond. We already saw in the quantum Stein’s lemma that it can

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

508 CHAPTER 11. MANIPULATION OF RESOURCES

be interpreted as the optimal decay rate of the type-II error exponent. Among all relative
entropies, it is the most well known, and in this section we show that the Umegaki rela-
tive entropy can be singled out as the only quantum relative entropy that is asymptotically
continuous. We will see later on in the book that this is the key reason of its “popularity”.
Following Definition 10.2.2, we say that a relative entropy D is asymptotically continuous
if there exists a continuous function f : [0, 1] → R+ such that f (0) = 0 and for all ρ, ρ′ , σ ∈
D(A), with supp(ρ) ⊆ supp(σ) and supp(ρ′ ) ⊆ supp(σ)
|D(ρ∥σ) − D(ρ′ ∥σ)| ⩽ f (ε) log ∥σ −1 ∥∞ (11.87)
where ε := 21 ∥ρ − ρ′ ∥1 , and σ −1 is the generalized inverse of σ. We emphasize that f is
independent of |A|.

Uniqueness of the Umegaki Relative Entropy

Theorem 11.4.1. Let D be a relative entropy that is asymptotically continuous.
Then, D = D, where D is the Umegaki relative entropy.

Recall from Corollary 10.2.1 that the Umegaki relative entropy is shown to be asymp-
totically continuous. The theorem we are discussing asserts that no other relative entropy
possesses this property of asymptotic continuity. To substantiate this claim, we will utilize
the following lemma, which introduces a notation for any relative entropy D:
D(ε) (ρ∥σ) := ′max D(ρ′ ∥σ) . (11.88)
ρ ∈Bε (ρ)

In other words, D(ε) represents a form of smoothing, albeit using the maximum rather than
the minimum over all states that are ε-close to ρ. Consequently, unlike Dε , the function
D(ε) does not qualify as a divergence. It’s also worth noting that this specific notation was
previously used in the context of the min relative entropy in Theorem 10.4.4. Both the
lemma and this notation are crucial for our proof, as they facilitate the examination of how
relative entropies respond to minor perturbations in the state ρ.

Lemma 11.4.1. Let ρ, σ ∈ D(A) with supp(ρ) ⊆ supp(σ), and D be a quantum

relative entropy satisfying (11.87) (i.e. D is asymptotically continuous). Then,
1 ε ⊗n ⊗n 1
= lim+ lim sup D(ε) ρ⊗n σ ⊗n .

D(ρ∥σ) = lim+ lim inf D ρ σ (11.89)
ε→0 n→∞ n ε→0 n→∞ n

Proof. Let ε ∈ (0, 1) and for each n ∈ N let ρ′n ∈ D(An ) be such that 21 ∥ρ′n − ρ⊗n ∥1 ⩽ ε.
Then, applying (11.87) to n copies of ρ and σ gives
1
D(ρ∥σ) − D(ρ′n ∥σ ⊗n ) ⩽ f (ε) log ∥σ −1 ∥∞ . (11.90)
n
Therefore, by taking the lim inf n→∞ or lim supn→∞ on both sides of the equation above
followed by limε→0+ completes the proof.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

11.5. ASYMPTOTIC INTERCONVERSIONS 509

We are now ready to prove Theorem 11.4.1.

Proof of Theorem 11.4.1. Since the lemma above states that (11.87) implies (11.89), it is
sufficient to prove that the Umegaki relative entropy is the only relative entropy that satis-
fies (11.89). Let D(ρ∥σ) be a relative entropy satisfying (11.89). Therefore,

1 ε ⊗n ⊗n
D ρ
D(ρ∥σ) = lim+ lim inf σ
ε→0 n→∞ n
1 ε (11.91)
ρ⊗n σ ⊗n

(6.113)→ ⩽ lim+ lim inf Dmax
ε→0 n→∞ n

AEP; see (11.61)→ = D(ρ∥σ) .

Conversely,
1
D(ρ∥σ) = lim+ lim sup D(ε) ρ⊗n σ ⊗n

ε→0 n→∞ n
1 (ε) (11.92)
(6.113)→ ⩾ lim+ lim sup Dmin ρ⊗n σ ⊗n

ε→0 n→∞ n
Theorem 10.4.4→ = D(ρ∥σ) .

This completes the proof.

11.5 Asymptotic Interconversions

In the first section of this chapter we studied inter-conversions of a single copy of a resource
into another resource state by free operations. In this section we study the asymptotic rate
at which many copies of a given state can be converted, by free operations, to many copies
of another state. We will consider two types of rates. In one type, the goal is to distill (by
free operations) as many copies as possible of a desirable state σ from n copies of a less
desirable resource state ρ. That is, this type of rate is defined as the maximization of the
F
ratio m
n
given that the conversion ρ⊗n → − σ ⊗m is possible. We will therefore call this rate
the distillation rate of converting ρ into σ. In the second type of rate, the goal is to find the
F
smallest number n for which the conversion ρ⊗n → − σ ⊗m is possible. We therefore call this
n
rate the cost rate of converting ρ into σ, and define it as the minimization of the ration m
F
such that ρ⊗n → − σ ⊗n is possible (see the precise definition below).
As we discussed earlier, in many resource theories, the set of free operations is not so
large, so that the exact conversion of one resource state to another is typically not possible.
Therefore, instead of considering the exact conversion of ρ⊗n to σ ⊗m we will allow the output
state to be ε-close to σ ⊗m as long as the error ε goes to zero in the asymptotic limit n, m → ∞.
This idea is made rigorous in the following definition.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

510 CHAPTER 11. MANIPULATION OF RESOURCES

Definition 11.5.1. Let ρ ∈ D(A) and σ ∈ D(B) be two resource states.

1. The asymptotic distillable rate of ρ into σ is defined as

nm o
⊗n F ⊗m
Distill(ρ → σ) := lim sup : T ρ → − σ ⩽ε (11.93)
ε→0+ n,m∈N n

2. The asymptotic cost rate of ρ into σ is defined as

nn o
⊗n F ⊗m
Cost(ρ → σ) := lim inf : T ρ → − σ ⩽ε (11.94)
ε→0+ n,m∈N m

Remark. The two definitions above are not independent of each other. Specifically, observe
that
1
Distill(ρ → σ) = . (11.95)
Cost(ρ → σ)
This relationship is consistent with the intuition that if ρ is a free state and σ is a resource
state, then Cost(ρ → σ) is equal to infinity, while Distill(ρ → σ) equals zero. This is because,
in the former case, no matter how many copies of ρ you have, they are insufficient to prepare
even a single copy of σ. In the latter case, it is impossible to distill or extract a resource
state σ from a free state ρ.
In the above definitions of cost and distillation, we did not impose any constraints on the
integers m and n. However, as intuition suggests, it is typically the case that m and n are
both very large. In fact, for any natural number a, we can include the condition n, m ⩾ a
in the aforementioned definitions without altering their value. Specifically, we argue that:
nn o
⊗n F ⊗m
Cost(ρ → σ) = lim+ inf : T ρ → − σ ⩽ε , (11.96)
ε→0 n,m∈N m
n,m⩾a

(and similarly we can add n, m ⩾ a to Distill(ρ → σ)). To see why, observe first that the
left-hand side of the equation above cannot be greater than the right-hand side since by
adding the restriction n, m ⩾ a one can only increase the infimum. To prove
that wemust
⊗n F
have equality, recall from Exercise 11.1.3 that for any such a ∈ N, if T ρ → − σ ⊗m ⩽ ε

F
then T ρ⊗na → − σ ⊗ma ⩽ aε. Therefore,
nn o
⊗na F ⊗ma
Cost(ρ → σ) ⩾ lim+ inf : T ρ →
− σ ⩽ aε
ε→0 mn,m∈N
′
′ ′ n ′
⊗n F ⊗m′

replacing na, ma with n , m → ⩾ lim+ ′ inf′ ′
: T ρ →
− σ ⩽ aε
ε→0 n ,m ∈N m (11.97)
m′ ,n′ ⩾a
′
n ′
⊗n F ⊗m ′

′
ε′ := aε −−−−→ = lim inf : T ρ →
− σ ⩽ε
ε′ →0+ n′ ,m′ ∈N
′ ′
m′
m ,n ⩾a

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

11.5. ASYMPTOTIC INTERCONVERSIONS 511

Hence, the equality in 11.96.

In the following exercise you show that the asymptotic cost and distillable rates are
themselves resource measures.

Exercise 11.5.1. Consider the asymptotic cost and distillable rates defined above.

1. Show that for a fixed resource state σ ∈ D(B), the function fσ (ρ) := Distill(ρ → σ) is
a resource measure.

2. Show that for a fixed resource state ρ ∈ D(B), the function gρ (σ) := Cost(ρ → σ) is a
resource measure.

Exercise 11.5.2. Let T ′ be another metric that is topologically equivalent to the trace dis-
tance T (i.e., there exists a, b > 0 such that aT ⩽ T ′ ⩽ bT ). Further, for every ρ ∈ D(A)
and σ ∈ D(B) let Distill′ (ρ → σ) and Cost′ (ρ → σ) be the distillation and cost rates obtained
by replacing the trace distance in (11.93) and (11.94)with the metric T ′ . Show that for all
ρ ∈ D(A) and all σ ∈ D(B)

Distill′ (ρ → σ) = Distill(ρ → σ) and Cost′ (ρ → σ) = Cost(ρ → σ) . (11.98)

Typically, in the process of converting n copies of ρ into m copies of σ, some resource

is consumed. This is reflected by the fact that the target state σ ⊗m (up to a small error)
is less resourceful than the source state ρ⊗n . If this loss of a resource is not too high (e.g.
sublinear in n), then typically one can use the m copies of σ to recover the n copies of ρ. If
the n copies of ρ can always be recovered (up to an error that goes to zero asymptotically)
we say that the resource theory is asymptotically reversible.

Definition 11.5.2. A QRT F is called asymptotically reversible if for all ρ ∈ D(A)

and σ ∈ D(B)
Distill(ρ → σ) = Cost(σ → ρ) . (11.99)

Note that due to (11.95) the condition that a QRT is reversible can also be expressed as

Distill(ρ → σ)Distill(σ → ρ) = 1 , (11.100)

or as
Cost(ρ → σ)Cost(σ → ρ) = 1 . (11.101)

Theorem 11.5.1. Let F be a QRT, ρ ∈ D(A) and σ ∈ D(B). Then,

Dreg (ρ∥F)
Distill(ρ → σ) ⩽ (11.102)
Dreg (σ∥F)

and equality holds if the QRT F is reversible.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

512 CHAPTER 11. MANIPULATION OF RESOURCES

Remark. The theorem above can also be expressed in terms of the asymptotic cost rate.
Specifically, we have the bound
Dreg (σ∥F)
Cost(ρ → σ) ⩾ , (11.103)
Dreg (ρ∥F)

where we used (11.95) in (11.102).

Proof. Let {εn }n∈N be a sequence of positive numbers with zero limit, let {mn }n∈N be a
sequence of integers, and {En }n∈N be a sequence of free channels with En ∈ F(An → B mn ),
such that
mn 1 ⊗mn
− En ρ⊗n

Distill(ρ → σ) − ⩽ εn and σ 1
⩽ εn . (11.104)
n 2
Recall that the relative entropy of a resource is asymptotically continuous. Therefore, for all
n∈N
D ρ⊗n F ⩾ D En ρ⊗n F

(11.105)

⊗mn
εn
(10.34)→ ⩾ D σ F − εn κn − (1 + εn )h
1 + εn
where κn := maxω∈D(B mn ) D(ω∥F). Dividing both sides by n and taking the limit n → ∞
yields
mn 1
Dreg (ρ∥F) ⩾ lim D σ ⊗mn F = Distill(ρ → σ)Dreg (σ∥F)

(11.106)
n→∞ n mn

where we used the assumption that

κn
lim sup <∞. (11.107)
n→∞ n
This completes the proof of the inequality (11.102). For the equality, observe first that both
bounds (11.102) and (11.103) can be written as

Dreg (ρ∥F)
Distill(ρ → σ) ⩽ ⩽ Cost(σ → ρ) . (11.108)
Dreg (σ∥F)
Hence, if F is reversible then both the inequalities above must be equalities. This completes
the proof.

11.5.1 Asymptotic Cost and Distillation of a Resource

Some QRTs contains a golden unit (see Definition 11.1.1) like the maximally entangled states
in entanglement theory. In such cases, quite often one is interested in computing asymptotic
conversion rates when ρ or σ are taken to be elements of the golden unit. Specifically, let
{Φk }k∈N be a golden unit, and take σ = Φ2 be the two dimensional element of the golden unit.
For this choice, the rate Distill(ρ → Φ2 ) quantifies the number of resource units (i.e. copies

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

11.5. ASYMPTOTIC INTERCONVERSIONS 513

of Φ2 ) that can be distilled from each copy of ρ. For this reason the quantity Distill(ρ → Φ2 )
is called the distillable resource of ρ, and denoted by
Distill(ρ) := Distill(ρ → Φ2 ) . (11.109)
Conversely, one can use the asymptotic cost rate to quantify the cost (in resource units Φ2 )
of a resource state ρ. Specifically, the quantity
Cost(ρ) := Cost(Φ2 → ρ) (11.110)
quantifies the cost in resource units (i.e. copies of Φ2 ) that are needed to prepare each copy of
ρ. The asymptotic cost and distillation of a resource are related to their single-shot versions
as follows.

Lemma 11.5.1. Let F be a QRT and ρ ∈ D(A). Then,

1
Costε ρ⊗n

Cost(ρ) = lim+ lim inf (11.111)
ε→0 n→∞ n
1
Distill(ρ) = lim+ lim sup Distillε ρ⊗n .

(11.112)
ε→0 n→∞ n

where Distillε and Costε have been defined in Definition 11.1.2.

Proof. We prove the first equality and leave the second one to Exercise 11.5.5. By definition,

1 ε ⊗n
log m
F ⊗n

inf Cost ρ = inf : T Φm → − ρ ⩽ ε , n, m ∈ N
n∈N n n

k k
F ⊗n

restricting m = 2 → ⩽ inf : T Φ2k →− ρ ⩽ ε , n, k ∈ N (11.113)
n

k
⊗k F ⊗n

property of a golden unit→ = inf : T Φ2 → − ρ ⩽ ε , n, k ∈ N .
n
Hence,
1
Costε ρ⊗n

Cost(ρ) ⩾ lim+ inf
ε→0 n∈N n
(11.114)
1 ε ⊗n

See Exercise 11.5.4 below→ = lim+ inf Cost ρ ∀a∈N.
ε→0 a⩽n∈N n

Since the above inequality holds for all a ∈ N we must have

1
Cost(ρ) ⩾ lim+ lim inf Costε ρ⊗n .

(11.115)
ε→0 n→∞ n

Conversely, let ε, δ ∈ (0, 1) and let a ∈ N be large enough such that a1 < δ. Then,
1 1
lim inf Costε ρ⊗n ⩾ inf Costε ρ⊗n

n→∞ n a⩽n∈N n

log(m)
F
(11.116)
⊗n
by definition→ = inf : T Φm → − ρ ⩽ε .
n,m∈N n
n⩾a

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

514 CHAPTER 11. MANIPULATION OF RESOURCES

Combining this with the inequality log(m) ⩾ ⌈log m⌉ − 1 gives

1 ε ⊗n
⌈log m⌉ − 1
F ⊗n

lim inf Cost ρ ⩾ inf : T Φm → − ρ ⩽ε
n→∞ n n,m∈N n
n⩾a
(11.117)
k−1
F ⊗n

k := ⌈log m⌉ −−−−→ ⩾ inf : T Φ2k →− ρ ⩽ε ,
n,k∈N n
n⩾a

k F F ⊗n
where we used the fact that m ⩽ 2 so that Φ2k →
− Φm and consequently T Φ2k →− ρ ⩽

F
T Φm →− ρ⊗n . Now, observe that for n ⩾ a we have (k − 1)/n ⩾ k/n − δ so that

1 ε ⊗n
k
F ⊗n

lim lim inf Cost ρ ⩾ lim+ inf : T Φ2k →
− ρ ⩽ε −δ
ε→0+ n→∞ n ε→0 n,k∈N n (11.118)
= Cost(ρ) − δ .

Since the above inequality holds for all δ ∈ (0, 1) we conclude that

1
Costε ρ⊗n ⩾ Cost(ρ) .

lim+ lim inf (11.119)
ε→0 n→∞ n
The two inequalities (11.115) and (11.119) then gives the desired equality (11.111).

Exercise 11.5.3. Show that for any m ∈ N and ρ ∈ D(A) we have

1 1
Cost ρ⊗m ⩾ Cost(ρ) and Distill ρ⊗m ⩽ Distill(ρ) .

(11.120)
m m
Exercise 11.5.4. Let ρ ∈ D(A). Show that for any a ∈ N we have

1 1
Costε ρ⊗n = lim+ inf Costε ρ⊗n .

lim+ inf (11.121)
ε→0 n∈N n ε→0 a⩽n∈N n

Hint: Use similar arguments that were used to prove the equality in (11.96).

Exercise 11.5.5. Prove the equality in (11.112).

In many QRTs it is possible to choose the golden unit such that DFreg (Φ2 ) = 1. With this
normalization we get from Theorem 11.5.1 that

Distill(ρ) ⩽ Dreg (ρ∥F) ⩽ Cost(ρ) . (11.122)

Particularly, if the QRT F is reversible then both the asymptotic cost and the asymptotic
distillation of the resource ρ equals Dreg (ρ∥F). Therefore, for reversible QRTs, the regularized
relative entropy of a resource is the unique measure of a resource in the asymptotic domain.
We make this statement rigorous in the following corollary.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

11.5. ASYMPTOTIC INTERCONVERSIONS 515

Corollary 11.5.1. Let F be a reversible QRT with a golden unit {Φk }k∈N such that
D(Φ2 ∥F) = 1, and let M be a resource measure that is asymptotically continuous
and normalized such that M(Φ2 ) = 1. Then,

Mreg (ρ) = Dreg (ρ∥F) ∀ ρ ∈ D(A) . (11.123)

Exercise 11.5.6. Prove the corollary above. Hint: Follow all the lines leading to (11.122),
but with M replacing everywhere D(·∥F).

11.5.2 Achieving Reversibility

The reversibility property of a QRT is extremely desirable, as quantum resources are expen-
sive and reversibility ensures resources are not wasted during quantum information processing
tasks. Moreover, in the previous subsections we saw that if a QRT is reversible then the rel-
ative entropy of a resource characterize uniquely all asymptotic interconversions. However,
many QRTs are not asymptotically reversible. This typically happens when the set of free
operations is not large enough to enable efficient interconversion of resources. It is therefore
natural to ask if a QRT is reversible under the maximal set of free operations; i.e. under
RNG operations.

Asymptotically RNG Operations

In Sec. 9.2.1 we discussed RNG operations and argued that they form the largest possible
set of free operations. However, in the asymptotic regime, when one consider many copies
of resources, one can define a set of operations that are RNG only in the asymptotic limit.
That is, the operations become closer to RNG operations when we take the number of copies
of the resources involved to infinity. To make this idea rigorous, we first define a set of
operations that are approximately RNG.

Definition 11.5.3. Let δ ∈ [0, 1] and F be a QRT. We say that a quantum channel
N ∈ CPTP(A → B) is RNGδ if it belong to the set
n o
RNGδ (A → B) := E ∈ CPTP(A → B) : Rg E(σ) ⩽ δ ∀ σ ∈ F(A) , (11.124)

where Rg is the global robustness of a resource as defined in (10.54).

We have used the global robustness in the definition above sinceit is a resource monotone
Rg E(σ) ⩽ δ implies that E(σ) is
that is faithful (see Exercise 10.2.7) so that the inequality
close to a free state. Specifically, suppose µ := Rg E(σ) ⩽ δ. Then, from (10.56) it follows
that
E(σ) = (1 + µ)τ − µω (11.125)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

516 CHAPTER 11. MANIPULATION OF RESOURCES

for some τ ∈ F(B) and ω ∈ D(B). Hence, from the above equality we get
1 1
E(σ) − τ 1 = µ(τ − ω) 1 ⩽ µ ⩽ δ . (11.126)
2 2

In other words, if Rg E(σ) ⩽ δ then E(σ) is δ-close to a free state.

Definition 11.5.4. Let F be a QRT, and for each n ∈ N let An and Bn be two
physical systems. A sequence of quantum channel {En }n∈N , with
En ∈ CPTP(An → Bn ), is said to be asymptotically RNG if there exists a sequence of
non-negative real numbers {δn }n∈N with limn→∞ δn = 0 such that for each n ∈ N,
En ∈ RNGδn (An → Bn ).

Note that in the definition above we do not specify how quickly δn goes to zero. The
main result of this section will not be effected even if we require in addition that δn goes to
zero exponentially fast with n. However, to keep the notion of asymptotically RNG in its
most generality we did not include such a condition in the definition above.

Cost and Distillation

The asymptotic distillable rate of ρ into σ under asymptotically RNG operations is defined
slightly different than the definitions given in Definition 11.5.1. Recall that the distillable
rate is defined as the supremum of the ratio m n
under the constraints that the conversion
distance between ρ⊗n and σ ⊗m is not too high. In the context of asymptotically RNG, instead
of taking the supremum of m n
we take the limit when n goes to infinity since we are only
interested in the optimal asymptotic behaviour of this ratio.
For any ε ∈ (0, 1) we will use the notation Rε (ρ → σ) to denote the set of all r ∈ R+
such that there exists a sequence {mn }n∈N ⊂ N fulfilling the two criteria:
mn
1. limn→∞ n
= r.
2. There exists another sequence {δn }n∈N ⊂ R+ with a limit of zero, such that for every
n∈N
RNGδn
T ρ⊗n −−−−→ σ ⊗mn ⩽ ε . (11.127)

That is, the sequence {mn }n∈N is such that ρ⊗n can be converted by RNGδn to a state that
is ε-close to σ ⊗mn . Hence, the set Rε (ρ → σ) ⊂ R+ consists of all achievable conversion
rates under asymptotically RNG that tolerate an ε-error. To get the optimal distillable rate
we will have to take the limit ε → 0+ .
Exercise 11.5.7. Let ρ ∈ D(A), σ ∈ D(B), and

rε := sup r : r ∈ Rε (ρ → σ) . (11.128)

1. Show that Rε (ρ → σ) = [0, rε ]; in particular, show that the supremum in the definition
of rε can be replaced with a maximum.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

11.5. ASYMPTOTIC INTERCONVERSIONS 517

2. Show that rε is non-increasing in ε.

Definition 11.5.5. Let F be a QRT, ρ ∈ D(A) and σ ∈ D(B). Using the notation
given in (11.128) of the exercise above, the asymptotically RNG distillable rate is
defined as
Distill(ρ → σ) := lim+ rε . (11.129)
ε→0

The condition (11.127) implies that there exists En ∈ RNGδn (An → B mn ) such that
En (ρ⊗n ) ≈ε σ ⊗mn . Moreover, the condition that limn→∞ δn = 0 implies that this sequence
of channels {En } is asymptotically RNG. Therefore, for a given ε ∈ (0, 1), we get that
ρ⊗n can be converted by RNGδn to σ ⊗mn up to an ε-error. The reason that we require
limn→∞ mnn = r instead of just sup{ mnn } = r is that the supremum can be achieved with a
finite n in which case δn may not be very small. Taking the limit n → ∞ ensures that the
RNGδ
conversion ρ⊗n −−−−→
n
σ ⊗mn (up to an ε-error) is achieved with a very small δn .
Exercise 11.5.8. Show that if in the definition of Rε (ρ → σ) we require sup{ mnn } = r
instead of limn→∞ mnn = r then we will get that Distill(ρ → σ) = ∞. Hint: Let n0 be a large
integer and take δn = n0 for n ⩽ n0 and δn = 0 if n > n0 .
For simplicity of the notation, we did not include a subscript in Distill(ρ → σ) to indi-
cate that the asymptotic distillable rate is calculated with respect to asymptotically RNG
operations. Similarly, we denote by Cost(ρ → σ) = 1/Distill(ρ → σ) the asymptotic cost
rate of ρ into σ under asymptotic RNG operations.

Towards Reversibility
In this book, we will restrict our attention to QRTs that meet the following condition:
1
κ(A) := lim sup max D(ω∥F) < ∞ . (11.130)
n→∞ n ω∈D(An )
It’s worth noting that this assumption is extremely lenient and is fulfilled by the majority, if
not all, of the QRTs discussed in the existing literature. In fact, for many QRTs κ(A) = 0.

Theorem 11.5.2. For any ρ ∈ D(A) and σ ∈ D(B), the asymptotic distillable rate
of ρ into σ under asymptotic RNG is bounded by

Dreg (ρ∥F)
Distill(ρ → σ) ⩽ . (11.131)
Dreg (σ∥F)

Remark. The theorem above does not follow from Theorem 11.5.1 since Distill(ρ → σ) is
calculated with respect to asymptotically RNG operations. Since these operations allow
for the generation of a resource (although small amount that vanishes asymptotically), the
proof of Theorem 11.5.1 cannot be applied directly, and a revised version is necessary to
accommodate this case.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

518 CHAPTER 11. MANIPULATION OF RESOURCES

Proof. Suppose by contradiction that

Dreg (ρ∥F)
Distill(ρ → σ) > + 2δ (11.132)
Dreg (σ∥F)

for some small positive δ. By definition, this means in particular that for sufficiently small
ε ∈ (0, 1) there exists r ∈ Rε (ρ → σ) such that

Dreg (ρ∥F)
r> +δ . (11.133)
Dreg (σ∥F)

Since r ∈ Rε (ρ → σ) there exists a sequence {mn }n∈N ⊂ N satisfying both r = limn→∞ mnn
and (11.127). From (11.127) it follows that there exists En ∈ RNGδn (An → B mn ) such that

En ρ⊗n ≈ε σ ⊗mn .

(11.134)

Now, since D(·∥F) is asymptotically continuous it follows that (cf. (10.34))

⊗n ε
F − D σ ⊗mn F ⩽ cn ε + (1 + ε)h

D En ρ (11.135)
1+ε

where
cn := max D(ω∥F) . (11.136)
ω∈D(B mn )

Dividing both sides by mn and taking the limit n → ∞ gives

1
Dreg (σ∥F) ⩽ lim D En ρ⊗n F + κ(B)ε .

(11.137)
n→∞ mn

For each n ∈ N, let ωn ∈ F (An ) be an optimizer state satisfying

D ρ⊗n ωn = D ρ⊗n F .

(11.138)

Then, for each n ∈ N

D En ρ⊗n F = minm D En ρ⊗n τn

τn ∈F(B n )

(6.114)→ ⩽ D En ρ⊗n En (ωn ) + minm Dmax En (ωn ) τn

(11.139)
τn ∈F(B n )
⊗n

DPI→ ⩽ D ρ F + Dmax En (ωn ) F

Since ωn is a free state and since each En is RNG

δn , the global robustness of En (ωn ) cannot
exceed δn , and in particular Dmax En (ωn ) F ⩽ log(1 + δn ). Therefore,

D En ρ⊗n F ⩽ D ρ⊗n F + log(1 + δn ) .

(11.140)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

11.5. ASYMPTOTIC INTERCONVERSIONS 519

Substituting this into (11.137) gives

1
Dreg (σ∥F) ⩽ lim D ρ⊗n F + log(1 + δn ) + κ(B)ε

n→∞ mn
n 1
D ρ⊗n F + κ(B)ε

= lim (11.141)
n→∞ mn n

mn 1
lim =r −−−−→ = Dreg (ρ∥F) + κ(B)ε .
n→∞ n r
Dreg (ρ∥F)
However, since r > Dreg (σ∥F)
+ δ for sufficiently small ε ∈ (0, 1) we get the contradiction

1
Dreg (σ∥F) ⩽ Dreg (ρ∥F) + κ(B)ε
r (11.142)
Exercise 11.5.9→ < Dreg (σ∥F) .

This completes the proof.

Exercise 11.5.9. Set a := Dreg (ρ∥F) and b := Dreg (σ∥F). Show that if ε ∈ (0, 1) is chosen
small enough such that
a2
κ(B)ε < δ , (11.143)
b
then for r > ab + δ
a
+ κ(B)ε < b . (11.144)
r
If the channels {En }n∈N in the proof above where generating a sublinear amount of a
resource (instead of being asymptotically RNG) the result would still not change. That is,
in the proof above we could replace the condition En ∈ RNGδn (An → B mn ) with the weaker
condition that
Dmax En (ωn ) F
lim max =0. (11.145)
n→∞ ωn ∈F(An ) mn
To see why, observe that the only change in the proof above would be to replace the term
log(1+ε)
n
in the first line of (11.141) with the ratio D max E n (ωn ) F /mn which also goes to
zero in the limit n → ∞. Hence, if the logarithmic robustness of En (ωn ) grows sublinearly
with mn the bound on the distillable rate would still hold. This observation is consistent
with the intuition that a sublinear amount of a resource becomes negligible in the asymptotic
limit and therefore cannot increase the distillable rate.
In the theorem below we will use the notations:
ε,reg 1 ε
ρ⊗n F

Dmin (ρ∥F) := lim inf Dmin
n→∞ n (11.146)
reg
D1− (ρ∥F) := lim− Dαreg (ρ∥F) .
α→1

Recall from Eqs. (10.85) (see also (11.73)) and (10.82) that for all ε ∈ (0, 1)
ε,reg
D1reg
− (ρ∥F) ⩽ Dmin (ρ∥F) ⩽ D
reg
(ρ∥F) . (11.147)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

520 CHAPTER 11. MANIPULATION OF RESOURCES

If the generalized quantum Stein’s lemma is valid, then the upper bound simplifies to an
equality.

Theorem 11.5.3. For any ρ ∈ D(A), σ ∈ D(B), and ε ∈ (0, 1), the asymptotic
distillable rate of ρ into σ under asymptotic RNG is bounded by
ε,reg
D (ρ∥F) D1reg
− (ρ∥F)
Distill(ρ → σ) ⩾ lim+ min
reg
⩾ reg
. (11.148)
ε→0 D (σ∥F) D (σ∥F)

ε,reg
Proof. The second inequality follows from the fact that Dmin (ρ∥F) ⩾ D1reg − (ρ∥F) for all

ε ∈ (0, 1) (see (11.147)). Hence, it is sufficient to prove the first inequality.

Set
ε,reg
a := lim+ Dmin (ρ∥F) , (11.149)
ε→0

and let r be a positive number satisfying r < a/Dreg (σ∥F). Our goal is to prove that
Distill(ρ → σ) ⩾ r. For this purpose, we fix ε ∈ (0, 1), and denote by mn := ⌈nr⌉ so that
limn→∞ mnn = r. We need to construct a sequence of channels {En }n∈N with the following
two properties:

1. For sufficiently large n ∈ N, the channel En ∈ RNGδn (An → B mn ) with δn := 2−nδ (for
some δ > 0). Hence, the sequence {En }n∈N is asymptotically RNG.

2. For sufficiently large n ∈ N, we have En (ρ⊗n ) ≈ε σ ⊗mn .

Note that from the definition of Distill(ρ → σ), if for any choice of ε ∈ (0, 1) there exists
a sequence {En }n∈N that satisfies the above two conditions then we must have Distill(ρ →
σ) ⩾ r.
The idea behind the construction of the channels {En }n∈N is to try to achieve the rate r
with a (two-outcome) measurement-prepare channels of the form
n
En (η) := Tr [Λn η] σn + Tr I A − Λn η ωn ∀ η ∈ L(An ) ,

(11.150)

for some σn , ωn ∈ D(B mn ) and some Λn ∈ Eff(An ). We therefore need to check if there
exists σn , ωn and Λn that satisfy both En (ρ⊗n ) ≈ε σ ⊗mn and En ∈ RNGδn (An → B mn ). Note
that if we choose Λn such that Tr[Λn ρ⊗n ] is close to one then En (ρ⊗n ) will be close to σn .
Therefore, if σn is close to σ ⊗mn we will get in this case that En (ρ⊗n ) is also close to σ ⊗mn .
We take ωn ∈ F (B mn ) to be any free density matrix, and define now Λn and σn .

1. Definition of Λn : Since the hypothesis testing divergence is non-decreasing in ε we

ε/2,reg
have that a ⩽ Dmin (ρ∥F) (the choice ε/2 instead of ε will be clear shortly). Therefore,
from the definition of a, for any δ > 0 and sufficiently large n ∈ N we get
ε/2
minn Dmin ρ⊗n τ ⩾ n (a − δ) .

(11.151)
τ ∈F(A )

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

11.5. ASYMPTOTIC INTERCONVERSIONS 521

The left-hand side of the inequality above can be expressed as

ε/2
min n Dmin ρ⊗n τn = − log maxn min n Tr [Λ′n τn ]

τn ∈F(A ) ′ τn ∈F(A ) Λn ∈Eff(A )
Tr[ρ⊗n Λ′n ]⩾1−ε/2
Minimax Theorem→
See Lemma 10.2.2 = − log min max Tr [Λ′n τn ] (11.152)
Λ′n ∈Eff(An ) τn ∈F(An )
Tr[ρ⊗n Λ′n ]⩾1−ε/2

Definition of Λn → := − log maxn Tr [Λn τn ] .

τn ∈F(A )

Combining the two equations above implies that the optimal effect Λn satisfies (for
and δ > 0 and sufficiently large n ∈ N)
ε
maxn Tr [Λn τn ] ⩽ 2−n(a−δ) and Tr[ρ⊗n Λn ] = 1 − . (11.153)
τn ∈F(A ) 2

2. Definition of σn : First, observe that if we choose σn to be ε/2-close to σ ⊗mn we get

from the triangle inequality
1 1 1
En ρ⊗n − σ ⊗mn En ρ⊗n − σn σn − σ ⊗mn

1
⩽ 1
+ 1
2 2 2
1 ε 1
(11.150)→ = (ωn − σn ) + σn − σ ⊗mn 1
(11.154)
2 2 1 2
ε ε
⩽ + =ε.
2 2
That is, En (ρ⊗n ) ≈ε σ ⊗mn . Therefore, we would like to define σn that is ε/2-close to
σ ⊗mn such that En ∈ RNGδn (An → B mn ).
ε/2
We take σn ∈ D (B mn ) to be a density matrix that satisfies Dmax (σn ∥F) = Dmax σ ⊗mn F .

The intuition behind this choice is that besides of being ε/2-close to σ ⊗mn , the density
matrix σn does not have “too much” robustness. To see why, recall first that from
Lemma 11.2.1 it follows that
1 ε/2
Dreg (σ∥F) ⩾ lim sup Dmax σ ⊗mn F

n→∞ mn
1
= lim sup Dmax (σn ∥F) (11.155)
n→∞ mn
mn 1 1
lim = r −−−−→ = lim sup Dmax (σn ∥F) .
n→∞ n
r n→∞ n
Now, since the inequality r < a/Dreg (σ∥F) is strict, there exists δ > 0 sufficiently small
such that r < (a − 2δ)/Dreg (σ∥F), or equivalently
rDreg (σ∥F) < a − 2δ . (11.156)
Hence, by combining the two equations above we get that for sufficiently large n
Dmax (σn ∥F) ⩽ n (a − 2δ) . (11.157)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

522 CHAPTER 11. MANIPULATION OF RESOURCES

That is, the global robustness (as defined in (10.54)) of σn satisfies

Rg (σn ) ⩽ 2n(a−2δ) − 1 . (11.158)

To show that for these choices the channel En is RNGδn , let η ∈ F(An ) be a free state,
and denote by tn := Tr [Λn η] and rn := Rg (σn ). Then, from the convexity of the global
robustness we get

Rg En (η) ⩽ tn Rg (σn ) + (1 − tn )Rg (ωn )
(11.159)
ωn ∈ F (B mn ) −−−−→ = tn rn ⩽ tn (1 + rn ) .

Now, from (11.158) we have rn + 1 ⩽ 2n(a−2δ) and from (11.153)

tn := Tr [Λn η] ⩽ maxn Tr [Λn τn ] ⩽ 2−n(a−δ) . (11.160)
τn ∈F(A )

Combining everything we get

Rg (En (η)) ⩽ 2−nδ = δn . (11.161)
That is, En ∈ RNGδn (An → B mn ). This completes the proof.
Exercise 11.5.10. Let σn and σ ⊗mn be as in the proof above.
1. Show that the robustness of σ ⊗mn is bounded by
Rg σ ⊗mn ⩽ 2mn Dmax (σ∥F) − 1

(11.162)

2. Show that the right-hand side of (11.162) is larger than the bound on R(σn ) given
in (11.158).

Examples with Full Reversibility

As an example, consider the resource theory of quantum coherence. In this example, the set
of free states are diagonal with respect to a fixed basis. In this example, the α-Rényi relative
entropy of a resource is additive. To see this recall that (see (10.105))
1
Dα (ρ∥F) = log ∥∆ (ρα )∥1/α (11.163)
α−1
where ∆ ∈ CPTP(A → A) is the completely dephasing map. Moreover, on n copies of A,
the completely dephasing map ∆n ∈ CPTP(An → An ) satisfy ∆n = ∆⊗n , where ∆ is the
completely dephasing map on a single copy of A. We therefore get that
1
Dα ρ⊗n F = log ∆n (ρα )⊗n 1/α

α−1
1
= log (∆ (ρα ))⊗n 1/α (11.164)
α−1
1
=n log ∥∆ (ρα )∥1/α = nDα (ρ∥F) .
α−1

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

11.6. NOTES AND REFERENCES 523

Therefore, in the case,

Dαreg (ρ∥F) = Dα (ρ∥F) . (11.165)
Since Dα (·∥F) is continuous at α = 1 (see Lemma 10.2.3) we conclude that the QRT of
quantum coherence is reversible with

H ∆(ρ) − H(ρ)
Distill(ρ → σ) = . (11.166)
H ∆(σ) − H(σ)

Exercise 11.5.11. Prove the equality above.

As a second example, consider the QRT consisting of conditional unital channels. In this
QRT the free states are given by

F(AB) = uA ⊗ σ B : σ ∈ D(B)

(11.167)

Since this QRT is also an affine QRT it has a self-adjoint resource destroying channel given
by
∆AB→AB ω AB := uA ⊗ ω B

∀ ω ∈ L(AB) . (11.168)
Also in this QRT the α-relative entropy of a resource is additive so that we get reversibility.
In particular, for this QRT we have
log |A| − H(A|B)ρ
Distill ρAB → σ AB = . (11.169)
log |A| − H(A|B)σ

Exercise 11.5.12. Prove the equality above.

11.6 Notes and References

Theorem 11.1.1 is due to [95]. In the derivation of Corollary 11.1.1 we followed [224] that
used similar lines to prove the same corollary in entanglement theory. The generalized AEP
as given in Theorem 11.2.1 was first proved by [32]. In the same paper it was also argued
that the generalized Stein’s lemma (conjecture 11.3.1) is correct, however, later on an error
was found in the proof (see [75] and [24]), so that Conjecture 11.3.1 is still a major open
problem in the field. We followed [98] in the proof of the uniqueness of the Umegaki relative
entropy; originally it was discovered by [159].

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

524 CHAPTER 11. MANIPULATION OF RESOURCES

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

Part IV

Entanglement Theory

525
CHAPTER 12

Pure-State Entanglement

Entanglement theory is the poster child of quantum resource theories. As we explored in

Chapter 1.4, entanglement not only piques our curiosity from a fundamental perspective but
also represents a valuable resource that can facilitate specific quantum information processing
tasks. In the chapters ahead, we embark on a journey to rigorously define entanglement
and formulate its corresponding resource theory. Our focus encompasses the classification,
detection, quantification, and manipulation of entanglement.

12.1 Definition of Quantum Entanglement

As discussed earlier, entanglement can be regarded as a resource with practical utility in
specific quantum information processing tasks. Take, for instance, quantum teleportation,
a process where entanglement is harnessed to simulate a quantum channel. Naturally, if
Alice and Bob already possess a noiseless quantum channel that they can freely employ an
unlimited number of times, entanglement holds no value for them since they can generate
it without constraints. In this context, entanglement serves as the means by which parties
surmount the limitations imposed by their apparatuses, enabling them to perform opera-
tions that extend beyond local quantum operations (e.g., quantum measurements) aided by
classical communication. Thus, we can operationally define entanglement as follows:

Quantum Entanglement
Definition 12.1.1. Entanglement is a characteristic of a composite physical system
that cannot be created or enhanced through local (quantum) operations and classical
communication (LOCC).

This definition precisely captures the intuition that entanglement is a quantum property
of a composite system that corresponds to correlations that are not classical. Historically,
this intuition led many researchers to associate entanglement with the non-local correlations
exhibited by composite physical systems. These correlations find expression in the proba-

527
528 CHAPTER 12. PURE-STATE ENTANGLEMENT

bility distribution p(ab|xy) observed in a Bell-type scenario. However, as we will explore

later, while the above definition of entanglement relates to Bell’s non-local correlations, it is
not an exact replica of the same property. Consequently, in general, entanglement and Bell
non-locality represent subtly different concepts.
To better understand the properties of entanglement, it is essential first to grasp the
structure of LOCC. We first encountered LOCC in the context of quantum teleportation. In
this process, Alice performs a quantum measurement on her two systems (a local quantum
operation) and then transmits the measurement’s outcome to Bob (using classical communi-
cation). At the end of the protocol, Bob executes a local (unitary) operation on his system.
The LOCC in the teleportation protocol is particularly unique because it involves only a
one-way classical communication channel from Alice to Bob. We refer to this restricted set
of LOCC as LOCC1 .
Generally, LOCC allows for unlimited rounds of classical communication. LOCC limited
to n rounds of classical communication is denoted as LOCCn . We also use the notation
LO = LOCC0 for local operations that occur without any communication. An example of
an LO map is the local unitary operation U AB = U A ⊗ U B . More broadly, an LO operation
can be defined as a quantum channel E ∈ LOCC(AB → A′ B ′ ) of the form E = M ⊗ N ,
where M ∈ CPTP(A → A′ ) and N ∈ CPTP(B → B ′ ) are channels on Alice’s and Bob’s
sides, respectively.
The most general quantum operation involving Alice performing a local operation (such
as a generalized measurement) and then sending classical information to Bob can be char-
acterized by a quantum instrument, E ∈ CPTP(A → A′ Y ). Here, Y represents the classical
system that Bob receives from Alice. Upon receiving Y , Bob can implement a local oper-
ation, F ∈ CPTP(BY → B ′ ). Consequently, the set of all LOCC1 operations is defined
mathematically as follows:
n ′ ′ ′Y )
o
LOCC1 (AB → A′ B ′ ) := F BY →B ◦ E A→A Y : FE∈CPTP(A→A
∈CPTP(BY →B ′ ) , |Y | < ∞ . (12.1)

Note that we do not impose any constraint on the classical system Y , only that it is finite
dimensional. Setting n := |Y |, an LOCC1 channel can be expressed as
′ ′ B→B ′ ′
X
F BY →B ◦ E A→A Y = F(y) ⊗ EyA→A . (12.2)
y∈[n]

′ ′
Here, for every y ∈ [n],
P the operation F(y) ∈ CPTP(B → B ), and Ey ∈ CP(A → A ).
Furthermore, the sum y∈[n] Ey is trace preserving.
By incorporating an additional round of communication from Bob to Alice, we obtain
channels in LOCC2 . Specifically, a channel N ∈ LOCC2 (AB → A′ B ′ ) can be expressed as
follows:
′ ′ ′
N AB→A B = E1A1 X1 →A ◦ F BY1 →B1 X1 ◦ E0A→A1 Y1 , (12.3)
where A1 represents an additional system on Alice’s side, E0 and E1 are channels on Alice’s
side, and F is a channel on Bob’s side. It’s important to note that without the second round
of communication, which corresponds to the case when |X1 | = 1, the description reverts to
a channel in LOCC1 .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

12.1. DEFINITION OF QUANTUM ENTANGLEMENT 529

Exercise 12.1.1. Show that if |X1 | = 1 then the channel in (12.3) belongs to LOCC1 .
In the same fashion, one can continue and express the most general protocol in LOCCn .
Clearly, from the construction above it is obvious that the expression of LOCC protocols
can be very complicated particularly if it involves a large number of classical communication
rounds (see Fig. 12.1). Moreover, it is also known that LOCCn is a strict subset of LOCCn+1
for all n ∈ N. Due to this notorious complexity of LOCC, and despite the enormous body
of work in recent years on the study of LOCC, there are still many open problems in entan-
glement theory. For this reason, it is sometimes convenient to consider a slightly larger class
of operations that contains LOCC and have a simpler characterization. We will consider in
the next chapter two such sets of operations known as the separable set and the PPT set.
However, as we will see in this chapter, the complexity of LOCC is reduced dramatically
when the bipartite system is initially in a pure state.

Figure 12.1: An LOCC operation. The double purple lines represents classical communication
between the parties.

In entanglement theory we consider all systems to be composite systems consisting of

at least two subsystems that we will denote by A (operated by Alice) and B (operated by
Bob). This means that even a single system on Alice’s side that is described by a density
matrix ρ ∈ D(A) can be treated mathematically as a bipartite system with ρ acting on AB,
with B being the trivial system (i.e. |B| = 1) in this case. The set of free operations of
this theory is F = LOCC, and we will typically use the notation LOCC(AB → A′ B ′ ) to
indicated bipartite channels in CPTP(AB → A′ B ′ ) that take bipartite states in D(AB) to
bipartite states in D(A′ B ′ ) (i.e. A and A′ represents Alice’s subsystems, and B and B ′ Bob’s
subsystems).
Exercise 12.1.2 (Separable Operations). Let SEP(AB → A′ B ′ ) ⊂ CPTP(AB → A′ B ′ ) be
the set of all quantum channels that has a separable operator sum representation; i.e. E ∈
SEP(AB → A′ B ′ ) if and only if there exist sets of operators Mx : A → A′ and Nx : B → B ′
such that
′ ′
X
E AB→A B ρAB := (Mx ⊗ Nx ) ρAB (Mx ⊗ Nx )∗

∀ ρ ∈ L(AB) . (12.4)
x

1. Show that
LOCC(AB → A′ B ′ ) ⊆ SEP(AB → A′ B ′ ) . (12.5)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

530 CHAPTER 12. PURE-STATE ENTANGLEMENT

2. Show that if E ∈ SEP(AB → A′ B ′ ) then its normalized Choi matrix is a separable

density matrix in D(AA′ BB ′ ) (the seperability is between Alice system AA′ and Bob’s
system BB ′ ).

12.2 Exact Manipulations of Entanglement

Since entanglement is a resource, it is typically not available in its purest form as described
for example by the pure (bipartite) singlet state. Instead, entangled systems are quite
often only partially entangled, and given in a form that is mixed with noise. Consequently,
such systems are not ideal resources for certain quantum information processing tasks (e.g.
quantum teleportation). To identify which entangled states are more resourceful than others,
we study here manipulations of entanglement; i.e. the conversion of one (partially) entangled
state to another by LOCC.
The most general LOCC protocol can be characterized by a quantum instrument {Ex },
with each Ex ∈ CP(AB → A′ B ′ ) being a trace non-increasing map. The parameter x
corresponds to all the measurements outcomes that were involved in the protocol (and that
were not discarded). Therefore, if the initial state of thesystem was ρAB , the final state of
the system after outcome x occurred is given by Ex ρAB /px , where
px = Tr Ex ρAB .

(12.6)

If for allPx, the post measurement state is the same, i.e. Ex ρAB /px = σ AB for all x, then
σ AB = x Ex ρAB , and the map x Ex := E is an LOCC CPTP map. In particular, we
P
say that a bipartite state ρAB is more entangled (i.e. more resourceful) than another state
σ AB if there exists an LOCC CPTP map E such that σ AB = E(ρAB ).
In general, given two entangled mixed states ρAB and σ AB , it can be very difficult to
determine if they are related by an LOCC CPTP map. There are two main reasons for
that: as we discussed above, LOCC is a very complex set to characterize, and in addition
the states ρAB and σ AB are mixed. It turns out that the entanglement properties of mixed
states are very hard to characterized in general. For example, it is known to be NP-hard to
determine if a mixed bipartite state ρAB is entangled or separable. We therefore first treat
in this chapter the pure bipartite entanglement manipulations, and postpone the treatment
of mixed-state entanglement manipulations to the next chapter.

12.2.1 LOCC on Pure Bipartite States

Here we consider the effect of LOCC operations on a pure bipartite states. The set of all
pure bipartite states will be denoted by Pure(AB), and without loss of generality, we will
assume in this section that |A| = |B| := d, since we can always embed any bipartite system
in a larger Hilbert space with both local dimensions equal max{|A|, |B|}. Recall that any
such state ψ ∈ Pure(AB) has a Schmidt decomposition of the form (see Exercise 2.3.26)
X√
|ψ AB ⟩ = px |ψxA ⟩ ⊗ |ϕB A B
AB
x ⟩ := Λ ⊗ I |Ω ⟩ (12.7)
x∈[d]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

12.2. EXACT MANIPULATIONS OF ENTANGLEMENT 531

where {|ψx ⟩A }x∈[d] and {|ϕx ⟩B }x∈[d] are orthonormal bases of A and B, respectively, ΛA =
√ √
Diag( p1 , . . . , pd ) is a diagonal matrix in the basis {|ψx ⟩A }x∈[d] , and |ΩAB ⟩ = x∈[d] |ψxA ⟩⊗
P

|ϕB
x ⟩ is an (unnormalized) maximally entangled state.
We consider now the effect of a local measurement on Bob side. For this purpose, let N
be some d × d complex matrix, and note that
I A ⊗ N |ψ AB ⟩ = (Λ ⊗ N ) |ΩAB ⟩ = ΛN T ⊗ I B |ΩAB ⟩ .

(12.8)
Exercise 12.2.1. Show that the matrix ΛN T has same singular values as N Λ. Hint: Show
that for any square matrix C, the matrices C and C T have the same singular values.
Since ΛN T and N Λ are two square matrices with the same singular values, it follows from
the singular value decomposition that there exists two unitary matrices U and V such that
ΛN T = U N ΛV T (12.9)
where the transpose on V is for convenience. Substituting this into (12.8) gives
I A ⊗ N |ψ AB ⟩ = U N ΛV T ⊗ I B |ΩAB ⟩

= (U N Λ ⊗ V ) |ΩAB ⟩ (12.10)
= (U N ⊗ V ) |ψ AB ⟩ .

That is, the vectors I A ⊗ N |ψ AB ⟩ and N ⊗ I B |ψ AB ⟩ are equivalent up to the local
unitary map U ⊗ V . This observation leads to the following result.

Lo-Popescu’s Theorem
Theorem 12.2.1. The effect of any LOCC map on a pure bipartite state can be
simulated by the following protocol: Alice performs a generalized quantum
measurement {Myx }x,y , sends the result (x, y) to Bob who then performs a local
unitary map Vyx on his system, and in the final step, Alice and Bob discard the value
of y.

P instrument,∗ {Ex }x∈[m] , performed by Bob. Any CP map

Proof. Consider first a single local
Ex can be expressed as Ex (·) = y∈[n] Nyx (·)Nyx , so that
X
∗
ExB→B (ψ AB ) = I A ⊗ Nyx ψ AB I A ⊗ Nyx

y∈[n]
X (12.11)
∗ ∗ ∗
(Uyx Nyx ⊗ Vyx ) ψ AB Nyx

(12.10)→ = Uyx ⊗ Vyx
y∈[n]

where we used (12.10) for each x ∈ [m] and y ∈ [n], with Uxy and Vxy being unitary matrices.
Denoting by Myx := Uyx Nyx the above equation becomes
X
∗ ∗
ExB→B (ψ AB ) = (Myx ⊗ Vyx ) ψ AB Myx

⊗ Vyx . (12.12)
y∈[n]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

532 CHAPTER 12. PURE-STATE ENTANGLEMENT

∗
Myx = I A we conclude that any quantum instrument that
P P
Moreover, since x∈[m] y∈[n] Myx
is performed by Bob, can be simulated with the following protocol: Alice performs a gen-
eralized quantum measurement {Mxy }x∈[m],y∈[n] , sends the outcome (x, y) to Bob, who then
performs a unitary matrix Vxy . At the end of the protocol, Alice and Bob discard or forget
the value of y. Therefore, in any LOCC protocol, all the local quantum instruments on Bob’s
side can be simulated with unitaries and measurements on Alice’s side. Since a sequence
of quantum instruments (generalized measurements) on Alice’s side can be combined into a
single generalized measurement (followed by coarse graining, i.e. discarding of information),
we conclude that the most general LOCC protocol on a pure bipartite state can be simu-
lated with a single generalized measurement by Alice’s side followed by a unitary on Bob’s
side that depends on Alice’s measurement outcome, and ends with the discarding of partial
information of the measurement outcome.

Exercise 12.2.2. Show that a sequence of two generalized measurements can be viewed as
a single generalized measurement. That is, given two generalized measurement {Mx }x∈[m]
and {Ny }y∈[n] show that the set of matrices {Lxy := Mx Ny }x∈[m],y∈[n] is also a generalized
measurement.

Exercise 12.2.3. Let ψ ∈ Pure(AB) and σ ∈ D(AB). Show that if there exists a deter-
LOCC
ministic LOCC protocol that converts ψ AB to σ AB , i.e. ψ AB −−−→ σ AB , then there exists a
set {Mx }x∈[m] of complex matrices in L(A), and a set {Ux }x∈[m] of unitary matrices in L(B)
such that X
σ AB = (Mx ⊗ Ux ) ψ AB (Mx ⊗ Ux )∗ . (12.13)
x∈[m]

Show further that the above relation can be expressed as

σ AB = U BX→B ◦ E A→AX ψ AB .

(12.14)

where E ∈ CPTP(A → AX) is a quantum instrument and U ∈ CPTP(BX → B) is a

LOCC1
controlled unitary channel. Note that in particular this implies that ψ AB −−−−→ σ AB .

Theorem 12.2.1 can be simplified further if we consider only LOCC protocols that take
pure bipartite states to pure bipartite states. In this case, any LOCC transformation can
be simulated by the following simple protocol: Alice performs a generalized measurement
{Mx }, sends the outcome x to Bob, who then performs a local unitary operation Vx . This
simplification of LOCC will be crucial for the study of pure-state entanglement theory.

Exercise 12.2.4. The Schmidt rank of a pure bipartite state is defined as the number of
non-zero Schmidt coefficients; for example, the Schmidt rank of the state given in (12.7) is
the rank of the matrix ΛA . We denote the Schmidt rank of a bipartite state ψ ∈ Pure(AB)
by SR(ψ). Show that for two bipartite states ψ, ϕ ∈ Pure(AB) with SR(ϕ) > SR(ψ) it is
impossible to convert ψ to ϕ by LOCC (not even with probability less than one).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

12.2. EXACT MANIPULATIONS OF ENTANGLEMENT 533

12.2.2 Exact Deterministic Interconversions

In this section we provide the precise conditions that determine if one quantum state can
LOCC
be converted to another by LOCC. We will use the notation |ψ AB ⟩ −−−−→ |ϕAB ⟩ whenever
it is possible to convert a bipartite pure state ψ ∈ Pure(AB) into the state ϕ ∈ Pure(AB).
Recall that any bipartite quantum state ψ ∈ Pure(AB) (with |A| = |B| := d) has a Schmidt
decomposition of the form
X√
|ψ AB ⟩ = px |ψx ⟩A |ϕx ⟩B , (12.15)
x∈[d]

where {|ψx ⟩A }x∈[d] and {|ϕx ⟩B }x∈[d] are orthonormal bases of A and B, respectively. Let U
and V be unitary matrices such that U |ψx ⟩A = |x⟩A and V |ϕx ⟩B = |x⟩B , where {|x⟩A } and
{|x⟩B } are the standard bases of A and B, respectively. Hence,
X√
U ⊗ V |ψ AB ⟩ = px |xx⟩AB . (12.16)
x∈[d]

Note also that by applying additional local permutations (which are unitaries) to the state
above we can rearrange the order that the Schmidt coefficients. Therefore, there exist unitary
matrices U ′ ∈ L(A) and V ′ ∈ L(B) such that
X√
|ψ̃ AB ⟩ := U ′ ⊗ V ′ |ψ AB ⟩ = px |xx⟩AB and p1 ⩾ p2 ⩾ · · · ⩾ pd . (12.17)
x∈[d]

The above form is called the standard form of |ψ AB ⟩. Note that |ψ AB ⟩ can be converted by
LOCC to another state |ϕAB ⟩ if and only if the standard form of |ψ AB ⟩ can be converted by
LOCC to the standard form of |ϕAB ⟩ (see Fig. 12.2). Therefore, without loss of generality
we will assume here that both |ψ AB ⟩ and |ϕAB ⟩ are given in their standard form.

Figure 12.2: LOCC maps between two pure bipartite states and their standard forms.

The next theorem provides a connection between LOCC conversions and majorization.
For any two density matrices ρ, σ ∈ D(A), we will say that ρ majorizes σ, and write ρ ≻ σ,
if the probability vectors p and q, consisting, respectively, of the eigenvalues of ρ and σ,
satisfy p ≻ q.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

534 CHAPTER 12. PURE-STATE ENTANGLEMENT

Nielsen’s Majorization Theorem

Theorem12.2.2.
A := AB
Let Aψ, ϕ ∈∈ Pure(AB)
AB be two bipartite quantum states, and let
ρ TrB ψ and σ := TrB ϕ be their corresponding reduced density
matrices. Then,
LOCC
ψ AB −−−−→ ϕAB ⇐⇒ σ A ≻ ρA . (12.18)

Proof. From the argument above we can assume without loss of generality that both ψ AB
and ϕAB are given in their standard forms. Furthermore, Exercise 12.2.4 implies that we
can assume without loss of generality that supp(σ A ) ⊆ supp(ρA ). This in turn implies that
we can assume without loss of generality that ρA > 0 since otherwise we embed both |ψ AB ⟩
and |ϕAB ⟩ in supp(ρA ) ⊗ B.
Form Theorem 12.2.1 any LOCC map that takes pure bipartite state |ψ AB ⟩ to another
pure bipartite state |ϕAB ⟩ can be simulated by 1-way LOCC (i.e. LOCC1 ) of the following
form: Alice performs a single generalized measurement {Mz }z∈[m] on her system, sends the
measurement outcome z to Bob, who then performs the unitary Vz . Therefore, after outcome
z occurred, the post-measurement state is given by
1
√ Mz ⊗ Vz |ψ AB ⟩ , (12.19)
tz
where tz := ⟨ψ AB |Mz∗ Mz ⊗ I B |ψ AB ⟩ is the probability that Alice’s measurement outcome
is z. Hence, if |ψ AB ⟩ can be converted by LOCC to |ϕAB ⟩ with 100% success rate, then
there must exist a generalized measurement {Mz }z∈[m] and a collection of unitary matrices
{Vz }z∈[m] such that
1
√ Mz ⊗ Vz |ψ AB ⟩ = |ϕAB ⟩ ∀ z ∈ [m] . (12.20)
tz
Since we assume without loss of generality that both |ψ AB ⟩ and |ϕAB ⟩ are given in their
standard form, we have
X√ √
|ψ AB ⟩ = px |xx⟩AB = ρ ⊗ I B |ΩAB ⟩
x∈[d]
X√ √ (12.21)
AB
|ϕ ⟩= qx |xx⟩AB = σ ⊗ I B |ΩAB ⟩
x∈[d]

where
Pρ and σ are, Arespectively,
Pthe reduced density matrices of |ψ AB ⟩ and |ϕAB ⟩. Explicitly,
ρ = x∈[d] px |x⟩⟨x| and σ = x∈[d] qx |x⟩⟨x|A . Substituting this into (12.20) gives

1 √ √
√ (Mz ρ ⊗ Vz ) |ΩAB ⟩ = σ ⊗ I B |ΩAB ⟩ .

(12.22)
tz
From Exercise 2.3.26 it follows that the above equation hold if and only if
1 √ √
√ Mz ρVzT = σ (12.23)
tz

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

12.2. EXACT MANIPULATIONS OF ENTANGLEMENT 535

T
Note that the matrix Uz := (Vz−1 ) is unitary. With this notation, the above equation is
equivalent to
√ √
Mz = tz σUz ρ−1/2 . (12.24)
The only constraint on Mz is that z∈[m] Mz∗ Mz = I A . We therefore conclude that |ψ AB ⟩
P

can be converted to |ϕAB ⟩ by LOCC if and only if there exists m ∈ N, unitary matrices
{Uz }z∈[m] , and probabilities {tz }z∈[m] such that
X
tz ρ−1/2 Uz∗ σUz ρ−1/2 = I A , (12.25)
z∈[m]

or equivalently,
X
ρ= tz Uz∗ σUz . (12.26)
z∈[m]

In other words, |ψ AB ⟩ can be converted to |ϕAB ⟩ by LOCC if and only if there exists a
mixture of unitaries that transforms the reduced density matrix of |ϕAB ⟩ to the reduced
density matrix of |ψ AB ⟩. Observe that such a random unitary channel is a unital channel. In
Section 3.5.9 we showed that if ρ = E(σ), with E being a unital channel, then there exists a
doubly stochastic matrix D such that p = Dq (recall that p and q are the probability vectors
whose components consist of the eigenvalues of ρ and σ, respectively). From Theorem 4.1.1
it then follows that q ≻ p or equivalently, σ ≻ ρ. We therefore conclude that if ψ AB can be
converted to ϕAB by LOCC then we must have σ ≻ ρ.
Conversely, if σ ≻ ρ then from Theorem 4.1.1 we have p = Dq for some doubly stochastic
matrix D. From Birkhoff/von-Neumann theorem (see Theorem A.5.1) every doubly stochas-
tic matrix can we written as a convex combination of permutation matrices. Therefore, there
therePexists m ∈ N, permutation matrices {Πz }z∈[m] , and probabilities {tz }z∈[m] , such that
p = z tz Πz q. In Exercise 12.2.5 you will show that this relation can be expressed as
X
ρ= tz Πz σΠTz . (12.27)
z∈[m]

The above equality is equivalent to (12.26) by taking Uz = ΠTz . Therefore, ψ AB can be

converted to ϕAB by LOCC. This completes the proof.

Exercise 12.2.5. Show that if ρ is a diagonal matrix, and p is the vector consisting of
its diagonal elements, then Πz ρΠTz is also a diagonal matrix,
P with the diagonal elements
given by the components of Πz p. Use this to show that ρ = z∈[m] tz Πz σΠTz if and only if
P
p = z∈[m] tz Πz q.

Exercise 12.2.6. Show that the maximally entangled state |ΦAB ⟩ := √1d x∈[d] |xx⟩ can be
P

converted by LOCC to any other state in Pure(AB). Moreover, show that any state |ψ⟩ ∈ AB
can be converted by LOCC to any product state of the form |ϕ⟩|χ⟩ ∈ AB.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

536 CHAPTER 12. PURE-STATE ENTANGLEMENT

Exercise 12.2.7. For any bipartite state ψ ∈ Pure(AB), and for any k = 1, . . . , d, define
X
E(k) (ψ AB ) := 1 − p↓x = 1 − ∥p∥(k) , (12.28)
x∈[k]

where p is the Schmidt vector of |ψ AB ⟩, and ∥p∥(k) is the Ky Fan norm of p (cf. Defini-
tion 2.3.2). Show that Nielsen Majorization Theorem can be expressed as
LOCC
ψ AB −−−−→ ϕAB ⇐⇒ E(k) (ψ AB ) ⩾ E(k) (ϕAB ) ∀ k ∈ [d] . (12.29)

Exercise 12.2.8. Let ψ ∈ Pure(AB), with m := |A| = |B|.

1. Prove the following theorem, assuming that Alice and Bob share the state ψ AB (but no
other entangled systems). Theorem. Faithful teleportation of a d-dimensional qudit
is possible if, and only if,

Et (ψ AB ) := − log2 pmax ⩾ log2 d , (12.30)

where pmax is the largest Schmidt coefficient of ψ AB . That is, teleportation is possible
if, and only if, none of the Schmidt coefficients are greater than 1/d. This also implies
that the Schmidt rank m is greater than or equal to d. Hint: Use Nielsen majorization
theorem.

2. Find a protocol for faithful teleportation of a qubit from Alice’s lab to Bob’s lab as-
suming Alice and Bob share the partially entangled state
1 1 1
|ψ AB ⟩ = √ |0⟩A |0⟩B + |1⟩A |1⟩B + |2⟩A |2⟩B . (12.31)
2 2 2

In particular, determine the projective measurement performed by Alice and the unitary
operators performed by Bob. What is the optimal classical communication cost? That
is, how many classical bits Alice has to send to Bob?

Exercise 12.2.9. Consider m bipartite states {ψzAB }z∈[m] in Pure(AB). Find an optimal
state ϕAB ∈ Pure(AB) such that:

1. The state ϕAB can be converted by LOCC to ψzAB , for all z ∈ [m].

2. If another state, χ ∈ Pure(AB), can be converted by LOCC to ψzAB , for all z ∈ [m],
then χAB can also be converted to ϕAB . That is, ϕAB is optimal.

12.2.3 Entanglement Catalysis

As we observed earlier, some LOCC conversions of pure bipartite states cannot be realized
with a 100% success rate. This limitation arises because LOCC constitutes a restricted subset
of quantum operations. To expand the capabilities of LOCC, one could allow Alice and Bob

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

12.2. EXACT MANIPULATIONS OF ENTANGLEMENT 537

to temporarily use an entangled state during their LOCC protocols. The condition here is
that they must return the entangled systems in their original state at the end of the protocols.
At first glance, it might seem that borrowing an entangled system wouldn’t provide any
advantage for tasks that cannot be accomplished with standard LOCC. However, as we will
now demonstrate, Nielsen’s majorization theorem reveals that this entanglement-assisted
LOCC (eLOCC) actually represents a significantly broader set of operations compared to
LOCC alone.
We start with the following example. Consider the two entangled states
p p p p
|ψ AB ⟩ = 2/5|00⟩ + 2/5|11⟩ + 1/10|22⟩ + 1/10|33⟩
p p p (12.32)
|ϕAB ⟩ = 1/2|00⟩ + 1/4|11⟩ + 1/4|22⟩ .

The Schmidt probability vectors associated with the two states above are given by
T T
2 2 1 1 1 1 1
p := , , , and q = , , ,0 , (12.33)
5 5 10 10 2 4 4
respectively. In Exercise 4.4.1, you confirmed that neither p majorizes q nor q majorizes p,
symbolized as p ̸≻ q and q ̸≻ p. Consequently, Nielsen’s majorization theorem implies that
neither ψ AB can be converted to ϕAB , nor can ϕAB be converted to ψ AB using LOCC. Now,
consider the state
′ ′
p p
|χA B ⟩ = 3/5|00⟩ + 2/5|11⟩ , (12.34)
and let its Schmidt vector be denoted by r := (3/5, 2/5)T . Interestingly, it is easy to verify
that
q⊗r≻p⊗r. (12.35)
Therefore, according to Nielsen’s theorem, the transformation
′ ′ LOCC ′ ′
|ψ AB ⟩ ⊗ |χA B ⟩ −−−−→ |ϕAB ⟩ ⊗ |χA B ⟩ (12.36)

is achievable with a 100% success rate. In this context, the state χAB functions as a catalyst
for the conversion of ψ AB into ϕAB , and thus is referred to as an entanglement catalyst.
Exercise 12.2.10. Show that there is no entanglement catalyst if both ψ AB and ϕAB have
Schmidt rank 3.
Exercise 12.2.11. Show that the maximally entangled state cannot act as a catalyst in any
eLOCC conversions that are not possible by LOCC.
Entanglement catalysis motivates the definition of a new partial order between probability
vectors that we studied in Sec. 4.4 and is called the trumping relation. Recall that for any
p, q ∈ Prob(n) we say that q trumps p and write

q ≻∗ p , (12.37)

if there exists an integer m ∈ N and a vector r ∈ Prob(m) such that q ⊗ r ≻ p ⊗ r.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

538 CHAPTER 12. PURE-STATE ENTANGLEMENT

Exercise 12.2.12. Show that any Schur concave function f : Prob(n) → R that is additive
under tensor product, behaves monotonically under the trumping relation. That is, for any
p, q ∈ Prob(n) we have
q ≻∗ p ⇒ f (p) ⩾ f (q) . (12.38)
A well known family of functions that behaves monotonically under the trumping relation
are the Rényi entropies. The set of functions
(
sign(α)
log x∈[m] pαx if 0 ̸= α ∈ [−∞, ∞]
P
1−α
fα (p) := (12.39)
− log(p1 · · · pm ) if α = 0 .

satisfies the monotonicity under the tramping relation and additivity. For α ⩾ 0 (i.e. fα =
Hα is the Rényi entropy) these functions are entropy functions as they also satisfy the
normalization condition that they are zero for p = (1, 0, . . . , 0). For α ⩽ 0 they are defined
to be −∞ if p ̸> 0. Such functions are not entropy functions since they do not satisfies the
normalization condition, however, they are useful in the characterization of the tramping
relation. Note that any convex combination of the functions above is also additive and
monotonic under the trumping relation.
Exercise 12.2.13. Show that the condition (4.182) of Theorem 4.4.1 is equivalent to the
condition that
fα (p) > fα (q) ∀ α ∈ [−∞, ∞] , (12.40)
where fα is defined in (12.39)
For each α ∈ [−∞, ∞] and ψ ∈ Pure(AB) define

Eα ψ AB := fα (p)

(12.41)

where p is the Schmidt probability vector of ψ AB , and fα is defined in (12.39). We will see
later on that the functions {Eα }α are measures of entanglement on pure states. From Theo-
rem 4.4.1 it follows that these functions can be used to characterize eLOCC transformations.

Corollary 12.2.1. Let ψ, ϕ ∈∈ Pure(AB) be two bipartite quantum states. Then,

eLOCC
ψ AB −−−−→ ϕAB Eα ψ AB > Eα ϕAB

⇐⇒ ∀ α ∈ [−∞, ∞] , (12.42)

where {Eα }α are defined in (12.41).

Exercise 12.2.14. Prove the corollary above using Theorem 4.4.1.

12.3 Quantification of Pure Bipartite Entanglement

How can entanglement be quantified? According to its definition (see Definition 12.1.1),
entanglement cannot be created or increased by LOCC. Consequently, entanglement must

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

12.3. QUANTIFICATION OF PURE BIPARTITE ENTANGLEMENT 539

be quantified using functions that are monotonic under LOCC. In this section, we focus on
pure state entanglement and consider a measure of entanglement to be a function
[
E: Pure(AB) → R (12.43)
A,B

that exhibits monotonic behavior under pure-state LOCC transformations.

Any pure bipartite state ψ ∈ Pure(AB) can be transformed by a local unitary operation
(i.e., a reversible LOCC) into the state
X√
|ψ̃ AB ⟩ := U ⊗ V |ψ AB ⟩ = px |xx⟩AB , (12.44)
x∈[d]

where U and V are d × d unitary matrices, {px }x∈[d] are the Schmidt coefficients of ψ AB ,
and {|x⟩A } and {|x⟩B } are fixed bases of A and B. By definition, since the LOCC map
U ⊗ V is reversible (having an LOCC inverse U ∗ ⊗ V ∗ ), we must have for any measure of
entanglement on pure states:

E ψ AB = E ψ̃ AB = f (p) ,

(12.45)

where p := (p1 , . . . , pd )T is the probability Schmidt vector of ψ AB and f : Prob(d) → R+

is some non-negative function of the Schmidt coefficients of |ψ AB ⟩. That is, any measure of
entanglement on pure states depends only on the Schmidt coefficients of the entangled state.
LOCC
From Nielsen theorem we have that |ψ AB ⟩ −−−−→ |ϕAB ⟩ if and only if the corresponding
Schmidt probability vectors p and q of ψ AB and ϕAB , respectively, satisfy p ≺ q. On the
LOCC
other hand,
by definition,
if ψ AB −−−−→ ϕAB then any measure of entanglement must satisfy
E ψ AB ⩾ E ϕAB . Hence, the function f above must be monotonic under majorization;
i.e. f must be a Schur concave function. Specifically,

p ≺ q ⇒ f (p) ⩾ f (q). (12.46)

Example 1: Entropy of Entanglement

The entropy of entanglement is arguably the most important measure of pure-state entan-
glement with several operational interpretations. It is detonated by E and defined for any
ψ ∈ Pure(AB) by
E ψ AB := H(p) ,

(12.47)

where H is the Shannon entropy and p is the Schmidt probability vector of ψ AB . We will
see in the next sections that the entropy of entanglement equals both the entanglement cost
and the distillable entanglement in the asymptotic regime.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

540 CHAPTER 12. PURE-STATE ENTANGLEMENT

Example 2: α-Entropy of Entanglement

Similar to the entropy of entanglement, for any α ∈ [0, ∞] the α-entropy of entanglement is
defined for any ψ ∈ Pure(AB) as
Eα ψ AB := Hα (p) ,

(12.48)
where Hα is the α-Rényi entropy and p is the Schmidt probability vector of ψ AB . In addition
to being monotonic under LOCC the α-entropy of entanglement is additive under tensor
products. We saw in Corollary 12.2.1 that this in turn implies that the Eα is also monotonic
under eLOCC and therefore can be used to characterize entanglement catalysis.

Example 3: The concurrence monotones

In (4.49) we saw that the elementary symmetric functions are Schur concave. Therefore,
they can be used to define a family of measures of entanglement known as the concurrence
monotones. Specifically, they are defined for any ψ ∈ Pure(AB) and any k ∈ [d] as
1/k
AB
fk (p) X
Ck ψ := where fk (p) := p x1 · · · p xk , (12.49)
fk (u) x1 <···<x k
x1 ,...,xk ∈[n]

p is the Schmidt probability vector of ψ AB , and u is the d-dimensional uniform probability

vector. Note that the k-concurrence is normalized such that it equal one of maximally
entangled states. Thus, the concurrence monotones take values between zero and one. The
power 1/k above is necessary to make these functions entanglement monotones; i.e. it can
be shown that these functions not only behave monotonically under LOCC but also non-
increasing on average under non-deterministic LOCC (see the following sections below).
For the two extreme casesk = 2 and k = d, we get for any ψ ∈ Pure(AB) with reduced
density matrix ρ := TrB ψ AB , that
r
AB
AB
d
C ψ := C2 ψ = (1 − Tr [ρ2 ])
d−1 (12.50)
AB
AB
1/d
G ψ := Cd ψ = d (det(ρ)) .
In literature, C above is called simply the concurrence, and G is called the G-concurrence
since it can be interpreted as the geometric mean of the Schmidt coefficients. Note that for
d = 2 we have C = G = Ck .
Exercise 12.3.1. Prove the equalities in (12.50).
Exercise 12.3.2.
 Let {|0⟩, |1⟩⟩} be a basis in which the second Pauli matrix, σy , has the
0 i
form σy =  . Show that for any ψ ∈ Pure(AB) with |A| = |B| = 2
−i 0

C ψ AB = ⟨ψ̄ AB |σy ⊗ σy |ψ AB ⟩

(12.51)
where ψ̄ AB is defined such that if |ψ AB ⟩ = x,y∈{0,1} cxy |x⟩|y⟩ then |ψ̄ AB ⟩ = x,y∈{0,1} c̄xy |x⟩|y⟩.
P P

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

12.4. STOCHASTIC INTERCONVERSIONS 541

12.4 Stochastic Interconversions

Quantum mechanics is inherently non-deterministic. This is manifested by quantum mea-
surements that transform a quantum state into several possible post-measurement states.
Typically, an LOCC protocol does not convert a quantum state into another quantum state.
Instead, it converts it to ensemble of states with each state occurring with different prob-
ability. Here we consider such probabilistic LOCC transformations among pure bipartite
states.
Suppose Alice and Bob share the pure bipartite state ψ ∈ Pure(AB). After they apply
LOCC to this state they end up sharing one out of n states {|ϕAB z ⟩}z∈[n] each with corre-
AB
sponding probability tz . The resulting ensemble of states {tz , |ϕz ⟩}z∈[n] can be described
with a cq-state X
σ ZAB := tz |z⟩⟨z|Z ⊗ |ϕAB AB
z ⟩⟨ϕz | , (12.52)
z∈[n]

where Z is a ‘flag’ system registering the value z. In this view, the LOCC protocol converted
the state ψ AB to the cq-state σ ZAB . The question we study here is to which cq-states the
pure state ψ AB can be transformed into by LOCC. Since the output state σ ZAB is not a pure
state, we cannot apply Nielsen majorization theorem. Yet, as we show now, Nielsen theorem
is imperative to answer such a question.

Theorem 12.4.1. A pure bipartite state ψ ∈ Pure(AB) can be converted by LOCC

AB
to the ensemble ϕz , tz z∈[n] of pure states in Pure(AB) if and only if for all
k ∈ [d] (d := |A| = |B|)
X
E(k) ψ AB ⩾ tz E(k) ϕAB

z , (12.53)
z∈[n]

where the functions E(k) are defined in (12.28).

Proof. Let p be the Schmidt probability vector associated with ψ AB . For each z ∈ [n], define
qz := (q1|z , . . . , qd|z )T as the Schmidt probability vector associated with ϕAB
z . We can assume
↓ ↓
without loss of generality that p = p and qz = qz , since the order of the Schmidt vectors
can always be rearranged by applying local unitary (permutation) maps to |ψ AB ⟩ and |ϕAB z ⟩.
Also, denote by
X X√
q := tz qz and by |ϕAB ⟩ := qx |xx⟩AB , (12.54)
z∈[n] x∈[d]

the bipartite state whose Schmidt vector is q. Since qz = q↓z for all z ∈ [n], it follows that
q = q↓ as well. The components of q are thus given by
X
qx = tz qx|z ∀ x ∈ [d] . (12.55)
z∈[n]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

542 CHAPTER 12. PURE-STATE ENTANGLEMENT

Consequently, for each k ∈ [d], the entanglement measure E(k) for ϕAB is given by
d
X
AB
E(k) (ϕ )= qx
x=k+1

X d
X
(12.55)→ = tz qx|z (12.56)
z∈[n] x=k+1
X
= tz E(k) (ϕAB
z ) .
z∈[n]

Hence, from Nielsen majorization theorem as expressed in Exercise 12.2.7, it follows that (12.53)
holds if and only if ψ AB can be converted to ϕAB by LOCC. Therefore, to complete the proof
AB AB
we now show that ϕ can be converted by LOCC to the ensemble ϕz , tz z∈[n] . The
conversion is achieved by the following single measurement performed by Alice
r
X tz qx|z
Mz := |x⟩⟨x|A (12.57)
qx
x∈[d]

Observe that X X X tz qx|z X

Mz∗ Mz = |x⟩⟨x| = |x⟩⟨x|A = I A , (12.58)
qx
z∈[n] x∈[d] z∈[n] x∈[d]

where we used (12.55). Furthermore, note that

Xp √
Mz ⊗ I B |ϕAB ⟩ = tz qx|z |xx⟩AB = tz |ϕAB

z ⟩ . (12.59)
x∈[d]

That is, Alice’s measurement produces the state |ϕAB

z ⟩ with probability tz . This completes
the proof.
The above theorem generalizes Nielsen majorization theorem to probabilistic transfor-
mations, and in addition leads to the following corollary.

Corollary 12.4.1. Let ψ, ϕ ∈ Pure(AB) be two pure bipartite entangled states.

Then, the maximal probability with which ψ AB can be converted to ϕAB by LOCC is
given by

AB LOCC AB
E(k) (ψ AB )
Pr ψ −−−→ ϕ = min . (12.60)
k∈[d] E(k) (ϕAB )

Remark. Note that for k = 1, E1 (ψ AB ) = 1 for all ψ ∈ Pure(AB). Therefore, the expression
on the right-hand side of the equation above can never exceed one. Note further that the
corollary above is a simplification of the formal result given in Corollary 11.1.1. That is, for
pure bipartite states it is sufficient to check the ratios of only d resource measures in order
to compute the maximum probability of conversion.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

12.4. STOCHASTIC INTERCONVERSIONS 543

Proof. Consider an optimal LOCC that converts ψ AB to ϕAB with the maximum possible
AB AB LOCC AB
probability. Such LOCC protocol yields ϕ with probability p := Pr ψ −−−→ ϕ
and other states with probability 1 − p. All such other states can always be converted
deterministically (by LOCC) to the product state |0⟩⟨0|A ⊗ |0⟩⟨0|B . Therefore, without
loss of generality we can assume that ψ AB is converted to ϕAB with probability p, and to
|0⟩⟨0|A ⊗ |0⟩⟨0|B with probability 1 − p. Since Ek (|0⟩⟨0|A ⊗ |0⟩⟨0|B ) = 0 for all k = 2, . . . , d,
Theorem 12.4.1 implies that such an LOCC protocol is possible if and only if
E(k) (ψ AB ) ⩾ pE(k) (ϕAB ) ∀ k ∈ {2, . . . , d} . (12.61)
The proof is concluded by recognizing that the equation above is equivalent to
E(k) (ψ AB )
p⩽ min . (12.62)
k∈{2,...,d} E(k) (ϕAB )

Therefore, the maximum probability is the one given in (12.60).

√ √
Exercise 12.4.1. Let |ψ AB ⟩ = p1 |11⟩ + . . . + pd |dd⟩ ∈ Cd ⊗ Cd be a 2-qudit entangled
state.
1. What is the maximum probability to convert by LOCC |ψ AB ⟩ to the maximally entangled
state |ΦAB ⟩.
2. For d = 2 find the LOCC protocol that achieves the maximum probability you found in
Part 1.
Exercise 12.4.2. Show that a maximally entangled state ΦAB can be converted to any mixed
bipartite quantum state ρ ∈ D(AB).
Theorem 12.4.1 can also be use to provide necessary and sufficient conditions to determine
the conversion of a pure entangled state to a mixed entangled state by LOCC.

Corollary 12.4.2. Let ψ ∈ Pure(AB) and σ ∈ D(AB) be two bipartite entangled

states with d := |A| = |B|. Then, ψ AB can be converted to σ AB by LOCC if and only
if n X o
AB AB
max min E(k) (ψ ) − px E(k) (ϕx ) ⩾ 0 , (12.63)
{tx , ϕx }x∈[m] k∈[d]
x∈[m]

where the maximum is over all pure-stateP decompositions of σ (i.e. over all pure-state
ensembles {px , ϕx }x∈[m] that satisfy σ = x∈[m] px ϕx ).

Proof. Suppose the condition in (12.63) P holds. Then, there exists an ensemble of states
AB AB AB
{px , ϕx }x∈[m] such that E(k) (ψ ) ⩾ x∈[m] px E(k) (ϕx ) for all k ∈ [d]. From Theo-
rem 12.4.1 it follows that ψ AB can be converted by LOCC to the cq-state
X
σ XAB = px |x⟩⟨x|X ⊗ ϕABx . (12.64)
x∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

544 CHAPTER 12. PURE-STATE ENTANGLEMENT

LOCC
Since tracing out the classical system X is an LOCC operation we conclude that ψ AB −−−→
σ AB .
LOCC
Conversely, suppose ψ AB −−−→ σ AB . From Theorem 12.2.1 it follows that there exists a
generalized measurement on Alice’s system {Mx }x∈[m] and a set of unitary matrices on Bob’s
system {Ux }x∈[m] such that
X
σ AB = (Mx ⊗ Ux ) ψ AB (Mx ⊗ Ux )∗ . (12.65)
x∈[m]

Denote by |ϕAB 1
x ⟩ := √px (Mx ⊗ Ux ) |ψ
AB
⟩, where px := ψ AB Mx∗ Mx ⊗ I B ψ AB . Then, from
the equation above we get that the ensemble {px , ϕABx }x∈[m] form a pure state decomposition
AB AB
of σ . Moreover, by definition, ψ can be converted by LOCC to ϕAB x with probability
px . Therefore, from Theorem 12.4.1 it follows that
n X o
min E(k) (ψ AB ) − px E(k) (ϕAB
x ) ⩾0. (12.66)
k∈[d]
x∈[n]

Hence, the condition in (12.63) holds. This completes the proof.

12.5 Approximate Single-Shot Conversions

In practical scenarios, state transformations are often imperfect. Rather than seeking perfect
LOCC conversion of one state into another, it is more realistic to allow the final state to
be ε-close to the target state, where ε > 0 is a small threshold typically reflective of the
inaccuracy inherent in the apparatus used. Allowing for the final state to be any state
within ε-proximity to the desired state opens the door to additional state transformations
that are not encompassed by Nielsen’s majorization condition.

12.5.1 The Conversion Distance

Consider two pure bipartite states ψ, ϕ ∈ Pure(AB) (with d := |A| = |B|). The conversion
distance of ψ AB into ϕAB is given by (refer to (11.24) with F = LOCC)

AB LOCC AB
1 AB AB AB LOCC AB
T ψ −−−→ ϕ := min ϕ −σ 1
: ψ −−−→ σ . (12.67)
σ∈D(AB) 2
One might wonder whether this conversion distance changes if we restrict the optimization
over D(AB) to states over Pure(AB). Given that the trace distance between pure states
equals the purified distance, constraining the above optimization to pure states yields
n o
AB LOCC AB AB AB AB LOCC AB

P⋆ ψ −−−→ ϕ := min P ϕ ,φ : ψ −−−→ φ . (12.68)
φ∈Pure(AB)

LOCC
It’s important to note that the value of T (ψ AB −−−→ ϕAB ) is not greater than that of
LOCC
P⋆ (ψ AB −−−→ ϕAB ). This is because restricting σ AB to be a pure state in the calculation of

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

12.5. APPROXIMATE SINGLE-SHOT CONVERSIONS 545

LOCC
T (ψ AB −−−→ ϕAB ) can only increase the minimum value obtained in the optimization process.
Furthermore, as we will explore in the next chapter (specifically, in Lemma 13.3.1), it will
LOCC LOCC
be shown that T (ψ AB −−−→ ϕAB ) can actually be strictly smaller than P⋆ (ψ AB −−−→ ϕAB ).
However, we will soon discover that the value of P⋆ remains unchanged even when the
optimization is extended from Pure(AB) to the full set of density matrices D(AB).
The optimization problem in (12.68) can be simplified as follows. Initially, observe that
if p, r ∈ Prob(d) are the Schmidt vectors of ψ AB and φAB , respectively, then according to
LOCC
Nielsen’s theorem, the condition ψ AB −−−→ φAB is equivalent to r ≻ p. Thus, for any given
Schmidt vector r of φAB , we first perform the optimization over all states φ ∈ Pure(AB)
AB
P √
with the same Schmidt vector r. Denoting by |φ̃ ⟩ = x∈[d] rx |xx⟩, this is equivalent to
optimization over the local unitaries U ⊗ V , such that |φAB ⟩ = U ⊗ V |φ̃AB ⟩. Due to the
relationship between trace distance and fidelity, we have
r
min P ϕAB , φAB = 1 − max |⟨ϕAB |φAB ⟩|2 (12.69)
U,V ∈U(d) U,V ∈U(d)

Denoting by |ϕAB ⟩ = N ⊗ I B |ΩAB ⟩ and by D the diagonal matrix with diagonal r, we obtain

max ⟨ϕAB |φAB ⟩ = max Tr[N ∗ U DV T ]

U,V ∈U(d) U,V ∈U(d)

von-Neumann trace inequality

Xq ↓ ↓ (12.70)
(see (B.25)) →= rx q x ,
x∈[d]

where q ∈ Prob(d) is the Schmidt vector of ϕAB . Taking everything into consideration we
obtain the following simplification:
n o
AB LOCC AB
P⋆ ψ −−−→ ϕ = min P (q, r) : r ≻ p , (12.71)
r∈Prob(d)

where p is the Schmidt vector of ψ AB , q the Schmidt vector of ϕAB , and P (q, r) :=
p
1 − F 2 (q, r) is the purified distance between probability vectors.
Observe that we added the subscript ⋆ to P⋆ since the conversion distance between two
pure states ψ, ϕ ∈ Pure(AB) as measured by the purified distance is defined as:
n o
AB LOCC AB AB AB AB LOCC AB

P ψ −−−→ ϕ := min P ϕ ,σ : ψ −−−→ σ . (12.72)
σ∈D(AB)

At first glance, this conversion distance may seem to be different than P⋆ , however, the
following theorem demonstrate the two are equal.

Theorem 12.5.1. Let ψ, ϕ ∈ Pure(AB). Then,

AB LOCC AB AB LOCC AB
P ψ −−−→ ϕ = P⋆ ψ −−−→ ϕ . (12.73)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

546 CHAPTER 12. PURE-STATE ENTANGLEMENT

Proof. The inequality

LOCC LOCC
P ψ AB −−−→ ϕAB ⩽ P⋆ ψ AB −−−→ ϕAB . (12.74)

follows trivially since restricting σ AB in (12.72) to be a pure state φAB can only increase
the quantity. To prove the opposite inequality let σ AB be an optimizer of (12.72). Since
LOCC
ψ AB −−−→ σ AB it follows from Corollary 12.4.2 and its proof that there exists an ensemble
LOCC
{tz , φz }z∈[k] such that σ = z∈[k] tz φz and ψ AB −−−→ {tz , φAB
AB AB AB
P
z }z∈[k] . For each z ∈ [k]
AB
let rz be the Schmidt vector of φz and define
X
r := tz r↓z . (12.75)
z∈[k]

LOCC
Let φAB be a pure state with a Schmidt vector r, so that ψ AB −−−→ φAB . Observe that the
square fidelity is given by
2 X 2
F ϕAB , σ AB = ⟨ϕAB |σ AB |ϕAB ⟩ = tz ⟨ϕAB |φAB
z ⟩
z∈[k]
X 2
cf . (12.70)→ ⩽ tz F q↓ , r↓z (12.76)
z∈[k]
2
(5.188)→ ⩽ F q↓ , r↓ .
Hence, q
= 1 − F (ϕAB , σ AB )2
AB LOCC AB

P ψ −−−→ ϕ
q
(12.76)→ ⩾ 1 − F (q↓ , r↓ )2 (12.77)
LOCC
⩾ P⋆ ψ AB −−−→ ϕAB .

Comparing the above inequality with (12.74) we get the equality of (12.73).

Corollary 12.5.1. Let ψ ∈ Pure(AB) and Φm ∈ Pure(A′ B ′ ) be the maximally

entangled state with m := |A′ | = |B ′ |. Then,
q
LOCC
P Φm −−−→ ψ AB = E(m) (ψ AB ) , (12.78)

where E(m) is the measure of entanglement defined in (12.28) for k = m.

Remark. Observe that the corollary above provides an operational meaning to the entangle-
ment monotones E(m) . That is, E(m) (ψ AB ) measures how close (in terms of the square of the
purified distance) Φm can reach to ψAB by LOCC. Furthermore,
it’s noteworthy that when
LOCC AB
m ⩾ |A|, the conversion distance P Φm −−−→ ψ equals zero. This outcome arises be-
cause, in this scenario, the majorization theorem by Nielsen guarantees that the conversion
LOCC
Φm −−−→ ψ AB can be accomplished exactly.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

12.5. APPROXIMATE SINGLE-SHOT CONVERSIONS 547

Proof. Let p ∈ Prob↓ (n), with n := |A|, be the Schmidt vector corresponding to ψ AB . From
Theorem 12.5.1 it follows that
n o
LOCC AB (m)
P Φm −−−→ ψ = min P (p, r) : r ≻ u . (12.79)
r∈Prob↓ (n)

where u(m) is the uniform density matrix in Prob(m). Now, observe that the condition
r ≻ u(m) holds if and only if r has at most m non-zero components. Denoting by |r| the
number of non-zero components in r, and using the fact that the square of the purified
distance equals one minus the square of the fidelity, we get
X√ 2
LOCC
P 2 Φm −−−→ ψ AB = 1 − max rx px
r∈Prob(n),|r|=m
x∈[n]
X√ 2
=1− max rx px
r∈Prob(m) (12.80)
x∈[m]
X
Exercise 12.5.1→ = 1 − px
x∈[m]

= E(m) ψ AB .

This completes the proof.

Exercise 12.5.1. Let {sx }x∈[n] be a set of non-negative real numbers. Show that
X√ sX
max rx sx = ∥s∥2 := s2x (12.81)
r∈Prob(n)
x∈[n] x∈[n]

Drawing on the more straightforward formula in (12.71), a different kind of conversion

distance can be defined, one that is based on the trace distance between Schmidt probability
vectors. Specifically, let
n1 o
LOCC
T⋆ ψ AB −−−→ ϕAB := min ∥q − r∥1 : r ≻ p , (12.82)
r∈Prob(d) 2

where p is the Schmidt vector of ψ AB , and q is the Schmidt vector of ϕAB .

This metric can be viewed as the conversion distance of ψ AB into ϕAB , in a framework
where all bipartite pure states are identified with their corresponding Schmidt vectors. It
quantifies the proximity to which the Schmidt vector of ψ AB can approach the Schmidt
vector of ϕAB via doubly stochastic matrices. Given that the resource content of pure
bipartite entangled states is fully encapsulated in their Schmidt vectors, this conversion
distance arguably becomes the most natural way to gauge the effectiveness of interconversion
between two pure resources.
In fact, the two conversion distances defined in (12.67) and (12.82) are topologically
equivalent. This equivalence becomes apparent when considering that the inequalities in

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

548 CHAPTER 12. PURE-STATE ENTANGLEMENT

(5.258) lead to the following relation:

r
AB LOCC AB AB LOCC AB LOCC
T ψ −−−→ ϕ ⩽P ψ −−−→ ϕ 2T ψ AB −−−→ ϕAB
⩽ (12.83)
r
AB LOCC AB LOCC LOCC
T⋆ ψ −−−→ ϕ ⩽ P⋆ ψ AB −−−→ ϕAB AB
⩽ 2T⋆ ψ −−−→ ϕ AB . (12.84)

Therefore, combining these inequalities with Theorem 12.5.1 gives

r
AB LOCC AB LOCC
T ψ −−−→ ϕ ⩽ 2T⋆ ψ AB −−−→ ϕAB (12.85)
r
LOCC LOCC
T⋆ ψ AB −−−→ ϕAB ⩽ 2T ψ AB −−−→ ϕAB . (12.86)

Considering the established equivalence between the two conversion distances, it makes sense
to primarily use T⋆ for further analysis, owing to its following closed-form expression.

Closed Formula
Theorem 12.5.2. Let ψ, ϕ ∈ Pure(AB) be two bipartite states with d := |A| = |B|,
and let p, q ∈ Prob(d) be the corresponding Schmidt probability vectors of ψ AB and
ϕAB , respectively. Then,

LOCC
T⋆ ψ AB −−−→ ϕAB = max ∥p∥(k) − ∥q∥(k) .

(12.87)
k∈[d]

Proof. The proof follows directly from Theorem 4.2.2. To see this, observe that

AB LOCC AB
1
T⋆ ψ −−−→ ϕ := min ∥q − r∥1
r∈Majo(p) 2
(12.88)

= T q, Majo(p)

Theorem 4.2.2→ = max ∥p∥(k) − ∥q∥(k) .
k∈[d]

This concludes the proof.

Exercise 12.5.2. Show that the maximization (12.87)
over all k ∈ [d] can be replaced with
AB
maximization over all k ∈ [r], where r = SR ψ is the Schmidt rank of ψ AB .

12.5.2 Distillable Entanglement

In this subsection we calculate the single-shot distillable entanglement. Consider first the
zero-error case in which ε = 0, and let p = p↓ ∈ Prob(d) be the Schmidt vector of an
entangled pure state ψ ∈ Pure(AB), with d := |A| = |B|. For the zero-error case, the
single-shot distillable entanglement is defined as:
n o
LOCC
Distillε=0 ψ AB := max log m : ψ AB −−−→ Φm .

(12.89)
m∈N

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

12.5. APPROXIMATE SINGLE-SHOT CONVERSIONS 549

LOCC
From Nielsen majorization theorem, ψ AB −−−→ Φm if and only if m1 ⩾ p1 , where p1 is the
first component of p = p↓ . Hence,
ε=0 AB
n 1o
Distill ψ = max log m : m ⩽
m∈N p1
(12.90)
1
m is an integer→ = log .
p1
The quantity 1/p1 can be express in terms of the min-entropy of p. Specifically,
Distillε=0 ψ AB = log 2Hmin (A)ρ ,

(12.91)

where Hmin (A)ρ is the min-entropy (see (6.22)) of the reduced density matrix ρA := TrB ψ AB .
To extend the formula above for the case that ε > 0, we use the computable conversion
distance, to calculate the single-shot distillable entanglement. Specifically, for any ε ∈ (0, 1)
and ψ ∈ Pure(AB), we define the ε-single-shot distillable entanglement as:
n o
LOCC
Distillε ψ AB := max log m : T⋆ ψ AB −−−→ Φm ⩽ ε .

(12.92)
m∈N

From the closed formula for T⋆ we get the following result.

Theorem 12.5.3. Using the same notations as above, for every ε ∈ [0, 1) and
ψ ∈ Pure(AB) we have
ε
Distillε ψ AB = log 2Hmin (A)ρ ,

(12.93)
ε
where Hmin (A)ρ is the smoothed min-entropy as given in (10.147).

Remark. In (10.147) we found a closed form to the smoothed min-entropy. Using this form
we can expressed the ε-single-shot distillable entanglement of ψ AB as:

ε AB
k
Distill ψ = min log , (12.94)
k∈{ℓ,...,d} ∥p∥(k) − ε
where ℓ ∈ [d] is the integer satisfying ∥p∥(ℓ−1) ⩽ ε < ∥p∥(ℓ) .
Proof. From Theorem 12.5.2 and Exercise 12.5.2 we have

LOCC
T⋆ ψ AB −−−→ Φm = max ∥p∥(k) − ∥u(m) ∥(k)

k∈[d]
(12.95)
k
= max ∥p∥(k) − .
k∈[d] m
Combining this with the definition in (12.92) we obtain

ε AB
k
Distill ψ = max log m : ∥p∥(k) − ⩽ε ∀ k ∈ [d]
m∈N m
(12.96)
k
= max log m : ∥p∥(k) − ⩽ε ∀ k ∈ {ℓ, . . . , d} ,
m∈N m

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

550 CHAPTER 12. PURE-STATE ENTANGLEMENT

k
since from the definition of ℓ, if k < ℓ then the inequality ∥p∥(k) − m ⩽ ε holds trivially.
k
Finally, observe that for each k ∈ {ℓ, . . . , d}, the condition ∥p∥(k) − m ⩽ ε can be expressed
k
as m ⩽ ∥p∥(k) −ε
, and since m is an integer, this condition is equivalent to m ⩽ ak , where

k
ak := . (12.97)
∥p∥(k) − ε

Therefore, with this notation Eq. (12.96) gives

Distillε ψ AB = max log m : m ⩽ ak ∀ k ∈ {ℓ, . . . , d}

m∈N
(12.98)
= log min {ak } .
k∈{ℓ,...,d}

This completes the proof.

Exercise 12.5.3. Use the formula in (12.94) to compute Distillε (ψ AB ) for the two extreme
cases: (1) ψ AB is a maximally entangled state; (2) ψ AB is a product state. Give a physical
interpretation to the results.

Exercise 12.5.4. Let ψ ∈ Pure(AB). Show that

lim− Distillε ψ AB = ∞ .

(12.99)
ε→1

12.5.3 Entanglement Cost

In this subsection, we present a closed-form expression for the single-shot entanglement cost.
Intriguingly, as we will explore in the next chapter, for the entanglement cost, both conversion
distances (referenced in equations (12.67) and (12.82)) yield the same entanglement cost.
This is a result we could not demonstrate for the distillable entanglement of a pure bipartite
dstate, though we suspect it to be true.
For any ε ∈ (0, 1) and ψ ∈ Pure(AB), the ε-single-shot entanglement cost is defined as
n o
ε AB
LOCC AB
Cost ψ := min log m : T⋆ Φm −−−→ ψ ⩽ε (12.100)
m∈N

From the closed formula for T⋆ we get the following result.

Theorem 12.5.4. Let ε ∈ [0, 1), ψ ∈ Pure(AB), d := SR(ψ AB ), and p ∈ Prob(d) be

the Schmidt probability vector of ψ AB . Then, the ε-single-shot entanglement cost of
ψ AB is given by
Costε ψ AB = log m

(12.101)
where m ∈ [d] is the integer satisfying ∥p∥(m−1) < 1 − ε ⩽ ∥p∥(m) .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

12.5. APPROXIMATE SINGLE-SHOT CONVERSIONS 551

Proof. From Theorem 12.5.2 and Exercise 12.5.2 we have for any m ∈ N

LOCC
X1
k

AB
T⋆ Φm −−−→ ψ = max − px = max − ∥p∥(k) . (12.102)
k∈[m] m k∈[m] m
x∈[k]

We therefore have

ε AB
k
Cost ψ = min log m : − ∥p∥(k) ⩽ ε ∀ k ∈ [m]
m∈N m

k
= min log m : m ⩾ ∀ k ∈ [m]
m∈N ∥p∥(k) + ε (12.103)

m
Exercise 12.5.5→ = min log m : m ⩾
m∈N ∥p∥(m) + ε

= min log m : ∥p∥(m) ⩾ 1 − ε .
m∈N

This completes the proof.

Exercise 12.5.5. Let
k
bk := ∀ k ∈ [d] . (12.104)
∥p∥(k) + ε
Show that
b1 ⩽ b2 ⩽ · · · ⩽ bd . (12.105)
As a simple example of this formula, consider the case that ε = 0. In this case, the
smallest m for which ∥p∥(m) ⩾ 1 is m = SR(ψ AB ). Therefore, as expected, the exact
single-shot cost of ψ AB is given by
Costε=0 ψ AB = log SR ψ AB .

(12.106)
Exercise 12.5.6. Use the formula in (12.101) to compute Costε (ψ AB ) for the two extreme
cases: (1) ψ AB is a maximally entangled state; (2) ψ AB is a product state. Give a physical
interpretation for your results.
In the following corollary we provide an operational interpretation for the smoothed max-
entropy as the single-shot entanglement cost. Recall that the max-entropy of ρ ∈ D(A) is
defined as (cf. (6.20)),
Hmax (A)ρ := log Rank(ρ) , (12.107)
and its ε-smoothed version as
ε
Hmax (A)ρ := ′ min Hmax (A)ρ′ . (12.108)
ρ ∈Bε (ρ)

Corollary 12.5.2. Let ε ∈ [0, 1), ψ ∈ Pure(AB), and denote by ρA :=B ψ AB its
reduced density matrix. Then, the ε-single-shot entanglement cost of ψ AB is given by

Costε ψ AB = Hmax ε

(A)ρ . (12.109)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

552 CHAPTER 12. PURE-STATE ENTANGLEMENT

Proof. The proof follows trivially from a combination of the theorem above and the expres-
ε
sion for Hmax as given in Lemma 10.4.2.

12.5.4 Embezzlement of Entanglement

In the single-shot regime, we explore the approximate interconversion of resources where the
final states are ε-close to the intended states. As previously discussed, this flexibility intro-
duces additional state transformations that go beyond the scope of Nielsen’s majorization
theorem. Notably, this includes the phenomenon known as embezzlement of entanglement.
Embezzlement of entanglement can be regarded as an extreme form of entanglement
catalysis. Consider a family of bipartite states |χn ⟩ ∈ Cn ⊗ Cn , defined by
1 X 1
|χn ⟩ := √ √ |x⟩|x⟩ , (12.110)
Hn y∈[n] x

where the normalization factor Hn := x∈[n] x1 is known as the harmonic number. We will
P

demonstrate that |χn ⟩ can serve as a catalyst for the generation of any arbitrary bipartite
state |ψ⟩ ∈ Cm ⊗ Cm , with |χn ⟩ undergoing minimal change. More precisely, for any ε > 0,
there exists an n ∈ N such that

LOCC
T⋆ χn −−−→ ψ ⊗ χn ⩽ ε . (12.111)
This remarkable result implies that it’s feasible to ‘embezzle’ a copy of |ψ⟩ from the catalyst
|χn ⟩, effectively borrowing some of its entanglement while leaving it largely unchanged.
To see how it works, recall from Lemma 11.1.1 that

LOCC LOCC
T⋆ χn −−−→ ψ ⊗ χn ⩽ T⋆ χn −−−→ Φm ⊗ χn , (12.112)

since the maximally entangled state |Φm ⟩ = √1m x∈[m] |xx⟩ can be converted by LOCC to
P
the state ψ. It is therefore sufficient to show that the right-hand side of the equation above
can be made arbitrarily small as we increase the dimension n. Let p be the Schmidt vector of
χn and q be the Schmidt vector of Φm ⊗ χn . Observe that p ∈ Prob(n) and q ∈ Prob(nm).
From Theorem 12.5.2 we know that

LOCC
T⋆ χn −−−→ Φm ⊗ χn = max ∥p∥(k) − ∥q∥(k) . (12.113)
k∈[n]

Now, observe that the components of q have the form pmx . Therefore, for any decomposition
k

k = am + b, with a := m and some b ∈ {0, 1, . . . , m − 1} we have
b
∥q∥(k) = ∥p∥(a) + pa+1 . (12.114)
m
Substituting this into (12.113) gives

LOCC
b
T⋆ χn −−−→ Φm ⊗ χn = max ∥p∥(k) − ∥p∥(⌊k/m⌋) − p⌊k/m⌋+1
k∈[n] m
b∈{0,...,m−1} (12.115)

= max ∥p∥(k) − ∥p∥(⌊k/m⌋) .
k∈[n]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

12.5. APPROXIMATE SINGLE-SHOT CONVERSIONS 553

Now, from the specific form of χn in (12.110) we have ∥p∥(k) = Hk /Hn so that the above
equality is equivalent to
Hk − H⌊k/m⌋

LOCC
T⋆ χn −−−→ Φm ⊗ χn = max . (12.116)
k∈[n] Hn
Finally, we use the well known bounds on the harmonic number Hn given by
1
ln(n) + ⩽ Hn ⩽ ln(n) + 1 . (12.117)
n
Using these bounds we estimate

k 1
Hk − H⌊k/m⌋ ⩽ ln(k) + 1 − ln − k
m m (12.118)
Exercise 12.5.7→ ⩽ 1 + ln(2m) .
We therefore conclude that
1 + ln(2m)
LOCC n→∞
T⋆ χn −−−→ ϕ+
m ⊗ χ n ⩽ −−−→ 0 , (12.119)
Hn
since Hn goes to infinity as n goes to infinity.
Exercise 12.5.7. Prove the second inequality of (12.118).
Exercise 12.5.8. Fix α ∈ R, and consider the bipartite entangled state
1 X√ α
|φn ⟩ := √ x |xx⟩ , (12.120)
Nn x∈[n]

where Nn = x∈[n] xα is the normalization factor. Show that only for α = −1 the state |φn ⟩
P
can be used to embezzle entanglement.
In the general case of arbitrary family of pure bipartite states, {χn }n∈N , observe that for
any integer ℓ ⩽ a := ⌊n/m⌋ + 1

max ∥p∥(k) − ∥p∥(⌊k/m⌋) = max ∥p∥(k) − ∥p∥(ℓ−1)
k∈[m(ℓ−1),...,mℓ−1] k∈[m(ℓ−1),...,mℓ−1]
(12.121)
= ∥p∥(mℓ−1) − ∥p∥(ℓ−1)
where we used the convention that ∥p∥(k) := 1 for an integer k > n. With this convention,
we conclude that

LOCC
T⋆ χn −−−→ Φm ⊗ χn = max ∥p∥(mℓ−1) − ∥p∥(ℓ−1) . (12.122)
ℓ∈[a]

Therefore, if {χn }n∈N is an embezzling family then in particular it satisfies

mℓ−1
X
lim p(n)
x = 0 ∀ℓ∈N. (12.123)
n→∞
x=ℓ

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

554 CHAPTER 12. PURE-STATE ENTANGLEMENT

The above equation holds if and only if

lim p(n)
x = 0 ∀x∈N. (12.124)
n→∞

However, observe that the condition above is in general insufficient to determine if the states
{χn }n∈N form an embezzling family, since the maximizer ℓ in (12.122) can depend on n.

12.6 Asymptotic Entanglement Theory of Pure States

In Sec. 11.5.1 we defined the asymptotic resource cost and the asymptotic distillation of a
resource. Applying these general definitions to the case of pure bipartite entanglement results
in the following definitions of the cost and distillable rates of converting ψ ∈ Pure(AB) to
ϕ ∈ Pure(AB):
nn o
AB AB ⊗n LOCC ⊗m

Cost ψ → ϕ := lim inf : T⋆ ψ −−−→ ϕ ⩽ε
ε→0+ n,m∈N m
n m o (12.125)
AB AB ⊗n LOCC ⊗m

Distill ψ → ϕ := lim+ sup : T⋆ ψ −−−→ ϕ ⩽ε .
ε→0 n,m∈N n
These definitions can then be used to define the entanglement cost and distillable entangle-
ment as:
Cost ψ AB = Cost Φ2 → ψ AB and Distill ψ AB = Distill ψ AB → Φ2 , (12.126)

where |Φ2 ⟩ := √12 (|00⟩ + |11⟩) is the 2 × 2 dimensional maximally entangled state (i.e. a
Bell state). In the following subsections we provide closed formulas for these measures of
entanglement and discuss their relations to the single-shot quantities Costε and Distillε that
we studied in the previous section.

12.6.1 Entanglement Cost

From Lemma 11.5.1, particularly Exercise 11.5.4, it follows that the entanglement cost as
defined above can be expressed for any ψ ∈ Pure(AB) as
1
Cost ψ AB = lim+ lim inf Costε ψ ⊗n .

(12.127)
ε→0 n→∞ n

In the following theorem we compute the entanglement cost and prove a stronger version of
the above relation.

Theorem 12.6.1. Let ψ ∈ Pure(AB). Then, for any ε ∈ (0, 1)

1
Cost ψ AB = lim Costε ψ ⊗n = E ψ AB ,

(12.128)
n→∞ n

where E is the entropy of entanglement defined in (12.47).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

12.6. ASYMPTOTIC ENTANGLEMENT THEORY OF PURE STATES 555

Remark. Observe that from the theorem above it follows that there is no need to take the
limit ε → 0+ in (12.127) ;that is, by taking the limit n → ∞ the dependance on ε is
eliminated (as long as ε ∈ (0, 1)).
Proof. The proof follows directly from a combination of Corollary 12.5.2 and the variant of
the AEP property given in (10.171). Specifically, denoting by ρA := TrB ψ AB , we get from
Corollary 12.5.2 that
1 1 ε
Costε ψ ⊗n = lim Hmax (An )ρ⊗n

lim
n→∞ n n→∞ n (12.129)
(10.171)→ = H(A)ρ .

This completes the proof.

Exercise 12.6.1. Provide a more direct proof of the theorem above using the concept of
typicality. Hint: The proof is a bit involved and can be found in Appendix D.5.

12.6.2 Distillable Entanglement

From Lemma 11.5.1, particularly Exercise 11.5.4, it follows that the distillable entanglement
as defined above can be expressed for any ψ ∈ Pure(AB) as
1
Distill ψ AB = lim+ lim sup Distillε ψ ⊗n .

(12.130)
ε→0 n→∞ n

In the following theorem we compute the distillable entanglement and prove a stronger
version of the above relation.

Theorem 12.6.2. Let ψ ∈ Pure(AB). Then, for any ε ∈ (0, 1)

1
Distill ψ AB = lim Distillε ψ ⊗n = E ψ AB ,

(12.131)
n→∞ n

where E is the entropy of entanglement defined in (12.47).

Proof. The proof follows directly from a combination of Theorem 12.5.3 and the variant of
the AEP property given in (11.61). Specifically, denoting by ρA := TrB ψ AB , we get from
Corollary 12.5.2 that
1 1 ε
Costε ψ ⊗n = lim Hmin (An )ρ⊗n .

lim (12.132)
n→∞ n n→∞ n

Moreover, taking σ A = uA in (11.61) yields

1 ε
lim Hmin (An )ρ⊗n = H(A)ρ . (12.133)
n→∞ n

This completes the proof.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

556 CHAPTER 12. PURE-STATE ENTANGLEMENT

For those readers seeking additional insights, an alternative proof utilizing the concept of
typicality is provided in Appendix D.5. This proof offers a different perspective and leverages
the principles of typicality, which may be of interest to readers who are keen on exploring
diverse approaches and methodologies within the field.

12.6.3 Reversibility of Pure Bipartite Entanglement Theory

We saw in the last two theorems that both the entanglement cost and the distillable entan-
glement of a pure bipartite state are equal to the entropy of entanglement. This equality
between the entanglement cost and the distillable entanglement implies that the QRT of
pure bipartite entanglement is reversible. Note that we get reversibility under LOCC which
is a strict subset of RNG (i.e. “non-entangling”) operations. In the next chapter we will
discuss in more details the relationship between non-entangling operations and LOCC.

Exercise 12.6.2. Use the theorems above to show that for any ψ, ϕ ∈ Pure(AB)
E(ψ) E(ϕ)
Distill ψ AB → ϕAB = and Cost ψ AB → ϕAB = . (12.134)
E(ϕ) E(ψ)

12.7 Notes and References

Comprehensive reviews on entanglement theory can be found in [133] and [181]. Additional
information on LOCC operations can be found in [49, 47] and references therein. Theo-
rem 12.2.1 is a slight modified version of the one given by [149]. The relation between
entanglement and majorization, specifically Theorem 12.2.2, was first established by [169].
The entanglement monotones defined in (12.28) were first introduced by [224]. Entangle-
ment catalysis was introduced by [141]. The concurrence measure of entanglement was first
introduced by [235] for the purpose of computing the entanglement of formation of a mixed
bipartite state. Later on it was generalized by [86] to the family of entanglement mono-
tones given in (12.49). Theorem 12.4.1 is due to [142], while the formula for the maximum
probability to convert one state to another by LOCC, i.e. Corollary 12.4.1, was first proved
independently by [223] without the use of Theorem 12.4.1 as we do here. The second corol-
lary of Theorem 12.4.1, i.e. Corollary 12.4.2, as well as the closed formula for T⋆ , where first
introduced in [239]. Embezzlement of entanglement was first introduced by [214].

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

CHAPTER 13

Mixed-State Entanglement

To gain a better understanding of bipartite entanglement theory, we will delve into its most
general form in this chapter. The free states of the theory consists of separable states,
denoted for any composite system AB by
 
X 
SEP(AB) := px σxA ⊗ ωxB : σx ∈ D(A), ωx ∈ D(B), p ∈ Prob(n), n ∈ N (13.1)
 
x∈[n]

where we used the notation p := (p1 , . . . , pn )T . Observe that SEP(AB) is a closed convex
set. Any quantum state ρ ∈ D(AB) that does not belong to SEP(AB) is referred to as
an entangled state. This chapter will reveal the intricate structure of entangled states,
highlighting the complexity of mixed-state entanglement theory.

13.1 Detection of Entanglement

Unlike pure state entanglement, detecting mixed state entanglement is a challenging task
even from a theoretical standpoint. In particular, density matrices in D(AB) are usually
expressed as positive semidefinite matrices with a size of |AB| × |AB| and a trace of one.
Establishing whether these matrices belong to SEP(AB) is a complex undertaking, and in
most cases, it falls under the category of NP-hard problems.

13.1.1 Entanglement Witnesses

One technique for detecting entanglement involves the concept of resource witnesses, which
is discussed in Section 9.4. Spedifically, Definition 9.4.1 of a resource witness can be adapted
to apply to entanglement theory.

557
558 CHAPTER 13. MIXED-STATE ENTANGLEMENT

Entanglement Witness
Definition 13.1.1. An operator Γ ∈ Herm(AB) is called an entanglement witness if
the following two conditions holds:

1. For any σ ∈ SEP(AB)

Tr ΓAB σ AB ⩾ 0 .

(13.2)

2. There exists ρ ∈ D(AB) such that

Tr ΓAB ρAB < 0 .

(13.3)

From the condition in (13.2), and the fact that SEP(AB) is the convex hull of product
states it follows that if Γ ∈ Herm(AB) is an entanglement witness then for any product state
ψ ⊗ ϕ ∈ Pure(AB) we must have
ψ A ⊗ ϕB ΓAB ψ A ⊗ ϕB ⩾ 0 . (13.4)
On the other hand, the condition 13.3 also implies that the exists a state χ ∈ Pure(AB)
such that
χAB ΓAB χAB < 0 . (13.5)
In other words, the condition (13.3) implies that ΓAB is not positive semidefinite so that we
can take χAB , for example, to be an eigenstate corresponding to a negative eigenvalue of
ΓAB .
Based on Theorem 9.4.1, we can conclude that entanglement witnesses are an effective
tool for detecting entanglement. Specifically, ρ ∈ D(AB) is an entangled state if and only if
there exists an entanglement witness Γ ∈ Herm(AB) such that
Tr ΓAB ρAB < 0; .

(13.6)
This characteristic can be employed to demonstrate that the set of separable states occupies
a non-zero volume.

Theorem 13.1.1. The set SEP(AB) has a non-zero volume in D(AB). Specifically,
there exists ε > 0 such that Bε (uAB ) ⊂ SEP(AB), where Bε (uAB ) is the “ball” of all
states in D(AB) that are ε-close to the maximally mixed state uAB = uA ⊗ uB .

Proof. Suppose by contradiction that the statement in the theorem is false. Then, there
exists a sequence of bipartite entangled states {τnAB }n∈N such that
1 AB
lim τn − uAB 1
=0. (13.7)
n→∞ 2

Since we assume that τnAB is entangled, we have τn ̸∈ SEP(AB). Therefore, there exists an
entanglement witness ΓAB
n such that
Tr τnAB ΓAB

n <0. (13.8)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.1. DETECTION OF ENTANGLEMENT 559

Without loss of generality we can assume that for each n ∈ N the witness ΓAB
n is normalized
with respect to the Hilbert-Schmidt inner product; i.e.
h 2 i
Tr ΓABn =1. (13.9)

Therefore, the sequence {ΓAB n } is a sequence of Hermitian operators in the unit sphere
of Herm(AB). Since the unit sphere is compact, there exists a subsequence {nk }k∈N of
integers such that the limit limk→∞ ΓAB AB
nk exists and equal to some normalized operator Γ⋆ ∈
AB AB
Herm(AB). Since each Γnk is an entanglement witness, the limit Γ⋆ must satisfy

Tr ΓAB AB

⋆ σ ⩾0 ∀ σ ∈ SEP(AB) . (13.10)

On the other hand, taking the limit k → ∞ on both sides of the inequality Tr τnAB
k
ΓAB
nk
<0
gives
Tr ΓAB AB

⋆ u ⩽0, (13.11)
AB
so that Tr Γ⋆ ⩽ 0. Now, let {|ψx ⟩A }x∈[m] be an orthonormal basis of A, and {|ϕy ⟩B }y∈[ℓ]
be an orthonormal basis of B. Then,
X X
0 ⩾ Tr ΓAB Tr ΓAB ψxA ⊗ ϕB

⋆ = ⋆ y . (13.12)
x∈[m] y∈[ℓ]

From (13.10) it follows that for each x ∈ [m] and y ∈ [ℓ] we have Tr ΓAB ψ A
⊗ ϕB
⩾ 0.
AB A B
⋆ x y
We therefore get that for any x ∈ [m] and y ∈ [ℓ], Tr Γ⋆ ψx ⊗ ϕy = 0. Finally, since
the orthonormal bases {|ψx ⟩A }x∈[m] and {|ϕy ⟩B }y∈[ℓ] where arbitrary, we conclude that

Tr ΓAB ψ A ⊗ ϕB = 0

⋆ ∀ ψ ∈ Pure(A) ∀ ϕ ∈ Pure(B) . (13.13)

However, from Exercise 3.3.6 it follows that the above equation holds if and only if ΓAB
⋆ =0
in contradiction with the fact that ΓAB
⋆ is normalized, so in particular, cannot be the zero
matrix. This completes the proof.

Exercise 13.1.1. Let A1 , . . . , Am be m physical systems and let SEP(A1 · · · Am ) be the set
of multipartite separable states; i.e. SEP(A1 · · · Am ) is the convex hull of the set of all m-
fold product states of the form ρ1 ⊗ · · · ⊗ ρm , with ρx ∈ D(Ax ) for all x ∈ [m]. Show that
SEP(A1 · · · Am ) has a non-zero volume in D(A1 · · · Am ).

The following theorem shows a close connection between entanglement witnesses and
positive maps.

Theorem 13.1.2. Any entanglement witness is the Choi matrix of a positive map
that is not completely positive. Explicitly, Γ ∈ Herm(AB) is an entanglement
witness if and only if ΓAB = JEAB for some positive map E ∈ Pos(A → B) that is not
completely positive (i.e. E ̸∈ CP(A → B)).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

560 CHAPTER 13. MIXED-STATE ENTANGLEMENT

Proof. Suppose first that ΓAB = JEAB for some positive map E ∈ Pos(A → B) and suppose
E ̸∈ CP(A → B). Then, for any product state ρ ⊗ σ ∈ Pure(AB) we have

Tr ΓAB ρA ⊗ σ B = Tr JEAB ρA ⊗ σ B

B A→B
(ρA )T

(13.14)
h i
TrA JEAB ρA ⊗ I B = E A→B (ρA )T −−−−→ = Tr σ E

⩾0

where the last inequality follows from the fact that E ∈ Pos(A → B) so that E ρT ⩾ 0.
Finally, the existence of a state χ ∈ Pure(AB) that satisfies (13.5) follows from the fact that
E is not completely positive so its Choi matrix ΓAB is not positive semidefinite.
Conversely, suppose ΓAB is an entanglement witness and let E ∈ L(A → B) be such
that ΓAB = JEAB (but we do not assume that E is positive). Then, for any ρ ∈ D(A) and
σ ∈ D(B) we have h i
T
Tr σ B E A→B ρA = Tr JEAB ρA ⊗ σ B

h T i
= Tr ΓAB ρA ⊗ σ B (13.15)

⩾0
where the last inequality follows from the fact that ΓAB is an entanglement witness and
T
ρA ⊗ σ B ∈ SEP(AB). Since ρA and σ B where arbitrary states, the above inequality
implies that E ∈ Pos(A → B). The map E is not completely positive since its Choi matrix,
ΓAB , is not positive semidefinite (as ΓAB is an entanglement witness). This completes the
proof.

Observe that the set of all entanglement witnesses consists of all the non-positive semidef-
inite matrices that are in the dual cone of the set of separable states. Specifically, for the
composite system AB, the set of all entanglement witnesses, denoted by WIT(AB), is given
by
WIT(AB) = Γ ∈ SEP(AB)∗ : ΓAB ̸⩾ 0 .

(13.16)
We will now provide two examples of how Theorem 9.4.1, adapted to entanglement theory
with the set WIT(AB) mentioned above, can be used to determine whether a quantum state
is entangled.

Example 1: The Isotropic State

The isotropic state in D(AB), with m := |A| = |B|, is defined as

ρAB
t = tΦAB
m + (1 − t)τ
AB
(13.17)

where t ∈ (0, 1), and

I AB − Φm
τ AB := . (13.18)
m2 − 1

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.1. DETECTION OF ENTANGLEMENT 561

Observe that Φm τ = τ Φm = 0, and furthermore, since ΦAB m is invariant under the action
AB
of the
AB twirling channel G defined in (3.251) also ρ t has this property. In fact, the state
ρt t∈[0,1] can be viewed as the set of all quantum states that are invariant under G
(see (3.255)). In the following, we will utilize this property to make the argument that the
isotropic state ρAB
t satisfies:
1
ρAB
t ∈ SEP(AB) ⇐⇒ t⩽ . (13.19)
m
To prove the above statement we follow Theorem 9.4.1. Specifically, ρAB t is separable if
and only if Tr[ΓAB ρAB
t ] ⩾ 0 for all entanglement witnesses Γ ∈ WIT(AB). The key idea is
AB
to use the invariance of ρt under G to get that

Tr ΓAB ρAB = Tr ΓAB G ρAB

t t
(13.20)
G is self-adjoint→ = Tr G ΓAB ρAB

t .

Now, observe that if σ ∈ SEP(AB) then also G(σ) ∈ SEP(AB) so that

Tr G ΓAB σ AB = Tr ΓAB G σ AB ⩾ 0 .

(13.21)

Combining this with (13.20) we conclude that ρAB

t is separable if and only if Tr[ΓAB ρAB
t ] ⩾ 0
for all entanglement witnesses of the form

ΓAB = G ΓAB = aI AB + bΦAB

m , (13.22)

where a, b ∈ R. In the final equality, we made use of the fact that I AB and ΦABm spans the
subspace of G-invariant operators in Herm(AB).
To ensure that the matrix Γ = aI + bΦm is an entanglement witness, we need to appro-
priately specify the coefficients a and b. Let’s start by noting that a must be non-negative,
as evidenced by
a = Tr Γ |1⟩⟨1| ⊗ |2⟩⟨2| ⩾ 0 . (13.23)
Furthermore, it’s important to recognize that Γ exhibits two distinct eigenvalues: a with
multiplicity |AB| − 1, and a + b with multiplicity one. Given that an entanglement witness
has at least one negative eigenvalue, and considering that a ⩾ 0, it is necessary for b to
satisfy b < −a. It’s also worth mentioning that the scenario where a = 0 does not yield an
entanglement witness (this is an interesting point to ponder – why this is the case?).
Consequently, after rescaling ΓAB by a positive factor a > 0, we can, without loss of
generality, assume that ΓAB takes the form

W = I AB − rΦAB
m , (13.24)

where r > 1. From (13.4) the matrix ΓAB is an entanglement witness if and only if for any
product state ψ ⊗ ϕ ∈ Pure(AB)

0 ⩽ Tr ΓAB ψ A ⊗ ϕB = 1 − rTr ΦAB ψ A ⊗ ϕB .

m (13.25)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

562 CHAPTER 13. MIXED-STATE ENTANGLEMENT

In Exercise 13.1.2 you show that

1
Tr ΦAB A B

max m ψ ⊗ ϕ = . (13.26)
ψ∈Pure(A) m
ϕ∈Pure(B)

We therefore conclude that Γ = I −rΦm is an entanglement witness if and only if 1 < r ⩽ m.

Hence, ρAB
t is separable if and only if for all 1 < r ⩽ m we have
0 ⩽ Tr ρAB I AB − rΦAB

t m = 1 − rt . (13.27)
The above inequality holds for all 1 < r ⩽ m if and only if t ⩽ 1/m. This completes the
proof of (13.19).
Exercise 13.1.2. Prove (13.26).
Exercise 13.1.3. Show that the isotropic state can be expressed as
m2

AB AB 1 AB
ρt = 2 (1 − t)u + t − 2 Φm . (13.28)
m −1 m

Example 2: Werner States

The Werner state is a density matrix in D(AB) that remains invariant under the action of
the G-twirling map, where G in this case is defined as the map given in (3.243). The Werner
state ρAB
W is defined for all p ∈ [0, 1] by (cf. (3.246))
2 2
ρAB := p ΠABSym + (1 − p) ΠAB . (13.29)
W
m(m + 1) m(m − 1) Asy
1 1

Recall that ΠAB
Sym = 2
I AB
+ F AB
and ΠAB
Asy = 2
I AB
− F AB
, where F AB is the flip
AB
operator F defined in (C.187). Therefore, the Werner state above can also be expressed
more compactly as (see Exercise 13.1.4 below)
1
ρAB I AB − αF AB ,

W = (13.30)
m(m − α)
where the new parameter α ∈ [−1, 1] is related to p via
1 + m(1 − 2p)
α := . (13.31)
1 − 2p + m
1

Exercise 13.1.4. Prove (13.30) by substituting the expressions ΠAB
Sym = 2
I AB + F AB and
1
AB AB AB

ΠAsy = 2 I − F into (13.29).
Similar to the analysis of the isotropic state, we can determine for which values of p (or
α) the Werner state is entangled. We find that ρAB W ∈ SEP(AB) if and only if p ⩽ 21 or
1
equivalently if and only if α ⩽ m . We leave the proof as an exercise.
Exercise 13.1.5. Prove that ρAB
W ∈ SEP(AB) if and only ifα ⩽ m1 . Hint: Show first that
the Werner state ρAB
W is entangled if and only if Tr ΓAB ρAB
W ⩾ 0 for all Γ ∈ WIT(AB) of
the form Γ = aI + bF AB . Then find the values of a and b for which aI AB + bF AB ∈
AB AB

WIT(AB) and continue from there.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.1. DETECTION OF ENTANGLEMENT 563

Entanglement Witnesses in Small Dimensions

For small dimensions, WIT(AB) has the following simple characterization.

Theorem 13.1.3. Let Γ ∈ WIT(AB) with |AB| ⩽ 6. Then, there exists

η1 , η2 ∈ Pos(AB) such that

ΓAB = η1AB + T B→B η2AB ,

(13.32)

where T ∈ Pos(B → B) is the transpose map.

Proof. Without loss of generality suppose |A| = 2 and |B| ⩽ 3. From Theorem 13.1.2 there
exists E ∈ Pos(A → B) such that

ΓAB = E Ã→B ΩAÃ . (13.33)

Furthermore, from Theorem 3.4.1 it follows that E = N1 +T ◦N2 for some N1 , N2 ∈ →

CP(A
B). Substituting this into the equation above, and denoting by ηj := NjÃ→B ΩAÃ for
j = 1, 2, we get that ΓAB has the form (13.32). Finally, observe that η1 , η2 ⩾ 0 since N1 and
N2 are completely positive maps. This concludes the proof.

13.1.2 The PPT Criterion

In this subsection we consider a simple criterion to detect entanglement. The criterion is
known as the Peres-Horodecki criterion or thePPPT criterion since it is based on the partial
transpose. The criterion states that if ρAB = x∈[m] px ρA B
x ⊗ ρx is a separable density matrix
then its partial transpose
B T
X
T B→B ρAB = px ρ A

x ⊗ ρx ⩾0, (13.34)
x∈[m]

is a positive semidefinite matrix. We will say that ρAB has positive partial transpose (PPT)
if this property hold, and otherwise, we will say that it has a negative partial transpose
(NPT) or simply that the state is an NPT state1 .
We have seen before that the 2-qubit maximally entangled state is an NPT state, and in
Exercise 3.4.2 you showed that all pure entangled states are NPT. Therefore, it is natural
to ask if all entangled states are NPT. In low dimensions, the following theorem states that
this is indeed the case.

Theorem 13.1.4. Let ρ ∈ D(AB) be a bipartite density matrix with dimensions of

the underlying Hilbert spaces satisfy |AB| ⩽ 6. Then, ρAB is entangled if and only if
it is an NPT state.
1
Note that the partial transpose of NPT states can have positive eigenvalues. We use the term NPT only
to indicate that the partial transpose of the state has at least one negative eigenvalue.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

564 CHAPTER 13. MIXED-STATE ENTANGLEMENT

Proof. If ρAB is an NPT state then from (13.34) it cannot be separable. Conversely, suppose
ρAB is a PPT state, and recall from Theorem 9.4.1 (when applied to entanglement theory)
that ρAB is separable if and only if
Tr ρAB ΓAB ⩾ 0

∀ Γ ∈ WIT(AB) . (13.35)
Now, fix Γ ∈ WIT(AB). From Theorem 13.1.3 ΓAB have the form (13.32) for some η1 , η2 ∈
Pos(AB). Hence,
Tr ρAB ΓAB = Tr ρAB η1AB + T B→B η2AB

⩾ Tr ρAB T B→B η2AB

(13.36)
T = T ∗ → = Tr T B→B ρAB η2AB

ρAB is PPT→ ⩾ 0 .
Since ΓAB was an arbitrary entanglement witness in WIT(AB) we conclude that the above
equation holds for all Γ ∈ WIT(AB) so that ρAB must be a separable state. This concludes
the proof.
The condition |AB| ⩽ 6 in the theorem above is optimal. Indeed, there are examples of
PPT entangled states in higher dimensions, including the case |A| = 2 and |B| = 4, as well
as the case |A| = |B| = 3.

Example of a PPT Entangled State

Consider the five product states in A ⊗ B := C3 ⊗ C3 given by
1 1
|ψ1 ⟩ := √ |0⟩ |0⟩ − |1⟩ , |ψ2 ⟩ := √ |0⟩ − |1⟩ |2⟩
2 2
1 1
|ψ3 ⟩ := √ |2⟩ |1⟩ − |2⟩ , |ψ4 ⟩ := √ |1⟩ − |2⟩ |0⟩ (13.37)
2 2
1
|ψ5 ⟩ := |0⟩ + |1⟩ + |2⟩ |0⟩ + |1⟩ + |2⟩ .
3
Observe that the five states above are orthonormal and they are invariant under the partial
transpose (i.e. T B→B (ψxAB ) = ψxAB for all x ∈ [5]). They also have the property that any
pure state in C3 ⊗ C3 that is orthogonal to all the five states above must be entangled
(see Exercise 13.1.6). The set {|ψx ⟩}x∈[5] is consequently called an unextendible product
basis (UPB) of the subspace H := span{|ψx ⟩}x∈[5] . It therefore follows that the orthogonal
complement of H in C3 ⊗ C3 , denoted by H⊥ , contains only entangled states. Let Π be the
projection to H⊥ ; that is, X
ΠAB = I9 − ψxAB , (13.38)
x∈[5]

where I9 is the 9×9 identity matrix. It then follows that the bipartite density matrix ρ := 31 Π
is entangled (see Exercise 13.1.6). However, the state ρ is also PPT since
1 AB X B→B AB 1 AB X AB
T B→B (ρAB ) = I − T ψx = I − ψx = ρAB ⩾ 0 . (13.39)
3 3
x∈[5] x∈[5]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.1. DETECTION OF ENTANGLEMENT 565

Therefore, the two-qutrit state ρAB is a PPT entangled state.

Exercise 13.1.6. Consider the five states defined in (13.37).
1. Show that any non-zero state in C3 ⊗C3 that is orthogonal to all the five states in (13.37)
must be entangled.

2. Show that the state 31 I9 − x∈[5] ψx is entangled. Hint: Use the fact that Π projects
P

into a subspace consisting of only entangled states.

Exercise 13.1.7.
1. Show that the number elements of every UPB in Cm ⊗Cn cannot be less than m+n−1.

2. Let W ⊂ Cm ⊗ Cn be a subspace containing no product states (i.e. containing no

normalized product vectors). Show that

dim W ⩽ (m − 1)(n − 1) . (13.40)

Exercise 13.1.8. Let K be the set of PPT operators in Pos(AB). Show that K∗ consists of
all operators Γ ∈ Herm(AB) of the form (13.32).

13.1.3 The Reduction Criterion

In Sec. 7.3 we saw that separable states has non-negative conditional entropy. Moreover, in
Corollary 7.3.1 we saw that if a quantum state ρAB satisfies H(A|B)ρ ⩾ 0 for every measure
H of conditional entropy then ρAB must satisfy

I A ⊗ ρB ⩾ ρAB . (13.41)

In particular, if ρAB is separable then it must satisfy the condition above. This criterion for
separability is known as the reduction criterion.
The reduction criterion can be expressed in terms of the positive map P ∈ Pos(A → A)

P(ω A ) := Tr ω A I A − ω A

∀ ω ∈ L(A) . (13.42)

Recall from Exercise 3.4.15 that the map described above is positive but not 2-positive,
and therefore not completely positive. Utilizing this map, the reduction criterion can be
expressed as:
P A→A ρAB ⩾ 0 .

(13.43)
Exercise 13.1.9 below provides an alternative expression for the positive map P A→A when
|A| = 2; specifically, for this case
T
P A→A ω A = σy ω A σy

∀ ω ∈ L(A) . (13.44)

Therefore, in this case the reduction criterion is equivalent to the PPT criterion.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

566 CHAPTER 13. MIXED-STATE ENTANGLEMENT

In general, however, the PPT criterion is a more powerful criterion for detecting entan-
glement than the reduction criterion, since there exist entangled states that violate the PPT
criterion, yet cannot be detected by the reduction criterion, whereas the converse is not true.
This means that the PPT criterion can detect a larger class of entangled states than the
reduction criterion. The reason for that is that for |A| > 2 the map P above has the form
(see Exercise 13.1.9)
P = P1 + T ◦ P2 , (13.45)
where P1 , P2 ∈ CP(A → A) and T ∈ Pos(A → A) is the transpose map.
Although the reduction criterion may not be as effective as the PPT criterion in detecting
entanglement, further investigation reveals that it still holds importance in the field of quan-
tum resource theories. In fact, quantum states that do not satisfy the reduction criterion
possess non-zero distillable entanglement, which emphasizes the usefulness of the criterion
in other aspects of quantum information. In the upcoming sections, we will explore some of
these implications in greater detail.
Exercise 13.1.9. Let P A→A be as in (13.42).
1. Prove the relation (13.44) for the case that |A| = 2.
2. Prove the relation (13.45) for the case |A| ⩾ 2. Hint: Show that the partial transpose
of the Choi matrix of P A→A is positive semidefinite.

13.1.4 The Realignment Criterion

The realignment criterion is another powerful tool used to detect entanglement in quantum
systems. One of the advantages of the realignment criterion is that it can be used in conjunc-
tion with other criteria to strengthen the detection of entanglement. For example, if a state
satisfies the PPT criterion but fails the realignment criterion, then the state is entangled.
Similarly, if a state fails the PPT criterion but satisfies the realignment criterion, then it is
entangled.
The realignment criterion is based on the operator Schmidt decomposition introduced in
Exercise 2.3.33. Specifically, let A and B be two Hilbert spaces of dimensions m := |A| and
n := |B| and denote by k := min{m2 , n2 }. Every ρ ∈ D(AB) can be expressed in terms of k
non-negative real numbers {λx }x∈[k] , and two orthonormal sets of Hermitian matrices (w.r.t.
the Hilbert-Schmidt inner product) {ηx }x∈[k] ⊂ Herm(A) and {ζy }y∈[k] ⊂ Herm(B):
X
ρAB = λx ηxA ⊗ ζxB . (13.46)
x∈[k]

Theorem 13.1.5. Using the same notations as above, a quantum state ρ ∈ D(AB)
is entangled if it satisfies X
λx > 1 . (13.47)
x∈[k]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.1. DETECTION OF ENTANGLEMENT 567

Proof. Since the condition given in (13.47) can be expressed as

X
Tr ΛAB ρAB < 0 where Λ := I AB − ηxA ⊗ ζxB ,

(13.48)
x∈[k]

it is sufficient to show that ΛAB is an entanglement witness. Indeed, let ψ ∈ Pure(A) and
ϕ ∈ Pure(B). Then,
X
Tr ΛAB ψ A ⊗ ϕB = 1 −

Tr[ψηx ]Tr[ϕζx ] . (13.49)
x∈[k]

Now, let v, u ∈ Rk be the vectors whose components are {Tr[ψηx ]}x∈[k] and {Tr[ϕζx ]}x∈[k] ,
respectively. Then, we need to show that v · u ⩽ 1. Since {ηx }x∈[k] is an orthonormal set in
Herm(A) (which can be completed to a full orthonormal basis of Herm(A)), in terms of the
Frobenius norm (i.e. the norm induced by the Hilbert-Schmidt inner product)
X 2
1 = ∥ψ∥22 ⩾ Tr[ψηx ]ηx
2
x∈[k]
X (13.50)
{ηx }x∈[k] is orthonormal −−−−→ = |Tr[ψηx ]|2 = v · v .
x∈[k]

Similarly, we get that u · u ⩽ 1. Hence, we must have v · u ⩽ 1. This completes the

proof.
The term “realignment” comes from the fact that the sum of the singular values {λx }x∈[k]
can be expressed in terms of the trace norm of a realigned version of the state ρAB . For
simplicity, suppose m = |A| = |B|, let ρ ∈ D(AB), and expend ρAB in the standard basis as
X
ρAB = rxx′ yy′ |x⟩⟨x′ |A ⊗ |y⟩⟨y ′ |B , (13.51)
x,x′ ,y,y ′ ∈[m]

where rxx′ yy′ ∈ C. We then define the realigned state ρ̃AB as

X
ρ̃AB := rxx′ yy′ |x⟩⟨y|A ⊗ |x′ ⟩⟨y ′ |B . (13.52)
x,x′ ,y,y ′ ∈[m]

We then argue (see Exercise 13.1.10) that the sum appearing in (13.47) can be expressed as
X
λx = ρ̃AB 1 . (13.53)
x∈[m2 ]

In other words, the realignment criterion can be stated as follows: if the trace-norm of the
realigned matrix ρ̃AB is greater than one (i.e. ∥ρ̃AB ∥1 ⩾ 1) then the state ρAB is entangled.

Exercise 13.1.10. Prove the relation (13.53).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

568 CHAPTER 13. MIXED-STATE ENTANGLEMENT

Exercise 13.1.11. Consider the case |A| = 2 and |B| = 4. Let p ∈ [0, 1] and ρ ∈ Herm(AB)
be the matrix  
1  pI4 pξ 
ρAB = (13.54)
7p + 1 pξ T η

where I4 is the 4 × 4 identity matrix,

   √ 
1+p 1−p2
0 1 0 0 2
0 0 2
   
   
0 0 1 0  0 p 0 0 
ξ := 

 ,
 and η := 

 .
 (13.55)
0 0 0 1  0 0 p 0
√

  
1−p2 1+p
0 0 0 0 2
0 0 2

1. Show that ρ ∈ D(AB) for all p ∈ [0, 1].

2. Show that ρAB is PPT for all p ∈ [0, 1].

3. Show that for some p ∈ [0, 1] the state ρAB is entangled.

Hint: Use the Schur complement (see Sec. B.8).

13.1.5 The k-Extendability Criterion

The concept of symmetric extensions of quantum states provides the most powerful criterion
for separability currently known. Consider a state σ AB that can be expressed as a convex
combination of product states:
X
σ AB = px ψxA ⊗ ϕB
x ∈ SEP(AB) . (13.56)
x∈[m]

We can construct a symmetric extension σ ∈ SEP(AB B̃) of this state by introducing an

additional system B̃ and defining the extension as
X
σ AB B̃ = px ψxA ⊗ ϕB B̃
x ⊗ ϕx . (13.57)
x∈[m]

This extension is considered symmetric since the original state can be obtained by tracing
out either the B or the B̃ systems; i.e., the marginals of σ AB B̃ satisfy σ AB = σ AB̃ . On
the other hand, when dealing with entangled states, it is not immediately clear whether
a symmetric extension, ρAB B̃ , with the property ρAB = ρAB̃ exists for an entangled state
ρ ∈ D(AB). While this property holds trivially for separable states, it doesn’t hold for all
entangled states.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.1. DETECTION OF ENTANGLEMENT 569

Note that the extension of the separable state in (13.56) can also be extended to k-copies
of B via
k
X
ρAB = px ψxA ⊗ ϕB Bk
x ⊗ · · · ⊗ ϕx ,
1
(13.58)
x∈[m]

where B ∼
= B1 ∼
= ··· ∼
k
= Bk . We say that ρAB has a symmetric k-extension of ρAB .

Definition 13.1.2. We say that ρ ∈ D(AB) is k-extendible if there exists

ρ ∈ D(AB k ), with B k = B1 · · · Bk and B ∼
= B1 ∼
= ··· ∼
= Bk , such that its marginals
satisfy
ρABm = ρABm′ = ρAB ∀ m, m′ ∈ [k] . (13.59)
k
If such a state exists then ρAB is called a symmetric k-extension of ρAB .

We saw above that every separable quantum state is k-extendible for all k ∈ N. Sur-
prisingly, the converse of this statement is also true! This means that if a quantum state
ρ ∈ D(AB) is k-extendible for all k ∈ N, then it must be separable. However, proving this
statement requires certain techniques that are beyond the scope of this book. Specifically,
it involves the use of the quantum de Finetti theorem. Interested readers can find more
information in the ‘notes and references’ section at the end of this chapter.
Given a quantum state ρ ∈ D(AB), how can we determine if it is k-extendible? Observe
that the conditions ρAB1 = ρABj , for all j ∈ [k], can be expressed as
h k k
i
Tr ρAB ΛAB j =0, (13.60)

for all Λj ∈ Herm(AB k ) of the form

k
ΛAB
j = η AB1 ⊗ I B2 ···Bk − ξ ABj ⊗ I B1 ···Bj−1 Bj+1 ···Bk , (13.61)

where η ∈ Herm(AB1 ) and ξ ∈ Herm(ABj ). Note that the linearity of the condition above
implies that we can restrict η and ξ to belong to orthonormal bases of Herm(AB1 ) and
Herm(ABj ), respectively. Thus, we conclude that there exists a finite number of operators
{Λjℓ }j∈[k],ℓ∈[n] such that ρAB1 = ρABj , for all j ∈ [k], if and only if
h k k
i
Tr ρAB ΛAB
jℓ =0 ∀ j ∈ [k], ℓ ∈ [n] . (13.62)

The conditions specified above indicate that the determination of whether ρAB is k-extendible
requires the solution of an SDP feasibility problem. Therefore, the criterion for k-extendibility
can be computed algorithmically and efficiently.
Exercise 13.1.12. Using the same notations as above:
1. Find an upper bound on n.

2. Use Farkas lemma of Exercise 4.6.16 to express the dual form of (13.62).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

570 CHAPTER 13. MIXED-STATE ENTANGLEMENT

13.2 Quantification of Entanglement

Entanglement is quantified by functions that behave monotonically under LOCC. More pre-
cisely, adapting Definition 10.1.1 to entanglement theory we get that a measure of entangle-
ment is a function [
E: D(AB) → R (13.63)
A,B

that satisfy the following two conditions:

1. For any LOCC map E ∈ LOCC(AB → A′ B ′ ), and any bipartite state ρ ∈ D(AB)
E E ρAB ⩽ E ρAB .

(13.64)

2. E(1) = 0, where 1 correspond to the only element of D(AB) when |A| = |B| = 1.
Exercise 13.2.1. Show that any measure of entanglement E as defined above satisfies the
following two conditions: (1) It is always non-negative, that is, E(ρAB ) ⩾ 0 for all ρ ∈
D(AB), and (2) it satisfies E(σ AB ) = 0 for all σ ∈ SEP(AB).
In general, LOCC can be stochastic, in the sense that ρAB can be converted to σxAB with
some probability px . In this case, the map from ρAB to σxAB can not be described by a
CPTP map. However, by introducing a classical ‘flag’ system X, we can view the ensemble
{σxAB , px }x∈[m] as a classical quantum state σ XAB := X AB
P
x∈[m] px |x⟩⟨x| ⊗ σx . Hence, if
ρAB can be converted by LOCC to σxAB with probability px , then there exists map E ∈
LOCC(AB → XAB) such that E(ρAB ) = σ XAB . Since the ‘flag’ system X is classical
both Alice and Bob have access to it since if Alice holds it she can communicate it to Bob,
and vice versa. Therefore, the definition above of a measure of entanglement
capture also
XAB AB
probabilistic transformations. Particularly, E must satisfy E σ ⩽E ρ .
Almost all measures of entanglement studied in literature (although not all) satisfy
X
E σ XAB = px E(σxAB ) ,

(13.65)
x∈[m]

which is very intuitive since X is just a classical system encoding the value of x. We call this
relation
L in (13.65) the direct sum property since, mathematically, σ XAB can also be viewed
AB XAB
as x∈[m] px σx . If the direct sum property holds then the condition E σ ⩽ E ρAB

becomes x px E(σxAB ) ⩽ E ρAB meaning that LOCC can not increase entanglement on
P
average. Therefore, the direct sum property is in general stronger than the strong monotonic-
ity property (10.10) of a resource measure M. In fact, the condition (13.65) also implies that
E is convex (see Exercise 13.2.2). We therefore conclude that any measure of entanglement
that satisfies the direct sum property is an entanglement monotone.
Exercise 13.2.2. Let E be a measure of entanglement satisfying the direct sum prop-
erty (13.65). Show that E is convex; i.e. for any ensemble of states {px , σxAB }x∈[m] we
have X X
px σxAB ⩽ px E σxAB .

E (13.66)
x∈[m] x∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.2. QUANTIFICATION OF ENTANGLEMENT 571

13.2.1 Extension of Entanglement from Pure to Mixed States

Quantifying entanglement in mixed states is considerably more complex than quantifying
entanglement in pure states. This is partly because there is no equivalent to Nielsen’s
majorization theorem, and as we discussed earlier, LOCC on mixed bipartite states is much
more intricate and difficult to characterize. Nevertheless, several approaches have been
developed to help characterize entanglement.

The Convex Roof Extension

Let [
E: Pure(AB) → R (13.67)
A,B
be a measure of pure state entanglement. We can extend the domain of E to mixed states
using a method known as the convex roof extension. The method is based on the fact
that any bipartite mixed state ρAB has many pure-state decompositions. Recall that a
pure-state decomposition of ρAB is an ensemble of pure states, {px , ψxAB }x∈[m]
P (where ψx ∈
Pure(AB) and {px }x∈[m] is a probability distribution), that satisfies ρAB = x∈[m] px ψxAB . In
Exercise 2.3.15 we saw that every unitary matrix in U(m) can be used to define a particular
pure state decomposition.

Entanglement of Formation
Definition 13.2.1. Let E be a measure
S of pure state entanglement. The convex roof
extension of E, is a function EF : A,B D(AB) → R defined on any ρ ∈ D(AB) via
X
EF ρAB = inf px E ψxAB

(13.68)
x∈[m]

where the infimum is over all pure-state decompositions {px , ψxAB }x∈[m] of ρAB . EF is
called the entanglement of formation associated with the pure-state measure E.

Remark. The term “entanglement of formation” originated from historical reasons, as it was
originally believed that EF (ρAB ), with E taken as the entropy of entanglement, represented
the entanglement cost required to create the state ρAB . However, we will discover later on
that it is actually the regularized entanglement of formation that can be interpreted as the
entanglement cost of ρAB .
Exercise 13.2.3. Show that if E is the entropy of entanglement then its corresponding
entanglement of formation satisfies for all ρ ∈ D(AB),
EF ρAB ⩽ min H ρA , H ρB

. (13.69)
To show that the entanglement of formation is indeed a measure of entanglement, recall
that any measure of pure state entanglement, E, can be expressed for all ψ ∈ Pure(AB) as
E ψ AB = g ρA with ρA := TrB ψ AB ,

(13.70)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

572 CHAPTER 13. MIXED-STATE ENTANGLEMENT

for some Schur concave function g. A slightly stronger condition than Schur concavity is the
condition that g is both symmetric and concave. In this case, the resulting entanglement of
formation is an entanglement monotone.

Theorem 13.2.1. Let E be a measure of entanglement on pure states given as

in (13.70) with g being a concave symmetric function. Then, its convex roof
extension, EF , as defined in Definition 13.2.1, is an entanglement monotone.

We first prove the following auxiliary lemma. In this lemma we only consider a quantum
instrument on Bob’s system and consider a pure initial bipartite state. We show that for
this simpler case, the convex roof extension of E satisfies strong monotonicity.

′ ′
Lemma 13.2.1. Let ψ ∈ Pure(AB), E B→B X := x∈[m] ExB→B ⊗ |x⟩⟨x|X be a
P

quantum instrument on Bob’s subsystem, and for all x ∈ [m],

′ 1 B→B ′ AB h ′ i
σxAB := E ψ where px := Tr ExB→B ψ AB . (13.71)
px x
Then,
′
X
px EF σxAB ⩽ EF ψ AB .

(13.72)
x∈[m]

Remark. In the proof below, we adopt the notations like ϕA := TrB ϕAB to denote the
reduced density matrix of a pure bipartite state. This notation is instrumental in reducing
the number of symbols used, enhancing clarity and conciseness. However, it’s crucial to
remember that ϕA in this context represents a mixed state, despite the notation resembling
that typically used for pure states. This distinction is important for a correct understanding
of the concepts and calculations involved in the proof.

Proof. For every x ∈ [m], let

′ ′
X
σxAB = ry|x ϕAB
xy (13.73)
y∈[n]
′ ′
where {ry|x , ϕAB AB
xy }y∈[n] is the optimal pure-state decomposition of σx . Therefore, for each
x ∈ [m] we have X
′ ′
EF σxAB = ry|x E ϕAB
xy
y∈[n]
X
ry|x g ϕA

cf. (13.70)→ = xy
y∈[n] (13.74)
X
g is concave→ ⩽ g ry|x ϕA
xy
y∈[n]

cf. (13.73)→ = g σxA .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.2. QUANTIFICATION OF ENTANGLEMENT 573

Hence,
′
X X
σxAB px g σxA

px EF ⩽
x∈[m] x∈[m]
X (13.75)
g is concave→ ⩽ g px σxA .
x∈[m]

Finally, observe that the reduced density matrix of ψ AB satisfies ψ A = x∈[m] px σxA . Substi-
P
tuting this into the equation above gives
X ′

px EF σxAB ⩽ g ψ A = E ψ AB = EF ψ AB .

(13.76)
x∈[m]

This completes the proof of the lemma.

We are now ready to prove the theorem.
Proof of Theorem 13.2.1. Let ρ ∈ D(AB) and let {px , ψxAB }x∈[m] be an optimal pure-state
decomposition of ρAB satisfying
X
EF ρAB = px E ψxAB .

(13.77)
x∈[m]

We first prove the strong monotonicity of EF . The proof strategy aims to show that
EF does not increase on average under a general quantum instrument on Bob’s subsystem.
We can then apply a similar argument to Alice’s side, demonstrating that EF remains non-
increasing under quantum instruments on either subsystem. The significance of this lies
in the fact that LOCC consists of such local quantum instruments, coupled with rounds
of classical communication, which do not affect the monotonicity property. Therefore, by
demonstrating the non-increasing nature of EF under a quantum instrument on both Bob’s
side (and, by symmetry arguments, also on Alice’s side), it can be concluded that EF satisfies
the strong monotonicity property under LOCC.
′
Let {EzB→B }z∈[k] be a quantum instrument on Bob’s subsystem, and for each z ∈ [k], let
′
ρAB
z be the post measurement state
after′ outcome
z occurred. Moreover, for every z ∈ [k]
′
and x ∈ [m] we denote by rz := Tr EzB→B ρAB , tz|x := Tr EzB→B ψxAB , and

AB ′ 1 ′
EzB→B ψxAB .

σxz := (13.78)
tz|x

With these notations, the post-measurement state can be expressed as

′ 1 B→B ′ AB 1 X ′
ρAB px EzB→B ψxAB

z := Ez ρ =
rz rz
x∈[m]
X px tz|x (13.79)
AB ′
= σxz .
rz
x∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

574 CHAPTER 13. MIXED-STATE ENTANGLEMENT

′
We need to show that the average entanglement z∈[k] rz EF ρAB cannot exceed EF (ρAB ).
P
z
For this purpose, for each state ρAB z we need to find a suitable pure state decomposition
that can be related to ρAB . Observe that the equation above involves the mixed states
AB ′ AB ′ ′
{σxz }x∈[m] . Therefore, for each σxz we denote by {sy|xz , ϕAB
xyz }y∈[n] its optimal pure state
decomposition, so that X
AB ′ AB ′
EF σxz = sy|xz E ϕxyz . (13.80)
y∈[n]
′
With this final notation, we get our desirable pure-state decomposition of ρAB
z :
′
X X px tz|x sy|xz ′
ρAB
z = ϕAB
xyz . (13.81)
rz
x∈[m] y∈[n]

′
Since the above pure-state decomposition of ρAB
z is not necessarily optimal, we conclude
X
AB
X px tz|x sy|xz AB ′ X
AB ′

rz EF ρz ⩽ rz E ϕxyz = px tz|x sy|xz E ϕxyz
x,y,z
rz x,y,z
z∈[k]

AB ′
X
(13.80)→ = px tz|x EF σxz
x,z (13.82)
X
px EF ψxAB

Lemma 13.2.1→ ⩽
x∈[m]

(13.77)→ = EF ρAB .

We therefore conclude that EF does not increase on average under a general quantum in-
strument on Bob’s subsystem. This completes the proof of strong monotonicity.
It is therefore left to show that EF is convex. Indeed, let {px , ρAB
x }x∈[m] be an ensemble of
AB
bipartite entangled states, and for each x ∈ [m] let {qy|x , ψxy }y∈[n] be an optimal pure-state
decomposition of ρAB x such that
X
EF ρAB AB

x = qy|x E ψxy . (13.83)
y∈[n]

AB
px ρAB
P
Now, observe that {px qy|x , ψxy }x,y is a pure-state decomposition of x x . Thus,
!
X X
px ρAB AB

EF x ⩽ p x q y|x E ψxy
x x,y
X (13.84)
ρAB

(13.83)→ = px EF x .
x∈[m]

This completes the proof.

The task of computing convex roof extensions is notably complex, particularly when
determining the entanglement of formation for a bipartite quantum state. If there were

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.2. QUANTIFICATION OF ENTANGLEMENT 575

a straightforward, closed formula for the entanglement of formation, identifying whether

a bipartite quantum state is entangled would be relatively easy. However, given that this
identification task is known to be hard (specifically, NP-hard), it’s unrealistic to expect a
simple formula for the entanglement of formation. Nevertheless, for two-qubit systems, such
a formula does exist, as outlined in the theorem below.
Recall the concurrence monotones defined in (12.49) for pure bipartite states. In the case
of two-qubit states, all concurrences are equivalent, so we denote them simply by C. The
concurrence of formation, which is the convex roof extension of C, is then denoted as CF .

Exercise 13.2.4. Consider E and C as the entropy of entanglement and concurrence for
pure states, respectively. Let A and B be two-qubit systems (i.e., |A| = |B| = 2), and define
the function g : [0, 1] → [0, 1] as
√
1+ 1 − x2
g(x) = h2 (13.85)
2

where h2 (x) := −x log x − (1 − x) log(1 − x) is the binary Shannon entropy.

1. Show that for any ψ ∈ Pure(AB) we have

AB AB

E ψ =g C ψ . (13.86)

2. Show that for any ρ ∈ D(AB)

EF ρAB = g CF ρAB

. (13.87)

Hint: Show first that the function g is concave.

The exercise above shows that in order to compute the entanglement of formation of a
two-qubit state ρAB it is sufficient to compute its concurrence of formation. In the following
theorem we give a closed formula for the concurrence of formation. The closed formula is
given in terms of the density matrix

ρAB
⋆ := (σ2 ⊗ σ2 )ρ̄AB (σ2 ⊗ σ2 ) (13.88)

where ρ̄AB is the density matrix whose components

⟨xy|ρ̄AB |x′ y ′ ⟩ := ⟨xy|ρAB |x′ y ′ ⟩ ∀ , x, y, x′ , y ′ ∈ {0, 1} , (13.89)

where the orthonormal basis {|x⟩}x∈{0,1} (and similarly {|y⟩}y∈{0,1} ) is such that σ2 has the
form −i|0⟩⟨1| + i|1⟩⟨0|.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

576 CHAPTER 13. MIXED-STATE ENTANGLEMENT

Closed Formula
Theorem 13.2.2. Let ρ ∈ D(AB) be a two qubit mixed state (i.e. |A| = |B| = 2).
Then, the concurrence of formation of ρAB is given by

CF ρAB = max{0, λ1 − λ2 − λ3 − λ4 }

(13.90)
√ √
where {λ1 , . . . , λ4 } are the four eigenvalues of the matrix ρ ρ⋆ , arranged in a
non-increasing order.

In the derivation of the formula (13.90), we will use a bilinear form denoted as ·, · :
C2 ⊗ C2 → C. This bilinear form will not only be instrumental in proving the formula but
will also be valuable in the analysis of multipartite entanglement. It is defined for any two
vectors |ψ⟩, |ϕ⟩ ∈ AB as
|ψ⟩, |ϕ⟩ := ⟨ψ̄ AB |σ2 ⊗ σ2 |ϕAB ⟩ (13.91)
where ψ̄ AB is defined such that if |ψ AB ⟩ = x,y∈{0,1} cxy |x⟩|y⟩ then |ψ̄ AB ⟩ = x,y∈{0,1} c̄xy |x⟩|y⟩.
P P

Note that ψ̄ AB is well define only with respect to some fixed bases {|0⟩A , |1⟩A } ⊂ A and
{|0⟩B , |1⟩B } ⊂ B of A and B, respectively. These orthonormal bases are chosen such that
σ2 has the form −i|0⟩⟨1| + i|1⟩⟨0|. The relation of the above bilinear form to our study
here can be found in Exercise 12.3.2 in which you had to show that the concurrence of a
AB
two-qubit pure state ψ ∈ Pure(AB) can be expressed as C ψ = |ψ⟩, |ψ⟩ . In the
following exercise you prove several additional properties of this bilinear form.

Exercise 13.2.5. Consider the bilinear form defined in (13.91).

1. Show the linearity of the bilinear form; that is, show that for any vectors |ψ⟩, |ψ1 ⟩,
|ψ2 ⟩, |ϕ⟩, |ϕ1 ⟩, and |ϕ2 ⟩ in C2 ⊗ C2

|ψ1 ⟩ + |ψ2 ⟩, |ϕ⟩ = |ψ1 ⟩, |ϕ⟩ + |ψ2 ⟩, |ϕ⟩

(13.92)
|ψ⟩, |ϕ1 ⟩ + |ϕ2 ⟩ = |ψ⟩, |ϕ1 ⟩ + |ψ⟩, |ϕ2 ⟩ .

2. Show that the bilinear form is symmetric; that is, for any two vectors |ψ⟩, |ϕ⟩ ∈ C2 ⊗C2

|ψ⟩, |ϕ⟩ = |ϕ⟩, |ψ⟩ . (13.93)

3. Invariance property. Let M, N ∈ SL(2, C), ψ, ϕ ∈ Pure(AB), and denote |ψ̃⟩ :=

M ⊗ N |ψ⟩ and |ϕ̃⟩ := M ⊗ N |ϕ⟩. Show that

|ψ̃⟩, |ϕ̃⟩ = |ψ⟩, |ϕ⟩ . (13.94)

Hint: Use the relation (C.13).

Proof of Theorem 13.2.2. Consider first the case that λ1 > λ2 + λ3 + λ4 . Let {px , ψx }x∈[4]
and {qy , ϕAB
y }y∈[n] be two pure-state decompositions of ρ
AB
, and for each x ∈ [4] and y ∈ [n]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.2. QUANTIFICATION OF ENTANGLEMENT 577

We can therefore relate the bilinear forms of the two decompositions as

X
|ϕ̃y ⟩, |ϕ̃y′ ⟩ = vyx vy′ x′ |ψ̃x ⟩, |ψ̃x′ ⟩ ∀ y, y ′ ∈ [n] . (13.96)
x,x′

Denoting by Mψ and Mϕ the matrices whose components are ψ̃x , ψ̃x′ and ϕ̃y , ϕ̃y′ , respec-
tively, we get that the above equation can be written as

Mϕ = V Mψ V T . (13.97)

Since Mψ is symmetric (see second part of Exercise 13.2.5), there exists a 4 × 4 unitary
matrix U such that U Mψ U T is diagonal. Moreover, by appropriate choice of U , the diagonal
elements of U Mψ U T can always be made real and positive (i.e. they are the singular values
of Mψ ), and arranged on the diagonal of U Mψ U T in a non-increasing order. We therefore
conclude that there exists a pure-state decomposition that is diagonal with respect to the
bilinear form. For simplicity of the exposition, we take it to be {px , ψxAB }x∈[4] itself; that is,

|ψ̃x ⟩, |ψ̃x′ ⟩ = λx δxx′ ∀ x, x′ ∈ [4] (13.98)

with real non-negative numbers λ1 ⩾ λ2 ⩾ λ3 ⩾ λ4 . From Exercise 13.2.6 below it follows

√ √
that λ1 , . . . , λ4 are precisely the eigenvalues of | ρ ρ⋆ |. With this specific choice of {ψ̃x }4x=1 ,
let {qy , ϕy }y∈[n] be another pure-state decomposition of ρAB (and as before we set |ϕ̃y ⟩ :=
√
qy |ϕy ⟩). Then, the relation (13.96) gives
X
|ϕ̃y ⟩, |ϕ̃y′ ⟩ = vyx vy′ x λx . (13.99)
x∈[4]

We therefore get that the average concurrence of {qy , ϕAB

y }y∈[n] can be expressed as

X X X
qy C ϕAB
y = qy |ϕy ⟩, |ϕy ⟩ = |ϕ̃y ⟩, |ϕ̃y ⟩
y∈[n] y∈[n] y∈[n]
X X
2
(13.99)→ = vyx λx
y∈[n] x∈[4] (13.100)
X
|vy1 |2 λ1 − |vy2 |2 λ2 − |vy3 |2 λ3 − |vy4 |2 λ4

⩾
y∈[n]

= λ1 − λ2 − λ3 − λ4 ,

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

578 CHAPTER 13. MIXED-STATE ENTANGLEMENT

where we used the inequality |a + b + c + d| ⩾ |a| − |b| − |c| − |d| for every a, b, c, d ∈ C.
Moreover, the inequality above can be saturated by taking V to be the unitary matrix (i.e.
taking n = 4)  
−1 i i i
 
1 −i i
 
1 i
V =   . (13.101)
2
1 i −i i 

 
1 i i −i
Indeed, observe that the matrix V above is unitary and has the property that for all y ∈ [4],
1
P 2

x∈[4] vyx λx = 4 λ1 − λ2 − λ3 − λ4 . Therefore, with this V we get from (13.100) that the
average concurrence of {qy , ϕy }y∈[4] is
X X
2
vyx λx = λ1 − λ2 − λ3 − λ4 . (13.102)
y∈[4] x∈[4]

It is therefore left to show that if λ1 ⩽ λ2 + λ3 + λ4 then CF (ρAB ) = 0. In this case we

take  
iθ2 iθ3 iθ4
−1 e e e
 
iθ2 iθ3 iθ4 
−e

1 1 e e 
V =   , (13.103)
2
1 eiθ2 −eiθ3 eiθ4 

 
iθ2 iθ3 iθ4
1 e e −e
where θ2 , θ3 , θ4 are some choices of phases to be determined shortly. With this V , the average
concurrence of {qy , ϕy }y∈[4] is given by (see (13.100))
X
qy C ϕAB = λ1 + λ2 e2iθ2 + λ3 e2iθ3 + λ4 e2iθ4 .

y (13.104)
y∈[4]

Since we assume that λ1 ⩽ λ2 + λ3 + λ4 (as well as λ1 ⩾ λ2 ⩾ λ3 ⩾ λ4 ) we can always

find three angles θ1 , θ2 , θ3 such the right-hand side above is zero (see Exercise 13.2.7). This
completes the proof.
Exercise 13.2.6. Let {ψ̃x }4x=1 be a set of four 2-qubit sub-normalized states satisfying (13.98)
4 4
P real numbers {λx }x=1 . Show that {λx }x=1 are the eigenvalues of
with some non-negative
√ √
| ρ ρ⋆ |, where ρ := x∈[4] ψ̃x . Hint: Recall from Exercise 5.4.13 that if λ is an eigenvalue
√ √
of | ρ ρ⋆ | then λ2 is an eigenvalue of ρρ⋆ , and compute ρρ⋆ |ψ̃x ⟩.
Exercise 13.2.7. Show that there for any four non-negative real numbers, λ1 , . . . , λ4 that
satisfy λ1 ⩽ λ2 + λ3 + λ4 and λ1 ⩾ λ2 ⩾ λ3 ⩾ λ4 there exists three angles θ2 , θ3 , θ4 that the
right-hand side of (13.104) is zero. Hint: Use a continuity argument.
Exercise 13.2.8. Compute the concurrence of the following two-qubit bipartite mixed states:

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.2. QUANTIFICATION OF ENTANGLEMENT 579

1. The isotropic state

ρAB
p := puAB + (1 − p)ΦAB , (13.105)
with p ∈ [0, 1].

2. The Werner state

1
ρAB
t = tΠAB AB
Sym + (1 − t)ΠAsy , (13.106)
3
with t ∈ [0, 1].

Exercise 13.2.9. Let ρ ∈ D(AB) with |A| = |B| = 2, and define the quantity
X
Ca ρAB := max px C(ψxAB )

(13.107)
x∈[m]

where the maximum is over all pure-state decompositions of ρAB (i.e. Ca is defined similarly
to CF but with a maximum instead of a minimum). Show that

Ca ρAB = F ρAB , ρAB

⋆ , (13.108)

where F is the fidelity. Hint: Use similar lines as in the proof above. Show also that the
square of the fidelity above can be expressed as
p p 2
= Tr ρAB ρAB

ρAB ρAB
⋆ ⋆ . (13.109)
1

Monotones Based on the Ky Fan Norms

Let us revisit the entanglement measures introduced in (12.28). According to Nielsen’s ma-
jorization theorem, these functions are indicative of whether a pure bipartite state can be
transformed into another. In the upcoming sections, we will demonstrate that some of the
operational significance of these measures can also be extended to their convex roof exten-
sions. Specifically, we can express the convex roof extension of the pure-state entanglement
measures defined in (12.28) as follows:
X
E(k) ρAB := min px 1 − ρA

x (k)
, (13.110)
x∈[m]

where the minimum is over all pure-state decompositions ρAB = px ψxAB , with ρA
P
x x :=
TrB [ψxAB ], and ∥ · ∥(k) is the Ky Fan norm.

Exercise 13.2.10. Show that the functions E(k) as defined above are entanglement mono-
tones.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

580 CHAPTER 13. MIXED-STATE ENTANGLEMENT

Note that for the two-qubit case, the only non-trivial measure E(k) is when k = 1. In
this case X
E(1) ρAB := min px λmin ρA

x , (13.111)
x∈[m]

since each ρA A A
x is a qubit so its the minimum eigenvalue λmin ρx = 1 − ρx (1) . Moreover,
observe that
1 p
λmin ρAx = 1 − 1 − 4 det (ρ A)
x
2 (13.112)
1 p
2 AB

= 1 − 1 − C (ψx )
2
where C 2 ψxAB is the square of the concurrence of ψxAB . In the following exercise you show
that the above relation can be used to show that for any two-qubit state ρ ∈ D(AB) with
|A| = |B| = 2 we have
AB
1 q
2

E(1) ρ = AB
1 − 1 − Cf (ρ ) , (13.113)
2

where Cf2 ρAB is the square of the concurrence of formation of ρAB . Hence, the closed
formula for the concurrence of formation can be used to compute E(1) .
Exercise 13.2.11. Let ρ ∈ D(AB) be a two-qubit state with |A| = |B| = 2.
1. Prove the relation (13.112).
2. Prove the relation (13.113). Hint: The proof is similar to the proof of (13.87).
In Corollary 12.4.2 we found necessary and sufficient conditions to convert by LOCC
a pure bipartite state to a mixed bipartite state. Interestingly, for the case that d :=
|A| = |B| = 2, the minimization in (12.63) over k ∈ {1, 2} becomes trivial since for all
ψ ∈ Pure(AB) we have E(2) ψ AB = 0. We therefore arrive at the following corollary.

Corollary 13.2.1. Let ψ ∈ Pure(AB) and σ ∈ D(AB) be two bipartite entangled

states with d := |A| = |B| = 2. Then, ψ AB can be converted to σ AB by LOCC if and
only if
C ψ AB ⩾ CF σ AB ,

(13.114)
where C is the concurrence.

Proof. As discussed above, taking k = 1 in (12.63) gives that ψ AB can be converted to σ AB

by LOCC if and only if
E(1) ψ AB ⩾ E(1) σ AB .

(13.115)
The proof is concluded by expressing E(1) on both sides of the equation above in terms of
the concurrence (see the relation (13.113) between E(1) and the concurrence).
Exercise 13.2.12. Let ψ ∈ Pure(AB) with d := |A| = |B| > 2 and let σ ∈ D(A′ B ′ ) with
|A′ | = |B ′ | = 2. Show that ψ AB can be converted to σ AB by LOCC if and only if
′ ′
AB
⩾ E(1) σ A B .

E(1) ψ (13.116)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.2. QUANTIFICATION OF ENTANGLEMENT 581

Optimal Extensions
In Chapter 5 we introduced a method to extend divergences from classical to quantum
systems. This method is in fact quite general and can be slightly modified to incorporate
extensions of measures of entanglement from pure to mixed states. Specifically, let E be a
measure on pure state entanglement. For any ρ ∈ D(AB) the maximal extension of E is
defined as n ′ ′ o
′ ′ LOCC
E ρAB := inf E ψ A B : ψ A B −−−→ ρAB ,

(13.117)

where the infimum is over all systems A′ B ′ and all pure states ψ ∈ Pure(A′ B ′ ) for which
′ ′
ψ A B can be converted by LOCC to ρAB . Similarly, the minimal extension is defined as
n ′ ′ ′ ′
o
LOCC
E ρAB := sup E ψ A B : ρAB −−−→ ψ A B .

(13.118)

The following exercise demonstrates the optimality of the above definitions.

Exercise 13.2.13. Let E be a measure of pure state entanglement, and let E and E be its
maximal and minimal extensions.

1. Show that E and E are measures of mixed bipartite entanglement.

2. Show that if E ′ is a measure of bipartite mixed-state entanglement that reduces to E

on pure states, then
E ρAB ⩽ E ′ ρAB ⩽ E ρAB .

(13.119)

3. Show that if E is additive under tensor products of pure bipartite states, then E is
sub-additive and E super-additive under tensor products of mixed bipartite states.

The minimal extension E is not very useful measure of entanglement since typically a
mixed bipartite state cannot be converted by LOCC to a pure entangled state. Therefore,
for such mixed entangled states E takes the zero value. On the other hand, the maximal
extension is a faithful measure of entanglement (i.e. takes the zero value only on separable
states).
As an example, recall that the Schmidt rank is a measure of entanglement on pure states.
Its maximal extension to mixed states is given by
n ′ ′ ′ ′ LOCC
o
SR ρAB := inf SR ψ A B : ψ A B −−−→ ρAB .

(13.120)

′ ′ LOCC
In general, the condition ψ A B −−−→ ρAB can be very complicated. However, we can replace
′ ′ ′ ′
ψ A B in the equation above with the maximally entangled state Φk , where k := SR ψ A B =
′ ′ LOCC LOCC
SR (Φk ), since whenever ψ A B −−−→ ρAB we also have Φk −−−→ ρAB . We therefore conclude
that n o
AB
AB
LOCC AB
SR ρ := SR ρ = min k : Φk −−−→ ρ , (13.121)

where, for simplicity of the exposition, we removed the over-line symbol from SR ρAB .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

582 CHAPTER 13. MIXED-STATE ENTANGLEMENT

Exercise 13.2.14. Let ρ ∈ D(AB). Show that SR ρAB = k for some k ∈ N if and only if
the following two conditions hold:

1. At least one of the states, in any pure-state decomposition of ρAB , has a Schmidt rank
no smaller than k.

2. There exists a pure-state decomposition of ρAB with all states having Schmidt rank at
most k.

Exercise 13.2.15. Let ρ ∈ D(AB). Show that

SR ρAB = inf max SR(ψxAB )

(13.122)
x∈[k]

where the infimum is over all pure-state decompositions of ρAB = px ψxAB .

P
x∈[k]

13.2.2 The Relative Entropy of Entanglement

In Sec. 10.2 we studied many properties of the relative entropy of a resource, and in Sec. 10.3
we developed a method to compute it. In entanglement theory, the relative entropy of
entanglement is defined for any ρ ∈ D(AB) as (see Fig. 13.1)

ER ρAB := D ρAB σ AB .

min (13.123)
σ∈SEP(AB)

As discussed in Sec. 10.3, computing the relative entropy of entanglement can generally be
quite challenging. However, in certain special cases, such as with pure states and symmetric
states, it is feasible to compute this measure. The complexity in computing the relative
entropy of entanglement typically arises from the need to optimize over the large set of
separable states, which can be a demanding task for most mixed states. Yet, for pure
states and certain states with specific symmetrical properties, this complexity is significantly
reduced, making the calculation manageable.

Figure 13.1: The closest separable state.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.2. QUANTIFICATION OF ENTANGLEMENT 583

Relative Entropy of Entanglement for Pure States

In the next theorem, it’s shown that for pure states, the relative entropy of entanglement
simplifies to the entropy of entanglement. This finding connects two key entanglement
measures, highlighting the elegant underlying structure of quantum entanglement in pure
systems.

Theorem 13.2.3. Let ψ ∈ Pure(AB). Then,

ER ψ AB = E ψ AB := H (A)ρ ,

(13.124)

where H (A)ρ is the von-Neumann entropy of the reduced density matrix

ρA := TrB ψ AB .

Proof. The proof is based on the closed formula given in Theorem 10.3.2 for the relative
entropy of a resource. Let X√
|ψ⟩ = px |xx⟩ (13.125)
x∈[n]

be given in its Schmidt form, where n := SR(ψ), and let

X
σ⋆ := px |xx⟩⟨xx| . (13.126)
x∈[n]

We argue now that σ⋆ is the closest separable state (see Fig. 13.1); that is, we argue that
min D ψ AB σ AB = D ψ AB σ⋆AB .

(13.127)
σ∈SEP(AB)

Indeed, from Theorem 10.3.2 it follows that σ⋆AB satisfies the above equality if and only if
there exists an entanglement witness η ∈ WIT(AB) such that
ψ AB = σ⋆AB − aL−1
σ⋆ (η) . (13.128)
The above equality can be expressed as L−1 AB
σ⋆ (aη) = σ⋆ − ψ AB , which is equivalent to
aη = Lσ⋆ (σ⋆ − ψ) = I − Lσ⋆ (ψ) . (13.129)
That is, σ⋆ satisfies (13.127) if and only if the right-hand side of the equation above is an
entanglement witness. Now, by direct computation with have (see Exercise 13.2.16)
X √ log px − log py
Lσ⋆ (ψ) = p x py |xx⟩⟨yy| . (13.130)
px − py
x,y∈[n]

px
Denote by rxy := py
and observe that for any x, y ∈ [n]
√
√ log px − log py rxy log(rxy )
cxy := px py = ⩽1, (13.131)
px − py rxy − 1

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

584 CHAPTER 13. MIXED-STATE ENTANGLEMENT

where the
√ last inequality follows from the fact that logarithm function satisfies log(r) ⩽
(r − 1)/ r for r ⩾ 1 and opposite inequality for 0 < r ⩽ 1. We therefore get for any product
state ϕA ⊗ φB
X
Tr ϕA ⊗ φB Lσ⋆ (ψ) =

cxy ⟨ϕ|x⟩⟨φ|x⟩⟨y|ϕ⟩⟨y|φ⟩
x,y∈[n]
X
⩽ cxy ⟨ϕ|x⟩⟨φ|x⟩⟨y|ϕ⟩⟨y|φ⟩
x,y∈[n]
X
cxy ⩽ 1 −−−−→ ⩽ ⟨ϕ|x⟩⟨φ|x⟩⟨y|ϕ⟩⟨y|φ⟩ (13.132)
x,y∈[n]

1 1 X
|⟨ϕ|x⟩|2 + |⟨φ|x⟩|2 |⟨y|ϕ⟩|2 + |⟨y|φ⟩|2

|a|2 + |b|2 −−−−→ ⩽

|ab| ⩽
2 4
x,y∈[n]
=1.

We therefore conclude that for any product state ϕA ⊗ φB

Tr ϕA ⊗ φB I − Lσ⋆ (ψ) ⩾ 0 ,

(13.133)

so that I − Lσ⋆ (ψ) is an entanglement witness. This completes the proof.

Exercise 13.2.16. Prove the equality in (13.130).

Relative Entropy of Entanglement for Symmetric States

Unlike the case of pure bipartite states, there isn’t a general, straightforward formula for
the relative entropy of entanglement of mixed bipartite states. However, for certain special
cases like isotropic states or Werner states, such formulas do exist. The underlying reason
for this is the symmetry inherent in these states. To understand why a formula exists for
these specific states, consider that if there is a bipartite channel G ∈ LOCC(AB → AB)
such that ρAB = G(ρAB ) (which is true for isotropic and Werner states under various twirling
operations), then the relative entropy of entanglement of such a state can be expressed as
follows:
ER ρAB := min D ρAB σ AB

σ∈SEP(AB)

min D ρAB G σ AB

G SEP(AB) ⊆ SEP(AB) −−−−→ ⩽

σ∈SEP(AB)
(13.134)
min D G ρAB G σ AB

ρAB
= G(ρ ) −−−−→ =
AB
σ∈SEP(AB)

DPI→ ⩽ min D ρAB σ AB = ER ρAB .

σ∈SEP(AB)

Therefore, all the inequalities must actually be equalities, and particularly:

ER ρAB = min D ρAB G σ AB .

(13.135)
σ∈SEP(AB)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.2. QUANTIFICATION OF ENTANGLEMENT 585

The importance of the above formula lies in the fact that the minimization over separable
states of the form G σ AB might involve significantly fewer parameters compared to the
minimization over the entire set of separable states SEP(AB). This simplification is what
makes the computation of the relative entropy of entanglement feasible for these symmetric
states.
As an example, let’s consider the isotropic state defined for all t ∈ [0, 1] (with m := |A| =
|B|) as
I AB − Φm
ρAB
t = tΦAB
m + (1 − t)τ AB
where τ AB
:= . (13.136)
m2 − 1
Previously, we observed that this state is invariant under the twirling channel described in
(3.251) and is separable if and only if t ⩽ 1/m. At one extreme (t = 0), the isotropic state is
the separable state τ AB . At the other extreme (t = 1/m), the isotropic state is the separable
state
m 1
ρAB
1/m = uAB + ΦAB . (13.137)
m+1 m+1 m
For an entangled isotropic state ρAB
t with t > 1/m, equation (13.135) yields

ER ρAB min D ρAB ρAB

t = t t′ , (13.138)
t′ ∈[0,1/m]

since for every separable state σ ∈ SEP(AB), the twirled state G σ AB = ρAB t′ for some
t′ ∈ [0, 1/m]. This demonstrates how the symmetry of ρAB t simplifies the optimization
problem. Furthermore, in Exercise 13.2.17, you will show that the optimal t′ is t′ = 1/m,
resulting in
ER ρAB = D ρAB ρAB

t t 1/m
1−t (13.139)
= log m + t log t + (1 − t) log .
m−1
This result is intuitive as the separable state ρAB
t′ with t′ = 1/m is on the boundary of the
set of separable states and, roughly speaking, the closest to ρAB
t .

Exercise 13.2.17. Prove the equalities in (13.139). Hint. Observe that [ρAB AB
t , ρt′ ] = 0 for
′ AB AB ′
all t, t ∈ [0, 1] and use it to show that D ρt ρt′ = D(t∥t ), where t := (t, 1 − t)T and
t′ := (t′ , 1 − t′ )T .

Exercise 13.2.18. Consider the Werner state ρAB

W as given in (13.29) with p > 1/2 (i.e.,
an entangled Werner state). Show that

ER ρAB AB AB

W = D ρW ω , (13.140)

where
1 1
ω AB := ΠAB
Sym + ΠAB . (13.141)
m(m + 1) m(m − 1) Asy

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

586 CHAPTER 13. MIXED-STATE ENTANGLEMENT

13.2.3 The Robustness of Entanglement

Section 10.2.2 introduces a resource measure called the robustness. However, this measure
is not very useful for affine resource theories, as noted in the Exercise 10.2.7. On the other
hand, since the set of separable states is maximally non-affine (as discussed around (9.35)),
the robustness measure is highly relevant in the context of entanglement theory. Later on,
we will explore its operational interpretation.
To apply the definition of the robustness measure to entanglement theory, we define the
robustness of entanglement for a bipartite state ρ ∈ D(AB) as:
ρAB + sω AB

AB

R ρ := min s ⩾ 0 : ∈ SEP(AB) , ω ∈ SEP(AB) . (13.142)
1+s
The global robustness, Rg , is defined in a manner similar to the definition above, but with
the key distinction that the state ω is chosen from the entire set of density matrices D(AB),
rather than being restricted to separable states SEP(AB).
We have established that the logarithmic global robustness is connected to Rg through
the relationship:
Dmax ρAB F = log 1 + Rg ρAB .

(13.143)
In a similar vein, the logarithmic robustness of a state ρAB is defined as:
LR ρAB := log 1 + R ρAB .

(13.144)
Exercise 13.2.19. Show that in entanglement theory, for all ρ ∈ D(AB) there exists a finite
s ⩾ 0 and a separable density matrix ω AB such that the density matrix ρAB + sω AB /(1+s)
is separable.
Exercise 13.2.20. Show that the logarithmic robustness is subadditive; that is, for all ρ ∈
D(AB) and σ ∈ D(A′ B ′ ) we have
′ ′
′ ′
LR ρAB ⊗ σ A B ⩽ LR ρAB + LR σ A B .

(13.145)

Exercise 13.2.21. Let ρ ∈ D(AB). Show that R ρAB = 0 if and only if ρ ∈ SEP(AB).
Exercise 13.2.22. Show that all ρ ∈ D(AB) can be written as
ρAB = 1 + R ρAB τ AB − R ρAB ω AB ,

(13.146)
for some τ, ω ∈ SEP(AB). The above decomposition of ρAB is sometimes referred to as a
pseudo-mixture of ρAB .

Note that from the exercise above it follows that R ρAB can also be expressed as
R ρAB := min s ⩾ 0 : ρAB = (1 + s)τ AB − sω AB , τ, ω ∈ SEP(AB) ,

(13.147)
so that we can think of the pseudo-mixture in (13.146) as the optimal one achieved with
s = R ρAB .
Exercise 13.2.23. Show that the robustness of entanglement is an entanglement monotone.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.2. QUANTIFICATION OF ENTANGLEMENT 587

13.2.4 Partial Transpose-Based Entanglement Measures

In the previous sections, we learned that a bipartite quantum state is entangled if the partial
transpose of the state has a negative eigenvalue. In this subsection, we will explore how
this property can be leveraged to measure the extent of entanglement in a bipartite state.
However, entanglement measures constructed in this manner have certain limitations. For
instance, they assign a value of zero to all PPT states, including those that are entangled.
Despite these shortcomings, these measures possess the advantage of being computationally
tractable and can be efficiently computed using SDP algorithms, as we will demonstrate.

The Negativity of Entanglement

The negativity of a bipartite state ρAB is a measure of entanglement that is defined as the
absolute value of the sum of all the negative eigenvalues of the partial transpose of ρAB .
Unlike other measures of entanglement, the negativity of entanglement is relatively easy to
compute even for non-qubit systems. Therefore, it is used quite extensively in literature.
In this section, for any ρ ∈ D(AB) we will use the notation ρΓ to denote the partial
transpose of the state ρAB ; i.e.,
ρΓ := T B→B ρAB

(13.148)
where T ∈ Pos(B → B) is the transpose map. The intuition behind this notation is that ρΓ
represents half of the full transposed state ρT .

Definition 13.2.2. Let ρ ∈ D(AB). The negativity of ρAB is defined as

ρΓ −1
N ρAB := 1

. (13.149)
2

Remark. We chose as a convention for this book that ρΓ represents partial transpose on
Bob’s side. This convention does not effect the definition above since

T B→B ρAB 1 = T A→A ρAB 1 .

(13.150)

Therefore, the definition of the negativity above is independent on whether the partial trans-
pose is taken on Bob’s side or on Alice’s side.
Note that ρΓ has trace one since the partial transpose does not effect the trace. Therefore,
if λ1 , . . . , λn are the eigenvalues of ρΓ then they sum to one. Suppose, without loss of
generality that the first k eigenvalues of ρΓ are non-negative, and the remaining n − k are
negative. We then get
X X Xn
Γ
ρ 1= |λx | = λx − λx
x∈[n] x∈[k] x=k+1
n
(13.151)
X
−−−−→ = 1 − 2 λx .
X
λx = 1
x∈[n]
x=k+1

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

588 CHAPTER 13. MIXED-STATE ENTANGLEMENT

Substituting this into (13.149) gives

n
X n
X
AB

N ρ =− λx = λx . (13.152)
x=k+1 x=k+1

That is, the negativity of ρAB is the absolute value of the sum of all the negative eigenvalues
of ρΓ . We can therefore express it also as

N ρAB = Tr ρΓ− .

(13.153)

where ρΓ− := ρΓ − is the negative part of ρΓ . Note that this also demonstrates that the
negativity is zero on separable states.

Exercise 13.2.24. Let ρ ∈ D(AB). Show that there exists density matrices ρ+ , ρ− ∈ D(AB)
such that ρ+ ρ− = ρ− ρ+ = 0 and

ρΓ = 1 + N ρAB ρAB AB
AB
+ − N ρ ρ− . (13.154)

The decomposition (13.154) of ρΓ in the exercise above is optimal in the following sense.
Suppose there exists σ, τ ∈ D(AB) such that

ρΓ = (1 + a)σ AB − aτ AB , (13.155)

for some a ∈ R+ . Then, from (13.154) we have

1 + N ρAB ρAB AB
AB
+ − N ρ ρ− = (1 + a)σ AB − aτ AB . (13.156)

Let Π− be the projector to the support of ρAB

− . Multiplying both sides of the equation above
by Π− and taking the trace gives

−N ρAB = Tr Π− (1 + a)σ AB − aτ AB

⩾ −aTr Π− τ AB

(13.157)
⩾ −a .

Hence, we must have a ⩾ N(ρAB ). In other words, we can express the negativity of ρAB as
n o
AB Γ AB AB

N ρ = inf a ∈ R : ∃ σ, τ ∈ D(AB) s.t. ρ = (1 + a)σ − aτ . (13.158)

Theorem 13.2.4. The negativity measure as defined in (13.149) is an entanglement

monotone.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.2. QUANTIFICATION OF ENTANGLEMENT 589

Proof. To prove the strong monotonicity property, Let {Ex }x∈[m] be a quantum instru-
ment on Alice’s system, with each Ex ∈ CP(A → A′ ) being trace non-increasing and
′ A′ B := 1 A→A′
P AB

x∈[m] E
x ∈ CPTP(A → A ). For each x ∈ [m], denote by ρx E
px x
ρ , where
A→A ′ AB
AB

px := Tr Ex ρ . Finally, set ν := N ρ By definition we have
1 A→A′ AB Γ
ρΓx = E ρ
px x
P artial transpose 1 A→A′ Γ
′ → = E ρ (13.159)
acts on Bob s side
px x
1 + ν A→A′ AB ν ′
ρ+ − ExA→A ρAB

(13.154)→ = Ex − .
px px
Since the above decomposition of ρΓx is not necessarily optimal (in the sense of (13.158)) we
must have ′ ν h
A→A′
i
N ρAx
B
⩽ Tr E x ρ AB
− . (13.160)
px
We therefore get that
′ h i
A→A′
X X
px N ρ A
x
B
⩽ ν Tr E x ρ AB
−
x∈[m] x∈[m] (13.161)
AB

=ν=N ρ ,
P
where we used the fact that x∈[m] Ex is trace preserving. That is, the negativity of en-
tanglement cannot increase on average by a quantum instrument on Alice’s side. Since
the negativity is not affected if we take the partial transpose on Alice’s system (instead of
Bob’s), using similar arguments as above, we get that the negativity cannot increase on
average under any quantum instrument applied on Bob’s side. We therefore conclude that
the negativity satisfies the strong monotonicity condition of an entanglement monotone. It
is left to show that the negativity is convex.
Let {px , ρAB
x }x∈[m] be an ensemble of density matrices in D(AB). Then, by definition,
X 1 X Γ 1
AB
N px ρ x = px ρ x −
2 1 2
x∈[m] x∈[m]
1 X 1
= px ρΓx − (13.162)
2 1 2
x∈[m]
1 X 1 X
px ρΓx px N ρAB

⩽ 1
− = x .
2 2
x∈[m] x∈[m]

This completes the proof.

The Logarithmic Negativity

The negativity has many nice properties, but it is not additive. It turns out, that by a small
tweak to its definition in (13.149), we can get an additive measure of entanglement.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

590 CHAPTER 13. MIXED-STATE ENTANGLEMENT

Exercise 13.2.25. Show that the negativity of a pure bipartite state ψ ∈ Pure(AB) with
m := |A| = |B| is given by X √
N(ψ AB ) = px py (13.163)
x<y
x,y∈[m]

where {px }x∈[m] are the Schmidt coefficients of ψ AB .

The Logarithmic Negativity

Definition 13.2.3. Let ρ ∈ D(AB). The logarithmic negativity of ρAB is defined as

LN ρAB = log ρΓ 1 .

(13.164)

Note that the logarithmic negativity can be expressed as a function of the negativity,
namely,
LN ρAB = log 2N ρAB + 1 .

(13.165)
Therefore, the logarithmic negativity is a measure of entanglement since the negativity is an
entanglement monotone. On the other hand, the logarithmic negativity is not an entangle-
ment monotone, in particular, it is in general not convex (Exercise 13.2.26).
The logarithmic negativity is additive under tensor products. To see why, let ρ ∈ D(AB)
and σ ∈ D(A′ B ′ ). Then,
′ ′

LN ρAB ⊗ σ A B = log (ρ ⊗ σ)Γ 1
= log ρΓ ⊗ σ Γ 1 (13.166)
Γ Γ
= log ρ 1 1
= log ρΓ
σ 1
+ log σ Γ 1
′ ′
= LN ρAB + LN σ A B .

We will see later on that the logarithmic negativity provides an upper bound to the distillable
entanglement.
Exercise 13.2.26. Show that the logarithmic negativity is not convex.

The κ-Entanglement
The κ-Entanglement is another measure of entanglement that is based on the partial trans-
pose. In Sec. 13.9 we will see that the regularized version of this measure has an operational
meaning as the zero-error entanglement cost under PPT operations. The κ-entanglement is
defined for all ρ ∈ D(AB) as

Eκ ρAB = min log Tr[Λ] : −ΛΓ ⩽ ρΓ ⩽ ΛΓ .

(13.167)
Λ∈Pos(AB)

In Sec. 13.9 we will see that Eκ behaves monotonically under a set of operations that is
larger than LOCC. Moreover, if ρ ∈ PPT(AB) then we can take in the equation above

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.2. QUANTIFICATION OF ENTANGLEMENT 591

Λ = ρ, so that the κ-Entanglement vanishes on PPT states and in particular on separable

states. Therefore, Eκ is a measure of entanglement.

Lemma 13.2.2. Let ρ ∈ D(AB). Then,

LN ρAB ⩽ Eκ ρAB ⩽ Dmax |ρΓ | σ .

min (13.168)
σ∈PPT(AB)

Remark. From the lemma above it follows that the κ-entanglement equals the logarithmic
negativity if
|ρΓ |Γ ⩾ 0 . (13.169)
To see why, note that in this case we have that the state ρ⋆ := |ρΓ |/∥ρΓ ∥1 ∈ PPT(AB), so
by taking σ = ρ⋆ we get that the upper bound
min Dmax |ρΓ | σ ⩽ Dmax |ρΓ | ρ⋆

σ∈PPT(AB)
(13.170)
Exercise 13.2.27→ = LN(ρ) .

Proof. The condition −ΛΓ ⩽ ρΓ ⩽ ΛΓ in (13.167) can also be expressed as

ΛΓ ⩾ ρ Γ and ΛΓ ⩾ −ρΓ . (13.171)
Combining this with the decomposition (13.154) of ρΓ gives the following two inequalities:
ΛΓ ⩾ 1 + N (ρ) ρ+ − N (ρ) ρ− and ΛΓ ⩾ N (ρ) ρ− − 1 + N (ρ) ρ+ .

(13.172)
Let Π± be the projections to the supports of ρ± . Then, from the above equations we get
Π+ ΛΓ Π+ ⩾ 1 + N (ρ) ρ+ and Π− ΛΓ Π− ⩾ N (ρ) ρ− .

(13.173)
Since ΛΓ ⩾ 0 we get
Tr[Λ] = Tr ΛΓ ⩾ Tr ΛΓ (Π+ + Π− )

= Tr Π+ ΛΓ Π+ + Tr Π− ΛΓ Π−

(13.174)
Γ
(13.173)→ ⩾ 1 + 2N(ρ) = ∥ρ ∥1 .

Since the above inequality holds for all Λ ∈ Pos(AB) that satisfies −ΛΓ ⩽ ρΓ ⩽ ΛΓ , we
conclude that the lower bound in (13.168) must hold.
To get an upper bound observe that for all ρ ∈ D(AB)
log Tr[Λ] : ΛΓ ⩾ |ρΓ |

Eκ (ρ) ⩽ min
Λ∈Pos(AB)

log(t) : tσ ⩾ |ρΓ |

Λ = tσ→ = min (13.175)
σ∈PPT(AB)

= min Dmax |ρΓ | σ .

σ∈PPT(AB)

This completes the proof.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

592 CHAPTER 13. MIXED-STATE ENTANGLEMENT

Exercise 13.2.27. Prove the equality in (13.170).

Exercise 13.2.28. Show that Eκ is subadditive. Hint: Show that if −ΛΓj ⩽ ρΓj ⩽ ΛΓj for
j = 1, 2 then
−ΛΓ1 ⊗ ΛΓ2 ⩽ ρΓ1 ⊗ ρΓ2 ⩽ ΛΓ1 ⊗ ΛΓ2 . (13.176)

13.2.5 The Squash Entanglement

In Exercise 7.5.2 we defined the mutual information as

I(A : B)ρ := D ρAB ρA ⊗ ρB = H(A)ρ + H(B)ρ − H(AB)ρ .

(13.177)

This function quantify the total (i.e. both quantum and classical) amount of correlation be-
tween Alice and Bob (see Fig. 13.2a). An extension of this quantity, known as the conditional
mutual information (CMI), is a function on a tripartite density matrix defined by

I(A : B|R)ρ := H(A|R)ρ + H(B|R)ρ − H(AB|R)ρ ∀ ρ ∈ D(ABR) . (13.178)

Since the CMI is defined with respect to the conditional von-Neumann entropy, it can also
be expressed for all ρ ∈ D(ABR) as

I(A : B|R)ρ = H(A|R)ρ + H(BR)ρ − H(ABR)ρ

= H(A|R)ρ − H(A|BR)ρ (13.179)
(7.136)→ ⩾ 0 .
Therefore, the CMI cannot be negative. The CMI quantifies the total correlations between
Alice and Bob, given that one has access to a reference system R (see Fig. 13.2b for an
heuristic description).
Exercise 13.2.29 (Chain Rule). Let ρ ∈ D(AA′ BB ′ R).
1. Show that
I(AA′ : B|R)ρ = I(A′ : B|R)ρ + I(A : B|RA′ )ρ
(13.180)
I(A : BB ′ |R)ρ = I(A : B ′ |R)ρ + I(A : B|RB ′ )ρ .

2. Show that
I(AA′ : B|R)ρ ⩾ I(A : B|R)ρ . (13.181)
That is, tracing out a local subsystem cannot increase the CMI.
Exercise 13.2.30. Show that for any state of the form
′ ′
X
σ ABRA = px σxABR ⊗ |x⟩⟨x|A , (13.182)
x∈[n]

we have that X
I(A : B|RA′ )σ = px I(A : B|R)σx . (13.183)
x∈[n]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.2. QUANTIFICATION OF ENTANGLEMENT 593

Figure 13.2: Venn Diagrams. (a) The intersection area (white) illustrates the mutual information.
(b) The white area illustrates the conditional mutual information.

The mutual information measures the overall correlations between Alice and Bob, and
only takes on the value zero for product states. However, it does not necessarily vanish for
separable states that are not products. In contrast, the CMI can be zero even for separable
states. For instance, consider the state

X
ρABR := px ρ A B
x ⊗ ρx ⊗ |x⟩⟨x|
R
(13.184)
x∈[n]

where px denotes a probability distribution, ρA B

x and ρx are density matrices, and |x⟩⟨x|
R

represents a pure state in a third system R that depends on the discrete variable x. Although
ρAB is a separable state, its CMI is zero. This is because knowing the value of x through
system R allows Alice and Bob to share the product state ρA B
x ⊗ ρx . The state ρ
ABR
above
belong to a special type of states known as quantum Markov states.
Quantum Markov states are a type of quantum state that exhibit a special type of corre-
lation structure between different subsystems. In a quantum Markov state, the correlation
between two subsystems is entirely mediated by a third subsystem R, which serves as a kind
of “bridge” or mediator between A and B. This correlation structure is analogous to the
Markov property in classical probability theory, where the future state of a system depends
only on its present state and not on its past states. A quantum Markov state ρ ∈ D(ABR)
is defined as follows.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

594 CHAPTER 13. MIXED-STATE ENTANGLEMENT

Definition 13.2.4. Let A, B, and R be three quantum systems. A state

ρ ∈ D(ABR) is called a quantum Markov state if the following two conditions hold:
(1) (2)
1. There exists m ∈ N, and two sets of Hilbert spaces {Rx }x∈[m] and {Rx }x∈[m]
L (1) (2)
such that R = x∈[m] Rx ⊗ Rx .
n (1)
o n (2)
o
2. There exists two sets of density matrices ρAR
x
x
and ρ BRx
x and a
x∈[m] x∈[m]
probability vector p ∈ Prob(m) such that
M (1) (2)
ρABR = px ρAR
x
x
⊗ ρBR
x
x
. (13.185)
x∈[m]

Exercise 13.2.31.

1. Show that the state in (13.184) is a quantum Markov state.

2. Show that for any quantum Markov state ρ ∈ D(ABR) we have

H(A : B|R)ρ = 0 . (13.186)

Remark: The converse of this statement is also true! That is, any density matrix
ρ ∈ D(ABR) with zero CMI is necessarily a quantum Markov state.

The exercise above demonstrate that if ρAB is a separable state then it has a tripartite
extension ρABR as in (13.184) for which the conditional mutual information is zero. This
observation motivates the following definition of a measure of entanglement known as the
squashed entanglement.

The Squashed Entanglement

Definition 13.2.5. The squashed entanglement of a bipartite density matrix
ρ ∈ D(AB) is defined as
1
Esq ρAB := inf I(A : B|R)ω ,

(13.187)
2
where the infimum is over all finite dimensional systems R and all ω ∈ D(ABR) with
marginal ω AB = ρAB .

Remark. The term “squashed” entanglement is used because conditioning on a reference

system R enables the removal (i.e. squash out) of all non-quantum correlations, similar
to how conditioning on system R can eliminate classical correlations in separable states.
Additionally, since squashed entanglement is based on the CMI, it is also known as the CMI
entanglement.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.2. QUANTIFICATION OF ENTANGLEMENT 595

Exercise 13.2.32. Show that for ψ ∈ Pure(AB) we have

Esq ψ AB = H(A)ψ ,

(13.188)

where H(A)ψ is the von-Neuman entropy of the reduced density matrix on system A.

The squashed entanglement has many desirable properties of measures of entanglement.

We start with the fact that it is an entanglement monotone.

Theorem 13.2.5. The squashed entanglement is an entanglement monotone.

Proof. Let ρ ∈ D(AB) and let ω ABR be an extension of ρAB ; i.e. ω AB = ρAB . Suppose Alice
applies on her system A a quantum instrument E ∈ CPTP(A → A′ A′′ ) of the form
′ ′′ ′ ′′
X
E A→A A = ExA→A ⊗ |x⟩⟨x|A . (13.189)
x∈[n]

Then, the state of the composite system ABR after Alice’s measurement is given by
′ ′′ BR ′ ′′
σA A := E A→A A ω ABR .

(13.190)

From Stinespring’s dilation theorem there exists an isometry V : A → A′ A′′ E such that
′ ′′ BR
σA A = TrE V ω ABR V ∗ .

(13.191)

Since conditional entropy is invariant under local isometry we get that

1 1
I(A : B|R)ω = I(A′ A′′ E : B|R)V ωV ∗ . (13.192)
2 2
Combining the equality above with the fact that by tracing out a local subsystem the CMI
cannot increase (see (13.181)), we get by tracing out system E

1 1
I(A : B|R)ω ⩾ I(A′ A′′ : B|R)σ . (13.193)
2 2
Combining this with the chain rule (13.180) gives

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

596 CHAPTER 13. MIXED-STATE ENTANGLEMENT

′ ′ ′
where px := Tr ExA→A ω ABR and σxA BR := p1x ExA→A ω ABR . Finally, by definition, for

′
any x ∈ [n] we have 12 I(A′ : B|R)σx ⩾ Esq σxA B so that the inequality above gives

1 X ′
I(A : B|R)ω ⩾ px Esq σxA B . (13.195)
2
x∈[n]

Since ω ABR was an arbitrary extension of ρAB we conclude that

X ′
Esq ρAB ⩾ px Esq σxA B . (13.196)
x∈[n]

In other words, the squashed entanglement does not increase on average under any local quan-
tum instrument on system A. From symmetry, the same holds for any quantum instrument
on Bob’s system. Therefore, the squashed entanglement satisfies the strong monotonicity
property of an entanglement monotone.
To prove the convexity of Esq , let ρ1 , ρ2 ∈ D(AB) and t ∈ [0, 1]. Let ω1ABR and ω2ABR be
extensions of ρAB
1 and ρAB
2 , respectively. Note that in general we can always assume that these
extensions has the same reference system R as otherwise we embed the lower dimensional
reference system in the higher dimensional one. Finally, let R′ be a qubit system and denote
by
′ ′ ′
ω ABRR := ω1ABR ⊗ |0⟩⟨0|R + (1 − t)ω2ABR ⊗ |1⟩⟨1|R . (13.197)
Since ω AB = tρAB + (1 − t)σ AB we get that

Esq tρAB + (1 − t)σ AB ⩽ I (A : B|RR′ )ω

(13.198)
Exercise 13.2.30→ = tI (A : B|R)ω1 + (1 − t)I (A : B|R)ω2 .

Since the extensions ω1ABR and ω2ABR were arbitrary, we conclude that

Esq tρAB + (1 − t)σ AB ⩽ tEsq ρAB + (1 − t)Esq σ AB .

(13.199)

This completes the proof.

Another interesting property of the squash entanglement is that it is additive.

Theorem 13.2.6. Let ρ ∈ D(AA′ BB ′ ). Then,

′ ′
′ ′
Esq ρAA BB ⩾ Esq ρAB + Esq ρA B ,

(13.200)

′ ′ ′ ′
with equality if ρAA BB = ρAB ⊗ ρA B .

′ ′
Proof. Let ω ∈ D(AA′ BB ′ R) be a quantum extension of ρAA BB . Then, by applying the
chain rule in (13.180) one time with respect to Alice’s systems and one time with respect to

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.2. QUANTIFICATION OF ENTANGLEMENT 597

Bob’s systems we get

I(AA′ : BB ′ |R)ω = I(A′ : BB ′ |R)ω + I(A : BB ′ |RA′ )ω

(13.180)→ = I(A′ : B ′ |R)ω + I(A′ : B|RB ′ )ω + I(A : B ′ |RA′ )ω + I(A : B|RA′ B ′ )ω
CMI ⩾ 0→ ⩾ I(A′ : B ′ |R)ω + I(A : B|RA′ B ′ )ω
′ ′
By definition→ ⩾ 2Esq ρAB + 2Esq ρA B .

(13.201)
′ ′ ′ ′
Since ω AA BB R was an arbitrary extension of ρAA BB we conclude that the above inequality
implies (13.200).
′ ′ ′ ′
It is left to show that ρAA BB = ρAB ⊗ ρA B we have equality in (13.200). Let ω ABR1
′ ′ ′ ′ ′ ′
and ω A B R2 be extensions of ρAB and ρA B , respectively. Set R = R1 R2 and ω AA BB R :=
′ ′ ′ ′
ω ABR1 ⊗ ω A B R2 . In Exercise 13.2.33 you show that for such ω AA BB R we have

I(A′ : B|RB ′ )ω = I(A : B ′ |RA′ )ω = 0 . (13.202)

Combining this with the second equality in (13.201) we get

I(AA′ : BB ′ |R)ω = I(A′ : B ′ |R)ω + I(A : B|RA′ B ′ )ω . (13.203)

′ ′ ′ ′
Finally, since ω A B R = ω R1 ⊗ ω A B R2 we have I(A′ : B ′ |R)ω = I(A′ : B ′ |R2 )ω as conditioning
by an additional independent system R1 does not change the CMI. Similarly,

I(A : B|RA′ B ′ )ω = I(A : B|R1 )ω . (13.204)

′ ′ ′ ′
We therefore get that for any extensions ω ABR1 and ω A B R2 of ρAB and ρA B we have
′ ′
1
Esq ρAB ⊗ ρA B ⩽ I(AA′ : BB ′ |R)ω
2 (13.205)
1 1
= I(A′ : B ′ |R1 )ω + I(A : B|R2 )ω .
2 2
′ ′
Since the above inequality holds of any such extensions of ρAB and ρA B , we must have
′ ′
′ ′
Esq ρAB ⊗ ρA B ⩽ Esq ρAB + Esq ρA B .

(13.206)

Combining this with (13.200) gives an equality. This completes the proof.
Exercise 13.2.33. Prove (13.202) and (13.204).

13.2.6 Coherent Information of Entanglement

In this section, we will explore a measure of entanglement that exhibits monotonic behavior
under one-way LOCC. While this measure may not exhibit monotonicity under arbitrary
LOCC, it can still be a valuable tool for providing bounds on the distillable entanglement
of mixed bipartite states, as we will see in the next section.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

598 CHAPTER 13. MIXED-STATE ENTANGLEMENT

The most general one-way LOCC operation that Alice and Bob can
P perform is for Alice to
apply a quantum instrument {Ex }x∈[m] , with Ex ∈ CP(A → A′ ) and x∈[m] Ex ∈ CPTP(A →
A′ ), send the outcome x to Bob, who then applies a quantum channel Fx ∈ CPTP(B → B ′ )
that depends on the outcome x received from Alice. The overall operation can be described
by the quantum channel
′ ′ ′ ′
X
N AB→A B := ExA→A ⊗ FxB→B . (13.207)
x∈[m]

Definition 13.2.6. Let ρ ∈ Pure(ABE). The coherent information of the marginal

state ρAB is defined as

I(A⟩B)ρ := −H(A|B)ρ = H(A|E)ρ , (13.208)

where the second equality is due to the duality relation of the conditional
von-Neumann entropy given in (7.158). Moreover, the coherent information of
entanglement of the state ρAB is defined as

E→ ρAB :=

sup I A⟩BX E(ρ) , (13.209)
E∈CPTP(A→AX)

where the supremum is also over all finite dimensions of the classical system X.

Exercise 13.2.34. Let ρABX := px ρAB X

P
x∈[m] x ⊗ |x⟩⟨x| be a cq-state in D(ABX).
1. Show that X
I(A⟩BX)ρ = px I(A⟩B)ρx . (13.210)
x∈[m]

2. Show that the coherent information is convex. That is, prove that
X
I(A⟩B)ρ ⩽ px I(A⟩B)ρx . (13.211)
x∈[m]

Hint: Use the joint convexity of the relative entropy.

3. Show that for every quantum channel F ∈ CPTP(B → B ′ ) we have

I(A⟩B ′ )F (ρ) ⩽ I(A⟩B)ρ . (13.212)

Hint: Either use the DPI directly, or recall that a single channel on Bob’s system is a
conditionally mixing operation.
Exercise 13.2.35. ShowP that the supremum in (13.209) can be restricted quantum channels
of the form E A→AX = x∈[n] ExA→A ⊗ |x⟩⟨x|X , where each ExA→A is a CP map with a single
Kraus operator. Hint: Use the joint convexity of the relative entropy.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.2. QUANTIFICATION OF ENTANGLEMENT 599

Exercise 13.2.36. Let ρ ∈ D(AB).

1. Show that E→ (ρAB ) ⩾ 0 with equality if ρAB is separable.
2. Show that E→ (ρAB ) ⩾ I(A⟩B)ρ .
Note that while the supremum in the above definition is taken over all dimensions of
X, we consider only quantum instruments from A to A. However, this limitation is not
necessary, as the coherent information of entanglement for the state ρAB can be defined as
E→ ρAB := I A′ ⟩BX E(ρ) ,

sup (13.213)
E∈CPTP(A→A′ X)

with the supremum extending over all dimensions of system A′ . To understand this, first
consider that if |A′ | ⩽ |A|, every channel E ∈ CPTP(A → A′ B) can be embedded in
CPTP(A → AX), as the coherent information is invariant under local isometries (a prop-
erty shared by all conditional entropies). Consequently, in this case, the supremum over
CPTP(A → AX) is at least as great as that over CPTP(A → A′ X).
′ A→A′
Conversely, if |A′ | > |A|, consider a quantum instrument E A→A X =
P
x∈[n] Ex ⊗
X A→A′ ∗
|x⟩⟨x| , where each Ex (·) = Mx (·)Mx is a CP map with a single Kraus operator Mx :
A → A′ . Through polar decomposition, each Mx can be written as Mx = Vx Nx , with
each Nx : A → A being part of a generalized measurement, and each Vx : A → A′ an
isometry. Due to the invariant property of coherent information under isometries, the CP
′
maps ExA→A (·) = Mx (·)Mx∗ can be substituted with NxA→A (·) = Nx (·)Nx∗ , allowing the
optimization over all channels in CPTP(A → A′ X) to be replaced with optimization over
all quantum instruments in CPTP(A → AX).
This observation is significant as it can be used to prove that the coherent information
of entanglement exhibits monotonic behavior under one-way LOCC.

Theorem 13.2.7. Let ρ ∈ D(AB) and N ∈ LOCC1 (AB → A′ B ′ ). Then,

AB→A′ B ′ AB
⩽ E→ ρAB .

E→ N ρ (13.214)

Proof. For every quantum instrument E ∈ CPTP(A′ → A′ X), E ◦ N ∈ LOCC1 (AB →

A′ B ′ X). Therefore,
′ ′
E→ N AB→A B ρAB = I A′ ⟩B ′ X E◦N (ρ)

sup
E∈CPTP(A′ →A′ X)
(13.215)
I A′ ⟩B ′ X M(ρ) ,

⩽ sup
M∈LOCC1 (AB→A′ B ′ X)

where we replaced E ◦ N with arbitrary M ∈ LOCC1 (AB → A′ B ′ X). Now, recall that every
element of LOCC1 (AB → A′ B ′ X) can be expressed as
′ ′ ′ ′
X
MAB→A B X := EyA→A X ⊗ FyB→B , (13.216)
y∈[n]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

600 CHAPTER 13. MIXED-STATE ENTANGLEMENT

′
with each EyA→A X being a CP map such that Ey ∈ CPTP(A → A′ X), and each
P
y∈[n]
Fy ∈ CPTP(B → B ′ ). Thus,
′ ′ X ′
′
MAB→A B X ρAB = qy FyB→B σyA BX , (13.217)
y∈[n]

′ ′ ′
where σyA BX := q1y EyA→A X ρAB and qy := Tr EyA→A ρAB . Combining this with the

convexity of the coherent information (see (13.211)) we get from (13.215) that
′ ′ X
E→ N AB→A B ρAB ⩽ sup qy I A′ ⟩B ′ X Fy (σy )

M∈LOCC1
y∈[n]
X (13.218)
qy I A′ ⟩B ′ X

cf. (13.212)→ ⩽ sup σy
.
M∈LOCC1
y∈[n]

′ ′ ′
Finally, denoting by Z := XY , and by E A→A Z := EyA →A X ⊗ |y⟩⟨y|Y we conclude that
P
y∈[n]

′ ′
E→ N AB→A B ρAB ⩽ I A′ ⟩BZ

sup E(ρ)
E∈CPTP(A→A′ Z) (13.219)
AB

(13.213)→ = E→ ρ .

This completes the proof.

Exercise 13.2.37. Show that E→ is an entanglement monotone under LOCC1 . That is,
prove the strong monotonicity property and convexity.

Exercise 13.2.38. Let Φm ∈ D(AB) be the maximally entangled state with m := |A| = |B|.
Show that
E→ ΦAB

m = log(m) . (13.220)

The coherent information of entanglement is superadditive. That is, for any ρ ∈ D(AB)
and σ ∈ D(A′ B ′ ) we have (see Exercise 13.2.39)
′ ′
′ ′
E→ ρAB ⊗ σ A B ⩾ E→ ρAB + E→ σ A B .

(13.221)

From Exercise 10.1.1 it then follows that the limit in

1
reg
ρAB := lim E→ ρ⊗n

E→ (13.222)
n→∞ n

exists. We will see in the next section that this regularize coherent information of entangle-
ment has an operational meaning as the one-way distillable entanglement of ρAB .

Exercise 13.2.39. Prove the superadditivity of the coherent information of entanglement as

given in (13.221).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.3. THE CONVERSION DISTANCE 601

13.3 The Conversion Distance

The conversion distance in entanglement theory is given by (cf. (11.24))

LOCC ′ ′
1 ′ ′ ′ ′
T ρAB −−−→ σ A B := min N AB→A B ρAB − σ A B

. (13.223)
2 N ∈LOCC 1

Computing the above quantity in general is a highly challenging task, so we often rely on
establishing lower and upper bounds. In this section, we will narrow our focus to the special
cases where either ρ or σ is maximally entangled. Recall that these cases are particularly
relevant for calculating entanglement distillation and entanglement cost.

13.3.1 Conversion Distance to a Maximally Entangled State

′ ′
We start with the following simplification of the expression given in (13.223) when σ A B is
maximally entangled.

Lemma 13.3.1. Let ρ ∈ D(AB), and Φm ∈ D(A′ B ′ ) be the maximally entangled

state with m := |A′ | = |B ′ |. Then,

LOCC LOCC
T ρ −−−→ Φm = P 2 ρ −−−→ Φm = 1 − sup Tr [Φm N (ρ)] , (13.224)
N

where the supremum is over all N ∈ LOCC(AB → A′ B ′ ), and P is the purified

distance as given in (5.251).

Remark. In Sec. 12.5.1, we explored various conversion distances among pure bipartite states.
It was established that the P⋆ -conversion distance is equal to the P -conversion distance.
Additionally, we speculated, albeit without formal proof, that the T -conversion distance
might be strictly smaller than the P -conversion distance. The lemma above confirms this
speculation by demonstrating that, when the target state is Φm , the T -conversion distance
actually aligns with the P 2 -conversion distance, which is indeed strictly smaller than the P -
conversion distance. This outcome is based on the understanding that the purified distance
is no greater than one; thus, squaring it effectively reduces its magnitude.
Proof. Let G ∈ LOCC(A′ B ′ → A′ B ′ ) be the twirling map given in (3.251). That is, for any
ω ∈ D(A′ B ′ ) Z
G (ω) := dU (U ⊗ U )ω(U ⊗ U )∗
U(m) (13.225)
(3.255)→ = (1 − Tr [Φm ω]) τ + Tr [Φm ω] Φm ,
where τ ∈ D(A′ B ′ ) is given by τ = (I − Φm )/(m2 − 1). In particular, observe that for all
ω ∈ D(A′ B ′ )
1 1
∥G (ω) − Φm ∥1 = 1 − Tr [Φm ω] ∥τ − Φm ∥1 = 1 − Tr [Φm ω] , (13.226)
2 2

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

602 CHAPTER 13. MIXED-STATE ENTANGLEMENT

where the last equality follows from the fact that τ Φm = Φm τ = 0. From the DPI of the
trace distance, and the invariance of Φm under the twirling map G, it follows that for all
N ∈ LOCC(AB → A′ B ′ ) and all ρ ∈ D(AB)
1 1
∥N (ρ) − Φm ∥1 ⩾ ∥G ◦ N (ρ) − Φm ∥1 . (13.227)
2 2
Since G ◦ N is also an LOCC channel it follows from the inequality above that the conversion
distance can be expressed as

LOCC
1
T ρ −−−→ Φm = inf ∥G ◦ N (ρ) − Φm ∥1
N ∈LOCC 2
(13.228)
(13.226)→ = 1 − sup Tr [Φm N (ρ)] .
N ∈LOCC

This completes the proof.

Exercise 13.3.1. Let k := |A| = |B|, m = |A′ | = |B ′ |, and ρ ∈ D(AB). Show that

AB LOCC ′B′
k
T ρ −−−→ ΦA
m ⩾1− . (13.229)
m

LOCC ′B′
Note that this bound is not trivial for k < m. Hint: Estimate T ΦAB
k −−−→ ΦA
m .

Conversion Distance Under One-Way LOCC

LOCC
The lemma’s expression for T (ρ −−−→ Φm ) is complex, making its computation challenging.
One contributing factor is the complexity inherent in LOCC. To simplify this, we begin by
restricting the channel N in (13.224) to one-way LOCC, which yields an upper bound on
LOCC
the conversion distance T (ρ −−−→ Φm ). More explicitly, the one-way conversion distance is
defined as:
LOCC1
T ρ −−−→ Φm = 1 − sup Tr [Φm N (ρ)] , (13.230)
N

where the supremum is over all N ∈ LOCC1 (AB → A′ B ′ ). It is evident that since LOCC1
is a subset of LOCC, we have the following inequality for all ρ ∈ D(AB):

LOCC LOCC1
T ρ −−−→ Φm ⩽ T ρ −−−→ Φm . (13.231)

Given that one-way LOCC is significantly easier to characterize than LOCC, we can rep-
LOCC1
resent the conversion distance T (ρ −−−→ Φm ) as an optimization problem over quantum
instruments (see the lemma below). This simplification is advantageous because it reduces
the complexity involved in the calculation and allows for a more straightforward analysis
of the conversion distance. By focusing on one-way LOCC, we limit the operations to a
sequence where one party, say Alice, performs a quantum operation and communicates the
outcome classically to the other party (Bob), who then performs a quantum operation based

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.3. THE CONVERSION DISTANCE 603

on that information. This constraint narrows down the set of operations to be considered
in the optimization problem, making the task of determining the conversion distance more
manageable and conceptually clearer.
In the following lemma we relate between the conversion distance under one-way LOCC
↑
and the optimized conditional min-entropy Hmin as defined in (7.143). The relation will be
given in terms of the function
↑ ′
Qmin (A′ |BX)τ := 2−Hmin (A |BX)τ ∀τ ∈ D(A′ BX) , (13.232)

with X being a classical system and A′ and B are quantum systems.

′ A→A′
Exercise 13.3.2. Let ρ ∈ D(AB) and E A→A X := ⊗ |x⟩⟨x|X be a quantum
P
x∈[k] Ex
instrument. Show that
X
Qmin (A′ |BX)E(ρ) = Qmin (A′ |B)Ex (ρ) , (13.233)
x∈[k]

where we extended the definition of Qmin to subnormalized states such that for any σ ∈
D⩽ (A′ B)
↑ ′
Qmin (A′ |B)σ := 2−Hmin (A |B)σ
n
B A′ B A′ B
o (13.234)
:= min Tr Λ : I ⊗ Λ ⩾ σ , Λ ∈ Pos(B) .

Lemma 13.3.2. Let ρ ∈ D(AB), and Φm ∈ D(A′ B ′ ) be the maximally entangled

state (m := |A′ | = |B ′ |). Then,

LOCC1
1
T ρ −−−→ Φm = 1 − sup Qmin (A′ |BX)E(ρ) , (13.235)
m E∈CPTP(A→A′ X)

where Qmin is defined in (13.232).

Remark. Note that Qmin (A′ |BX)E(ρ) depends on m as m := |A′ |. Replacing CPTP(A →
A′ X) in (13.235) with CPTP(A → AX), we obtain a lower bound (see Exercise 13.3.3):

LOCC1
1
T ρ −−−→ Φm ⩾ 1 − sup Qmin (A|BX)E(ρ) . (13.236)
m E∈CPTP(A→AX)

Proof. Every N ∈ LOCC1 (AB → A′ B ′ ) can be expressed as:

′ ′ ′ ′
N AB→A B = F BX→B ◦ E A→A X , (13.237)
′ ′
a classical system, and E A→A X and F BX→B are quantum channels. The channel
where X is P
′ ′
E A→A X = x∈[k] ExA→A ⊗ |x⟩⟨x|X can be viewed as a quantum instrument with k = |X|,
′
where we can assume without loss of generality that each ExA→A (·) = Mx (·)Mx∗ has a single

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

604 CHAPTER 13. MIXED-STATE ENTANGLEMENT

Kraus operator Mx by increasing the dimension of X. Thus, the optimization in (13.230)

′ ′
over all one-way LOCC channels, N AB→A B , can be decomposed into optimizations over
′ ′ ′
all X, all E A→A X , and all F A→A X . Next, we optimize over F A→A X , while keeping X and
′ ′ ′
E A→A X fixed. Denoting by σ A BX := E A→A X ρAB , we get:
h ′ ′ i h ′ ′ ′ i
BX→B ′ A→A′ X BX→B ′
max Tr ΦA
m
B
F ◦ E ρAB
= max Tr ΦAB
m F σ A BX
F ∈CPTP F ∈CPTP
(13.238)
1 ↑ ′
(7.148)→ = 2−Hmin (A |BX)σ ,
m
′
where we used (7.148) with F BX→B replacing E B→Ã . Substituting this into (13.230) we get
that the one-way LOCC conversion distance is given as in (13.235).

Exercise 13.3.3. Prove (13.236). Hint: Use the property that any conditional entropy is
invariant under local isometries (particularly on Alice’s system).

Building on Lemma (13.3.2) and Exercise 13.3.2, the conversion distance can be re-
expressed as follows:

LOCC1
1 X
T ρ −−−→ Φm = 1 − sup Qmin (A′ |B)Ex (ρ) . (13.239)
m E∈CPTP(A→A′ X)
x∈[k]

In Lemma 7.6.2, we established a connection between the optimized conditional min-entropy

and the square fidelity. Specifically, let E as be a purifying system and ρABE be a purifi-
′
cation of ρAB . The marginal of the (subnormalized) pure state ExA→A (ρABE ) is denoted as
′ ′
ExA→A (ρAE ). Recall that we can assume each ExA→A in the supremum comprises a single
Kraus operator. Hence, the relation in Lemma 7.6.2 implies:
′
′ 2 A E A→A′ AE
Qmin (A |B)Ex (ρ) = max F I ⊗ τ , Ex (ρ ) . (13.240)
τ ∈D(E)

Combining this with the relation P 2 = 1 − F 2 between the purified distance and the fidelity,
we conclude that (13.235) can also be rewritten as:
X ′ ′

LOCC1
T ρAB −−−→ Φm = inf min P 2 uA ⊗ τ E , ExA→A (ρAE ) , (13.241)
{Ex } τ ∈D(E)
x∈[k]

′
where the infimum is over all k ∈ N and all quantum instruments {ExA→A }x∈[k] . We will use
this form of the conversion distance to get the following upper bound.

Upper Bound
The main result of this subsection is the following upper bound on the right-hand side of
the equation above.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.3. THE CONVERSION DISTANCE 605

Upper Bound
Theorem 13.3.1. Let ρ ∈ Pure(ABE) and m ∈ N. Then,
√ 1 ↑
AB LOCC1
T ρ −−−→ Φm ⩽ m2− 2 H̃2 (A|E)ρ , (13.242)

where H̃2↑ (A|E)ρ := − minω∈D(E) D̃2 ρAE ∥I A ⊗ ρE is the optimized conditional

entropy as defined with respect to the quantum Sandwich divergence of order α = 2.

Remark. When m ⩾ |A|, the upper bound above is trivial, since in this case
√ − 1 H̃ ↑ (A|E)ρ 1 ↑
m2 2 2 = 2 2 (log(m)−H̃2 (A|E)ρ) ⩾ 1 , (13.243)

Proof. We get the upper bound on the conversion distance in two stages. First, by taking
τ E = ρE in (13.241) we get
′B′
′
A→A′ AE
LOCC1
X
T ρAB −−−→ ΦA
m ⩽ inf P 2
u A
⊗ ρ E
, E x (ρ ) . (13.244)
{Ex }
x∈[k]

Second, we replace the maximization above over all quantum instruments {Ex }x∈[k] with a
specific choice of a quantum instrument to get a simpler upper bound. We will denote by
n := |A| and assume that m ⩽ n (see the remark above).
Observe that the expression in (13.244) has a form that is somewhat similar to the
decoupling theorem studied in Sec. 7.7. Therefore, our strategy is to choose {Ex }x∈[k] in such
a way that we will be able to use the upper bound given in the decoupling theorem. For this
purpose, recall that the twirling operation G ∈ CPTP(AÃ → AÃ) as defined in (7.199) can
be express as a finite convex combination of product unitary channels as given in (7.211).
With these k ∈ N, p ∈ Prob(k), and {Ux }x∈[k] ⊂ U(A), we define
′ ′
ExA→A := px N A→A ◦ UxA→A , (13.245)
′
n ∗
where Ux (·) = Ux (·)Ux∗ , and N A→A (·) := m V (·)V , where V : A′ → A is some isometry.
′
We now discuss the properties of the set {ExA→A }x∈[k] . First, observe that by definition,
the channel X
RA→A := px UxA→A , (13.246)
x∈[k]

corresponds to the completely randomizing channel that outputs the maximally mixed state
irrespective on the input state. This follows from the fact that both (7.199) and (7.211)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

606 CHAPTER 13. MIXED-STATE ENTANGLEMENT

corresponds to the same twirling channel, so their marginal channels are also the same
(see (3.242)). With this at hand, we get that
′ ′ ′
X
E A→A := ExA→A = N A→A ◦ RA→A . (13.247)
x∈[k]

′ ′
Next, we argue that E A→A is trace preserving so that {ExA→A }x∈[k] as defined above is indeed
a quantum instrument. To see it, observe that for all ω ∈ L(A) we have
h i h i
A→A′ A A→A′ A→A A
Tr E (ω ) = Tr N ◦R ω
h i
1 1 A→A′ A A
N (uA ) = N (I A ) = V V ∗ −−−−→ = Tr N Tr[ω ]u
n m
(13.248)
1
= Tr[ω ] Tr[V V ∗ ]
A
m
= Tr[ω A ] ,

where in the last line we used the fact that V V ∗ is a projection of rank m = |A′ | since
V : A′ → A is an isometry.
Therefore, with this choice of quantum instrument, Eq. (13.244) becomes
′B′
X ′
LOCC1 A→A′
T ρAB −−−→ ΦA
m ⩽ p x P 2
u A
⊗ ρ E
, N ◦ U A→A AE
x (ρ )
x∈[k]
X ′
∗ ′
(13.249)
P 2 (ρ, σ) ⩽ ∥ρ − σ∥1 −−−−→ ⩽ px N A→A UxA ρAE UxA − uA ⊗ ρE ,
1
x∈[k]

where the last line follows from (5.202). Finally, to apply the decoupling theorem as given
in (7.221), we define

′ 1 AA′ 1 ′
1 A ∗
τ AA := JN = N Ã→A ΩAÃ = I ⊗ V ΩAÃ I A ⊗ V .

(13.250)
n n m
′ ′
Note in particular that the marginal τ A = uA so that the right-hand side of Eq. (13.249) has
the exact same form as given on the left-hand side of (7.221) (in the decoupling theorem).
Hence, we can apply the decoupling theorem to get

− 12 H̃2↑ (A|E)ρ +H̃2↑ (A|A′ )τ

AB LOCC1 A′ B ′
T ρ −−−→ Φm ⩽2 . (13.251)

′
Moreover, since τ AA is maximally entangled we get that

H̃2↑ (A|A′ )τ = − log(m) . (13.252)

Substituting this into the previous equation gives (13.242). This completes the proof.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.3. THE CONVERSION DISTANCE 607

13.3.2 Conversion Distance from a Maximally Entangled State

The conversion distance in (13.223) is defined with respect to the trace distance. Since all
metrics in finite dimensional Hilbert spaces are topologically equivalent, we can always select
a different metric to simplify computations. In this section, we find that the square of the
purified distance is a more convenient measure to work with. Recall from Lemma 13.3.1
that the T -conversion distance (measured with respect to the trace distance T ) is equivalent
to the P 2 -conversion distance when computing the conversion distances to a maximally
entangled state. However, it is important to note that this equivalence does not hold when
the source/input state is maximally entangled.
The P 2 -conversion distance from σ ∈ D(A′ B ′ ) to ρ ∈ D(AB) is defined as
′ ′ ′ ′ ′ ′

LOCC
P 2 σ A B −−−→ ρAB := max P 2 E A B →AB σ A B , ρAB . (13.253)
E∈LOCC

′ ′ ′ ′
AB
In what follows we take the source state σ A B to be the maximally entangled state Φm .

Exercise 13.3.4. Show that for any ρ ∈ D(AB) and any σ ∈ D(A′ B ′ ) we have

LOCC LOCC LOCC
T 2 σ −−−→ ρ ⩽ P 2 σ −−−→ ρ ⩽ 2T σ −−−→ ρ . (13.254)

Hint: See Theorem 5.6.4.

Theorem 13.3.2. Let ρ ∈ D(AB) and m ∈ N. Then,

LOCC
P 2 Φm −−−→ ρAB = E(m) ρAB

(13.255)

where E(m) ρAB is the entanglement monotone defined in (13.110) with k = m.

Proof. Recall first that for the case that ρAB is a pure pure state, the theorem follows from
Corollary 12.5.1. We therefore need to generalize this result to the case that ρAB is a mixed
state. We start by showing that
n o n o
E (Φm ) : E ∈ LOCC(A′ B ′ → AB) = ω ∈ D(AB) : SR ω AB ⩽ m ,

(13.256)

where SR ω AB is the Schmidt rank as defined in (13.120). Indeed, since the Schmidt rank
SR as defined in (13.120) is a measure of entanglement it follows that for ω AB = E (Φm )

SR ω AB ⩽ SR (Φm ) = m .

(13.257)

Therefore, the left-hand side of (13.256) is contained in the right-hand side. On the other
hand, from Exercise 13.2.14 it follows that every state ω AB with Schmidt rank no greater
than m has a pure state decomposition with all states having Schmidt rank no greater than
m. As a consequence of Nielsen’s theorem, such a state ω AB can be generated by LOCC

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

608 CHAPTER 13. MIXED-STATE ENTANGLEMENT

from Φm . That is, the right-hand side of (13.256) is contained in the left-hand side. This
completes the proof of the equality in (13.256).
Let E be a purifying system of dimension n := |E| ⩽ |AB|. From the equivalency of the
two sets in (13.256) we get
2 AB 2 AB AB

max
′ ′
F E (Φm ) , ρ = max F ω , ρ
E∈LOCC(A B →AB) ω∈D(AB), SR(ω)⩽m
′
Uhlmann s theorem→ = max ⟨ϕABE |ψ ABE ⟩
2
. (13.258)
ψ,ϕ∈Pure(ABE)
SR(ϕAB )⩽m, ψ AB =ρAB

Let {|x⟩E }x∈[n] be a fixed orthonormal basis of the purifying system E, and observe that
every purification ψ ABE of ρAB can be expressed as
X√
|ψ ABE ⟩ := px |ψxAB ⟩|x⟩E , (13.259)
x∈[n]

where {px , ψxAB }x∈[n] is a pure state decomposition of ρAB . Specifically, there is a one-to-one
correspondence between all purifications ψ ABE of ρAB P that have the form (13.259), and all
AB AB AB
pure states decompositions {px , ψx }x∈[n] of ρ = x∈[n] px ψx .
Similarly, for every ω AB with Schmidt rank SR(ω AB ) ⩽ m let
X√
|ϕABE ⟩ := qx |ϕx ⟩AB |x⟩E , (13.260)
x∈[n]

be a purification of ω AB with the property that each |ϕx ⟩AB has a Schmidt rank no greater
than m (see Exercise 13.2.14). With this at hand, we get from (13.258) that
X√ 2
2 AB AB AB

max′ ′
F E (Φm ) , ρ = max q x p x ⟨ϕx |ψx ⟩ , (13.261)
E∈LOCC(A B →AB)
x∈[n]

where the maximum on the right-hand side is over all pure states decompositions of ρAB =
AB AB
P
x px ψx , all probability vectors q ∈ Prob(n), and all pure states {ϕx }x∈[n] with Schmidt
rank no greater than m. Now, from Corollary 12.5.1 it follows that (see Exercise 13.3.5) for
every ψ ∈ Pure(AB)
2
max ⟨ϕAB |ψ AB ⟩ = ∥ψ A ∥(m) . (13.262)
ϕ∈Pure(AB)
SR(ϕ)⩽m

Therefore, there exists {ϕAB

x }x∈[n] such that
q
⟨ϕAB AB AB AB
x |ψx ⟩ = ⟨ϕx |ψx ⟩ = ∥ψxA ∥(m) , (13.263)

and that ⟨ϕAB AB

x |ψx ⟩ cannot exceed this value. We therefore conclude that
Xq 2
2 AB

max F E (Φm ) , ρ = max max A
qx px ∥ψx ∥(m)
′ ′
E∈LOCC(A B →AB) {px ,ψx } q∈Prob(n)
x∈[n]
X (13.264)
Exercise 12.5.1→ = max px ψxA (m)
{px ,ψx }
x∈[n]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.4. SINGLE-SHOT DISTILLABLE ENTANGLEMENT 609

where the maximum on the right-handPsides stands for a maximum over all pure state
decompositions {px , ψxAB }x∈[n] of ρAB = x∈[n] px ψxAB . In terms of the purified distance we
have X
2 AB
px ψxA (m)

max′ ′
P E (Φm ) , ρ = 1 − max
E∈LOCC(A B →AB)
x∈[n]
X
= min px 1 − ψxA (13.265)
(m)
x∈[n]

= E(m) ρAB ,

where both the min and max above are over all pure-state decompositions {px , ψxAB }x∈[n] of
ρAB . This completes the proof.

Exercise 13.3.5. Use Corollary 12.5.1 to prove the equality in (13.262).

13.4 Single-Shot Distillable Entanglement

The ε-single-shot distillable entanglement is then defined as (cf. (11.32))
n o
ε AB AB LOCC

Distill ρ := max log m : T ρ −−−→ Φm ⩽ ε . (13.266)

Since the computation of the distillable entanglement is hard, we start with the single-shot
one-way distillable entanglement. As LOCC1 is a subset of LOCC, any lower bound on the
one-way distillable entanglement will automatically provide a lower bound on the distillable
entanglement defined above.

Exercise 13.4.1. Let k := |A| = |B|, m = |A′ | = |B ′ |, and ρ ∈ D(AB). Show that

Distillε ρAB ⩽ log(k) − log(1 − ε) .

(13.267)

13.4.1 One-Way Single-Shot Distillable Entanglement

Similar to (13.266), we define the ε-single-shot distillable entanglement under 1-way LOCC
as n o
LOCC1 ′B′
Distillε→ ρAB := max log m : T ρAB −−−→ ΦA

m ⩽ ε . (13.268)

A simple formula for the above expression is not presently available. However, we can provide
some useful lower and upper bounds.
In the following theorem, we present an upper bound on the single-shot one-way distillable
entanglement. It is worth noting that the upper bound given in (11.38) will not be helpful
in this case, as we are considering a subset of LOCC. Therefore, we can expect to obtain a
tighter upper bound, particularly since the upper bound given in (11.38) remains valid even
if we replace LOCC with non-entangling operations.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

610 CHAPTER 13. MIXED-STATE ENTANGLEMENT

The upper bound presented in the following theorem is expressed in terms of the coherent
information of entanglement, denoted as E→ . This particular measure of entanglement has
been defined and extensively examined in Sec. 13.2.6.

Theorem 13.4.1. Let ρ ∈ D(AB) and ε ∈ (0, 1/2). Then, the one-way ε-single-shot
distillable entanglement is bounded by

ε AB
1 AB
1+ε ε
Distill→ ρ ⩽ E→ ρ + h , (13.269)
1 − 2ε 1 − 2ε 1+ε

where h(x) := −x log x − (1 − x) log(1 − x) is the binary Shannon entropy.

LOCC1 ′B′
Proof. Let m ∈ N be such that Distillε→ ρAB = log m so that T ρAB −−−→

ΦA
m ⩽ ε.
LOCC ′ ′ ′ ′
This means that ρAB −−−−→ 1
σ A B for some state σ ∈ D(A′ B ′ ) that is ε-close to ΦmAB
.
Therefore, from the monotonicity of E→ under one-way LOCC we get that
′ ′
AB
⩾ E→ σ A B .

E→ ρ (13.270)

′ ′ ′ ′
Next, we use the fact that σ A B is ε-close the ΦA m
B
to show that the right-hand side of
the equation above cannot be much smaller than log(m). Indeed, combining the continuity
property of the function I(A′ ⟩B ′ )ρ := −H(A′ |B ′ )ρ (see (10.50)), with the second part of
Exercise 13.2.36, gives
′ ′
E→ σ A B ⩾ I(A′ ⟩B ′ )σ

′ ′ ε
(10.50)→ ⩾ I(A ⟩B )Φm − 2ε log m − (1 + ε)h (13.271)
1+ε

ε
= (1 − 2ε) log(m) − (1 + ε)h .
1+ε

The proof is concluded by noting that the inequality above in conjunction with the inequal-
ity (13.270) yields the desired inequality (13.269).

Exercise 13.4.2. Use similar lines as in the proof above to prove the following bound on
the ε-shot distillable entanglement

ε AB
1 ′ ′
1+ε ε
Distill ρ ⩽ sup I A ⟩B E(ρ) + h . (13.272)
1 − 2ε E∈LOCC(AB→A′ B ′ ) 1 − 2ε 1+ε

where the supremum is also over all systems A′ and B ′ .

In the context where the single-shot distillable entanglement (and the entanglement cost)
is expressed as log(m) for some integer m ∈ N, it is useful to introduce a specific notation, ⪆,

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.4. SINGLE-SHOT DISTILLABLE ENTANGLEMENT 611

to denote a particular type of inequality between two real numbers a, b ∈ R. This notation
is defined as follows:
a ⩾ log 2b .

a⪆b ⇐⇒ (13.273)
This definition provides a convenient way to express inequalities that are relevant in the
quantification of entanglement, especially in scenarios involving logarithmic expressions
and
ε AB
integer values. With this in mind, our next goal is a lower bound on the Distill→ ρ . The
lower bound given below is known as the single-shot hashing bound.

Theorem 13.4.2. Let ρ ∈ Pure(ABE) and ε ∈ (0, 1). Then, for every 0 < δ < ε we
have
Distillε→ ρAB ⪆ Hmin
δ
(A|E)ρ + log(ε − δ)2 .

(13.274)

Remark. Observe that unlike the upper bound which is given in terms of the coherent in-
formation of entanglement, the lower bound above does not involve an optimization over
channels in CPTP(A → AX).
Proof. The main strategy of the proof is to use the upper bound given in Theorem 13.3.1
for the one-way convcersion distance. Specifically, let δ ∈ (0, 1) and let ρ̃ ∈ Bδ (ρAE ) be such
↑δ ↑
that Hmin (A|E)ρ = Hmin (A|E)ρ̃ . With these definitions we get
√ − 1 H ↑δ (A|E)ρ √ − 1 H ↑ (A|E)ρ̃
m2 2 min = m2 2 min
√ − 1 H̃ ↑ (A|E)ρ̃
↑
Hmin ⩽ H2↑ −−−−→ ⩾ m2 2 2

AB LOCC1 A′ B ′
(13.275)
(13.242)→ ⩾ T ρ̃ −−−→ Φm
′B′

LOCC1
Lemma 11.1.2→ ⩾ T ρAB −−−→ ΦAm −δ .

That is,
AB LOCC1 ′B′
√ − 1 H ↑δ (A|E)ρ
T ρ −−−→ ΦA
m ⩽ m2 2 min +δ . (13.276)
Therefore, for any ε ∈ (0, 1) and 0 < δ < ε we get that the one-way ε-distillable entanglement
satisfies
n o
ε AB AB LOCC1 A′ B ′

Distill→ ρ := max log m : T ρ −−−→ Φm ⩽ε
n √ 1 δ
o
(13.276)→ ⩾ max log m : m2− 2 Hmin (A|E)ρ + δ ⩽ ε
n o (13.277)
δ
= max log m : log m ⩽ Hmin (A|E)ρ + log(ε − δ)2
j δ (A|E)
k
2 Hmin
= log (ε − δ) 2 ρ
.

This completes the proof.

Exercise 13.4.3. Show that for any ρ ∈ Pure(ABE) and ε ∈ (0, 1) we have
Distillε→ ρAB ⪆ H̃2↑ (A|E)ρ + 2 log ε .

(13.278)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

612 CHAPTER 13. MIXED-STATE ENTANGLEMENT

13.5 Asymptotic Distillable Entanglement

The asymptotic distillable entanglement is defined for any ρ ∈ D(AB) as (cf. (11.93)
and (11.109))
nm o
LOCC
Distill ρAB := lim+ sup : T ρ⊗n −−−→ Φ⊗m

2 ⩽ ε . (13.279)
ε→0 n,m∈N n
From (11.112) it follows that the asymptotic distillable entanglement can be expressed as
1
Distill ρAB = lim+ lim sup Distillε ρ⊗n .

(13.280)
ε→0 n→∞ n

Similarly, the one-way ε-distillable entanglement is given by

1
Distill→ ρAB = lim+ lim sup Distillε→ ρ⊗n .

(13.281)
ε→0 n→∞ n

Exercise 13.5.1. Show that for any n ∈ N and any ρ ∈ D(AB) we have
1 1
Distill ρ⊗n Distill→ ρ⊗n .

Distill (ρ) ⩾ and Distill→ (ρ) ⩾ (13.282)
n n

13.5.1 Simple Lower Bound on the Distillable Entanglement

Our analysis begins with a simple lower bound.

The Hashing Bound

Theorem 13.5.1. Let ρ ∈ D(AB). Then,

Distill→ ρAB ⩾ I(A⟩B)ρ .

(13.283)

Remark. Since the distillable entanglement is always no smaller than the one-way distillable
entanglement, we also have
Distill ρAB ⩾ I(A⟩B)ρ .

(13.284)
Proof. Let ρABE ∈ Pure(ABE) be a purification of ρAB , and let ε, δ ∈ (0, 1) be such that
δ < ε. From the lower bound in (13.274) we get
1 ε ⊗n
1 δ n n

lim inf Distill→ ρ ⩾ lim inf Hmin (A |E )ρ⊗n + 2 log(ε − δ)
n→∞ n n→∞ n
1 δ
= lim inf Hmin (An |E n )ρ⊗n (13.285)
n→∞ n
Theorem 11.2.2→ = H(A|E)ρ
Duality relation (7.158)→ = −H(A|B)ρ = I(A⟩B)ρ .
Since the equation above holds for all ε ∈ (0, 1), it also holds if we take the limit ε → 0+ .
This completes the proof.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.5. ASYMPTOTIC DISTILLABLE ENTANGLEMENT 613

It is worth noting that the Hashing bound reveals that the distillable entanglement is
non-zero whenever the conditional entropy of ρAB is negative. As we discussed earlier, the
conditional entropy can only be negative for entangled states, which aligns with the fact
that only entangled states can possess non-zero distillable entanglement. However, in the
upcoming sections, we will discover that the converse statement is not true. Specifically,
there exist entangled states with zero distillable entanglement.

13.5.2 One-Way Distillable Entanglement

Although the hashing bound is a useful and straightforward lower bound on the distillable
entanglement, its tightness in general is not very clear. Interestingly, on pure states, the
hashing bound is equal to the distillable entanglement. This is because, for any pure state
ψ ∈ Pure(AB), the coherent information is given by
I(A⟩B)ψ = H(A)ψ , (13.286)
which is equal to the entropy of entanglement of the pure state ψ AB . For mixed states, we
have the following expression for the one-way distillable entanglement.

Theorem 13.5.2. Let ρ ∈ D(AB). Then, the one-way distillable entanglement of

ρAB is given by
Distill→ ρAB = E→reg
ρAB ,

(13.287)
reg
where E→ is the regularized coherent information of entanglement as defined
in (13.222).

Remark. It is worth noting that the theorem provides an operational interpretation for the
coherent information of entanglement as the one-way distillable entanglement.
Proof. For the direct part (i.e. achievability), observe that any quantum instrument E ∈
CPTP(A → AX) can be considered as a special type of LOCC1 (i.e. Alice implies the
instrument {Ex }x∈[m] on her system and sends the outcome x to Bob). Since the single-shot
one-way distillable entanglement behaves monotonically under such LOCC1 we get that for
all ε ∈ (0, 1)
Distillε→ ρAB ⩾ Distillε→ σ ABX ,

(13.288)

where σ ABX := E A→AX ρAB . Since for every n ∈ N the equation above also holds with n
copies of ρ and σ we get
1 1
Distillε→ ρ⊗n ⩾ lim inf Distillε→ σ ⊗n

lim inf
n→∞ n n→∞ n (13.289)
(13.285)→ ⩾ I(A⟩BX)σ
Since the inequality above holds for all E ∈ CPTP(A → AX) we conclude that
1
Distillε→ ρ⊗n ⩾ E→ ρAB .

lim inf (13.290)
n→∞ n

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

614 CHAPTER 13. MIXED-STATE ENTANGLEMENT

We would like to replace the right-hand side above with the regularized version of E→ . For
this purpose, fix k ∈ N and observe that by applying the above inequality for ρ⊗k ∈ D Ak B k
we get
1 1
Distillε→ ρ⊗kn ⩾ E→ ρ⊗k .

lim inf (13.291)
n→∞ kn k
We next show that the left-hand side of the two equations above coincide. Indeed, by
definition of the lim inf, the left-hand side of (13.290) is no greater than the left-hand side
of (13.291). For the converse, let {nj }j∈N be a subsequence of integers such that
1 1
Distillε→ ρ⊗n = lim Distillε→ ρ⊗nj .

lim inf (13.292)
n→∞ n j→∞ nj
n
Now, for any j ∈ N, set mj := k kj ; i.e. mj is the largest multiple of k that is no greater
than nj . In particular, note that nj − k < mj ⩽ nj . Then,
1 1
Distillε→ ρ⊗kn ⩽ lim inf Distillε→ ρ⊗mj

lim inf
n→∞ kn j→∞ mj

1
Distillε→ ρ⊗nj

mj ⩽ nj −−−−→ ⩽ lim inf
j→∞ mj
(13.293)
1
Distillε→ ρ⊗nj

nj − k < mj ⩽ nj −−−−→ = lim
j→∞ nj

1
(13.292)→ = lim inf Distillε→ ρ⊗n ,

n→∞ n

where the first inequality follows from the fact that {mj }j∈N is a subset of {kn}n∈N , the second
inequality from the fact that Distillε→ (ρ⊗mj ) ⩾ Distillε→ (ρ⊗nj ) as mj ⩾ nj , and the third in
n
equality from the fact that m1j = mjj n1j and limj→∞ nj /mj = 1 (since nj −k < mj ⩽ nj ). This
completes the proof that the left-hand side of (13.290) equals the left-hand side of (13.291)
so that
1 1
lim inf Distillε→ ρ⊗n ⩾ E→ ρ⊗k .

(13.294)
n→∞ n k
Since the equation above holds for all k ∈ N it must also hold for the limit k → ∞; that is,
we conclude that
1
lim inf Distillε→ ρ⊗n ⩾ E→ reg
ρAB .

(13.295)
n→∞ n

For the converse inequality we apply the upper bound (13.269) with ρ⊗n instead of ρ.
Explicitly, observe that for any ε ∈ (0, 1)

1 ε ⊗n
1 1 ⊗n
1+ε ε
lim sup Distill→ ρ ⩽ lim sup E→ ρ + h
n→∞ n n→∞ n 1 − 2ε 1 − 2ε 1+ε
(13.296)
1 reg AB

= E ρ .
1 − 2ε →
Taking the limit ε → 0+ and combining the resulting inequality with (13.295) we conclude
that the equality in (13.287) holds.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.5. ASYMPTOTIC DISTILLABLE ENTANGLEMENT 615

In the final step of the proof just discussed, we had to consider the limit as ε approaches
0 (from the positive side). It remains unclear to the author whether this step is essential,
and whether the following equality holds true:
1
lim Distillε→ ρ⊗n = E→ reg
ρAB

(13.297)
n→∞ n

for all ε within the interval (0, 1). This point raises an interesting question in the study
of quantum information theory, particularly regarding the behavior of the distillable en-
tanglement under asymptotic conditions. The uncertainty here revolves around whether
reg
the regularized entanglement measure, E→ , aligns with the distillable entanglement rate,
1 ε ⊗n
n
Distill → (ρ ), for any non-zero ε. Resolving this would contribute to a deeper understand-
ing of entanglement properties in quantum systems.

13.5.3 Bound Entanglement

The following theorem presents a fascinating discovery in quantum mechanics, demonstrat-
ing that certain entangled states have a distillable entanglement of zero. This means that
their entanglement is bounded, and they cannot be transformed into a pure bipartite form,
making them an example of bound entanglement. Despite its inability to be distilled, bound
entanglement still has practical applications in quantum information theory.
One important application of bound entanglement is in quantum cryptography, where it
can be used as a resource for various cryptographic protocols. For example, bound entangled
states can be used to implement quantum key distribution (QKD) protocols, which allow
two parties to establish a secure shared key even in the presence of an eavesdropper. Another
potential application of bound entanglement is in quantum communication networks, where
it can be used as a resource for distributing entanglement between multiple parties. This
could be useful for tasks such as quantum teleportation, where entanglement is used to
transmit quantum information between distant locations.
Overall, while bound entanglement cannot be used for some tasks that pure entanglement
can, it still has practical uses in quantum information processing and communication. From
a fundamental point of view, this remarkable result challenges our understanding of entan-
glement and highlights the complexity of quantum systems. It underscores the importance of
exploring and understanding the different types of entanglement that can exist in quantum
systems. It also presents a new avenue for research, as scientists continue to investigate the
properties and applications of bound entanglement.

Theorem 13.5.3. Let ρ ∈ D(AB) be an entangled state with a positive partial

transpose. Then,
Distill ρAB = 0 .

(13.298)

Proof. Suppose by contradiction that Distill ρAB > 0. This in particular means that there
exists n ∈ N such that ⊗n LOCC A′ B ′
ρAB −−−→ σ , (13.299)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

616 CHAPTER 13. MIXED-STATE ENTANGLEMENT

for some two-qubit entangled state σ ∈ D(A′ B ′ ) with |A′ | = |B ′ | = 2. However, since the
logarithmic negativity is additive we get

LN ρ⊗n = nLN ρAB

(13.300)
ρAB is PPT→ = 0 ,

whereas ′ ′
LN σ A B > 0 (13.301)
′ ′
since σ A B is a two-qubit entangled state and from Theorem 13.1.4 it follows that it is NPT
(recall that the logarithmic robustness is strictly positive on NPT states). We therefore get
that ′ ′
LN ρ⊗n < LN σ A B

(13.302)

in contradiction with (13.299) and the fact that the logarithmic robustness is a measure of
entanglement and therefore cannot increase by LOCC. This completes the proof.
By applying the hashing bound, we can deduce that if H(A|B)ρ < 0, then the distillable
entanglement of ρAB is strictly positive. Combining this with the theorem mentioned above,
we can conclude that if ρ ∈ D(AB) is PPT, then its conditional entropy H(A|B)ρ ⩾ 0. This
observation is consistent with the reduction criterion discussed in Section 13.1.3, as Corol-
lary 7.3.1 and Theorem 7.3.1 show that states satisfying the reduction criterion (particularly
PPT states) have non-negative conditional entropy.

Exercise 13.5.2. Let ρ ∈ Pure(ABC) be a tripartite pure state. Show that if its marginals
satisfy I A ⊗ ρC > ρAC then Distill ρAB > 0.

Theorem 13.5.3 states that PPT entangled states have zero distillable entanglement. This
result is an important insight into the relationship between entanglement and the partial
transpose operation. However, it raises the question of whether the converse of this property
also holds. That is, are all entangled states with zero distillable entanglement (i.e., bound
entangled states) necessarily PPT? This is one of the most challenging and long-standing
open problems in quantum information theory, and despite significant efforts over the past
two decades, the answer is still unknown. Despite the current lack of a definitive answer to
this question, research in this area continues to progress, with new insights and techniques
being developed to study the properties of entangled states and their relation to the partial
transpose operation.

13.6 Single-Shot Entanglement Cost

In this section we compute the single-shot entanglement cost of a bipartite mixed entangled
state ρ ∈ D(AB). For any ε ∈ (0, 1), we define the ε-single-shot entanglement cost as
n o
ε AB
2 LOCC AB
Cost ρ := min log m : P Φm −−−→ ρ ⩽ε , (13.303)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.6. SINGLE-SHOT ENTANGLEMENT COST 617

where Φm is the maximally entangled state in D(A′ B ′ ) with m := |A′ | = |B ′ |. Since all
metrics in finite dimensional Hilbert spaces are topologically equivalent, we chose the square
of the purified distance as it is easier to work with, and in particular, has the form given in
Theorem 13.3.2.

The Zero-Error Cost

Before providing a formula for this ε-single-shot entanglement cost, we first consider the case
ε = 0. In this case, observe that
n o
LOCC
Costε=0 ρAB = min log m : Φm −−−→ ρAB

(13.304)
(13.121)→ = log SR ρAB .

That is, the logarithm of the Schmidt rank of ρAB offers an operational interpretation as the
zero-error entanglement cost of ρAB . Following this, we demonstrate that SR(ρAB ) bears a
close relationship to the conditional max-entropy. To elaborate further, let’s first present an
alternative method for describing the convex roof extension.

Definition 13.6.1. Let ρ ∈ D(AB) be a bipartite density matrix, and let X be a

classical system of dimension k. We say that ρXAB is a regular extension of ρAB if
there exists a pure state decomposition of ρAB = x∈[k] px ψxAB such that
P

X
ρXAB := px |x⟩⟨x|X ⊗ ψxAB . (13.305)
x∈[k]

Note that if {ψxAB }x∈[k] in the definition above were composed of mixed states instead of
pure states, then ρXAB would not necessarily qualify as a regular extension, even though it
would still be an extension of ρAB . Moreover, we show now that the Schmidt rank of ρAB can
be expressed as an optimization problem over all marginal cq-states ρXA that results from
regular extensions. Explicitly, if ρXAB is a regular extension of ρAB , as given in (13.305),
then the marginal cq-state, ρXA , has the form
X
ρXA = px |x⟩⟨x|X ⊗ ρA
x , (13.306)
x∈[k]
AB
where ρAx := TrB ψx . By definition, SR(ψxAB ) = Tr ΠAρx , where Πρx ∈ Pos(A) is the
projection in A to the support of ρA
x . Combining this with the relation (13.122) we can
AB
express the Schmidt rank of ρ as

SR ρAB = inf max Tr ΠA

ρx , (13.307)
ρABX x∈[k]

where the maximum is over all regular extensions ρABX of ρAB . The expression above can
be rewritten in terms of the conditional max-entropy of the state ρAX . To see this, let ΠXA
ρ

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

618 CHAPTER 13. MIXED-STATE ENTANGLEMENT

be the projection to the support of ρXA defined as

X
ΠXA
ρ = |x⟩⟨x|X ⊗ ΠA
ρx , (13.308)
x∈[k]

and observe that from its definition

Hmax (A|X)ρ = max −Dmin ρXA τ X ⊗ I A

τ ∈D(X)

= max log Tr ΠXA τ X ⊗ IA

τ ∈D(X)
ρ (13.309)
= log max Tr ΠA

ρx
x∈[k]

where in the last line we replaced the maximum over all τ ∈ D(X) with a maximum over all
x ∈ [k]. Using this observation in conjunction with (13.307), we can express the logarithm
of the Schmidt rank of ρAB as

log SR ρAB = inf Hmax (A|X)ρ .

(13.310)
ρABX

We therefore conclude that the zero-error entanglement cost of ρAB is given by

Costε=0 ρAB = inf Hmax (A|X)ρ .

(13.311)
ρABX

where the infimum is over all classical systems X and all regular extensions ρABX of ρAB . In
the following exercise you show that we can remove the restriction to regular extensions.

Exercise 13.6.1. Show that (13.311) still holds even if we take the infimum over all classical
systems X and all extensions ρABX of ρAB .

Exercise 13.6.2. Let ρ ∈ D(AB). Show that the entanglement of formation of ρAB can be
expressed as
EF ρAB := inf H(A|X)ρ

(13.312)
ρABX

where H(A|X)ρ is the von-Neumann conditional entropy, and the infimum is over all classical
systems X and all extensions ρABX of ρAB .

Exercise 13.6.3. Let ρ ∈ D(AB) and n ∈ N.

⊗n
1. Show that if ρABX is a regular extension of ρAB then ρABX is a regular extension
⊗n
of ρAB .
n n n ⊗n
2. Show that there exists a density matrix ω A B X that is a regular extension of ρAB
⊗n
but it does not have the form σ ABX (for some σ ∈ D(XAB)).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.6. SINGLE-SHOT ENTANGLEMENT COST 619

The General Case

From the above discussion, for every ε ∈ (0, 1) the ε-single-shot entanglement cost can be
expressed as

Costε ρAB = min log SR ω AB : P 2 ω AB , ρAB ⩽ ε , ω ∈ D(AB)

(13.313)
= inf Hmax (A|X)ω : P 2 ω AB , ρAB ⩽ ε

ω ABX

where the second infimum is over all density matrices ω AB , all classical systems X, and
over all regular extensions ω ABX of ω AB . Given the complexity of the above expression,
we’ll transition to an alternative approach, specifically employing the formula presented in
Theorem 13.3.2 for the conversion distance.

Theorem 13.6.1. Let ρ ∈ D(AB). Then, the ε-single-shot entanglement cost is

given by
Costε ρAB = inf Hmaxε

(A|X)ρ . (13.314)
ρABX
ε
where Hmax (A|X)ρ is the smoothed conditional max entropy (see remark below), and
the infimum is over all classical systems X and all extensions ρABX of ρAB .

Remark. Exercise 13.6.1 allows us to limit the infimum in (13.314) to regular extensions
ρABX of ρAB . Additionally, from (13.309), the smoothed version of Hmax (A|X)ρ is expressed
as:
ε
(A|X)ρ = min max log Tr ΠA

Hmax ωx , (13.315)
ω∈Bε (ρXA ) x∈[k]

where ωxA := q1x TrX |x⟩⟨x|X ⊗ I ω XA with qx := Tr |x⟩⟨x|X ⊗ I A ω XA . Interestingly,

the optimization’s minimizer ω XA can be an m-pruned version of ρXA (as referenced in

(5.156)). Therefore, before delving into the proof of Theorem 13.6.1, we first establish
XA
that the
Pminimizer ω in the above optimization is the m-pruned state of the cq-state
ρ = x∈[k] px |x⟩⟨x| ⊗ ρA
XA
x . This pruned state is given by:

X
ρ(m) = px |x⟩⟨x| ⊗ ρ(m)
x , (13.316)
x∈[k]

(m)
where each ρx is the m-pruned version of ρx as defined in (5.156).

Exercise 13.6.4. Use (12.60) to show that if ρ(m) ̸= ρXA then H(A|X)ρ(m) = log m.

Lemma 13.6.1. Let ε ∈ (0, 1), d := |A|, ρ ∈ D(XA), and for all m ∈ [d] let
ρ(m) ∈ D(XA) be the m-pruned version of ρXA as defined in (13.316). Then,
ε
(A|X)ρ = min log m : ρ(m) ≈ε ρXA

Hmax (13.317)
m∈[d]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

620 CHAPTER 13. MIXED-STATE ENTANGLEMENT

Remark. Observe that the trace distance between ρXA and its m-pruned version is given by
1 XA X 1
ρ − ρ(m) 1 = px ρx − ρ(m)x 1
2 2
x∈[k]
X
(5.157)→ = px 1 − ∥ρx ∥(m) (13.318)
x∈[k]
X
=1− px ∥ρx ∥(m) .
x∈[k]

Therefore, (13.317) can also be written as

n X o
ε
Hmax (A|X)ρ = min log m : px ∥ρx ∥(m) ⩾ 1 − ε . (13.319)
m∈[d]
x∈[k]

Proof. By definition,
ε
(A|X)ρ = min Hmax (A|X)ω : ω XA ≈ε ρXA

Hmax
(m)
≈ε ρXA

Restricting ω = ρ(m) −−−−→ ⩽ min Hmax (A|X)ρ(m) : ρ
m∈[d] (13.320)
Exercise 13.6.4→ = min log m : ρ(m) ≈ε ρXA .

m∈[d]

For the converse direction, suppose by contradiction that (see (13.319))

n X o
ε
Hmax (A|X)ρ < min log m : px ∥ρx ∥(m) ⩾ 1 − ε . (13.321)
m∈[d]
x∈[k]
P
Let m be the minimizer of the right-hand side above, so that x∈[k] px ∥ρx ∥(m) ⩾ 1 − ε.
ε
Further, let ω ∈ D(XA) be such that Hmax (A|X)ρ = Hmax (A|X)ω . Then, by the assumption
log m > Hmax (A|X)ω
cf. (13.309)→ = log max Tr ΠA
(13.322)
ωx ,
x∈[k]

so that Tr ΠA ωx < m for all x ∈ [k]. Thus, since the
Arank of each ωx is no greaterXAthan
m − 1 we get that for any x ∈ [k], ρA x (m−1) ⩾ Tr ρ Π A
x ωx . Therefore, denoting Π ω :=
X A
P
x∈[k] |x⟩⟨x| ⊗ Πωx we obtain (cf. (5.159))
X
px ρA
XA XA
x (m−1) ⩾ Tr ρ Πω
x∈[k]

ρXA − ω XA ΠXA
h i
Tr ω XA ΠXA
ω =1 −−−−→ = 1 + Tr ω
h i (13.323)
ρXA − ω XA − ΠXA

η ⩾ −(η)− ∀η ∈ Herm(XA) −−−−→ ⩾ 1 − Tr ω
XA XA

ΠXA
ω ⩽ I XA −−−−→ ⩾ 1 − Tr ρ −ω −
⩾1−ε,

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.6. SINGLE-SHOT ENTANGLEMENT COST 621

where we used the fact that Tr ρXA − ω XA − = 21 ∥ω XA −ρXA ∥1 ⩽ ε. Therefore we get a con-

tradiction with the assumption that m was the minimizer of the right-hand side of (13.321).
This completes the proof.
We are now ready to prove Theorem 13.6.1.
Proof of Theorem 13.6.1. From Theorem 13.3.2 it follows that the conversion distance that
appears in (13.303) can be expressed as
X
LOCC
P 2 Φm −−−→ ρAB = min px 1 − ρA x (m) (13.324)
ρXAB
x∈[k]

where the minimum is over all regular extensions ρXAB of ρAB , with the same notations as
in (13.306). We therefore get that the ε-single-shot entanglement cost as defined in (13.303)
can be expressed as
n X o
Costε ρAB = min log m : max px ρ A

x (m) ⩾ 1 − ε
m∈[d] ρXAB
x∈[k]
n X o
Exercise 13.6.5→ = min min log m : px ρA
x (m) ⩾1−ε (13.325)
ρXAB m∈[d]
x∈[k]
ε
(13.319)→ = min Hmax (A|X)ρ ,
ρXAB

where the maximum is overl all regular extensions ρXAB of ρAB . This completes the proof.

Exercise 13.6.5. Let λ ∈ R+ , K1 , K2 ⊆ D(A) be two subsets of density matrices, and

f : D(A) → R+ and g : D(A) × D(A) → R+ be two functions.
1. Show that
n o n o
inf f (ρ) : sup g(ρ, σ) ⩾ λ = inf inf f (ρ) : g(ρ, σ) ⩾ λ (13.326)
ρ∈K1 σ∈K2 σ∈K2 ρ∈K1

2. Use similar lines to prove the second equality in (13.325)

Exercise 13.6.6. Consider the two formulas given in (12.101) and (13.314) for the ε-single-
shot entanglement cost of pure and mixed states, respectively.
1. Show that the two formulas coincide when ρAB = ψ AB is a pure state.

2. Without assuming (13.314), use Theorem 5.4.3 to show (by direct calculation) that for
the pure state case, the formula in (12.101) can be expressed as

Costε ψ AB = Hmaxε
(ρA ) := inf Hmax (σ A ) ,

(13.327)
σ∈Bε (ρ)

where ρA := TrB ψ AB .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

622 CHAPTER 13. MIXED-STATE ENTANGLEMENT

13.7 Asymptotic Entanglement Cost

The asymptotic entanglement cost is defined for any ρ ∈ D(AB) as (cf. (11.94) and (11.110))
nn o
LOCC
Cost ρAB := lim+ inf : T Φ⊗n ⊗m

2 − −−→ ρ ⩽ ε (13.328)
ε→0 n,m∈N m

Recall from Exercise 11.5.2 that the above cost does not change if we replace the trace dis-
tance with the square of the purified distance, as the two metrics are topologically equivalent.
Thus, from (11.111) it follows that the asymptotic entanglement cost can be expressed as
1
Cost ρAB := lim+ lim inf Costε ρ⊗n .

(13.329)
ε→0 n→∞ n

Theorem 13.7.1. Let ρ ∈ D(AB). Then, the entanglement cost of ρAB can be
expressed as
1
Cost ρAB = EFreg ρAB := lim EF ρ⊗n

(13.330)
n→∞ n

where EF is the entanglement of formation (see Definition 13.2.1).

Proof. We first prove that Cost ρAB ⩽ EFreg ρAB . From Theorem 13.6.1 we have

⊗n
Costε ρAB ε
= inf Hmax (An |Yn )ρn
ρn
⊗n (13.331)
ε
(An |X n )ρ⊗n

taking ρn = ρXAB −−−−→ ⩽ inf Hmax
ρXAB

where the first infimum is over all classical systems Yn and all regular extensions ρn ∈
AB ⊗n
n n

D(Yn A B ) of ρ , and the second infimum is over all regular extensions ρ ∈ D(XAB)
AB XAB
of ρ . For theinequality above we used the fact that if ρ is a regular extension of
AB XAB ⊗n AB ⊗n
ρ then ρ is a regular extension of ρ (see Exercise 13.6.3). Therefore, the
entanglement cost satisfies
1 ε
Cost ρAB ⩽ lim+ inf lim inf Hmax (An |X n )ρ⊗n

ε→0 ρXAB n→∞ n

AEP of the form (10.140) −−−−→ = inf H(A|X)ρ (13.332)

ρXAB

Exercise 13.6.2→ = EF ρAB .

Thus, EF ρAB ⩾ Cost ρAB . Repeating the same argument with m ∈ N copies of ρAB
gives EF (ρ⊗m ) ⩾ Cost (ρ⊗m ). Combining this with (11.120) we get
1 1
Cost ρAB ⩽ Cost ρ⊗m ⩽ EF ρ⊗m .

(13.333)
m m
Since the inequality
AB
reg AB
holds for all integers m, it also holds in the limit m → ∞. Hence,
above
Cost ρ ⩽ EF ρ .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.7. ASYMPTOTIC ENTANGLEMENT COST 623

For the converse inequality, observe that

⊗n
Costε ρAB ε
= inf Hmax (An |Yn )ρn
ρn

By definition→ = inf min Hmax (An |Yn )ω

ρn ω∈Bε (ρYn An )
n
(13.334)
Hmax (A|X) ⩾ H(A|X) −−−−→ ⩾ inf min H(An |Yn )ω .
ρn ω∈Bε (ρYn An )
n

Now, the asymptotic continuity property (10.51) of the conditional entropy gives for any
n
ω ∈ Bε ρYnn A
H(An |Yn )ω ⩾ H(An |Yn )ρn − n log |A|f (ε) . (13.335)
Substituting this into (13.334) gives
Costε ρ⊗n ⩾ inf H(An |Yn )ρn − n log |A|f (ε)

ρn
(13.336)
Exercise 13.6.2→ = EF ρ⊗n − n log |A|f (ε) .

Dividing both sides by n and taking the limit n → ∞ followed by ε → 0+ gives Cost ρAB ⩾
EFreg ρAB . This completes the proof.

If the entanglement of formation (EF ) were additive under tensor products, determining
EFreg ρAB would be a more straightforward task. For quite some time, a prevalent belief
among researchers in the field was that EF is indeed additive, implying its equivalence to
the entanglement cost. However, in a pivotal development in 2008, Hastings refuted this
additivity conjecture, demonstrating that the entanglement of formation is generally not
additive. From its definition, for any states ρ ∈ D(AB) and σ ∈ D(A′ B ′ ), the following
inequality holds: ′ ′
′ ′
EF ρAB ⊗ σ A B ⩽ EF ρAB + EF σ A B .

(13.337)
Hastings’ result indicates that this inequality can be strict, even when ρ is equal to σ.
Notably, Hastings’ proof is existential, meaning it establishes the existence of such non-
additivity without providing an explicit counterexample. To date, an explicit example where
EF is not additive has not been identified, but Hastings’ contribution significantly altered our
understanding on this problem (further details can be found in the ”Notes and References”
section at the end of this chapter).
While the entanglement of formation (EoF) is not generally additive, it can be additive
for certain specific states, allowing for efficient computation of their entanglement cost. An
interesting concept relevant in this context is that of an “entanglement breaking subspace”.

Definition 13.7.1. A bipartite subspace K ⊂ AB is said to be entanglement

breaking subspace (EBS) if for every other bipartite system system A′ B ′ and any
pure state in K ⊗ A′ B ′ , its reduced density matrix that obtained after tracing out
system B can be written as a separable state between system A and the joint system
A′ B ′ .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

624 CHAPTER 13. MIXED-STATE ENTANGLEMENT

To be more precise, a subspace K is an EBS if, for every ψ ∈ Pure (K ⊗ A′ B ′ ), we have:

h i
ABA′ B ′ A′ B ′
X
TrB ψ = px ϕA
x ⊗ φx , (13.338)
x∈[m]

where m ∈ N, p = (p1 , . . . , pm )T ∈ Prob(m), and ϕx ∈ Pure(A) and φx ∈ Pure(A′ B ′ ) for

each x ∈ [m]. Note that requiring K to be a subspace is a strong condition. Even if two
states in Pure (K ⊗ A′ B ′ ) satisfy the above condition, it is not clear why all their linear
combinations would satisfy it as well. This raises the question of whether EBSs exist. In the
following discussion, we present a couple of examples of EBSs.
We start with an example in two qubits. Let K be the subspace spanned by the two
product states |00⟩ and |11⟩. We argue now that this subspace of C2 ⊗ C2 is indeed an EBS.
Indeed, any state in ψ ∈ Pure (K ⊗ A′ B ′ ) can be expressed as
′ ′ ′ ′ ′ ′
|ψ ABA B ⟩ = |00⟩AB ⊗ |ϕA
0
B
⟩ + |11⟩AB ⊗ |ϕA
1
B
⟩ (13.339)

where ϕAB0 and ϕAB

1 are some subnormalized pure states. Clearly, the reduced density matrix
′
AA B ′
ψ has the form
′ ′ ′ ′ ′ ′
ψ AA B = |0⟩⟨0|A ⊗ ϕA
0
B
+ |1⟩⟨1|A ⊗ ϕA
1
B
, (13.340)

which is a separable state between A and A′ B ′ .

In the second example, we consider the 2-qutrit space C3 ⊗ C3 . In Exercise 13.7.1 you
show that its antisymmetric subspace that is spanned by the three states
1
|χAB
1 ⟩ := √ (|01⟩ − |10⟩)
2
1
|χAB
2 ⟩ := √ (|12⟩ − |21⟩) (13.341)
2
1
|χAB
3 ⟩ := √ (|20⟩ − |02⟩)
2
is an EBS.
Exercise 13.7.1. Let K be the subspace spanned by the three vectors above. Show that:

1. For all x, y ∈ [3] we have TrB |χAB AB A
x ⟩⟨χy | = δxy I − |y⟩⟨x|
A
.

2. The reduced density matrix of every ϕAB ∈ Pure (K) of the form |ϕAB ⟩ = x∈[3] λx |χAB
P
x ⟩
(with λx ∈ C) can be expressed as
1 1
TrB [ϕAB ] = I A − φT (13.342)
2 2
where |φA ⟩ := λx |x⟩A .
P
x∈[3]

3. The subspace K is EBS.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.7. ASYMPTOTIC ENTANGLEMENT COST 625

Theorem 13.7.2. Let ρ ∈ Pure(AB) be a bipartite quantum state whose support is

an EBS. Then, for every bipartite system A′ B ′ and every σ ∈ D(A′ B ′ ) we have
′ ′
′ ′
EF ρAB ⊗ σ A B = EF ρAB + EF σ A B .

(13.343)

In particular, the entanglement cost of ρAB equals its entanglement of formation.

Proof. Due to (13.337) it is sufficient to prove that

′ ′
′ ′
EF ρAB ⊗ σ A B ⩾ EF ρAB + EF σ A B .

(13.344)

Denote by K := supp ρAB . The key idea is to first show that for every ψ ∈ Pure (K ⊗ A′ B ′ )

we have ′ ′
′ ′
E ψ ABA B ⩾ EF ψ AB + EF ψ A B ,

(13.345)
where E on the left-hand side is the entropy of entanglement between systems AA′ and BB ′ ,
′ ′
and for simplicity of notations we use ψ AB and ψ A B to denote the mixed marginal states of
′ ′ ′ ′
ψ ABA B . To see why the above inequality holds, recall that ψ ABA B belongs to an EBS so
that we can express it as
′ ′
X√
A′ B ′
|ψ ABA B ⟩ := px |x⟩B ⊗ |ϕA
x ⟩ ⊗ |φx ⟩ (13.346)
x∈[m]

where m ∈ N, p = (p1 , . . . , pm )T ∈ Prob(m), and ϕx ∈ Pure(A) and φx ∈ Pure(A′ B ′ ) for

each x ∈ [m]. We therefore get from the definition of the entropy of entanglement that
′ ′

E ψ ABA B = H (AA′ )ψ (13.347)

where H(AA′ )ψ is the von-Neumann entropy of the marginal state

′
h ′ ′i
A′ A′
X
ψ AA := px ϕA
x ⊗ φx where φx := Tr B φx
′
AB
. (13.348)
x∈[m]

Now, let C be an m-dimensional system and let σ ∈ D(AA′ C) be the state

′ A′
X
σ AA C := px ϕA C
x ⊗ φx ⊗ |x⟩⟨x| . (13.349)
x∈[m]

Then, from the strong subadditivity as given in (7.134) and the fact that the marginal
′ ′
σ AA = ψ AA we get
H(AA′ )ψ = H(AA′ )σ
Strong subadditivity (7.134)→ ⩾ H(A)σ + H(AA′ C)σ − H(AC)σ
X ′ (13.350)
Exercise 13.7.2→ = H ψ A + px H φAx .
x∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

626 CHAPTER 13. MIXED-STATE ENTANGLEMENT

A
AB

Now, from (13.69) we have H ψ ⩾ E F ψ . Moreover, by definition, for every x ∈ [m]
′ A′ B ′

we have H φA x = E φ x . Combining this with the equation above and with (13.347)
gives ′ ′
′ ′ X
E ψ ABA B ⩾ EF ψ AB + px E φA
x
B

x∈[m] (13.351)
′ ′
AB AB

⩾ EF ψ + EF ψ .
This completes the proof.
Exercise 13.7.2. Prove the last equality in (13.350).
Exercise 13.7.3. Compute the entanglement cost of the state
ρAB := pΦAB AB
+ + (1 − p)Φ− , (13.352)
for all p ∈ [0, 1], where
1
|ΦAB
± ⟩ := √ (|00⟩ ± |11⟩) . (13.353)
2

13.8 Beyond LOCC: Non-Entangling Operations

As we have discussed earlier, the study of mixed-state entanglement is significantly more
complex than that of pure bipartite entanglement, leading to numerous unresolved issues.
The intricate nature of LOCC is the main contributing factor to this difficulty. To address
this challenge, this section focuses on examining entanglement theory under different sets of
operations that are comparatively simpler to handle. Specifically, we explore briefly in this
section the theory of entanglement under RNG operations, also referred to in entanglement
theory as non-entangling operations.
A quantum channel E ∈ CPTP(AB → A′ B ′ ) is considered non-entangling if, for any
σ ∈ SEP(AB), E(σ) ∈ SEP(A′ B ′ ). It should be emphasized that the relative entropy of en-
tanglement is a measure of entanglement that behaves monotonically under non-entangling
operations. This subsection is dedicated to the calculation of the single-shot entanglement
cost, as well as the distillable entanglement, under non-entangling operations. The calcu-
lation reveals that (1) the operational meaning of the relative entropy of a resource, as
measured by hypothesis testing divergence, is the single-shot distillable entanglement and
(2) the smoothed logarithmic robustness can be interpreted as the single-shot entanglement
cost.

13.8.1 The Single-Shot Entanglement Cost

In order to compute the single-shot entanglement cost under non-entangling operations, we
first need simplify the expression for the conversion distance:
′ ′ 1 ′ ′
B RNG A′ B ′ →AB
T ΦA m −−→ ρ AB
= min ρ AB
− E ΦA
m
B
. (13.354)
2 E∈RNG 1

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.8. BEYOND LOCC: NON-ENTANGLING OPERATIONS 627

′ ′
Since ΦA
m
B
is invariant under the action of the (self-adjoint) twirling map
′ ′ Z ′ ′ ∗
G ω A B = dU U ⊗ U ω A B U ⊗ U ∀ ω ∈ L(A′ B ′ ) , (13.355)

we can replace E in (13.354) with E ◦ G, or in other words, we can assume without loss
of generality that E = E ◦ G. Any such non-entangling (RNG) operation has the form
(see (3.256))
E(ω) = Tr [(I − Φm )ω] σ AB + Tr [Φm ω] η AB ∀ ω ∈ D(A′ B ′ ) , (13.356)
where σ, η ∈ D(AB). Note that the channel E is RNG (i.e., non-entangling) if and only if
Tr [(I − Φm )ω] σ + Tr [Φm ω] η ∈ SEP(AB) ∀ ω ∈ SEP(A′ B ′ ) . (13.357)
Now, recall that the density state τ := (I − Φm )/(m2 − 1) is a separable isotropic state
(see (13.19)). Taking ω = τ above we get E(τ ) = σ. Therefore, since τ is a separable
state we getthat σ must be separable as well. More generally, from (13.26) we have that
Tr Φm ω AB ⩽ m1 for all separable states ω ∈ SEP(AB). Therefore, the condition in (13.357)
AB

holds if and only if

1 AB m − 1 AB
η + σ ∈ SEP(AB) . (13.358)
m m
To summarize, the channel E as given in (13.356) is RNG if and only if σ ∈ SEP(AB) and
the equation above holds. Hence,
′ ′
B RNG
1 AB
T ΦA m −−→ ρ AB
= min ρ − η AB 1 (13.359)
2
where the minimum is over all η ∈ D(AB) that satisfies (13.358) with some σ ∈ SEP(AB).
The robustness measure of entanglement of the state η AB is defined as (see (10.2.2))

AB η + sσ
R(η ) := min s ⩾ 0 : ∈ SEP(AB) , σ ∈ SEP(AB) . (13.360)
1+s
Comparing this with the expression for the conversion distance above we get that conversion
distance can be expressed compactly as
′ ′ 1
B RNG
T ΦA m −−→ ρ AB
= min ρAB − η AB 1 . (13.361)
2 η∈D(AB)
R(η)⩽m−1

In other words, the conversion distance above can be interpreted as the distance of ρAB to
the set of states with robustness no greater than m − 1.
Using the compcat expression for the conversion distance above, we get that for any
ε ∈ (0, 1), the ε-single-shot entanglement cost under non-entangling operations is given by
n 1 AB o
Costε (ρAB ) = min log m : ρ − η AB 1 ⩽ ε , R η AB ⩽ m − 1 .

(13.362)
m∈N 2
That is,
Costε ρAB = log 1 + Rε ρAB = LRε ρAB .

(13.363)
The formula above provides an operational interpretation for the smoothed logarithmic ro-
bustness as the single-shot entanglement cost under non-entangling operations.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

628 CHAPTER 13. MIXED-STATE ENTANGLEMENT

13.8.2 Single-Shot Distillable Entanglement

The conversion distance is given by
′B′
h ′ ′ i
RNG A B AB→A′ B ′
T ρAB −−→ ΦA m = 1 − sup Tr Φm E ρ AB
. (13.364)
E∈RNG

Due to the symmetry of Φm we can assume without loss of generality that E = G ◦ E so that
the non-entangling operation E has the form (see (3.257))
′ ′ ′B′
E(ω) = 1 − Tr [Λω] τ A B + Tr [Λω] ΦA
m ∀ ω ∈ L(AB) , (13.365)

where Λ ∈ Eff(AB), and

′ ′
A′ B ′ I A B − Φm
τ := . (13.366)
m2 − 1
Observe that for ω ∈ D(AB) the state E(ω) is an isotropic state, and therefore is entangled
if and only if Tr[Λω] ⩽ m1 (we used again (13.19)). Since Λ = E ∗ (Φm ) we conclude that

RNG
1
T ρ −−→ Φm = 1 − sup Tr [Λρ] : Tr [Λσ] ⩽ , ∀ σ ∈ SEP(AB) . (13.367)
Λ∈Eff(AB) m

Therefore, the distillation is given by

n o
RNG
Distillε (ρ) := sup log m : T ρ −−→ Φm ⩽ ε
m∈N

1
= sup log m : Tr [Λρ] ⩾ 1 − ε, max Tr [Λσ] ⩽ , , Λ ∈ Eff(AB) .
m∈N σ∈SEP(AB) m
(13.368)
The condition that m is no greater than the reciprocal of maxσ∈SEP(AB) Tr [Λσ] implies the
following:
Distillε (ρ) ⩽ − log min max Tr [Λσ]
Λ∈Eff(AB) σ∈SEP(AB)
Tr[Λρ]⩾1−ε

= − log max min Tr [Λσ]

σ∈SEP(AB) Λ∈Eff(AB)
Tr[Λρ]⩾1−ε
(13.369)
ε
= min Dmin (ρ∥σ)
σ∈SEP(AB)
ε
= Dmin (ρ∥SEP) .

Exercise 13.8.1. Using the same notations as above, show that

ε
Distillε (ρ) = log 2Dmin (ρ∥SEP)

(13.370)

Hint: Observe that the optimal m in (13.368) is the floor of the reciprocal of maxσ∈SEP(AB) Tr [Λσ].

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.9. BEYOND LOCC: NPT-ENTANGLEMENT THEORY 629

13.8.3 The Asymptotic Regime

The results obtained in the single-shot regime lead directly to the following expressions for
the cost and distillation of a bipartite state ρ ∈ D(AB) under non-entangling operations:

1
Cost ρAB = lim lim inf LRε ρ⊗n

ε→0 n→∞ n
1 ε (13.371)
Distill ρAB = lim lim sup Dmin ρ⊗n ∥SEP .

ε→0 n→∞ n

From these expressions, it becomes evident that if the generalized quantum Stein’s lemma
(as proposed in Conjecture 11.3.1) is valid, then the distillable entanglement under non-
entangling operations would be equal to the regularized relative entropy of entanglement.
In contrast, regarding the entanglement cost, it is known (as detailed in the section
on ‘Notes and References’ at the end of this chapter) that there are states for which the
entanglement cost is strictly greater than the distillable entanglement. This implies that
even under a broad range of non-entangling operations, the reversibility of mixed state
entanglement is not guaranteed. In essence, this reflects a fundamental asymmetry in the
processes of creating and extracting entanglement from quantum systems.

13.9 Beyond LOCC: NPT-Entanglement Theory

In this subsection, we delve into the quantum resource theory where F(AB) is defined as the
set of PPT states within D(AB), and F(AB → A′ B ′ ) as the set of completely PPT preserving
quantum channels. The exploration of this resource theory is not only intriguing from a
theoretical standpoint but is also driven by the fact that the set of completely PPT preserving
quantum channels encompasses LOCC. Consequently, the entanglement cost and distillation
rates determined under these operations offer lower and upper bounds, respectively, on the
corresponding rates under LOCC.
In the framework of completely PPT-preserving operations, entangled states that exhibit
a positive partial transpose are regarded as free resources. Accordingly, in this resource
theory, the focus is on NPT-entanglement, emphasizing interest in entangled states with a
negative partial transpose (NPT), which are states whose partial transpose is not positive
semidefinite. In the rest of this section, we will use the notation PPT(AB) to refer to the
set of all density matrices in D(AB) that have a positive partial transpose. Thus, we obtain
the following relation:

SEP(AB) ⊂ PPT(AB) ⊂ D(AB) . (13.372)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

630 CHAPTER 13. MIXED-STATE ENTANGLEMENT

13.9.1 The Set of Completely PPT Preserving Operations

Definition 13.9.1. Let E ∈ L(AB → A′ B ′ ) be a linear map. We say that E is PPT

preserving if
E(ρ) ∈ PPT (A′ B ′ ) ∀ ρ ∈ PPT(AB) . (13.373)
′ ′ ′′ B ′′ ′ ′
Moreover, we say the E AB→A B is completely PPT preserving if idA ⊗ E AB→A B is
PPT preserving for every bipartite system A′′ B ′′ .

To characterize the set of completely PPT preserving operations, we begin by defining

the partial transpose of a linear map. Let E ∈ L(AB → A′ B ′ ) be a linear map. The partial
transpose of E is defined as follows:
Γ
E Γ (ω) := E ω Γ ∀ ω ∈ L(AB) , (13.374)

where the superscript Γ indicates partial transpose w.r.t. Bob’s systems (see (13.148)). In the
following exercise you prove some of the key properties of this extension of partial transpose
to linear maps.
Exercise 13.9.1. Let E ∈ L(AB → A′ B ′ ) be a bipartite linear map, E Γ be its partial
′ ′
transpose, and JE := JEABA B be its Choi matrix. Prove the following statements:
1. E is PPT preserving if and only if E Γ is PPT preserving.
2. E is completely PPT preserving if and only if E Γ is completely PPT preserving.
3. E satisfies:
JE Γ = JEΓ , (13.375)
′ ′
where on the right-hand side the superscript Γ denotes the partial transpose of JEABA B
with respect to both B and B ′ .
4. E satisfies: ∗
(E ∗ )Γ = E Γ . (13.376)

Theorem 13.9.1. Let E ∈ CPTP(AB → A′ B ′ ) be a bipartite quantum channel.

Then, the following are equivalent:

1. E is completely PPT preserving.

2. E Γ is a quantum channel.

Proof. We can use 13.375 to observe that E Γ is a quantum channel if and only if JEΓ ⩾ 0.
Thus, it suffices to show that E is completely PPT preserving if and only if its Choi matrix
′ ′
is PPT. To see this, note that the Choi matrix of E AB→A B can be expressed as
′ ′ ′ ′

JEABA B = E ÃB̃→A B Ω(AB)(ÃB̃) , (13.377)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.9. BEYOND LOCC: NPT-ENTANGLEMENT THEORY 631

where Ω(AB)(ÃB̃) is an unnormalized maximally entangled state between system ÃB̃ and
system AB. Furthermore, we can write

Ω(AB)(ÃB̃) = ΩAÃ ⊗ ΩB B̃ , (13.378)

where ΩAÃ and ΩB B̃ are unnormalized maximally entangled states between the respective
systems. Since we take the partial transpose with respect to system B B̃, we get from the
equation above that Ω(AB)(ÃB̃) is PPT with respect to B B̃. Therefore, if E is completely
′ ′ ′ ′
PPT preserving, then its Choi matrix JEABA B must be PPT, since ΩABA B is PPT.
′′ ′′
Conversely, suppose JEΓ ⩾ 0 (i.e. E Γ is a quantum channel) and let ρA B AB be a PPT
state with respect to system B ′′ B. For simplicity of the exposition here, we use the su-
perscript Γ to indicate partial transpose on all systems on Bob’s side (i.e. for ρAB , the
′′ ′′
superscript Γ in ρΓ stands for partial transpose w.r.t. B, and for ρA B AB , the superscript
in ρΓ stands for partial transpose w.r.t. to system B ′′ B). Then,

AB→A′ B ′ A′′ B ′′ AB
Γ
= E Γ ρΓ ⩾ 0 ,

E ρ (13.379)

′ ′ ′′ ′′
since ρΓ ⩾ 0 and E Γ is completely positive. Therefore, the state E AB→A B ρA B AB is
′′ ′′
PPT, and since ρA B AB was an arbitrary PPT state in D(A′′ B ′′ AB) we conclude that E is
completely PPT preserving. This completes the proof.
We denote by PPT(AB → A′ B ′ ) the set of all completely PPT preserving channels in
CPTP(AB → A′ B ′ ).

Exercise 13.9.2. Recall that LOCC(AB → A′ B ′ ) is a subset of SEP(AB → A′ B ′ ); see (12.5).

Show that
SEP(AB → A′ B ′ ) ⊂ PPT(AB → A′ B ′ ) . (13.380)

Exercise 13.9.3. Show that if E ∈ PPT(AB → A′ B ′ ) then there exists a channel N ∈

PPT(AB → A′ B ′ ) such that E = N Γ .

Monotonicity of Entanglement Measures

The above theorem states that a quantum channel E ∈ CPTP(AB → A′ B ′ ) is completely
PPT preserving if and only if its partial transpose E Γ is also completely positive. Using this
property, we can show that measures of entanglement based on the partial transpose behave
monotonically under completely PPT preserving operations. Indeed, for ρ ∈ D(AB) and E ∈
Γ
PPT(AB → A′ B ′ ), we have (E(ρ))Γ = E Γ (ρΓ ), which gives ∥ E(ρ) ∥1 = ∥E Γ (ρΓ )∥1 ⩽ ∥ρΓ ∥1
by DPI and the fact that E Γ is a quantum channel. Therefore, both the negativity and the
logarithmic negativity behave monotonically under completely PPT preserving operations.
To see that Eκ also behaves monotonically, consider an optimal operator Λ ∈ Pos(AB)
that satisfies −ΛΓ ⩽ ρΓ ⩽ ΛΓ and Eκ (ρ) = log Tr[Λ]. Then, for all E ∈ PPT(AB → A′ B ′ ),
we have:
−E Γ ΛΓ ⩽ E Γ ρΓ ⩽ E Γ ΛΓ

(13.381)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

632 CHAPTER 13. MIXED-STATE ENTANGLEMENT

since E Γ is CP. Since (E(ρ))Γ = E Γ (ρΓ ) we get that the above equation is equivalent to
Γ
−Λ′Γ ⩽ E(ρ) ⩽ Λ′Γ where Λ′ := E(Λ) . (13.382)

Therefore, Λ′ is a feasible solution in the optimization problem of Eκ E(ρ) so that

Eκ E(ρ) ⩾ log Tr [Λ′ ] = log Tr[Λ] = Eκ (ρ) .

(13.383)

That is, Eκ behaves monotonically under completely PPT preserving operations.

Exercise 13.9.4. Consider the linear map E ∈ CPTP(AB → AB) with m := |A| = |B|
defined for all ω ∈ L(AB) as

E(ω) := Tr[Λω]uAB + Tr[(I − Λ)ω]ΦAB

m , (13.384)

where
1 Γ
Λ := I AB − ΦAB
m . (13.385)
m+1
1. Show that Λ ∈ Eff(AB).

2. Show that E is PPT preserving (but not necessarily completely PPT preserving).

3. Show that for m > 3 we have N E ρAB W > N ρABW , where ρAB
W is the maximally
entangled Werner state (see (13.30) with α = 1)
1
ρAB I AB − F AB .

W = (13.386)
d(d − 1)

The exercise above demonstrates that the negativity measure, in general, does not exhibit
monotonic behavior under PPT-preserving operations, but as we saw earlier, it does exhibit
monotonic behavior under completely PPT-preserving operations.

13.9.2 The PPT Conversion Distance

In this section, we aim to simplify the calculation of the conversion distance under PPT
operations. Since our focus is on the cost and distillation of entanglement under PPT
PPT
operations, we will only consider the conversion distance T (ρ −−→ σ) when either ρ or σ is
maximally entangled. We begin with the case where σ is maximally entangled.

The PPT Conversion Distance to a Maximally Entangled State

Very similar to Lemma 13.3.1, the conversion distance under completely PPT preserving
operations is given by
h ′ ′ i
AB PPT A′ B ′ A B AB→A′ B ′ AB
T ρ −−→ Φm = 1 − sup Tr Φm E ρ . (13.387)
E∈PPT

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.9. BEYOND LOCC: NPT-ENTANGLEMENT THEORY 633

In order to simplify the expression above for the conversion distance, we will need the
following lemma.

Lemma 13.9.1. Let A, B, A′ , B ′ be four quantum systems m := |A′ | = |B ′ |, and let

Λ ∈ Herm(AB). The following are equivalent:
′B′
1. There exists E ∈ PPT(AB → A′ B ′ ) such that ΛAB = E ∗ ΦA
m .
1
2. Λ ∈ Eff(AB) and ΛΓ ∞
⩽ m
.

Proof. Suppose first that Λ = E ∗ (Φm ) for some E ∈ PPT(AB → A′ B ′ ). Since E is a CPTP
′ ′
map it follows that E ∗ is a unital CP map. Therefore, since 0 ⩽ Φm ⩽ I A B we have
0 ⩽ Λ ⩽ I AB . Moreover, observe that
Γ
Λ = E (Φm ) = (E ∗ )Γ ΦΓm
Γ ∗

∗ Γ
(13.376)→ = E Γ Φm (13.388)
1 ∗ AB
= EΓ F ,
m
where F AB is the flip operator. Since E Γ is also a CPTP map (see Theorem 13.9.1) it follows
∗
that E Γ is a unital CP map. Combining this with the fact that −I AB ⩽ F AB ⩽ I AB
(recall F 2 = I AB ) we conclude that

1 AB 1
− I ⩽ ΛΓ ⩽ I AB , (13.389)
m m
which is equivalent to ΛΓ ∞ ⩽ m1 .
1
Conversely, suppose Λ ∈ Eff(AB) and ΛΓ ∞
⩽ m
. Define the measurement-prepare
channel E ∈ CPTP(AB → A′ B ′ ) as

E (ω) := Tr [Λω] Φm + Tr [(I − Λ)ω] τ ∀ ω ∈ L(AB) , (13.390)

where τ := (I − Φm )/(m2 − 1) ∈ D(A′ B ′ ). Observe that Λ = E ∗ (Φm ) (can you see why?).
The intuition behind the definition above comes from the observation that the optimization
over the PPT channels in (13.387) can be further restricted to channels that satisfy E = G ◦E
which according to (3.257) have the form of the channel above. It is therefore left to show
that E as defined above is a PPT quantum channel.
Indeed, observe that by definition for all ω ∈ L(AB) we have
Γ
E Γ (ω) = E ω Γ
= Tr Λω Γ ΦΓm + Tr (I − Λ)ω Γ τ Γ
(13.391)
The partial transpose
→ = Tr ΛΓ ω ΦΓm + Tr I − ΛΓ ω τ Γ .

is self-adjoint

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

634 CHAPTER 13. MIXED-STATE ENTANGLEMENT

Now, from (C.186) and (C.188) we get that

1 A′ B ′ 1
ΦΓm = F = (ΠSym − ΠAsy ) , (13.392)
m m
and (recall I = ΠSym + ΠAsy )

I − ΦΓm 1 1
τΓ = = ΠSym + ΠAsy . (13.393)
m2 − 1 m(m + 1) m(m − 1)

Substituting these expressions for ΦΓm and τ Γ into (13.391) (and rearranging terms) gives
(see Exercise 13.9.5)

E Γ (ω) = Tr [Λ′ ω] σSym + Tr [(I − Λ′ ) ω] σAsy , (13.394)

where
1 2 2
Λ′ :=

I + mΛ , σSym := ΠSym , σAsy := ΠAsy . (13.395)
2 m(m + 1) m(m − 1)

Hence, since ∥Λ∥∞ ⩽ 1/m we get that Λ′ ∈ Eff(A′ B ′ ) so that E Γ is itself a measurement-
prepare quantum channel. That is, E is indeed a PPT channel. This completes the proof.

Exercise 13.9.5. Give more details for the derivation of (13.394).

We therefore get the following corollary.

Corollary 13.9.1. Using the same notations as above, the PPT conversion distance
to a maximally entangled state is given by
′B′

PPT
T ρAB −−→ ΦA = 1 − max Tr ΛAB ρAB .

m (13.396)
Λ∈Eff(AB)
∥ΛΓ ∥∞ ⩽ m1

Exercise 13.9.6. Use the lemma above to prove the corollary.

The PPT Conversion Distance from a Maximally Entangled State

The conversion distance from a maximally entangled state is given by
′B′ PPT
1 A′ B ′ →AB
′ ′
T ΦA
m −−→ ρ AB
= AB
min ρ − E ΦA
m
B
. (13.397)
2 E∈PPT 1

′ ′ ′B′
In the next lemma we provide a characterization of the density matrix E A B →AB ΦA
m .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.9. BEYOND LOCC: NPT-ENTANGLEMENT THEORY 635

Lemma 13.9.2. Let A, B, A′ , B ′ be four quantum systems m := |A′ | = |B ′ |, and let

σ ∈ D(AB). The following are equivalent:
′B′
1. There exists E ∈ PPT(AB → A′ B ′ ) such that σ AB = E ΦA
m .

2. There exists ω ∈ D(AB) such that

(1 − m)ω Γ ⩽ σ Γ ⩽ (1 + m)ω Γ . (13.398)

Remark. The condition presented in equation (13.398) leads to the implication that (m +
1)ω Γ ⩾ (1 − m)ω Γ . This inequality can be simplified and equivalently restated as mω Γ ⩾ 0.
Thus, the partial transpose of ω is positive semidefinite, so that ω ∈ PPT(AB).
Proof. Let G ∈ CPTP(A′ B ′ → A′ B ′ ) be the twirling channel as defined in (3.251). Recall
that G is an LOCC channel, and thus it is completely PPT preserving. If σ = E(Φm ) for
some PPT channel E, we can assume without loss of generality that E = E ◦ G. Otherwise,
we can replace E with E ′ = E ◦ G, which has the desired property. From (3.256) it then
follows that for all η ∈ L(AB)

E(η) = Tr [ηΦm ] σ + Tr [(I − Φm ) η] ω , (13.399)

where we replaced ω1 in (3.256) with E(Φm ) = σ and renamed the density matrix ω2 as
ω ∈ D(AB). The partial transpose of E is given for all η ∈ L(AB) as

E Γ (η) = Tr η Γ Φm σ Γ + Tr (I − Φm ) η Γ ω Γ

Partial transpose
Γ Γ (13.400)
I − ΦΓm η ω Γ .

is self-adjoint → = Tr ηΦm σ + Tr

We next use (13.392) to express ΦΓm in terms of the symmetric and antisymmetric projectors.
Hence,
1 1
E Γ (η) = Tr [η (ΠSym − ΠAsy )] σ Γ + Tr (m − 1)ΠSym + (m + 1)ΠAsy η ω Γ

m m (13.401)
1 1
= Tr [ηΠSym ] σ Γ + (m − 1) ω Γ + Tr [ηΠAsy ] (1 + m) ω Γ − σ Γ .

m m
Hence, E Γ is completely positive if and only if the matrices σ and ω satisfy (13.398). This
completes the proof.
From the lemma above it follows that the conversion distance can be expressed as

′ ′
A B PPT AB
1 Γ Γ Γ
T Φm −−→ ρ = min ∥ρ − σ∥1 : (1 − m)ω ⩽ σ ⩽ (1 + m)ω . (13.402)
ω,σ∈D(AB) 2
Now that we have obtained formulas for the two types of PPT-conversion distances in
this subsection, we can use them for the operational tasks of entanglement distillation and
entanglement cost.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

636 CHAPTER 13. MIXED-STATE ENTANGLEMENT

13.9.3 Distillation of NPT Entanglement

In this subsection, we express the single-shot distillable NPT-entanglement in terms of the
hypothesis testing divergence and show that it can be computed using an SDP program. We
then use it to derive a tight upper bound on the asymptotic distillable NPT-entanglement,
known as Rains’ bound. Since the distillable NPT-entanglement is not smaller than the
LOCC distillable entanglement, we can conclude that this upper bound also applies to the
distillable entanglement under LOCC.

Single-Shot Distillable NPT-Entanglement

For every ε ∈ (0, 1), the ε-single-shot distillable NPT-entanglement of a bipartite state
ρ ∈ D(AB) is defined as
n ′B′
o
PPT
Distillε ρAB = sup log m : T ρAB −−→ ΦA

m ⩽ ε . (13.403)
m∈N

Theorem 13.9.2. Let ρ ∈ D(AB) and ε ∈ (0, 1). Then, the ε-single-shot distillable
NPT-entanglement is given by
j ε AB AB k
Distillε ρAB = min log 2Dmin (ρ ∥η ) ,

(13.404)
∥η Γ ∥1 ⩽1
η∈Herm(AB)

where the definition of the hypothesis testing divergence above has been extended to
operators that are not necessarily density matrices.

Proof. Using the expression for the conversion distance given in (13.396) we get
n 1 o
Distillε (ρ) = sup
log m : Tr [Λρ] ⩾ 1 − ε, ΛΓ ∞ ⩽ , Λ ∈ Eff(AB)
m∈N m

1
n 1 o
m= −−−−→ = max log : Tr [Λρ] ⩾ 1 − ε, Λ ∈ Eff(AB)
∥ΛΓ ∥∞ ∥ΛΓ ∥∞
(13.405)
Now, from Exercise 2.3.22 we have that

ΛΓ Tr ΛΓ η

∞
= max
∥η∥1 ⩽1
η∈Herm(AB)
(13.406)
Tr Λη Γ

Γ is self-adjoint→ = max
∥η∥1 ⩽1
η∈Herm(AB)

We therefore get that the minimal value that ΛΓ ∞

can take under the constraints given

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.9. BEYOND LOCC: NPT-ENTANGLEMENT THEORY 637

in (13.405) is given by:

ΛΓ Tr Λη Γ

min ∞
= min max
Tr[Λρ]⩾1−ε Tr[Λρ]⩾1−ε ∥η∥1 ⩽1
Λ∈Eff(AB) Λ∈Eff(AB) η∈Herm(AB)

Tr Λη Γ

minmax theorem→ = max min
∥η∥1 ⩽1 Tr[Λρ]⩾1−ε
η∈Herm(AB) Λ∈Eff(AB)
(13.407)
2−Dmin (ρ∥η )
ε Γ
= max
∥η∥1 ⩽1
η∈Herm(AB)
ε
= max 2−Dmin (ρ∥η) .
∥η Γ ∥1 ⩽1
η∈Herm(AB)

Substituting this into (13.405) we obtain (13.404). This completes the proof.
Exercise 13.9.7. Show that the ε-single shot distillable NPT-entanglement can be be com-
puted with an SDP program and find the dual problem of (13.404).
Exercise 13.9.8. Let ε ∈ (0, 1), ρ ∈ D(AB), and η ∈ Herm(AB). Show that for all
E ∈ CPTP(AB → A′ B ′ ) we have
ε ε

Dmin E(ρ) E(η) ⩽ Dmin (ρ∥η) . (13.408)
ε
That is, the DPI with Dmin still holds even if η is not positive semidefinite.
For states that have a certain symmetry, the optimization problem given in (13.404)
can be performed analytically. For example, consider the Werner state, ρAB
W , as defined
in (13.29). This state is invariant under the twirling map G ∈ CPTP(AB → AB) as defined
in (7.199). Let η ∈ Herm(AB) be an optimal matrix such that
Distillε ρAB ε
ρAB η AB .

W = Dmin W (13.409)
Due to the invariance property of ρAB
W we have
ε ε

Dmin (ρW ∥η) ⩾ Dmin G(ρW ) G(η)
ε
(13.410)
= Dmin ρW G(η) .
Moreover, since G ∈ PPT(AB → AB) and ∥η Γ ∥1 ⩽ 1 we get that also ζ := G(η) satisfies
ζ Γ 1 = G Γ ηΓ 1

(13.411)
DPI →→ ⩽ η Γ 1 ⩽ 1 .
ε ε
Therefore, since η was optimal, we must have Dmin (ρW ∥η) = Dmin (ρW ∥ζ), which means that
ζ is also optimal. To summarize, without loss of generality, we can restrict the optimization
in (13.404) to Hermitian matrices η ∈ Herm(AB) that satisfy both ∥η Γ ∥1 ⩽ 1 and G(η) = η.
This additional condition implies that η can be written as a linear combination of I AB and
F AB , or equivalently, η Γ can be expressed as
η Γ = aΦAB AB
m + bτm , (13.412)
AB :=
for some a, b ∈ R, and τm (I AB − ΦAB 2
m )/(m − 1).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

638 CHAPTER 13. MIXED-STATE ENTANGLEMENT

Exercise 13.9.9. Let ρAB

W be the Werner state with α > m1 (i.e., ρAB
W is entangled). Show
that for m > 2
ε AB
m(m + α) − 2
Distill ρW = log . (13.413)
m(m − α)(1 − ε)
Hint: Optimize over a, b ∈ R that appear in (13.412) and note that since ΦAB AB
m τm = 0, we
get η Γ 1 = |a| + |b| , so the condition ∥η Γ ∥1 ⩽ 1 becomes equivalent to |a| + |b| ⩽ 1.
In the optimization problem in (13.404), the operator η ∈ Herm(AB) does not have to
be positive semidefinite. However, if we restrict it to be in Pos(AB) (instead of Herm(AB))
we get the following upper bound.

Corollary 13.9.2. Let ρ ∈ D(AB) and ε ∈ (0, 1). Then, the ε-single-shot distillable
NPT-entanglement is bounded from above by
n o
Distillε ρAB ⩽ min ε

Dmin (ρ∥σ) + LN(σ) , (13.414)
σ∈D(AB)

where LN stands for the logarithmic negativity as defined in (13.164).

Proof. By removing the floor function, and replacing Herm(AB) in the right-hand side
of (13.404) with the smaller set Pos(AB), we get
Distillε (ρ) ⩽ min ε
Dmin (ρ∥η)
∥η Γ ∥1 ⩽1
η∈Pos(AB)
n o
ε
η = tσ −−−−→ = min Dmin (ρ∥σ) − log t (13.415)
t∥σ Γ ∥1 ⩽1
σ∈D(AB), t⩾0
n o
Taking the largest ε Γ
possible t=1/∥σ Γ ∥1 → = min Dmin (ρ∥σ) + log ∥σ ∥1 .
σ∈D(AB)

Finally, observe that the second term on the right-hand side of the equation above is the
logarithmic negativity of σ AB as defined in (13.164). This completes the proof.

Asymptotic Distillable NPT-Entanglement

Using Corollary 13.9.2, we can get the following upper bound on the distillable NPT-
entanglement.

Corollary 13.9.3. Let ρ ∈ D(AB). Then, the distillable NPT-entanglement is

bounded from above by
n o
Distill ρAB ⩽ min

D (ρ∥σ) + LN(σ) . (13.416)
σ∈D(AB)

The right-hand side is known as the Rains’ bound.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.9. BEYOND LOCC: NPT-ENTANGLEMENT THEORY 639

Proof. The proof follows directly from a combination of Corollary 13.9.2 and the quantum
Steins’ lemma given in (8.211). Explicitly, let ε ∈ (0, 1) and σ ∈ D(AB). Then, from
Corollary 13.9.2 we get

1 ε ⊗n
1 ε ⊗n ⊗n
1 ⊗n

lim sup Distill ρ ⩽ lim sup D ρ σ + LN σ
n→∞ n n→∞ n min n (13.417)
(8.211) + Additivity of LN→ = D(ρ∥σ) + LN(σ) .

Since the inequality above holds for all σ ∈ D(AB) we can take the minimum over all density
matrices so that
1 n o
lim sup Distillε ρ⊗n ⩽ min

D (ρ∥σ) + LN(σ) . (13.418)
n→∞ n σ∈D(AB)

Note that the inequality above is in fact stronger than (13.416) in the sense that it holds for
all ε ∈ (0, 1). This completes the proof.

Exercise 13.9.10. Show that the Rains bound is a measure of entanglement and in particular
does not increase under completely PPT preserving operations.

Exercise 13.9.11. Prove that the Rains’ bound for a pure bipartite state ψ ∈ Pure(AB)
is equal to the entropy of entanglement E(ψ AB ), which means that on pure bipartite states,
the Rains’ bound is equal to the distillable entanglement. Hint: Take a look at the proof of
Theorem 13.2.3.

13.9.4 Cost of NPT Entanglement

Using the conversion distance provided in (13.402), we can express the ε-single-shot NPT-
entanglement cost as

Costε ρAB = min log(m) : (1 − m)ω Γ ⩽ σ Γ ⩽ (1 + m)ω Γ , ω ∈ D(AB), σ ∈ Bε (ρ)

m∈N
(13.419)
The expression above can be simplified by replacing 1 ± m with ±m. This change does not
change much the entanglement cost.

Theorem 13.9.3. For all ε ∈ (0, 1) and all ρ ∈ D(AB) we have

ε AB ε AB
log2 2Eκ (ρ ) − 1 ⩽ Costε ρAB ⩽ log2 2Eκ (ρ ) + 1 ,

(13.420)

where
Eκε ρAB = ′ min Eκε ρ′AB

(13.421)
ρ ∈Bε (ρ)

is the smoothed version of the κ-entanglement defined in (13.167).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

640 CHAPTER 13. MIXED-STATE ENTANGLEMENT

Proof. Recall from Exercise 11.1.5 that

Costε (ρ) = ′ min Costε=0

F (ρ′ ) (13.422)
ρ ∈Bε (ρ)

(this can also be verified directly from the expression in (13.419)). Therefore, it is sufficient
to prove the lemma for the case ε = 0. For ε = 0 the entanglement cost given in (13.419)
takes the form

Costε=0 ρAB = min log(m) : (1 − m)ω Γ ⩽ ρΓ ⩽ (1 + m)ω Γ , ω ∈ D(AB) . (13.423)

m∈N

Now, observe that every m ∈ N and ω ∈ PPT(AB) that satisfy (1 − m)ω Γ ⩽ ρΓ also satisfy
−(1 + m)ω Γ ⩽ ρΓ . Therefore, we get that

Costε=0 ρAB ⩾ min log(m) : −(1 + m)ω Γ ⩽ ρΓ ⩽ (1 + m)ω Γ , ω ∈ D(AB)

m∈N

log Tr [Λ] − 1 : −ΛΓ ⩽ ρΓ ⩽ ΛΓ

Λ := (1 + m)ω −−−−→ = min
Λ∈Pos(AB)

Eκ (ρAB )
(13.167)→ = log2 2 −1 .
(13.424)
Similarly, for the other inequality observe that every m ∈ N and ω ∈ PPT(AB) that satisfy
ρΓ ⩽ (m − 1)ω Γ also satisfy ρΓ ⩽ (m + 1)ω Γ . Therefore, we get that

Costε=0 ρAB ⩽ min log(m) : −(1 − m)ω Γ ⩽ ρΓ ⩽ (1 − m)ω Γ , ω ∈ D(AB)

m∈N

log Tr [Λ] + 1 : −ΛΓ ⩽ ρΓ ⩽ ΛΓ

Λ := (m − 1)ω −−−−→ = min
Λ∈Pos(AB)

Eκ (ρAB )
(13.167)→ = log2 2 +1 .
(13.425)
This completes the proof.
The theorem above demonstrates that the single-shot NPT-entanglement cost is essen-
tially given by the smoothed version of the κ-entanglement. From the bounds in (13.168) it
follows that
LNε ρAB ⩽ Eκε ρAB ⩽ min Dmax |ω Γ | σ .

(13.426)
σ∈PPT(AB)
ω∈Bε (ρ)

While it is true that the lower and upper bounds above may appear to be simpler than
computing the Eκε directly using an SDP program, it is important to note that this may not
always be the case. In fact, in many instances, computing the bounds may require solving
non-trivial optimization problems themselves, and as such may not necessarily be any easier
to compute than the original quantity Eκϵ . Therefore, while the bounds can be a useful tool
for gaining insight into the behavior of Eκϵ , they should not be relied upon exclusively as a
substitute for computing the quantity directly using an SDP program. Moreover, it is not
clear to the author how these bounds can be used in deriving computable bounds for the
asymptotic NPT-entanglement cost.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.10. NOTES AND REFERENCES 641

The Asymptotic NPT-Entanglement Cost

From the lower and upper bounds given in (13.420) we get the following expression for the
NPT-entanglement cost.

Theorem 13.9.4. Let ρ ∈ D(AB). The NPT-entanglement cost of ρAB is given by

1
Cost ρAB = lim+ lim Eκϵ ρ⊗n .

(13.427)
ε→0 n→∞ n

Exercise 13.9.12. Prove the theorem above, and in particular show that the limit
1 ϵ ⊗n
lim Eκ ρ (13.428)
n→∞ n

exists for all ρ ∈ D(AB) and all ε ∈ (0, 1).

From the theorem above it follows that

Cost ρAB ⩽ Eκ ρAB ,

(13.429)

since for all n ∈ N and all ε ∈ (0, 1) we have

Eκϵ ρ⊗n ⩽ Eκ ρ⊗n

(13.430)
Eκ is subadditive→ ⩽ nEκ (ρ) .

Therefore, Eq. (13.429) provides a computable upper bound on the NPT-entanglement cost.

13.10 Notes and References

For those interested to learn more about entanglement theory, several comprehensive reviews
are available. Two recommended sources are [133] and [181]. If you’re specifically inter-
ested in entanglement detection and entanglement witnesses, we suggest checking out [105]
and [51]. For a study of the entanglement properties of isotropic states and reduction cri-
teria, we recommend [126]. The Werner state, a key example in entanglement theory, was
first introduced in [231].
In 1996, the positive partial transpose (PPT) criterion, was proposed in [177]. In the same
year, the proof that a state is separable if and only if it is PPT (as stated in Theorem 13.1.4)
was given in [127]. Another useful technique for constructing PPT entangled states is to use
unextendible product bases (UPB), as demonstrated in the example given in (13.37), which
was first proposed by [22].
The realignment criterion, first introduced in [44], and was developed further in [194]. If
you’re interested in the k-extendability criterion, take a look at the original paper [66].
To learn more about entanglement monotones and convex roof extensions, please refer
to [224]. The closed formulas for the entanglement of formation (13.87) and the concurrence

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

642 CHAPTER 13. MIXED-STATE ENTANGLEMENT

of formation (13.90) were discovered by [235]. The characterization of pure-to-mixed state

conversions, as given in Corollary 13.2.1 and (13.116), can be attributed to [239]. The
Schmidt rank measure of mixed-state entanglement was initially introduced in [207], and its
interpretation as the maximal extension from the Schmidt rank on pure states was introduced
in [98].
The concept of relative entropy of entanglement was initially introduced in [217] (also
see the review article [216]). The robustness of entanglement was first introduced in [225].
The negativity was first introduced in [222], its logarithmic version in [8, 180], and the κ-
entanglement in [229]. The Squash entanglement, also referred to as the CMI entanglement,
was initially mentioned in Eq.(42) of [211]. However, it was later proven to be an additive
measure of entanglement in [50]. More recently, in [33], it was shown that the squash
entanglement is a faithful measure of entanglement.
The study of single-shot distillable entanglement was first conducted in [39] using the
quantum spectrum method, which we did not employ in this book. Instead, we utilized other
techniques such as the decoupling theorem to obtain the bounds presented in Theorem 13.4.2
and Theorem 13.4.1.
The first protocols for asymptotic distillation of mixed states were introduced in the
seminal paper [21]. The (asymptotic) hashing bound (13.283) and the formula for one-way
distillable entanglement, as presented in Theorem 13.5.2, were discovered in [63].
Bound entanglement was discovered in [128]. Since then, extensive research has been
conducted on bound entanglement, with various applications in quantum cryptography, such
as those presented in [124], and in distributing entanglement in quantum networks, such as
those outlined in [163].
The study of single-shot entanglement cost was conducted in [34], where tight lower and
upper bounds for single-shot entanglement cost were identified. Here, in Theorem 13.6.1,
we have slightly improved upon those findings by providing the precise expression for the
single-shot entanglement cost. The proof for the asymptotic entanglement cost, specifically
in Theorem 13.7.1, was presented in [116].
The additivity of entanglement of formation under tensor products had been an unre-
solved problem since the mid-90s, and it was discovered to be related to three other conjec-
tures, namely the classical capacity of a quantum channel, and the minimum entropy output
of a quantum channel, with the latter being the easiest to approach. In a remarkable develop-
ment [115], it was established that the minimum entropy output of a quantum channel is not
additive, disproving all conjectures, including the additivity of entanglement of formation.
Nevertheless, an explicit example that demonstrates the non-additivity of entanglement of
formation is still missing.
Theorem 13.7.2 and the concept of an ‘entanglement breaking subspace’ were first intro-
duced in [221]. More recent developments on this topic can be found in [240].
The concept of non-entangling operations (i.e., operations beyond LOCC) was introduced
in [35] for the asymptotic regime, and further developed for the single-shot regime in [34].
More recently, in [144], it was demonstrated that the entanglement cost can be strictly greater
than the distillable entanglement under non-entangling operations, thereby illustrating that

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

13.10. NOTES AND REFERENCES 643

entanglement theory is not reversible even under the broad set of non-entangling operations.
Completely PPT preserving operations (sometimes referred to as PPT operations) were
introduced in [185]. In the same work, among many other findings, the Rains’ bound (13.416)
on distillable NPT-entanglement was discovered. The monotonicity of negativity and log-
arithmic negativity under PPT operations was proven in [180]. The expression presented
in (13.404) for the one-shot distillable NPT-entanglement was initially discovered in [76] (see
also [188] for additional results on distillation beyond LOCC). Lastly, the NPT-entanglement
cost was first studied in [8] and developed further in [229].

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

644 CHAPTER 13. MIXED-STATE ENTANGLEMENT

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

CHAPTER 14

Multipartite Entanglement

Thus far, our focus has been on entanglement that is shared solely between two parties.
However, entanglement is not restricted to bipartite systems and can exist among any number
of parties. In this section, we will examine the properties of multipartite entanglement,
comparing and contrasting it with bipartite entanglement. It’s important to note that the
theory of multipartite entanglement can be quite complex. Therefore, in this chapter, we
will restrict ourselves to pure multipartite states, and concentrate on simpler cases involving
three and four qubits in greater detail.

14.1 Stochastic LOCC (SLOCC)

Consider a Hilbert space An that is composed of n subsystems, denoted as A1 , A2 , · · · , An .
In the previous chapter, we discussed the bipartite case where n = 2, and showed that the
Nielsen majorization theorem can be utilized to determine state-conversion under LOCC.
However, as we will see in the upcoming sections, this theorem cannot be applied for n > 2,
and even for two pure states ψ and ϕ that belong to Pure(An ), deterministic conversion
from ψ to ϕ under LOCC is almost always impossible, unless the states are related by
local unitaries. Hence, the majority of the research on multipartite entanglement has been
dedicated to studying stochastic interconversions.
For two pure states ψ and ϕ belonging to Pure(An ), we say that ψ can be converted to ϕ
SLOCC
via stochastic LOCC, or SLOCC, denoted as ψ −−−−→ ϕ, if there exists a matrix Mx ∈ L(Ax )
for each x ∈ [n] such that Mx∗ Mx ⩽ I Ax and

|ϕ⟩ = M1 ⊗ · · · ⊗ Mn |ψ⟩; . (14.1)

The relation above implies that ψ can be converted into ϕ through local measurements with
some non-zero probability, since each Mx can be considered as one Kraus element of a local
generalized measurement on system Ax .
The set of all states |ϕ⟩ in An that can be obtained from |ψ⟩ as in (14.1) is called the
SLOCC class of ψ. The SLOCC class of ψ comprises two types of states: those that can be

645
646 CHAPTER 14. MULTIPARTITE ENTANGLEMENT

written in the form (14.1) with invertible matrices M1 , . . . , Mn , and those in which at least
one of the matrices Mx is non-invertible. In the former case, ψ can be converted to ϕ and
vice versa using SLOCC, whereas in the latter, the resulting state |ϕ⟩ cannot be converted
back to ψ via SLOCC.
If |Ax | = 2 for some x ∈ [n] (i.e. the x-th subsystem is a qubit) and Mx is non-invertible,
then the resulting state |ϕ⟩ is a product state between the qubit system Ax and the remaining
n − 1 subsystems. To demonstrate this, assume x = 1 for simplicity, so that M1 is a 2 × 2
non-invertible matrix. Since M1 is a rank-one matrix, it can be written as M1 = |u⟩⟨v| where
|u⟩ is an unnormalized vector in A1 , and |v⟩ is a normalized vector in A1 . The state |ψ⟩ can
be expressed as
n
|ψ⟩A = a|v⟩A1 |ψ1 ⟩A2 ···An + b|v ⊥ ⟩A1 |ψ2 ⟩A2 ···An ; , (14.2)
where a, b ∈ C, |v ⊥ ⟩ ∈ A1 is an orthogonal vector to |v⟩, and ψ1 and ψ2 are some pure states
in A2 · · · An . Thus,
n
M1 ⊗ M2 ⊗ · · · ⊗ Mn |ψ⟩A = a|u⟩A1 ⊗ M2 ⊗ · · · ⊗ Mn |ψ1 ⟩A2 ···An ; ,

(14.3)

which is a product state between system A1 and system A2 · · · An .

Exercise 14.1.1. Let ψ, ϕ ∈ Pure(An ) and suppose there exists matrices M1 , . . . , Mn such
that (14.1) holds. Let B := A2 · · · An , d := |A1 |, and suppose det(M1 ) = 0. Show that

SR ϕA1 B ⩽ d − 1 ,

(14.4)

where we view ϕA1 B as a bipartite state between A1 and B.

As we observed in the previous exercise, if any of the matrices M1 , . . . , Mn is non-

invertible, then the resulting state |ϕ⟩ belongs to a class of states with lower Schmidt rank.
Hence, we can categorize n-partite entanglement in two stages:

1. Determine the n Schmidt ranks between each subsystem and the other n−1 subsystems.

2. Classify the n-partite entanglement based on a fixed set of n Schmidt ranks obtained
in the first step.

It is worth noting that for the second step mentioned above, we only need to consider
reversible SLOCC conversions where all matrices M1 , . . . , Mn in (14.1) are invertible. Hence,
states ψ, ϕ ∈ Pure(An ) belong to the same reversible SLOCC class if and only if there exists
a matrix
M ∈ GLn := GL(A1 ) × · · · × GL(An ) , (14.5)
such that |ϕ⟩ = M |ψ⟩. Here, for each x ∈ [n], the set GL(Ax ) represents the group of
invertible matrices in L(Ax ). It is also noteworthy that M takes the form of M1 ⊗ · · · ⊗ Mn ,
where Mx ∈ GL(Ax ) for each x ∈ [n]. Please note that our notation GLn does not explicitly
specify the system An = A1 · · · An . However, in the rest of this chapter we will assume that
the context makes it clear which system we are referring to.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

14.2. SL-INVARIANT POLYNOMIALS 647

With these observations, we can use certain tools from representation theory to charac-
terize the reversible SLOCC class of |ψ⟩; i.e., the set of states M |ψ⟩. To achieve this, we
begin by relaxing the normalization condition that M |ψ⟩ = 1 and allowing each Mx to
vary over any element of SL(Ax ). The group SL(Ax ) is a subgroup of GL(Ax ) with the
property that the determinant of its elements is one. The limitation to this group can only
affect the normalization of the states in M |ψ⟩. Therefore, we will consider the “orbit” of ψ
with respect to the group

SLn := SL(A1 ) × · · · × SL(An ) (14.6)

Mathematically, the SL-orbit of a pure state ψ ∈ Pure(An ) is defined as

n o
n n
SL |ψ⟩ := M |ψ⟩ : M ∈ SL ; . (14.7)

By working with SL-orbits rather than GL-orbits, we can classify multipartite entanglement
using SL-invariant polynomials.

14.2 SL-Invariant Polynomials

Let us recall the bilinear form defined in (13.91). One of its properties (as shown in Exer-
cise 14.2.4) is that for any two-qubit pure states |ψ⟩ and |ϕ⟩

M |ψ⟩, M |ϕ⟩ = |ψ⟩, |ϕ⟩ ∀ M ∈ SL2 . (14.8)

The function f : C2 ⊗ C2 → C defined by

f (|ψ⟩) := |ψ⟩, |ψ⟩ ∀ ψ ∈ C2 ⊗ C2 , (14.9)

P
is a polynomial in the coefficients of |ψ⟩ := x,y∈{0,1} axy |xy⟩. Specifically, it is the polyno-
mial
f (|ψ⟩) = a00 a11 − a01 a10 . (14.10)
This polynomial is invariant under the group SL2 ; that is, f (M |ψ⟩) = f (|ψ⟩) for any M ∈
SL2 . Hence, we refer to it as an SL-invariant polynomial. Note that if f (|ψ⟩) ̸= 0, then ψ is
not a product state, and furthermore, all states in the orbit SL2 |ψ⟩ are not product states.
On the other hand, if |ϕ⟩ ̸∈ SL2 |ψ⟩, then |ϕ⟩ is necessarily a product state. Therefore, in the
case of two qubits, there exist precisely two classes of states: SL2 |00⟩ and SL2 |ψ⟩ for some
two-qubit non-product state |ψ⟩ (i.e., f (|ψ⟩) ̸= 0). Similarly, as we will see shortly, for more
general multipartite systems, SL-invariant polynomials can be used to classify all reversible
SLOCC classes.

Definition 14.2.1. A polynomial f : An → C is called SL-invariant polynomial, or

in short SLIP, if for any M ∈ SLn and any ψ ∈ Pure(An ) we have f (M |ψ⟩) = f (|ψ⟩).
The set of all SLIPs on An with f (0) = 0 is denoted by SLIP(An ).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

648 CHAPTER 14. MULTIPARTITE ENTANGLEMENT

Remark. The condition that f (0) = 0 is a convention that we will adopt to eliminate trivial
SLIPs that are constant for all vectors in An . Furthermore, we will see shortly that this
convention implies that SLIPs vanish on product states.
The set of all SLIPs forms a vector space over C. Additionally, the following exercise
reveals that this vector space has a basis consisting of homogeneous SLIPs. Therefore, we
will concentrate on homogeneous SLIPs of some fixed degree k ∈ N. For instance, the SLIP
in (14.10) is homogeneous of degree 2 since it satisfies f (c|ψ⟩) = c2 f (|ψ⟩) for any c ∈ C. The
dimension of the space of all homogeneous SLIPs of a fixed degree k is finite, but, as we will
see, it grows exponentially with n.

Exercise 14.2.1. Show that the vector space of SLIPs has a basis consisting of homogeneous
SLIPs.

Exercise 14.2.2. Let AB be a bipartite system with d := |A| = |B|. Show that the function

f (|ψ⟩) = det(M ) ∀ |ψ⟩ = M ⊗ I B |ΩAB ⟩ , (14.11)

is an homogeneous SLIP of degree d.

The degrees of homogeneous SLIPs have a close relationship with the local dimensions
of the subsystems. To understand this connection, consider a multipartite system An =
A1 · · · An , where mx := |Ax | for each x ∈ [n]. Suppose fk : An → C is a homogeneous SLIP
of degree k ∈ N. Note that if c ∈ C satisfies cmx = 1 for some x ∈ [n], then the matrix
n
cI Ax has a determinant of one, implying that cI A ∈ SLn . Hence, we obtain the following
relationship: for every |ψ⟩ ∈ An
n
fk (|ψ⟩) = fk cI A |ψ⟩ = ck fk (|ψ⟩) ,

(14.12)

where the first equality is due to the SL-invariance property of fk , and the second equality is
due to the homogeneity of fk . Therefore, as long as fk is not the zero polynomial it follows
that ck = 1 for any complex number c that satisfies cmx = 1. This means that mx must
divide k. Since this holds for all x we can conclude that k is divisible by the least common
multiple r := lcm(m1 , . . . , mn ).
n
Exercise 14.2.3. Let |ψ⟩ ∈ An be a product state; i.e. |ψ⟩A = |ψ1 ⟩A1 ⊗ · · · ⊗ |ψn ⟩An . Show
that for any f ∈ SLIP(An ) we have f (|ψ⟩) = 0.
SLOCC
Remember that for ψ, ϕ ∈ Pure(An ), the relation ψ −−−−→ ϕ holds if there exists a
matrix Mx ∈ L(Ax ) for every x ∈ [n] such that both Mx∗ Mx ⩽ I Ax and |ϕ⟩ = M |ψ⟩ are
satisfied, where M := M1 ⊗ · · · ⊗ Mn . The subsequent lemma establishes that if any of the
matrices in the set {Mx }x∈[n] exhibits rank deficiency, then any SLIP will be nullified for |ϕ⟩.

Lemma 14.2.1. Let f : An → C be a SLIP and |ψ⟩ and |ϕ⟩ be as above. If there
exists x ∈ [n] such that det(Mx ) = 0 then f (|ϕ⟩) = 0.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

14.2. SL-INVARIANT POLYNOMIALS 649

Proof. Since every SLIP can be expressed as a linear combination of homogeneous SLIPs,
we will assume without loss of generality that f is a homogeneous SLIP of degree k ∈ N.
Denote by d := |An |, and for every ε ∈ [0, 1), define Mε := M1 (ε) ⊗ · · · ⊗ Mn (ε), where each
Mx (ε) is a slight perturbation of Mx ensuring that det(Mx (ε)) ̸= 0 for all ε ∈ (0, 1). As a
1/d
consequence, µε := det(Mε ) ̸= 0 for all ε ∈ (0, 1), which implies Nε := Mε /µε is an element
of SLn . By definition, we can express:

f (Mε |ψ⟩) = f µ1/d

ε Nε |ψ⟩

f is homogeneous of degree k −−−−→ = µk/d

ε f (Nε |ψ⟩) (14.13)
f ∈ SLIP(An ) , Nε ∈ SLn −−−−→ = µk/d
ε f (|ψ⟩) .

Upon taking the limit as ε → 0+ on both sides and noting that M = limε→0+ Mε and
limε→0+ µε = det(M ) = 0, we infer that f (M |ψ⟩) = 0, or equivalently, f (|ϕ⟩) = 0. This
concludes the proof.

14.2.1 SLIPs of n-Qubits

SLIPs, in general, involve cumbersome expressions. However, for systems of n-qubits, there
exists a broad class of slips that can be expressed elegantly due to the property (C.13) of
the symplectic matrix J2 := −iσ2 = |0⟩⟨1| − |1⟩⟨0|. Recall the bilinear form ·, · as defined
in (13.91). This form can be extended to any number of qubits. Specifically, for any n ∈ N,
we define the bilinear form ·, · n : An × An → C as follows:
n n n n
ψ A , ϕA n
:= ψ̄ A J2 ⊗ · · · ⊗ J2 ϕA ∀ |ψ⟩, |ϕ⟩ ∈ An , (14.14)
| {z }
n−times

where for convenience we replaced the second Pauli matrix σ2 that appear in (13.91) with
J2 := −iσ2 = |0⟩⟨1| − |1⟩⟨0|.

Exercise 14.2.4. Consider the bilinear form in (14.14).

1. Use the relation (C.13) to show that for any M ∈ SLn , and any vectors |ψ⟩, |ϕ⟩ ∈ An

M |ψ⟩, M |ϕ⟩ n
= |ψ⟩, |ϕ⟩ n
. (14.15)

2. Show that for any odd integer n ∈ N we have

|ψ⟩, |ψ⟩ n
=0. (14.16)

The bilinear form defined above can be used to define SLIPs on systems of n qubits. For
an even number of qubits, it follows from the exercise above that

f (|ψ⟩) := |ψ⟩, |ψ⟩ n

, (14.17)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

650 CHAPTER 14. MULTIPARTITE ENTANGLEMENT

is an homogeneous SLIP of degree two.

For an odd number of qubits, one can define a SLIP of degree four as follows: For any
|ψ⟩ ∈ An we write
n
|ψ A ⟩ = |0⟩A1 |ψ0 ⟩A2 ···An + |1⟩A1 |ψ1 ⟩A2 ···An (14.18)
where |ψ0 ⟩, |ψ1 ⟩ ∈ A2 · · · An . The basis |0⟩A1 and |1⟩A1 is chosen such that J2 has the standard
form. With these choices the determinant
 
|ψ0 ⟩, |ψ0 ⟩ n−1 |ψ0 ⟩, |ψ1 ⟩ n−1
g4 (|ψ⟩) := det   (14.19)
|ψ1 ⟩, |ψ0 ⟩ n−1 |ψ1 ⟩, |ψ1 ⟩ n−1

is an homogeneous SLIP of degree 4. To see that the above function is a SLIP, observe that
A2 ···An
(n) (n−1) n (n)

for any M = M1 ⊗ M ∈ SL , we have g4 M |ψ⟩ =g4 M 1⊗I |ψ⟩ since
a b
·, · n−1
is invariant under the action of M (n−1) . Now, let M1 =   and observe that:
c d

M1 ⊗ I A2 ···An |ψ⟩ = a|0⟩ + c|1⟩ |ψ0 ⟩ + b|0⟩ + d|1⟩ |ψ1 ⟩

(14.20)
= |0⟩ a|ψ0 ⟩ + b|ψ1 ⟩ + |1⟩ c|ψ0 ⟩ + d|ψ1 ⟩ .

For each x, y ∈ {0, 1} we denote by µxy := |ψx ⟩, |ψy ⟩ n−1

. With these notations

g4 M1 ⊗ I A2 ···An |ψ⟩

 
a|ψ0 ⟩ + b|ψ1 ⟩, a|ψ0 ⟩ + b|ψ1 aψ0 ⟩ + b|ψ1 ⟩, c|ψ0 ⟩ + d|ψ1 ⟩
= det  
c|ψ0 ⟩ + d|ψ1 ⟩, a|ψ0 ⟩ + b|ψ1 ⟩ c|ψ0 ⟩ + d|ψ1 ⟩, c|ψ0 ⟩ + d|ψ1 ⟩
2
= (a2 µ00 + b2 µ11 + 2abµ01 )(c2 µ00 + d2 µ11 + 2cdµ10 ) − acµ00 + bdµ11 + (ad + cb)µ01
= (ad − bc)2 µ00 µ11 − µ201

(14.21)

where the last line follows from direct algebraic simplification of all the terms involved. Since
M1 ∈ SL(2, C) we have ad − bc = 1 so that

g4 M1 ⊗ I A2 ···An |ψ⟩ = µ00 µ11 − µ201 = g4 (|ψ⟩) .

(14.22)

To get other SLIPs, let An be a system of n qubits, and for any choice of m < n of its
qubits, we associate a bipartite cut denoted as Am ⊗ Bn−m , where Am is a system of m qubits
of An , and Bn−m is the system comprising of the remaining n − m qubits of An . With respect
to this bipartite cut, any vector |ψ⟩ ∈ An can be expressed as
n
X X
|ψ A ⟩ = λxy |ux ⟩Am |vy ⟩Bn−m
x∈[2m ] y∈[2n−m ] (14.23)
= Λ ⊗ I B̃n−m ΩBn−m B̃n−m

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

14.2. SL-INVARIANT POLYNOMIALS 651

where {|ux ⟩Am } and {|vy ⟩Bn−m } are orthonormal bases of Am and Bn−m , respectively, and
each coefficient λxy ∈ C. Therefore, and vector |ψ⟩ ∈ An , and every bipartite cut Am ⊗ Bn−m
of An defines a matrix Λ := (λxy ) with x ∈ [2m ] and y ∈ [2n−m ].

Theorem 14.2.1. Let An be a system of n qubits, m ∈ [n − 1], and Am ⊗ Bn−m be a

bipartite cut of An . For any |ψ⟩ ∈ An let Λ = (λxy ) be the 2m × 2n−m matrix defined
in (14.23), and set J2 := |0⟩⟨1| − |1⟩⟨0|. Then, for any ℓ ∈ N the function
ℓ
⊗m ⊗(n−m) T
fℓ (|ψ⟩) := Tr J2 ΛJ2 Λ (14.24)

is a homogeneous SLIP of degree 2ℓ.

Proof. Let M ∈ SLn . We need to show that f (M |ψ⟩) = f (|ψ⟩). Let N ∈ SLm and
L ∈ SLn−m be such that M = N ⊗ L. Then,
n
M |ψ A ⟩ = N Λ ⊗ L ΩBn−m B̃n−m
(14.25)
= N ΛLT ⊗ I B̃n−m ΩBn−m B̃n−m
We therefore get that
ℓ
An ⊗(n−m)
J2⊗m N ΛLT J2 LΛT N T

fℓ M |ψ ⟩ = Tr
ℓ (14.26)
⊗(n−m)
Cyclic permutation→ = Tr N T
J2⊗m N ΛLT J2 LΛT ,

where we have used the invariance of the trace under cyclic permutation (note that the
power over ℓ does not effect this property). To complete the proof we now argue that
⊗(n−m) ⊗(n−m)
N T J2⊗m N = J2⊗m and LT J2 L = J2 so that the right-hand side above equals
A n
f |ψ ⟩ . Indeed, observe that N = N1 ⊗ · · · ⊗ Nm , where Nx ∈ SL(2, C) so that
N T J2⊗m N = N1T J2 N1 ⊗ · · · ⊗ Nm J2 Nm
(14.27)
(C.13)→ = J2 ⊗ · · · ⊗ J2 = J2⊗m .
⊗(n−m) ⊗(n−m)
In the same way, one can prove that LT J2 L = J2 . This completes the proof.

Examples:
1. The case n = 2. In this
P case the only non-trivial m is m = 1. In this case for any
two-qubit state |ψ⟩ = x,y∈{0,1} λxy |xy⟩ we get
h i
T ℓ
fℓ (|ψ⟩) = Tr J2 ΛJ2 Λ
h i
(C.13)→ = Tr (J2 det(Λ)J2 )ℓ (14.28)
ℓ
J22 = I −−−−→ = 2 det(Λ)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

652 CHAPTER 14. MULTIPARTITE ENTANGLEMENT

Note that in this case det(Λ) = |ψ⟩, |ψ⟩ 2

which is the same SLIP we already discussed
before.
2. The case n = 3. Consider the bipartite cut (A1 A2 ) ⊗ A3 of the three-qubit system A3 .
With respect to this bipartite cut, any state |ψ⟩ ∈ A3 can be expressed as
3
ψA = Λ ⊗ I Ã3 ΩA3 Ã3 , (14.29)

where Λ : A3 → A1 A2 is a 4 × 2 matrix. It will be convinient to view the matrix Λ in

a block form  
Λ1
Λ=  (14.30)
Λ2
where Λ1 and Λ2 are both 2 × 2 matrices. For this choice of bipartite cut we have
h i
⊗2 T ℓ
fℓ (|ψ⟩) = Tr J2 ΛJ2 Λ . (14.31)

The term    
T T
Λ1 Λ1 J2 Λ1 Λ1 J2 Λ2
ΛJ2 ΛT =   J2 [ΛT1 ΛT2 ] =   . (14.32)
T T
Λ2 Λ2 J2 Λ1 Λ2 J2 Λ2
 
0 J2
Hence, combining this with J2⊗2 =   gives
−J2
 
J2 Λ2 J2 ΛT1 J2 Λ2 J2 ΛT2
J2⊗2 ΛJ2 ΛT =   . (14.33)
T T
−J2 Λ1 J2 Λ1 −J2 Λ1 J2 Λ2

Since the trace of the matrix above is zero the case ℓ = 1 is trivial. For the case ℓ = 2
we have h 2 i
Tr J2⊗2 ΛJ2 ΛT = Tr J2 Λ2 J2 ΛT1 J2 Λ2 J2 ΛT1

− Tr J2 Λ2 J2 ΛT2 J2 Λ1 J2 ΛT1

(14.34)
+ Tr J2 Λ1 J2 ΛT2 J2 Λ1 J2 ΛT2

− Tr J2 Λ1 J2 ΛT1 J2 Λ2 J2 ΛT2 .

and due to the cyclic permutation of the trace we have

h i
fℓ=2 (|ψ⟩) = 2Tr J2 Λ2 J2 ΛT1 J2 Λ2 J2 ΛT1 − J2 Λ2 J2 ΛT2 J2 Λ1 J2 ΛT1 . (14.35)

The above expression is an homogeneous SLIP of degree 4. Since for three qubits there
are no homogeneous SLIPs of degree two, any other SLIP must be proportional to
some power of fℓ=2 . Hence, for three qubits, the above SLIP is essentially the only
one.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

14.2. SL-INVARIANT POLYNOMIALS 653

3. The case n = 4. Let A2 ⊗ B2 be a bipartite cut with exactly two qubits on each side
and let |ψ⟩ ∈ A4 be given as |ψ⟩ = Λ ⊗ I B2 |ΩA2 B2 ⟩, where Λ is a 4 × 4 matrix. Then,
the function h ℓ i
fℓ (|ψ⟩) = Tr J2⊗2 ΛJ2⊗2 ΛT , (14.36)
is an homogeneous SLIP of degree 2ℓ. Specifically, consider the four-qubit state

|ψ⟩ = λ1 |Ψ+ ⟩|Ψ+ ⟩ + λ2 |Ψ− ⟩|Ψ− ⟩ + λ3 |Φ+ ⟩|Φ+ ⟩ + λ4 |Φ− ⟩|Φ− ⟩ , (14.37)

where λx ∈ C for all x ∈ [4] and the two-qubit states |Ψ± ⟩ and |Φ± ⟩ form the Bell
basis of C2 ⊗ C2 . For this states it follows that

fℓ (|ψ⟩) = λ2ℓ 2ℓ 2ℓ 2ℓ
1 + λ2 + λ3 + λ4 . (14.38)

Exercise 14.2.5. Prove the relation (14.38).

14.2.2 The Set of All Homogeneous SLIPs

Homogeneous polynomials of degree one on An can be defined using an inner product of the
form fχ (ψ) := ⟨χ|ψ⟩ for all |ψ⟩ ∈ An , where χ is a fixed coefficient vector in An . Similarly,
homogeneous polynomials of degree k ∈ N can be expressed as

fχ (ψ) = χ ψ ⊗k ∀ ψ ∈ An , (14.39)

where the coefficient vector |χ⟩ ∈ (An )⊗k . However, this relation is not one-to-one; fχ is
equal to fχ′ if the coefficient vectors |χ⟩ and |χ′ ⟩ are related by a permutation matrix. This
permutation is with respect to the k copies of An . As a result, there exists an isomorphism
between the space of homogeneous polynomials of degree k and the subspace Symk (An ) of
(An )⊗k (see definition in (C.161)).
The polynomial fχ above is SLIP if and only if for any M ∈ SLn for all |ψ⟩ ∈ An we
have fχ (M |ψ⟩) = fχ (|ψ⟩), which is equivalent to

χ|M ⊗k ψ ⊗k = χ ψ ⊗k . (14.40)

Since the above equation has to hold for all |ψ⟩, and since if M ∈ SLn then M ∗ ∈ SLn we
conclude that fχ is SLIP if and only if (see Exercise 14.2.6)

M ⊗k |χ⟩ = |χ⟩ ∀ M ∈ SLn . (14.41)

7 L (An )⊗k
In other words, |χ⟩ is a SLn -fixed vector under the representation πk : SLn →
given by πk (M ) = M ⊗k . Denoting by SLIPk (An ) the set of all homogeneous SLIPs of degree
k, and by V := (An )⊗k we conclude that
n o
n SLn
SLIPk (A ) = fχ : |χ⟩ ∈ V , (14.42)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

654 CHAPTER 14. MULTIPARTITE ENTANGLEMENT

n
where we used the same notations as in (C.175). Our task is therefore to characterize V SL .
For this purpose, observe that the vector space V is isomorphic to

V = (An )⊗k ∼
= A⊗k ⊗k
1 ⊗ · · · ⊗ An . (14.43)

Let P : V → A⊗k ⊗k
1 ⊗ · · · ⊗ An be this isomorphism (permutation) map. Under this isomor-
⊗k
phism any matrix M , with M := M1 ⊗ · · · ⊗ Mn ∈ SLn , goes to
⊗k
P M ⊗k P −1 = P M1 ⊗ · · · ⊗ Mn P −1 = M1⊗k ⊗ · · · ⊗ Mn⊗k . (14.44)

Therefore, for any |χ⟩ ∈ V that satisfies (14.41) the vector |ϕ⟩ := P |χ⟩ satisfies

M1⊗k ⊗ · · · ⊗ Mn⊗k |ϕ⟩ = |ϕ⟩ . (14.45)

Exercise 14.2.6. Let |χ⟩ ∈ (An )⊗k . Show that fχ as defined in (14.39) is an homogeneous
SLIP of degree k if and only if (14.41) holds.

Lemma 14.2.2. Let |ϕ⟩ ∈ A⊗k ⊗k

1 ⊗ · · · ⊗ An . Then, |ϕ⟩ satisfies (14.45) if and only if

SL(A1 ) SL(An )
|ϕ⟩ ∈ W := A⊗k
1 ⊗ · · · ⊗ A⊗k
n , (14.46)

where we used the notations given in (C.175).

Proof. Clearly, by definition, if |ϕ⟩ ∈ W then |ϕ⟩ satisfies (14.45). Conversely, suppose that
SL(A1 )
|ϕ⟩ satisfies (14.45), and let {|vx ⟩} be an orthonormal basis of A⊗k 1 , and {|uy ⟩} be an
SL(A )
orthonormal basis of the orthogonal complement of A⊗k in A⊗k
1

1 1 . Finally, let {|φz ⟩}
be an orthonormal basis of A2 ⊗· · ·⊗An . With these notations, since |ϕ⟩ ∈ A⊗k
⊗k ⊗k
1 ⊗· · ·⊗An
⊗k

where λxz , µyz ∈ C. Using this expression in (14.45), and taking a special case in which
M2 = I A2 ,. . . ,Mn = I An , gives
X X X X
λxz M1⊗k |vx ⟩|φz ⟩ + µyz M1⊗k |uy ⟩|φz ⟩ = λxz |vx ⟩|φz ⟩ + µyz |uy ⟩|φz ⟩ . (14.48)
x,z y,z x,z y,z

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

14.2. SL-INVARIANT POLYNOMIALS 655

Since the vectors {|φz ⟩} are orthonormal, it follows that for all z and all M1 ∈ SL(A1 ) we
have X X
µyz M1⊗k |uy ⟩ = µyz |uy ⟩ . (14.50)
y y
P
The above equation implies that for each z the vector y µyz |uy ⟩ belongs to the subspace
SL(A1 )
A⊗k
1 . However, by definition, the vectors {|uy ⟩} belong to the orthogonal complement
⊗k SL(A1 )

of A1 . Therefore, the coefficients {µyz } must be zero, so that
X
|ϕ⟩ = λxz |vx ⟩|φz ⟩ . (14.51)
x,z

Denoting by C := A⊗k ⊗k C
2 ⊗ · · · ⊗ An , the above equation can be expressed as Π1 ⊗ I |ϕ⟩ = |ϕ⟩,
where X
Π1 := |vx ⟩⟨vx | (14.52)
x
SL(A1 )
is the orthogonal projection to the subspace A⊗k 1 . Denoting by the Πx the orthogonal
⊗k SL(Ax )

projection to the subspace Ax , and repeating the same argument for any x ∈ [n]
we conclude that
Π1 ⊗ · · · ⊗ Πn |ϕ⟩ = |ϕ⟩ . (14.53)
That is, |ϕ⟩ ∈ W . This completes the proof.
n SL(Ax )
The lemma above shows that characterizing V SL can be done by characterizing A⊗k x
or the orthogonal projection Πx . Therefore, the problem of characterizing all SLIPs of degree
SL(A)
k can be reduced to characterizing A⊗k , which is a classic representation theory prob-
lem that uses the Schur-Weyl duality. This duality connects the irreducible representations
(irreps) of SL(A) to the symmetric group on k elements, with a natural action. Further
information can be found in the ‘Notes and References’ section at the end of this chapter.

14.2.3 SLIPs and Multipartite Entanglement Monotones

The following theorem illustrates the usefulness of SLIPs in quantifying entanglement.

Theorem 14.2.2. Let fk : An → R+ be an homogenous SLIP of degree k ∈ N, and

define for any ψ ∈ Pure(An )
n n 2/k
E ψA ψA

:= fk . (14.54)

Then, E is an entanglement monotone on pure multipartite states; i.e. it is zero on

product states and it does not increase on average under LOCC.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

656 CHAPTER 14. MULTIPARTITE ENTANGLEMENT

Proof. From Exercise 14.2.3, we deduce that E vanishes on product states. Consider an
n
arbitrary m ∈ N and ψ ∈ Pure(An ). If there exists an LOCC protocol that transforms ψ A
n An
to ϕA n
x ∈ Pure(A ) with a probability px , where x ∈ [m], then each ϕx can be represented
as:
n 1 n
ϕA
x = √ Mx ψ A , (14.55)
px
where each matrix Mx is a tensor product of the form MP x = Λx1 ⊗ · · · ⊗ Λxn , and for every
n
y ∈ [n], Λxy ∈ L(Ay ). Additionally, we have the relation x∈[m] Mx∗ Mx = I A .
Leveraging Lemma 14.2.1, we observe that f (Mx |ψ⟩) = 0 when det(Mx ) = 0. We can
then categorize the set {Mx }x∈[m] into two subsets: the matrices that are rank deficient and
those that possess full rank. Without loss of generality, let’s assume the first r ∈ [m] matrices
{Mx }x∈[r] are all of full rank, while the subsequent matrices, for all x = r + 1, . . . , m, satisfy
the condition det(Mx ) = 0.
Thus, for each x ∈ [r], aside from a scalar coefficient, Mx can be interpreted as a member
of SLn . More precisely, we can express Mx as Mx = µx Nx , where µx := (det(Mx ))1/d ,
d := |An |, and the normalized matrix Nx := µ1x Mx belongs to SLn . With these notations,
we can proceed as follows:
2/k X 2/k
X
An
X 1 An µx An
px E ϕx = px fk √ Mx ψ = px fk √ Nx ψ
px px
x∈[m] x∈[r] x∈[r]
n 2/k
X
fk is homogenous of degree k → = |µx |2 fk Nx ψ A
x∈[r]
(14.56)
An 2/k
X
n 2

SL invariance→ = |µx | fk ψ
x∈[r]
n X
= E ψA |µx |2 .
x∈[r]

Now, observe that

X 1/d
2/d
X X X
|µx |2 ⩽ |µx |2 = det(Mx ) = det (Mx∗ Mx ) , (14.57)
x∈[r] x∈[m] x∈[m] x∈[m]

where we removed the absolute value since Mx∗ Mx ⩾ 0. From the geometric-arithmetic
1/d
∗
inequality we have that det (Mx Mx ) ⩽ d1 Tr [Mx∗ Mx ]. Hence, substituting this into the
equation above gives
X X 1 1
|µx |2 ⩽ Tr [Mx∗ Mx ] = Tr I An = 1 .

d d (14.58)
x∈[r] x∈[m]

Combining this with (14.56) we conclude that

n n
X
px E ϕ A
x ⩽ E ψA . (14.59)
x∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

14.3. CHARACTERISTICS OF MULTIPARTITE ENTANGLEMENT 657

This completes the proof.

To extend the definition of E as defined in (14.54) to mixed multipartite states, we can

employ the convex roof extension. In particular, for a homogeneous SLIP of degree k, we
can use the following approach:
n n 2/k
X
E ρA := min px fk ψxA , (14.60)
x∈[m]

n n
where the minimum is over all pure state decompositions of ρA = x∈[m] px ψ A . The above
P
theorem implies that E is an entanglement monotone on mixed states.

Exercise 14.2.7. Show that E as defined in (14.60) is an entanglement monotone on mul-

tipartite mixed states.

If we set n = k = 2 and choose fk to be the specific SLIP (14.9) in (14.60), we can

see that E corresponds to the concurrence. Although there is a simple closed formula for
the concurrence of mixed bipartite states, it may not be immediately clear whether similar
formulas exist for multipartite entanglement. Interestingly, there is one known example of
such a formula for an even number of qubits.

Exercise 14.2.8. Let n ∈ N be an even integer, ρ ∈ D(An ), ·, · n : An × An → C be the

bilinear form as defined in (14.14), and f be the SLIP of degree 2 as defined in (14.17).
Finally, let E be the entanglement monotone as defined in (14.60) but with f replacing fk .
Show that
n ℓ o
An
X
E ρ = max 0, λ1 − λx (14.61)
x=2
n √ √
where ℓ := 2 , and {λx }x∈[ℓ] are the eigenvalues of the matrix ρ ρ⋆ arranged in non-
decreasing order. The matrix ρ⋆ is defined similarly to (13.88) as
n ⊗n AB ⊗n
ρA
⋆ := σ2 ρ̄ σ2 . (14.62)

Hint: Follow the exact same lines as in the proof of Theorem 13.2.2 by with ·, · n
replacing
the bilinear form given in (13.91).

14.3 Characteristics of Multipartite Entanglement

In this section, we will present several key results from representation theory and algebraic
geometry that help characterize the structure of multipartite entangled states. While some
of these results are presented without their full proofs, as they go beyond the scope of this
book, interested readers can find more information in the book by [227]. By leveraging these
powerful mathematical tools, we can gain deeper insight into the properties of multipartite
entangled states.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

658 CHAPTER 14. MULTIPARTITE ENTANGLEMENT

14.3.1 Critical States

Let Lie(SLn ) be the Lie algebra of the group SLn defined in (14.6). Define the set of critical
states in An to be
n o
Crit(An ) := |ψ⟩ ∈ An : ⟨ψ|X|ψ⟩ = 0 ∀ X ∈ Lie (SLn ) . (14.63)

The reason for this terminology, is that any state |ψ⟩ ∈ Crit(An ) is a critical point of the
function f : SLn |ψ⟩ → R+ defined by f (|ϕ⟩) := ∥|ϕ⟩∥. In fact, we have something that is a
bit stronger.

Kempf-Ness Theorem (Part I)

Theorem 14.3.1. Let |ψ⟩ ∈ An . The following statements are equivalent:

1. The state |ψ⟩ ∈ Crit(An ).

2. For any M ∈ SLn we have M |ψ⟩ ⩾ |ψ⟩ .

n
3. For any x ∈ [n], the reduced density matrix of ψ A on the xth -sub-system is
proportional to the identity matrix I Ax .

Proof. We start by proving that 1 ⇒ 2. Suppose |ψ⟩ ∈ Crit(An ) and observe that for any
M ∈ SLn also M ∗ M ∈ SLn . We can therefore write M ∗ M = eX for some X ∈ Lie (SLn ).
Hence,
∥M |ψ⟩∥2 = ⟨ψ|eX |ψ⟩
eX ⩾ I + X −−−−→ ⩾ ⟨ψ| (I + X) |ψ⟩ (14.64)
2
|ψ⟩ is critical→ = ⟨ψ|ψ⟩ = ∥|ψ⟩∥ .
To prove that 2 ⇒ 1, suppose that for any M ∈ SLn we have M |ψ⟩ ⩾ |ψ⟩ . Then, for
any X ∈ Lie (SLn ) and t ∈ R we have
1 2
f (t) := e 2 tX |ψ⟩ = ⟨ψ|etX |ψ⟩ ⩾ ⟨ψ|ψ⟩ = f (0) . (14.65)

Hence, t = 0 must be a critical point of the function f (t) so that f ′ (0) = ⟨ψ|X|ψ⟩ = 0. Since
this holds for any X ∈ Lie (SLn ) we conclude that |ψ⟩ ∈ Crit(An ).
We next prove the equivalence of 1 and 3. Recall that any X ∈ Lie (SLn ) can be written
as a linear combination of matrices, that up to a permutation of the subsystems of An , have
the form X1 ⊗ I A2 ⊗ · · · ⊗ I An . Now, if X = X1 ⊗ I A2 ⊗ · · · ⊗ I An then the condition
⟨ψ|X|ψ⟩ = 0 is equivalent to
Tr ρA1 X1 = 0,

(14.66)
n
where ρA1 := TrA2 ···An ψ A . Since the above condition has to hold for all X1 ∈ Lie SL(A1 ) ,
we conclude that ρA1 is proportional to the identity matrix. In other words, |ψ⟩ ∈ Crit(An )
n
if and only if for any x ∈ [n], the reduced density matrix of ψ A on the xth-subsystem is
proportional to the identity matrix I Ax . This completes the proof.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

14.3. CHARACTERISTICS OF MULTIPARTITE ENTANGLEMENT 659

Exercise 14.3.1. Let |ψ⟩ ∈ Crit(An ) and let M ∈ SLn be such that M |ψ⟩ = |ψ⟩ .

1. Show that there exists a local unitary matrix; i.e., U ∈ SU (d1 ) × · · · × SU (dn ) such
that M |ψ⟩ = U |ψ⟩.

2. Show that if in addition, M > 0, then M |ψ⟩ = |ψ⟩.

14.3.2 The Null Cone

Definition 14.3.1. Let An be a multipartite system. The null cone of An , denoted

by Null(An ), is the set of all vectors in An on which all SLIPs vanish. That is,
|ψ⟩ ∈ Null(An ) if and only if

f (|ψ⟩) = 0 ∀ f ∈ SLIP(An ) . (14.67)

For two qubit states the null cone consists only of product states. This can be easily
verified by noting that the SLIP given by f (|ψ⟩) := |ψ⟩, |ψ⟩ 2 is zero if and only if |ψ⟩ ∈
C2 ⊗ C2 is a product state. For higher number of qubits the null cone is not trivial. As an
example, consider the three-qubit state, known as the W-state,

1
|W ⟩ := √ |100⟩ + |010⟩ + |001⟩ . (14.68)
3
 ⊗3
t 0
This state has the property that for any 0 ̸= t ∈ C the matrix Mt :=   satisfies
0 t−1

Mt |W ⟩ = t|W ⟩ . (14.69)

Since Mt ∈ SL3 , for any homogeneous SLIP fk of degree k we have

fk (|W ⟩) = fk (Mt |W ⟩) = fk (t|W ⟩) = tk fk (|W ⟩) . (14.70)

Since t ̸= 0 this means fk (|W ⟩) = 0. Since fk is an arbitrary homogenous SLIP, this implies
that for any f ∈ SLIP(A3 ) (where A is a qubit; i.e. |A| = 2) we have f (|W ⟩) = 0. Therefore,
the W-state belong to the null cone of three qubits.
From (14.69) of the example above it follows that

lim Mt |W ⟩ = lim t|W ⟩ = 0 . (14.71)

t→0 t→0

That is, the orbit SL3 |W ⟩ contains a sequence of vectors approaching the zero vector. This
is precisely the key property of states in the null cone.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

660 CHAPTER 14. MULTIPARTITE ENTANGLEMENT

The Hilbert-Mumford Theorem

Theorem 14.3.2. Let |ψ⟩ ∈ An . The following statements are equivalent:

1. The vector |ψ⟩ ∈ Null(An ).

2. There exists a sequence of vectors {|ψk ⟩}k∈N ⊂ SLn |ψ⟩ such that

lim |ψk ⟩ = 0 . (14.72)

k→∞

The direction that 2 ⇒ 1 is relatively simple to show. Indeed, suppose there is a sequence
of vectors |ψk ⟩k∈N ⊂ SLn |ψ⟩ that approaches the zero vector in the limit k → ∞. Let
f ∈ SLIP(An ). Then, for any k ∈ N we have f (|ψk ⟩) = f (|ψ⟩). Since this holds for any
integer k it must hold also for the limit k → ∞. Combining this with the continuity of
polynomial functions we get
f (|ψ⟩) = lim f (|ψk ⟩) = f (0) = 0 . (14.73)
k→∞

As f was an arbitrary SLIP we conclude that |ψ⟩ ∈ Null(An ). The other direction can be
found in Theorem 43 of [227].
Exercise 14.3.2. Let λ1 , . . . , λn ∈ C, and let
|ψ⟩ := λ1 |10 . . . 0⟩ + λ2 |01 . . . 0⟩ + · · · + λn |00 . . . 1⟩ ∈ An . (14.74)
Show that |ψ⟩ ∈ Null(An ).

14.3.3 Stable States

Definition 14.3.2. Let An be a multipartite system. A state ψ ∈ Pure(An ) is said

to be stable if its orbit SLn |ψ⟩ is closed. The set of all stable states is denoted
Stable(An ).

Note that the orbit SLn |ψ⟩ is closed if for any sequence of states {|ϕk ⟩}n∈N ⊂ SLn |ψ⟩
with a limit limk→∞ |ϕk ⟩ = |ϕ⟩ we have that the limit |ϕ⟩ is also in SLn |ψ⟩. Therefore, states
in the null cone are not stable since if |ψ⟩ ∈ Null(An ) is a non-zero vector then SLn |ψ⟩ does
not contain the zero vector. Still, SLn |ψ⟩ contains a sequence of vectors with zero limit since
|ψ⟩ is in the null cone. Hence, the null cone and the set of stable states forms two disjoint
set of states in An . The following theorem shows that any state in An can be written as a
linear combination of these two set of states.

Theorem 14.3.3. Let An := A1 · · · An be a multipartite system and ψ ∈ An . Then,

there exists |ϕ⟩ ∈ Stable(An ) and |χ⟩ ∈ Null(An ) such that

|ψ⟩ = |ϕ⟩ + |χ⟩ . (14.75)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

14.3. CHARACTERISTICS OF MULTIPARTITE ENTANGLEMENT 661

The above result follows from a variant of the Hilbert-Mumford theorem given in Theo-
rem 45 of [227]. The theorem above states that the vector space An can be decomposed into
the direct sum
An = Stable(An ) ⊕ Null(An ) . (14.76)
In addition, it can be shown that almost all vectors in An are stable in the sense that the
closure of Stable(An ) is the whole space; i.e.

An = Stable(An ) . (14.77)

Therefore, much of the characterization in literature of multipartite entanglement is focused

on stable states.

The Kempf-Ness Theorem (Part II)

Theorem 14.3.4. Let |ψ⟩ ∈ An . Then, |ψ⟩ ∈ Stable(An ) if and only if SLn |ψ⟩
contains a critical state.

Proof. Suppose |ψ⟩ is stable so that SLn |ψ⟩ is closed. Then, there exists a state |ϕ⟩ ∈ SLn |ψ⟩
with minimal norm; that is, for any M ∈ SLn

M |ψ⟩ ⩾ |ϕ⟩ . (14.78)

But since |ϕ⟩ = N |ψ⟩ for some N ∈ SLn we can express the above equation as

M N −1 |ϕ⟩ ⩾ |ϕ⟩ ∀ M ∈ SLn . (14.79)

Since any M ′ ∈ SLn can be expressed as M ′ = M N −1 for some M ∈ SLn we conclude that
M ′ |ϕ⟩ ⩾ |ϕ⟩ for all M ′ ∈ SLn . From Theorem 14.3.1 it then follows that |ϕ⟩ is a critical
state. That is, the orbit SLn |ψ⟩ contains a critical state. The proof of the converse part can
be found in Theorem 47 of [227].

14.3.4 Characterization of SLOCC Classes

In this subsection we show that SLOCC classes of states can be characterized with SLIPs.
Specifically, let ψ, ϕ ∈ Pure(An ). How can we determine if these two states belong to the
same SLOCC class? According to the discussion above (14.5) it is sufficient to consider
n n
reversible SLOCC classes. Hence, ψ A and ϕA belong to the same (reversible) SLOCC class
if and only if there exists θ ∈ [0, 2π) and M ∈ SLn such that
n
An iθ M ψA
ϕ =e . (14.80)
M ψ An
Observe that if f ∈ SLIPk (An ) for some k ∈ N then the above equation gives
n eiθk n
ϕA ψA

f = k
f . (14.81)
M ψ An

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

662 CHAPTER 14. MULTIPARTITE ENTANGLEMENT

n
Therefore, if h ∈ SLIPk (An ) is another homogenous SLIP of degree k such that h ψ A ̸= 0
then the above equation is equivalent to
n n
f ϕA f ψA
= . (14.82)
h ϕ An h ψ An
n n
That is, if ψ A and ϕA belong to the same reversible SLOCC then the above equation must
hold. The following theorem demonstrates that the converse is also true for almost all states
in An .

Theorem 14.3.5. Let |ψ⟩, |ϕ⟩ ∈ Stable(An ). Then, there exists θ ∈ [0, 2π) and
M ∈ SLn such that (14.80) holds
if and only if (14.82) holds for all k ∈ N and all
n An
f, h ∈ SLIPk (A ) with h ψ ̸= 0.

Remark. Note that in the theorem above, k is unbounded. However, since it is known that
the space of SLIPs has a finite dimension, it is possible to restrict k, although the best upper
bound is unknown.
Proof. We showed above that (14.80) implies (14.82). It is therefore left to show the converse.
n n n
If there exists h ∈ SLIPk (An ) such that h ψ A ̸= 0 but h ϕA = 0 then clearly ψ A
n
and ϕA are not in the same invertible SLOCC class. We therefore assume without loss of
n An
generality that there exists k ∈ N and h ∈ SLIP k (A ) such that both h ψ ̸= 0 and
An

h ϕ ̸= 0, and denote by
n
h ϕA
λ := ̸= 0 . (14.83)
h ψ An
With this notation, our assumption in (14.82) implies that for all f ∈ SLIPk (An ),
n n n
f ϕA = λf ψ A = f λ1/k ψ A . (14.84)
Our first goal is to show that up to some phase factors, f above can be replaced with any
SLIP (even not homogeneous). For this purpose, consider the subgroup Gn,k ⊂ GL(An )
defined by
Gn,k := µM : µk = 1 , M ∈ SLn , µ ∈ C ,

(14.85)
and observe that in addition for being SLn -invariant polynomial (i.e., SLIP), h is also Gn,k -
invariant polynomial. Moreover, the degree of any homogeneous Gn,k -invariant polynomial
must be divisible by k. To see this, let g be a Gn,k -invariant polynomial of degree m. Then,
since for any µ ∈ C such that µk = 1 we have µI ∈ Gn,k , it follows that g(|ψ⟩) = g(µI|ψ⟩) =
µm g(|ψ⟩), so that µm = 1. Since m satisfies this property for any such µ (i.e. any k-th root
of unity), we conclude that m = kr for some r ∈ N.
Now, fix k, and let g be a homogenous Gn,k -invariant polynomial of degree kr for some
r ∈ N. Since g is a SLIP, from the assumption of the theorem
n n
g ϕA g ψA
= . (14.86)
hr ϕAn hr ψ An

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

14.4. MULTIPARTITE ENTANGLEMENT OF THREE AND FOUR QUBITS 663

Combining this with (14.83) yields

n n n
g ϕA = λr g ψ A = g λ1/k ψ A

, (14.87)

where in the last equality we used the fact that g is homogeneous of degree kr. Since the above
equation holds for any homogenous SLn -invariant polynomial g (recall that r was arbitrary),
it must also hold for all (possibly non-homogeneous) SLn -invariant polynomials. Hence,
using a result from invariant theory that closed orbits of a reductive algebraic subgroup of
GL(An ) are separated by their invariant polynomials, we conclude that there exists µ ∈ C
n n n n
with µk = 1 and M ∈ SLn such that ϕA = λ1/k µM ψ A . The upshot is ϕA = cM ψ A
n n
for some c ∈ C, and the normalization ϕA = 1 gives c = eiθ / M ψ A . This completes
the proof.

The theorem above demonstrates that SLIPs can be used to classify multipartite entan-
glement. We give two examples of such classifications in three and four qubits systems.

14.4 Multipartite Entanglement of Three and Four Qubits

14.4.1 Classification of Three Qubit Entanglement

Canonical Form

In the previous chapters, we learned that pure bipartite states can always be represented in
their Schmidt form. Specifically, for a two-qubit system AB, any state ψ ∈ Pure(AB) can
be expressed, up to local unitaries, as
√ p
|ψ AB ⟩ = p|00⟩ + 1 − p|11⟩, (14.88)

where p ∈ [0, 1]. We refer to this representation as the canonical form of the state ψ AB .
Now, our goal is to find a canonical form for any three-qubit state in ABC where |A| =
|B| = |C| = 2. To achieve this, we will utilize the following property presented in the
following exercise.

Exercise 14.4.1. Let AB be a two-qubit system and let |ψ0 ⟩, |ψ1 ⟩ ∈ AB be two pure bipartite
vectors. Show that if the vectors |ψ0AB ⟩ and |ψ1AB ⟩ are linearly independent then there exists
numbers a, b ∈ C such that a|ψ0AB ⟩ + b|ψ1AB ⟩ is a product (i.e. non-entangled) state. Hint:
Denote by c := ab and view the determinant of the reduced density matrix of the (non-
normalized) state c|ψ0AB ⟩+|ψ1AB ⟩ as a quadratic polynomial in c. Recall that over the complex
field, all quadratic polynomials have roots.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

664 CHAPTER 14. MULTIPARTITE ENTANGLEMENT

Theorem 14.4.1. Let ABC be a three-qubit system, and let ψ ∈ Pure(ABC).

Then, up to a local unitary matrix in L(ABC), the state ψ ABC can be expressed as

ψ ABC = λ0 |000⟩ + λ1 eiθ |100⟩ + λ2 |101⟩ + λ3 |110⟩ + λ4 |111⟩ , (14.89)

where λ0 , . . . , λ4 ∈ R+ and θ ∈ [0, π].

Remark. The normalization of ψ ABC implies that λ20 + · · · + λ24 = 1. In the proof below, the
fact that we can restrict θ to the domain [0, π] will be left as an exercise.
Proof. Every three-qubit state |ψ⟩ ∈ ABC can be expressed as
ψ ABC = |0⟩A |ψ0BC ⟩ + |1⟩A |ψ1BC ⟩ , (14.90)
where |ψ0 ⟩, |ψ1 ⟩ ∈ BC are two orthogonal (possibly unnormalized) vectors. From the exercise
above it follows that there exists two complex numbers a, b ∈ C such that a|ψ0BC ⟩ + b|ψ1BC ⟩
2 2
is a product state. Note that  loss of generality we can assume that |a| + |b| = 1.
 without
a b
Therefore, the matrix U =   is a unitary matrix, so that by applying U to the first
−b̄ ā
qubit of |ψ ABC ⟩ we get
U A ⊗ I BC ψ ABC = a|0⟩A − b̄|1⟩A ψ0BC + b|0⟩A + ā|1⟩A |ψ1BC ⟩

(14.91)
= |0⟩A a ψ0BC + b ψ1BC + |1⟩A ā ψ1BC − b̄ ψ0BC .

Since a ψ0BC +b ψ1BC is a (possibly unnormalized) product state, there exists a local unitary
on BC that transform it to the state λ0 |00⟩BC where λ0 ∈ C is some normalization factor.
We therefore conclude that, up to local unitaries, the state |ψ ABC ⟩ can be expressed as
|ψ ABC = λ0 |000⟩ + |1⟩|ϕBC ⟩ (14.92)
where |ϕBC ⟩ is some vector in BC. Let λ1 , . . . , λ4 ∈ C be such that
|ϕBC ⟩ = λ1 |00⟩ + λ2 |01⟩ + λ3 |10⟩ + λ4 |11⟩ . (14.93)
Note that by applying to the state above, the local unitary
   
iθ1 iθ3
e 0 e 0
U BC :=  ⊗  , (14.94)
iθ2 iθ4
0 e 0 e
we get
U BC |ϕBC ⟩ = λ1 ei(θ1 +θ3 ) |00⟩ + λ2 ei(θ1 +θ4 ) |01⟩ + λ3 ei(θ2 +θ3 ) |10⟩ + λ4 ei(θ2 +θ4 ) |11⟩ . (14.95)
Therefore, by choosing appropriately the four phases θ1 , θ2 , θ3 , θ4 we can make three of the λs
non-negative real numbers. We choose them to be λ2 , λ3 , λ4 ∈ R+ . Moreover, observe that
by applying eiθ |0⟩⟨0| + |1⟩⟨1| to system A in (14.92) we can add a phase to λ0 . Therefore,
we can assume without loss of generality that λ0 is a real non-negative number.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

14.4. MULTIPARTITE ENTANGLEMENT OF THREE AND FOUR QUBITS 665

Exercise 14.4.2. Complete the proof above by showing that θ in (14.89) can be restricted to
[0, π].
Exercise 14.4.3. Let ψ ABC be the three-qubit state given in (14.89). Show that its three
local marginals (i.e. reduced density matrices) are given by
   
2 −iθ 2 2 2 iθ
λ0 λ0 λ1 e λ + λ1 + λ2 λ1 λ3 e + λ2 λ4
ψA =   , ψB =  0  (14.96)
λ0 λ1 e−iθ 1 − λ20 λ1 λ3 e−iθ + λ2 λ4 λ23 + λ24

and  
+ +λ20 λ21 λ23
λ1 λ2 e + λ3 λ4 iθ
ψC =   . (14.97)
λ1 λ2 e−iθ + λ3 λ4 λ22 + λ24

From Theorem 14.4.1 and the exercise above we get that up to local unitaries, there is
only one normalized critical state given by the GHZ state
1
|GHZ⟩ := √ |000⟩ + |111⟩ . (14.98)
2

Corollary 14.4.1. Let ABC be a composite system of three qubits. Then, if

|ψ⟩ ∈ Crit(ABC) is normalized then there exists a local unitary matrix
U1 ⊗ U2 ⊗ U3 ∈ L(ABC) such that

ψ ABC = U1 ⊗ U2 ⊗ U3 |GHZ⟩ . (14.99)

Proof. From the properties of critical states (see Theorem 14.3.1) we know that if |ψ⟩ ∈
Crit(ABC) is normalized then all three local marginals ψ A , ψ B , and ψ C must be maximally
mixed. Now, from Theorem 14.4.1 we know that up to local unitaries the state ψ ABC can
be expressed as in (14.89). Hence, using this form, we get from Exercise 14.4.3 that the
condition ψ A = 21 I A holds if and only if λ20 = 21 and λ1 = 0. The condition ψ B = 12 I B gives
in particular λ20 + λ21 + λ22 = 21 . Therefore, also λ2 = 0. Finally, the condition ψ C = 12 I C gives
λ3 = 0 and λ24 = 12 . Hence, the state ψ ABC as given in (14.89) is critical if and only in it is
the GHZ state. This concludes the proof.
Recall the homogeneous SLIP of degree four as defined in (14.19) for odd number of
qubits. For three qubit system ABC (with |A| = |B| = |C| = 2), its absolute value is called
the 3-tangle, and it is given for any vector

|ψ ABC ⟩ := |0⟩A |ψ0BC ⟩ + |1⟩A |ψ1BC ⟩ ∈ ABC (14.100)

by  
|ψ0 ⟩, |ψ0 ⟩ |ψ0 ⟩, |ψ1 ⟩
Tangle ψ ABC

:= det   (14.101)
|ψ1 ⟩, |ψ0 ⟩ |ψ1 ⟩, |ψ1 ⟩

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

666 CHAPTER 14. MULTIPARTITE ENTANGLEMENT

where |ψx ⟩, |ψy ⟩ := ⟨ψ̄xBC |J2 ⊗J2 |ψyBC ⟩ for each x, y ∈ {0, 1}; recall that J2 := |0⟩⟨1|−|1⟩⟨0|.
From the corollary discussed earlier, it follows that all stable vectors in ABC are, up to
normalization, contained in the G3 orbit of the GHZ state |GHZ⟩. In other words, almost
all three-qubit normalized states are in the SLOCC class of the GHZ state. This, in turn,
implies that almost all three-qubit states have a non-zero 3-tangle, which is consistent with
the formula for the 3-tangle given below.
Exercise 14.4.4. Show that the 3-tangle of the state ψ ABC in (14.89) is given by
Tangle ψ ABC = λ0 λ4 .

(14.102)
The formula presented in the exercise above shows that the 3-tangle is zero when λ0 = 0,
which makes sense because in this case, the state ψ ABC is a product state between system
A and system BC. This implies that the state is in the null cone, i.e., it has no genuine
tripartite entanglement. On the other hand, if λ4 = 0, then the 3-tangle is also zero. In this
case, the state ψ ABC can be expressed as
ψ ABC = λ0 |000⟩ + λ1 eiθ |100⟩ + λ2 |101⟩ + λ3 |110⟩ . (14.103)
If we apply the flip operator |10| + |0⟩⟨1| to the first qubit of the state ψ ABC , the resulting
state takes the form:
ψ ABC = λ1 eiθ |000⟩ + λ0 |100⟩ + λ2 |001⟩ + λ3 |010⟩ . (14.104)
⊗3
Moreover, observe that by applying the local unitary matrix e−iθ/3 |0⟩⟨0| + e2iθ/3 |1⟩⟨1| we
can eliminate the phase attached to the |000⟩ term. Therefore, after renaming the coefficients
λ0 , . . . , λ3 we conclude that unless the state ψ ABC is a product state between A and BC, its
3-tangle is zero if and only if, up to local unitaries, it can be expressed as
|ψ ABC ⟩ = λ0 |000⟩ + λ1 |100⟩ + λ2 |010⟩ + λ3 |001⟩ , (14.105)
with λ0 , . . . , λ3 ∈ R+ .
Exercise 14.4.5. Show that for any three-qubit pure state ψ ABC of the form (14.105), there
exists three matrices M, N, L ∈ GL(2, C) such that
ψ ABC = M ⊗ N ⊗ L W (14.106)
where |W ⟩ is the W-state as defined in (14.68).
The preceding discussion and exercise demonstrate that the SLOCC class of the W -state
consists of all states whose 3-tangle vanishes. Furthermore, since the W -state lies in the
null cone (as shown in the discussion below equation (14.68)), we can conclude that the null
cone precisely consists of the SLOCC class of the W -state. This, in turn, implies that a
three-qubit vector lies in the null cone if and only if its 3-tangle vanishes. This also implies
that any other SLIP must be proportional to a power of the 3-tangle. Therefore, the 3-tangle
is essentially the absolute value of the only SLIP in three qubits.
In summary, we can divide the space of three qubits into six invertible SLOCC classes:

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

14.4. MULTIPARTITE ENTANGLEMENT OF THREE AND FOUR QUBITS 667

• The “genuine” tripartite entanglement classes: the GHZ class and the W-class.

• Three bipartite entanglement classes: the three SL3 -orbits generated by |0⟩A |ΦBC ⟩,
|0⟩B |ΦAC ⟩, and |ΦAB ⟩|0⟩C .

• The unentangled class generated by |000⟩.

14.4.2 Four-Qubit Entanglement

In this section, our goal is to analyze the classification of SLOCC classes of four-qubit
states by seeking their canonical forms under both local unitaries and SLOCC. We will find
that characterizing these forms is considerably more complex than in the three-qubit case.
Therefore, we will focus our attention on the set of critical states, which is somewhat simpler
to characterize. For interested readers, we refer to the references listed in the section ‘Notes
and References’ at the end of this chapter for more details on this topic.

The Canonical Form Under Local Unitaries

In this section, we consider the composite system ABCD consisting of four qubits, where
|A| = |B| = |C| = |D| = 2. The primary technique employed in the study of four-qubit
entanglement is the “accident” in Lie-group theory, which states an isomorphism between the
special orthogonal group SO(4) (consisting of 4×4 real orthogonal matrices with determinant
one) and the group SU (2) ⊗ SU (2). We denote this isomorphism as

SU (2) ⊗ SU (2) ∼
= SO(4) . (14.107)

In the following exercise you prove this isomorphism.

Exercise 14.4.6. Consider the 4 × 4 complex matrix

 
1 0 0 1
 
 
1  0 i i 0 
T := √   . (14.108)
2 0 −1 1 0  
 
i 0 0 −i

1. Show that T is a unitary matrix.

2. Show that for all U1 , U2 ∈ SU (2) we have

T U1 ⊗ U2 T ∗ ∈ SO(4) .

(14.109)

Hint: Show that T T T = J ⊗ J, where J = |0⟩⟨1| − |1⟩⟨0| is the matrix that satisfies (C.13).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

668 CHAPTER 14. MULTIPARTITE ENTANGLEMENT

We can use the above isomorphism to get the canonical form of a four-qubit state. Let
ψ ABCD ∈ ABCD be a four qubit state, and let M : AB → AB be the 4 × 4 complex
matrix defined via
ψ ABCD = M ⊗ I CD Ω(AB)(CD) (14.110)
where X
Ω(AB)(CD) = |xy⟩AB |xy⟩CD . (14.111)
x,y∈{0,1}

In other words, we view four-qubit states as 4 × 4 complex matrices. Consider now a state

ϕABCD := U1 ⊗ U2 ⊗ U3 ⊗ U4 ψ ABCD (14.112)

with each Ux ∈ SU (2). That is, |ψ⟩ and |ϕ⟩ are related by local unitaries. Let N be the
4 × 4 matrix representing ϕABCD similarly to (14.110). Then, from the second part of
Exercise 2.3.26 we get that M and N are related by
N = (U1 ⊗ U2 )M (U3 ⊗ U4 )T
(14.113)
= T ∗ O1 T M T ∗ O2 T ,

where T is the unitary matrix (14.108) and

T
O1 := T U1 ⊗ U2 T ∗ and O2 := T U3 ⊗ U4 T ∗ .

(14.114)

Observe that O1 , O2 ∈ SO(4).

Continuing, let T M T ∗ = M1 + iM2 , where M1 and M2 are matrices with real coefficients.
Then, T N T ∗ = N1 + iN2 , where N1 := O1 M1 O2 and N2 := O1 M2 O2 are real matrices.
Finally, observe that by appropriate choice of O1 and O2 , the matrix N1 (or N2 ) can be
made diagonal using the (real) singular value decomposition. The resulting N = N1 + iN2
and consequently ϕABCD can be viewed as the canonical form of ψ ABCD . However, this
canonical form of four-qubit states is not very useful as it involves too many parameters.
Specifically, even if N1 is diagonal, the matrix N2 is not.
Exercise 14.4.7. Prove that any 4-qubit state can be expressed, up to local unitary transfor-
mations, in the form given by equation (14.110), where the matrix M satisfies the condition

M ∗ M = D + iΛ; , (14.115)

and D, Λ ∈ R4×4 , with D being a diagonal matrix with non-negative diagonal elements and
Λ being a skew-symmetric matrix.
Exercise 14.4.8. Prove that if Λ1 , Λ2 ∈ SL(2, C) then

T (Λ1 ⊗ Λ2 )T ∗ ∈ SO(4, C) , (14.116)

where SO(4, C) is the (non-compact) special orthogonal group over C; i.e. O ∈ SO(4, C) if
and only if O ∈ C4×4 , OT O = I4 , and det(O) = 1. Here T is the same matrix that was used
in Exercise 14.4.6.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

14.4. MULTIPARTITE ENTANGLEMENT OF THREE AND FOUR QUBITS 669

Critical States
In this subsection, we will characterize the set Crit(ABCD) of critical states in the four-
qubit system by leveraging the isomorphism described in (14.116). Specifically, we begin
by considering Λ = Λ1 ⊗ Λ2 ⊗ Λ3 ⊗ Λ4 ∈ G4 , ψ ∈ ABCD, and the matrix M defined
in (14.110). We observe that
T
Λ ψ ABCD = N ⊗ I CD Ω(AB)(CD) where N = Λ1 ⊗ Λ2 M Λ3 ⊗ Λ4 .

(14.117)

Next, under the isomorphism in (14.116), the matrix M is transformed into M̃ = T M T ∗ and
N into Ñ = T N T ∗ . We can then express the relation between M̃ and Ñ as Ñ = O1 M̃ O2 ,
where T
O1 := T Λ1 ⊗ Λ2 T ∗ and O2 := T Λ3 ⊗ Λ4 T ∗ .

(14.118)
Note that if O1 and O2 were unitaries, we could have diagonalized M̃ using the singular value
decomposition. However, since they are orthogonal, this is not always possible. Nevertheless,
a somewhat cumbersome canonical form does exist (see, e.g., [218]).
We now focus on four-qubit states in ABCD whose corresponding M̃ matrix has the form
O1 DO2′ , where D is a 4 × 4 complex diagonal matrix, and O1′ and O2′ are 4 × 4 orthogonal
′

complex matrices. We will show that all critical states in four qubits belong to this class.
Therefore, by the Kempf-Ness theorem (Theorem 14.3.4) in conjunction with (14.77), this
class of states is dense in ABCD. In other words, almost all four-qubit pure states have this
property.
We begin by noting that the diagonalizable property of M̃ remains invariant under the
action of G4 . This is because we have already shown that for every |ψ⟩ ∈ ABCD, the
transformation |ψ⟩ → Λ|ψ⟩ translates, under the isomorphism, to the transformation of M̃
to O1 M̃ O2 = O1 O1′ DO2′ O2 , which is of the form Q1 DQ2 , where Q1 = O1 O1′ and Q2 = O2 O2′
are two orthogonal matrices.
Next, for a fixed diagonal matrix D = Diag(λ1 , λ2 , λ3 , λ4 ), where each λx ∈ C (x ∈ [4]),
we take the state corresponding to M̃ = D to represent this G4 orbit. Note that M =
T ∗ M̃ T = T ∗ DT , so the representative state has the form

ψ ABCD = T ∗ DT ⊗ I CD Ω(AB)(CD)
(14.119)
= (T ∗ D ⊗ T T ) Ω(AB)(CD) .

It is simple to check that for each j ∈ [4] the states |uAB CD

j ⟩ and |vj ⟩ are maximally entangled.
Hence, up to local unitary matrices, the state above can be expressed as

ψλABCD = λ1 ΦAB
+ |ΦCD
+ + λ2 ΦAB
− |ΦCD
− + λ3 ΨAB
+ |ΨCD
+ + λ4 ΨAB
− |ΨCD
− , (14.121)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

670 CHAPTER 14. MULTIPARTITE ENTANGLEMENT

where {|Φ± ⟩, |Ψ± ⟩} denotes the Bell basis of maximally entangled states in four qubits.
Note that if there exists another diagonal matrix D′ = Diag(λ′1 , λ′2 , λ′3 , λ′4 ) such that
D′ = O1 DO2 , we then must have
2
D′ = (D′ )T D′ = O2T D2 O2 . (14.122)

Therefore, the coefficients λ1 , . . . , λ4 must be equal to the coefficients λ′1 , . . . , λ′4 up to a

plus/minus sign. This means that the states |ψλ ⟩ and |ψλ′ ⟩ belong to the same G4 orbit if
and only if there exists a permutation π on four elements such that for all j ∈ [4], λ′j = λπ(j)
or λ′j = −λπ(j) .
Since local unitaries do not form a subgroup of G4 , we cannot directly translate the
above conclusion to the language of SLOCC classes. However, we can order the coefficients
λx such that λ1 has the maximal absolute value and apply a global phase to the state |ψλ ⟩
to remove the phase of λ1 , so that λ1 is a positive real number. With this convention, we
arrive at the following result.

Theorem 14.4.2. Let |ψλ ⟩ and |ψλ′ ⟩ be two four qubit states as given in (14.121),
with the coefficients λ1 and λ′1 being real positive such that for all x = 2, 3, 4 we have
λ1 ⩾ |λx | and λ′1 ⩾ |λ′x |. Then, the two states |ψλ ⟩ and |ψλ′ ⟩ belong to the same
SLOCC class if and only if λ1 = λ′1 and there exists a permutation, π, on three
elements such that for each x ∈ {2, 3, 4} we have λ′x = λπ(x) or λ′x = −λπ(x) .

The theorem above highlights a stark contrast between three-qubit systems and four-
qubit systems. While three-qubit systems have a finite number of SLOCC classes, the same
cannot be said for four-qubit systems. In fact, the theorem demonstrates that four-qubit
systems have an uncountable number of SLOCC classes.
This has significant implications, as it means that converting |ψλ ⟩ to |ψλ′ ⟩ by LOCC is
impossible, even with a probability less than one, unless λ′ = λ up to a permutation and a
sign change of the components of λ′ and λ. In simpler terms, the components of λ′ and λ
must be identical except for a rearrangement and possibly a change in sign.

Exercise 14.4.9. Show that the state |ψλ ⟩ in (14.121) is a critical state. Specifically, show
that if ψλABCD is normalized then its four local marginals are maximally mixed; i.e. show
that
1
ψλA = ψλB = ψλC = ψλD = I2 . (14.123)
2
It is worth noting that in [220, 218, 226] it has been shown that up to local unitaries, the
set  
 X 
C := ψλABCD : λ1 , λ2 , λ3 , λ4 ∈ C , |λx |2 = 1 (14.124)
 
x∈[4]

is the set of all critical states in four qubits.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

14.5. DETERMINISTIC INTERCONVERSIONS OF MULTIPARTITE
ENTANGLEMENT 671
14.5 Deterministic Interconversions of Multipartite En-
tanglement
As discussed in the previous section, four-qubit systems exhibit an uncountable number of
SLOCC classes. This means that deterministic and non-deterministic LOCC conversions
between two randomly selected states is typically not possible in multipartite systems. How-
ever, for two states that belong to the same SLOCC class, it may be possible to convert one
state to another deterministically via LOCC. Recall, however, that LOCC operations can
be very complex in multipartite systems. As such, it will be more convenient to consider
the larger set of separable operations instead. These operations can be used to transform a
state into any other state in the same SLOCC class, and are generally easier to handle than
LOCC operations.
A quantum channel E ∈ CPTP(An → An ) is said to be separable if it has an operator
sum representation of the form
X (k) (k)
E (·) = Mk (·)Mk∗ where Mk = N1 ⊗ N2 ⊗ · · · ⊗ Nn(k) , (14.125)
k∈[m]

(k)
and for each j ∈ [n], Nj ∈ L(Aj ). We denote the set of all such channels by SEP(An →
(k)
An ). While the matrices Nj (and, by extension, Mk ) might not always be invertible,
in this section, our attention is specifically on the conversion between one pure state to
another under the assumption that all {Mk }k ∈ [m] are non-singular. We use the notation
SEP1 (An → An ) ⊂ SEP(An → An ) to represent all separable channels of this kind. In
essence, our focus is restricted to separable operations as defined earlier, with each Mk being
an element of GLn . For further insights and references into the relations between LOCC,
SEP1 , and SEP, readers interested are referred to the concluding section of this chapter,
titled “Notes and references.”

14.5.1 The Stabilizer Group

In order to fully characterize conversions among pure multipartite states, it is necessary to
introduce the concept of the stabilizer group. This group plays a crucial role in determining
the properties and symmetries of a given state, and can help us identify which states can be
transformed into one another via separable operations.
It is worth noting that, unlike the stabilizer formalism commonly used in quantum error
correction codes, the stabilizer group discussed here is not necessarily a subgroup of the Pauli
group. Instead, it is a subgroup of GLn , a much larger group that includes the Pauli group
as a special case. This difference is important because the stabilizer group in quantum error
correction codes is designed to protect against certain types of errors, whereas the stabilizer
group in multipartite quantum systems reflects the underlying symmetries and properties
of the system itself. By understanding the structure and properties of this group, we can
develop new insights into the behavior of multipartite quantum systems and discover new
ways to manipulate and control them.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

672 CHAPTER 14. MULTIPARTITE ENTANGLEMENT

n
Definition 14.5.1. Let ψ ∈ Pure(An ). The stabilizer group of ψ A is a subgroup of
GLn defined by
Stab(ψ) := Λ ∈ GLn : Λ|ψ⟩ = |ψ⟩

(14.126)

Note that the set Stab(ψ) is not empty since the identity matrix belongs to it.

Exercise 14.5.1. Let ψ ∈ Pure(An ) and consider the stabilizer group Stab(|ψ⟩).

1. Show that Stab(|ψ⟩) is indeed a group.

2. Show that for any Λ ∈ GLn and |ϕ⟩ := Λ|ψ⟩ we have

Stab(ϕ) = Λ Stab(ψ) Λ−1 . (14.127)

Exercise 14.5.2. Let AB be a bipartite system with |A| = |B|. Find the stabilizer group of
the maximally entangled state |ΦAB ⟩.

The stabilizer group for ψ is a subgroup of GLn . One may naturally wonder how this
group is related to the same group, but with GLn replaced by SLn . The following theorem
demonstrates that, unless ψ is in the null cone of An , every element in the stabilizer group
of ψ lies in SLn up to a factor given by a root of unity.

Theorem 14.5.1. Let ψ ∈ Pure(An ) and suppose that |ψ⟩ ̸∈ Null(An ). Then, there
exists m ∈ N such that
n 2πk o
Stab(ψ) ⊂ Gm := ei m Λ : k ∈ [m] , Λ ∈ SLn . (14.128)

Proof. By definition, since |ψ⟩ ̸∈ Null(An ) there exists a homogeneous SLIP, f , with the
property that f (|ψ⟩) ̸= 0. Let m be the degree of f . Now, let Λ′ ∈ Stab(ψ) and observe
that since Stab(ψ) ⊂ GLn there exists a ∈ C such that Λ′ = aΛ, where Λ ∈ SLn . Thus, the
property Λ′ |ψ⟩ = |ψ⟩ gives

f (|ψ⟩) = f (Λ′ |ψ⟩) = f (aΛ|ψ⟩) = am f (Λ|ψ⟩) = am f (|ψ⟩) . (14.129)

Since f (|ψ⟩) ̸= 0 we must have am = 1 so that Λ′ = aΛ ∈ Gm . This completes the

proof.

Corollary 14.5.1. Let ψ ∈ Pure(An ) and suppose there exists an homogeneous

SLIP of degree m ∈ N that is not vanishing on ψ. Let G := Stab(ψ) ∩ SLn . Then,
the quotient group Stab(ψ)/G is a group of order at most m.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

14.5. DETERMINISTIC INTERCONVERSIONS OF MULTIPARTITE
ENTANGLEMENT 673
Proof. We first need to show that G is a normal subgroup of Stab(ψ). To see why, recall
2πk
that for any M ∈ Stab(ψ) we have M = ei m Λ where Λ ∈ SLn . Therefore, for any Γ ∈ G
we get
2πk
M Γ = ei m ΛΓΛ−1 Λ = Γ′ M (14.130)
where Γ′ := ΛΓΛ−1 . We therefore need to show that Γ′ ∈ G. Since Λ′ is a product of three
matrices in SLn it is itself in SLn . To show that Γ′ ∈ Stab(ψ) observe that by definition Γ′
can also be expressed as Γ′ = M ΛM −1 which is a product of three matrices in Stab(ψ) and
therefore also Λ′ is in Stab(ψ). Hence, G is a normal subgroup of Stab(ψ). Finally, note
2πk
that any M = ei m Λ as above satisfies M m ∈ SLn so that M m ∈ G. This completes the
proof.
In the second corollary of Theorem 14.5.1, we use the notation dx := |Ax | to represent
the local dimension of subsystem Ax for each x ∈ [n]. Additionally, we denote by
SU n := SU (d1 ) × · · · × SU (dn ) (14.131)
and for any m ∈ N n 2πk o
Km := ei m U : k ∈ [m] , U ∈ SU n . (14.132)

Corollary 14.5.2. Let ψ ∈ Crit(An ) be such that Stab(ψ) is a finite group. Then,
there exists m ∈ N such that
Stab(ψ) ⊂ Km . (14.133)

Proof. Let M ∈ Stab(ψ). Since ψ is a critical state it is not in the null cone of An , so that
from Theorem 14.5.1 there exists N ∈ SLn , and a ∈ C with am = 1, such that M = aN .
Moreover, using the polar decomposition we can further express N as N = U Λ, where
U ∈ SU n and Λ > 0 is a positive matrix in SLn . Hence,
|ψ⟩ = M |ψ⟩ = aU Λ|ψ⟩ = Λ|ψ⟩ . (14.134)
As Λ ∈ SLn is positive definite, the Kempf-Ness theorem (as described in Exercise 14.3.1)
implies that Λ|ψ⟩ = |ψ⟩. In other words, Λ belongs to the stabilizer group Stab(ψ). Since
Stab(ψ) is a finite group, the sequence {Λk }k∈N must contain elements that are equal to each
n
other, and therefore, there exists k ∈ N such that Λk = I A . Since Λ > 0 we must have
n
Λ = I A . Hence, M = aU ∈ Km . Since M was an arbitrary element of Stab(ψ) we conclude
that all the elements of Stab(ψ) belong to Km . This completes the proof.

Example 1: The Stabilizer Group of the 3-Qubit GHZ State

Let ϕ := √12 (|000⟩ + |111⟩) be the GHZ state of three qubits, and let G := Stab(ϕ) ∩ SL3 . A
straightforward calculation (see Exercise 14.5.3) shows that G is given by
      
 s 0 s 0 s 0 
1 2 3
G=   ⊗   ⊗   : s1 s2 s3 = 1 , s1 , s2 , s3 ∈ C (14.135)
 0 s−1 0 s−1 0 s−1 
1 2 3

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

674 CHAPTER 14. MULTIPARTITE ENTANGLEMENT

Now, recall that in three qubits, the 3-tangle is defined in terms of an homogeneous SLIP
of degree 4. Therefore, from Corollary 14.5.1 we get that the quotient group Stab(ϕ)/G is
a group of order at most four. However, note that if Λ ∈ SL3 then also −Λ ∈ SL3 so that
G4 = G2 , where the groups G2 and G4 are defined in (14.128). We therefore conclude that
Stab(ϕ)/G is a group of order at most two. Since X ⊗ X ⊗ X ∈ Stab(ϕ)/G we conclude
that Stab(ϕ)/G contains only the identity matrix and the flip matrix X ⊗ X ⊗ X so that
Stab(ϕ) is the union of G and the coset (X ⊗ X ⊗ X)G.

Exercise 14.5.3. Prove (14.135) by direct calculation.

Exercise 14.5.4. Find the stabilizer group of the W-state of three qubits. Is it compact?

Example 2: The Stabilizer Group of a Generic State in Four Qubits

As a second example, let ψ by the four-qubit state

|ψ⟩ = λ1 |Φ+ ⟩|Φ+ ⟩ + λ2 |Φ− ⟩|Φ− ⟩ + λ3 |Ψ+ ⟩|Ψ+ ⟩ + λ4 |Ψ− ⟩|Ψ− ⟩ , (14.136)

where {|Φ± ⟩, |Ψ± ⟩} is the Bell basis of two qubits, λ1 , λ2 , λ3 , λ4 ∈ C, and λ2x ̸= λ2x′ for all
x ̸= x′ ∈ [4]. For this four-qubit state, it can be shown (see [?] and [226]) that the group
G = Stab(ψ) ∩ SL4 is the Klein group consisting of only four elements:

G = {I , X ⊗ X ⊗ X ⊗ X , Y ⊗ Y ⊗ Y ⊗ Y , Z ⊗ Z ⊗ Z ⊗ Z} (14.137)

where X, Y, Z are the three Pauli matrices.

Now, recall the homogeneous SLIP of degree 2 given in (14.38). This polynomial is
non-zero if
λ21 + λ22 + λ23 + λ24 ̸= 0 . (14.138)
Therefore, if the coefficients of ψ in (14.136) satisfy the relation (14.138) then from Corol-
lary 14.5.1 there is no difference between G and Stab(ψ) so we get the stabilizer of ψ is the
Klein group given in (14.137).

Exercise 14.5.5. Consider the 4-qubit state:

1
|ψ⟩ := √ |Φ+ ⟩|Φ+ ⟩ + ω|Φ− ⟩|Φ− ⟩ + ω|Ψ+ ⟩|Ψ+ ⟩ , (14.139)
3
2π
where ω = ei 3 .

1. Show that G ̸= Stab(ψ).

2. Find all the elements of Stab(ψ).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

14.5. DETERMINISTIC INTERCONVERSIONS OF MULTIPARTITE
ENTANGLEMENT 675
14.5.2 Generalization of Nielsen Majorization Theorem
In this subsection, we extend Theorem 12.2.2 to the multipartite scenario by defining a posi-
tive semidefinite operator for each stable state in An . Recall that Theorem 14.3.4 establishes
that any stable state ψ ∈ Pure(An ) is related to some critical state χ ∈ Pure(An ) by an
SLn -orbit. Therefore, there exist θ ∈ [0, 2π) and M ∈ SLn such that
M |χ⟩
|ψ⟩ = eiθ . (14.140)
∥M |χ⟩∥
We say that Λ ∈ Pos(An ) is an associated positive (semidefinite) operator (APO) of ψ if Λ
can be expressed as
M ∗M
Λ := . (14.141)
⟨χ|M ∗ M |χ⟩
As demonstrated by the following exercise, the APO of the state ψ above is not unique.
Exercise 14.5.6. Show that if S ∈ Stab(χ) then both Λ and S ∗ ΛS are APOs of ψ.
Exercise 14.5.7. Consider a bipartite state ψ ∈ Pure(AB)
with |A| = |B|. Show that one
of its APOs is given by ρA ⊗ I B , where ρA := TrB ψ AB .

Theorem 14.5.2. Let ψ1 , ψ2 ∈ Pure(An ) be two multipartite states in the same

reversible SLOCC class of χ ∈ Pure(An ). Let Λ1 and Λ2 be any APOs corresponding
SEP1
to ψ1 and ψ2 , respectively. Then, ψ1 −−→ ψ2 if and only if there exists a probability
distribution {px }x∈[m] along with operators {Sx }x∈[m] ⊂ Stab(χ) such that
X
Λ1 = px Sx∗ Λ2 Sx . (14.142)
x∈[m]

Proof. Let N1 , N2 ∈ SLn be such that for j = 1, 2

Nj |χ⟩
|ψj ⟩ = eiθj , (14.143)
∥Nj |χ⟩∥
and denote by
N1∗ N1 N2∗ N2
Λ1 = and Λ2 = . (14.144)
⟨χ|N1∗ N1 |χ⟩ ⟨χ|N2∗ N2 |χ⟩
SEP1 n
ψ2 if and only if there exists Mx ∈ GLn satisfying x Mx∗ Mx = I A such that
P
Now, ψ1 −−→

Mx |ψ1 ⟩ = cx |ψ2 ⟩ (14.145)

for some cx ∈ C. After combining the relation above with (14.143), and performing some
algebra, we obtain:
1 ∥N2 |χ⟩∥ −1
N Mx N1 |χ⟩ = |χ⟩ . (14.146)
cx ∥N1 |χ⟩∥ 2

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

676 CHAPTER 14. MULTIPARTITE ENTANGLEMENT

Thus, for each x we have

1 ∥N2 |χ⟩∥ −1
N Mx N1 = Sx , (14.147)
cx ∥N1 |χ⟩∥ 2
where Sx ∈ Stab(χ). Isolating Mx gives
∥N1 |χ⟩∥
Mx = cxN2 Sx N1−1 . (14.148)
∥N2 |χ⟩∥
n
Using the notations (14.144), it can be verified easily that x Mx∗ Mx = I A if and only if
P
X
Λ1 = px Sx∗ Λ2 Sx , (14.149)
x

SEP
with px := |cx |2 . Hence, ψ1 −−→
1
ψ2 if and only if Λ1 and Λ2 satisfy the relation above. This
completes the proof.
Let χ ∈ Crit(An ) have a finite stabilizer, i.e., Stab(χ) = Uxx∈[m] is a finite set of unitaries,
as established by Corollary 14.5.2. In this situation, we can define the Stab(χ)-twirling
operation as:
n 1 X n
G ωA = Ux ω A Ux∗ ∀ ω ∈ L(An ) . (14.150)
m
x∈[m]
1 SEP
Now, according to Theorem 14.5.2 ψ1 −−→ ψ2 if and only if there exists a probability
distribution {px }x∈[m] such that
X
Λ1 = px Ux∗ Λ2 Ux . (14.151)
x∈[m]

1 SEP
By taking the twirling map G on both sides of the equation above we get that if ψ1 −−→ ψ2
then
G (Λ1 ) = G (Λ2 ) . (14.152)
SEP
1
In other words, the condition above is a necessary condition for the conversion ψ1 −−→ ψ2
SEP1
(but not always sufficient). Moreover, if Λ1 is symmetric, meaning G(Λ1 ) = Λ1 , then ψ1 −−→
ψ2 if and only if Λ1 = G (Λ2 ).
n
One special case that Λ1 is symmetric is the case that ψ1 = χ. In this case, Λ1 = I A
SEP1 n n
and χ −−→ ψ2 if and only if G (Λ2 ) = I A . Conversely, if ψ2 = χ then Λ2 = I A , so the
n SEP1
condition (14.151) becomes Λ1 = I A . In other words, ψ1 −−→ χ if and only if up to local
unitaries ψ1 = χ. This is consistent with the intuition that the critical state is the maximally
entangled state of the SLOCC orbit.
To demonstrate how the theorem mentioned above generalizes Nielsen’s majorization
theorem, we now apply it to the bipartite case. Let us consider a bipartite system AB with
m := |A| = |B|. The only critical state of this system is the maximally entangled state Φm ,
and its stabilizer is given by:
Stab(Φm ) := S −1 ⊗ S T : S ∈ GL (m) .

(14.153)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

14.6. ENTANGLEMENT OF ASSISTANCE 677

The stabilizer group mentioned above is clearly not compact. However, in the derivation
of Nielsen’s majorization theorem, we employed Lo-Popescu’s Theorem (Theorem 12.2.1)
which limits Bob’s operations to be unitary operations. Thus, we can limit S to be a unitary
matrix without loss of generality.
Consider two bipartite states ψ1 , ψ2 ∈ Pure(AB). As per Exercise 14.5.7, an APO of
ψ1 has the form Λ1 = ρA B A AB
1 ⊗ I , where ρ1 is the reduced density matrix of ψ1 . Similarly,
an APO of ψ2 has the form Λ2 = ρA B A
2 ⊗ I , where ρ2 is the reduced density matrix of ψ2 .
AB
SEP
Therefore, from Theorem 14.5.2, we can infer that ψ1AB −−→ ψ2AB if and only if there exists a
1

probability distribution {px }x∈[k] along with k unitary matrices {Ux }x∈[k] ⊂ U (m) such that:

X
ρA B
1 ⊗I = px Ux∗ ρA B
2 Ux ⊗ I . (14.154)
x∈[k]

This condition is precisely the same as the condition we obtained in (12.26) for Nielsen’s
majorization criterion.

Exercise 14.5.8. Let ψ be the 4-qubit state (14.136) and suppose it satisfies (14.138). Clas-
SEP1
sify all the 4-qubit states ϕ for which ψ −−→ ϕ holds.

14.6 Entanglement of Assistance

In the previous sections, we have seen that local operations and classical communication
(LOCC) alone are limited in their ability to manipulate multipartite entanglement. For in-
stance, the classification of four-qubit entanglement involves infinitely many SLOCC classes,
highlighting the complexity of entanglement manipulation under LOCC for generic multi-
partite entangled pure states in Pure(An ), where n ⩾ 4. More generally, it has been shown
that if two generic multipartite entangled pure states ψ and ϕ cannot be locally converted
into each other by a unitary operation, then the conversion cannot be achieved by LOCC as
well. However, this situation can change significantly if the target state ϕ is not generic.
While bipartite entanglement is the most useful form of entanglement in quantum infor-
mation, the concentration of multipartite systems into two parties has also been extensively
studied. In this section, we focus on the conversion of a tripartite pure state into an ensemble
of bipartite states. We begin by considering the following conversion of a tripartite state
ψ ∈ Pure(ABR) into an ensemble of pure states, or equivalently, a cq-state of the form
X
σ ABX = px ϕAB X
x ⊗ |x⟩⟨x| , (14.155)
x∈[m]

where p := (p1 , ..., pm )T ∈ Prob(m), and each ϕx ∈ Pure(AB).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

678 CHAPTER 14. MULTIPARTITE ENTANGLEMENT

Lemma 14.6.1. Let ψ ∈ Pure(AB) and let ρB := TrA ψ AB be its reduced density
matrix on system B. Then, for every pure state decomposition ρB = x∈[m] px ϕB
P
x,
there exists a POVM on Alices system, {Λx }x∈[m] ⊂ Eff(A), such that for all x ∈ [m]

1
ϕB TrA ΛA B
AB
ΛA B
AB
x = x ⊗I ψ and px = Tr x ⊗I ψ . (14.156)
px

That is, every pure state decomposition of ρB can be realized by a generalized

measurement on Alice’s system.

Proof. Consider the pure state

X√
|ψ̃ RB ⟩ := px |x⟩R |ϕB
x⟩ . (14.157)
x∈[n]

Since both ψ AB and ψ̃ RB are purifications of ρB there exists an isometry V : A → R such

that |ψ̃ RB ⟩ = V ⊗ I B |ψ AB ⟩. Therefore, taking
∗
ΛA R
x := V |x⟩⟨x| V ∀ x ∈ [m] , (14.158)
gives h i √
ΛA B AB R B AB
px ϕB

TrA x ⊗I ψ = TrR |x⟩⟨x| ⊗ I ψ̃ = x . (14.159)
Therefore, with this choice of the POVM {ΛA x }x∈[m] we get (14.156). This completes the
proof.

Let ψ ∈ Pure(ABR) be a tripartite pure state with marginal ρAB := TrR ψ ABR . From
Lemma 14.6.1, we know that every pure state decomposition of ρAB can be realized by a
generalized measurement on the reference system R. Thus, for a given measure of bipartite
entanglement E, we define the entanglement of assistance to be
X
Ea ρAB1 := sup px E ϕAB

x
1
, (14.160)
x∈[m]

where the supremum is taken over all pure state decompositions of ρAB = x∈[m] px ϕAB
P
x .
Note that this definition is similar to the definition of the entanglement of formation given
in (13.68), except that we take the supremum instead of the infimum as taken in (13.68).
Exercise 14.6.1. Compute the entanglement of assistance of the maximally mixed state uAB
and conclude that the entanglement of assistance is not a measure of entanglement.
Exercise 14.6.2. Let E be a measure of pure bipartite entanglement, and let EF and Ea be its
corresponding entanglement of formation and assistance ABrespectively.
Let ψ ∈ Pure(AB1 B2 )
AB1 := 1 B2
be a tripartite pure state with marginal ρ TrB2 ψ . Show that if ψ AB1 B2 satisfies
the disentangling condition (14.184) then
EF ρAB = Ea ρAB .

(14.161)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

14.6. ENTANGLEMENT OF ASSISTANCE 679

Exercise 14.6.1 demonstrates that entanglement of assistance is not a measure of entan-

glement as a function of the bipartite state on system AB. However, one may wonder if the
entanglement of assistance is a measure of tripartite pure entanglement. Specifically, for any
ψ ∈ Pure(ABR), we define
Ẽa ψ ABR := Ea ρAB ,

(14.162)

where ρAB := TrR ψ ABR . One can then ask whether it is impossible to increase Ẽa by
LOCC. In other words, can an LOCC prior to the measurement on the reference system
increase the entanglement of assistance? Surprisingly, we now show that such a pre-LOCC
can increase the entanglement of assistance, and therefore Ẽa is also not an entanglement
measure.
Let A = A1 A2 with |A1 | = 2, |A2 | = |B| = 4, and |R| = 2, and let ψ ∈ Pure(ABR) be
the state
1
|ψ ABR ⟩ := |0⟩A1 |ΦA 2B
⟩|0⟩R + |0⟩A1 |ΦA 2B
⟩|1⟩R + |1⟩A1 |ΦA 2B
⟩|+⟩R + |1⟩A1 |ΦA 2B
⟩|−⟩R ,

0 1 2 3
2
(14.163)
A2 B
where {Φx }x∈{0,1,2,3} are four maximally entangled states to be determined shortly. Con-
sider a protocol where Alice measures system A1 in the |0⟩, |1⟩ basis and sends the classical
outcome to the referee. Based on the outcome, the referee performs either the measurement
in the |0⟩R , |1⟩R basis (if Alice’s outcome is zero) or the measurement in the |+⟩R , |−⟩R
basis (if Alice’s outcome is one). Regardless of the measurement outcomes of both Alice
and the referee, Alice and Bob end up with one of the four maximally entangled states
{ΦAx
2B
}x∈0,1,2,3 . In particular, by local unitary operation, Alice and Bob can transform any
of these four states into the maximally entangled state ΦA2 B . Thus, we conclude that
LOCC
ψ ABR −−−→ ΦA2 B . (14.164)

However, we now show that for certain choices of maximally entangled states {ΦAx
2B
}x∈0,1,2,3 ,
the transformation above cannot be achieved (even with probability less than one) if we only
allow system R to perform a measurement.
The reduced density matrix ρAB := TrR ψ ABR of the state above can be expressed as

ρAB = φAB
0 + φAB
1 (14.165)

where
1 A1 A2 B 1
|φAB ⟩ + √ |1⟩A1 |ΦA 2B
⟩ + |ΦA 2B

0 ⟩ := |0⟩ |Φ0 2 3 ⟩ (14.166)
2 2 2
and
1 A1 A2 B 1
|φAB ⟩ + √ |1⟩A1 |ΦA 2B
⟩ − |ΦA 2B

1 ⟩ := |0⟩ |Φ1 2 3 ⟩ (14.167)
2 2 2
We argue that there exists four maximally entangled states {ΦA
x
2B
}x∈{0,1,2,3} such that any
AB AB
linear combination of |φ0 ⟩ and |φ1 ⟩ is not maximally entangled. Indeed, take
1
|ΦA
0
2B
⟩ = |ΦA
2
2B
⟩ = (|00⟩ + |11⟩ + |22⟩ + |33⟩) (14.168)
2

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

680 CHAPTER 14. MULTIPARTITE ENTANGLEMENT

and
1
|ΦA
1
2B
⟩ = (|00⟩ − i|11⟩ − |22⟩ + i|33⟩)
2 (14.169)
A2 B 1
|Φ3 ⟩ = (|00⟩ − i|11⟩ + |22⟩ − i|33⟩) .
2
With these choices we get by direct calculation that for any a, b ∈ C the linear combination

a|φAB AB
0 ⟩ + b|φ1 ⟩ (14.170)

is not proportional to the maximally entangled state (see Exercise 14.6.3). Therefore, the
state ψ ABR cannot be converted to ΦAB (even with probability less than one) by a local
measurement on system R. Alternatively, none of the pure-state decompositions of ρAB
contains a maximally entangled state (i.e. 2-ebits).

Exercise 14.6.3. Show that for any choice of a, b ∈ C with |a|2 +|b|2 = 1 the state in (14.170)
is
P3not maximally entangled. Hint: Write the state in (14.170) as a linear combination
A B
x=0 |ϕx ⟩|x⟩ and show that the vectors {|ϕA
x ⟩}x cannot all have the same norm and also
orthogonal to each other.

14.6.1 Entanglement of Collaboration

Since pre-LOCC operations performed by Alice and Bob can increase the entanglement of
assistance, we can modify the definition of entanglement of assistance by including arbitrary
LOCC operations performed by all parties. This modification results in a new measure called
the “entanglement of collaboration”, which more comprehensively quantifies the amount of
bipartite entanglement available for collaborative tasks among all parties.

Entanglement of Collaboration
Definition 14.6.1. Let E be a measure of bipartite entanglement for mixed states.
Its corresponding measure of tripartite entanglement, known as the ”entanglement of
collaboration” and denoted by Ec , is defined as:
′ ′
Ec ρABR := sup E N ABR→A B ρABR

∀ ρ ∈ D(ABR) , (14.171)
N ∈LOCC

where the supremum is over all quantum systems A′ and B ′ , and all LOCC channels
′ ′
N ABR→A B .

In other words, the entanglement of collaboration is the maximal bipartite entanglement

(as measured by E) that can be shared between Alice and Bob after all parties collaborate
via LOCC. Observe that the entanglement of collaboration is never smaller than the en-
tanglement of assistance. Indeed, suppose ρABR = ψ ABR is a pure state and take A′ = A,
B ′ = BX where X is a classical system corresponding to the outcome of a POVM per-
formed on system R. Then, taking the LOCC channel N ABR→ABX = idAB ⊗ E R→X , where

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

14.6. ENTANGLEMENT OF ASSISTANCE 681

E ∈ CPTP(R → X) is a POVM channel, we get that

X
N ABR→ABX ψ ABR = σ ABX := px ϕAB X

x ⊗ |x⟩⟨x| (14.172)
x∈[m]

is a cq-state, where {px , ϕAB

x }x∈[m] is one of the pure-state decompositions of ρ
AB
. Therefore,
ABR→A′ B ′
for this choice of N we get

ABR→A′ B ′ ABR
= E σ ABX

E N ρ
X (14.173)
Assuming E is convex linear (10.8)→ = px E(ϕAB
x ) .
x∈[m]

Therefore, since entanglement of collaboration is defined as a supremum over all such LOCC
′ ′
channels N ABR→A B it must be no smaller than the entanglement of assistance. Furthermore,
unlike entanglement of assistance, entanglement of collaboration is a measure of tripartite
entanglement.

Exercise 14.6.4. Show that the entanglement of collaboration is a measure of tripartite

entanglement.

Lemma 14.6.2. Let E be a measure of bipartite entanglement. Then, for all

ρ ∈ D(ABR) n o
Ec ρABR ⩽ min E ρA(BR) , E ρB(AR)

(14.174)

where the parenthesis in ρA(BR) indicates that the entanglement is computed between
system A and the composite system BR.

Exercise 14.6.5. Prove the lemma above.

One of the most fundamental questions in entanglement theory is the distillation of Bell
states from multiple copies of a bipartite entangled state. As we saw earlier, for a given
pure state ψ AB , the distillable entanglement is determined by the von-Neumann entropy of
the reduced density matrix ψ A . A similar question arises in the multipartite regime: given
many copies of a tripartite pure entangled state ψ ABR , how many Bell states can be distilled
between Alice and Bob by LOCC of all three parties sharing the state?
The above lemma asserts that the optimal distillation rate cannot exceed the minimum
between the entropies of system A and system B. Remarkably, it has been shown that this
upper bound can be attained. However, we will present the proof of this statement in volume
2 of this book, after introducing the quantum state merging protocol from quantum Shannon
theory.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

682 CHAPTER 14. MULTIPARTITE ENTANGLEMENT

14.7 Monogamy of Entanglement

Monogamy of entanglement is a fundamental property of multipartite quantum systems,
whereby the amount of entanglement between two subsystems limits the amount of entan-
glement that each subsystem can have with the rest of the system. Unlike classical correla-
tion, entanglement cannot be freely shared among multiple parties, making this principle a
fundamental concept in quantum information theory. For instance, if Alice’s system is max-
imally entangled with Bob’s system, it cannot be simultaneously entangled with another
third system. That is, sharing entanglement is limited.
This principle has far-reaching implications in various areas, including quantum commu-
nication and cryptography. Overall, monogamy of entanglement is a fundamental property
of quantum systems with significant implications in both theoretical and practical aspects
of quantum information theory.

14.7.1 Quantification of Monogamy of Entanglement

We can quantify the phenomenon of monogamy of entanglement as follows. Let B = B1 B2 be
a composite system, and let E be a measure of entanglement. We say that E is monogamous
if for all ρ ∈ D(AB)
E ρAB ⩾ E ρAB1 + E ρAB2 .

(14.175)
If E is monogamous and A and system B1 are maximally entangled, then we must have
E ρAB = E ρAB1 . From the inequality above, we can conclude that E ρAB2 = 0,
meaning that B2 has no entanglement with A.
It is worth noting that not all measures of entanglement satisfy the inequality above, and
it is not immediately clear that monogamous measures of entanglement exist. Therefore, to
demonstrate the monogamy of entanglement, we must first show that there are measures of
entanglement that satisfy (14.175).

Theorem 14.7.1. The squashed entanglement is a monogamous measure of

entanglement satisfying (14.175).

Proof. Let ρABR be an extension of the state ρAB . Using the chain rule of the conditional
mutual information (see the second equality in (13.180)) we get
1 1 1
I(A : B|R)ρ = I(A : B1 |R)ρ + I(A : B2 |RB1 )ρ
2 2 2 (14.176)
By definition→ ⩾ Esq ρAB1 + Esq ρAB2 .

Since the above inequality holds for all extensions ρABR of ρAB we conclude that

Esq ρAB ⩾ Esq ρAB1 + Esq ρAB2 .

(14.177)

This completes the proof.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

14.7. MONOGAMY OF ENTANGLEMENT 683

It is important to recognize that not all measures of entanglement adhere to the monogamy
condition specified by (14.175). Nevertheless, the squash entanglement is currently the only
known measure of entanglement that satisfies (14.175) in all finite dimensions, which is a
remarkable property that highlights the unique nature of this measure. Other measures
satisfy (14.175) on fixed dimensions. For example, on qubit systems, the square of the
concurrence is also a monogamous measure of entanglement that satisfies (14.175) when
|A| = |B1 | = |B2 | = 2.

Qubit Monogamy Relations

For qubit systems one can use the concurrence to quantify entanglement. Let ψ ∈ Pure(ABC)
be a 3-qubit pure state; that is, |A| = |B| = |C| = 2. In this case, from (13.109) and the
fact that ρAB is at most rank 2, the concurrence of assistance (as defined in Exercise 13.2.9;
see (13.107) and (13.108)) can be expressed
2
Ca ρAB = Tr ρAB ρAB

⋆
(14.178)
Exercise 14.7.1→ = 2 det ρA + det ρB − det ρC .

Exercise 14.7.1. Use a direct calculation to prove the equality in (14.178).

The equality above implies that for the pure state ψ ABC with marinals ρAB and ρAC we
have
2 2 2
Ca ρAB + Ca ρAC = 4 det ρA = C ψ A(BC) .

(14.179)
2
Denoting by τ ρAB := C ρAB (the square of the concurrence of formation, also know
AB AB
as the 2-tangle) and using the fact that C ρ ⩽ Ca ρ we arrive at the following
monogamy inequality
τ ψ A(BC) ⩾ τ ρAB + τ ρAC ,

(14.180)
where h i
A(BC) A 2
= 4 det ρA .

τ ψ := 2 1 − Tr ρ (14.181)

The relation (14.180) is widely known in the literature as the Coffman-Kundu-Wootter

(CKW) monogamy relation. This seminal result introduced the concept of monogamy of
entanglement for the first time, demonstrating that the amount of entanglement between
two subsystems is limited by the amount of entanglement between each subsystem and a
third subsystem.

Exercise 14.7.2. Show that

τ ψ A(BC) − τ ρAB − τ ρAC = Tangle ψ ABC

(14.182)

where the right-hand side is the 3-tangle defined in (14.101).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

684 CHAPTER 14. MULTIPARTITE ENTANGLEMENT

14.7.2 Monogamy Without Inequalities

The definition of a monogamous measure of entanglement, as provided in equation (14.175),
only captures a partial aspect of monogamy of entanglement. This is due to the fact that
many important measures of entanglement do not satisfy this relation, and some of them are
not even additive under tensor product. Therefore, the summation in the right-hand side of
(14.175) is only a convenient choice and not a necessity. For instance, it is well-known that
if E does not satisfy this relation, it is still possible to find a positive exponent α > 0 such
that the function E α satisfies the relation. This is already evident in the CKW relation,
where E is taken to be the square of the concurrence instead of the concurrence itself.
Moreover, there exist measures of entanglement that are multiplicative under tensor
product, which implies that they fail to satisfy (14.175) or any power of it. This highlights
the necessity for a more nuanced and refined definition of monogamy of entanglement that
encompasses these peculiarities. This becomes especially important when dealing with the
range of measures of entanglement available, which exhibit varying properties.
One approach to address the issues with the current definition of a monogamous measure
of entanglement is to replace the relation given in (14.175) with a family of monogamy
relations of the form:
E ρAB ⩾ f E ρAB1 , E ρAB2

, (14.183)

where f is a function of two variables that satisfies certain conditions. While this family of
monogamy relations may be more flexible than the original definition, it still lacks a clear
theoretical foundation. Thus, a more desirable solution would be to derive the monogamy
relations from more basic principles, which would provide a deeper understanding of the
nature of this phenomenon.
Recently, such approach to monogamy of entanglement has been proposed, which is more
“fine-grained” in nature and avoids the need for introducing a function f . This approach does
not involve monogamy relations such as (14.175) or (14.183). Instead, it defines a measure
of entanglement E to be monogamous if it satisfies a certain condition that does not involve
inequalities. In particular, this approach takes into account the fact that different measures
of entanglement have varying properties and limitations, rather than attempting to impose
a one-size-fits-all definition. By adopting this more nuanced approach, we can gain a deeper
understanding of the monogamy of entanglement and how it manifests itself across different
measures.

Definition 14.7.1. Let B = B1 B2 denote a composite system, and let E be a

measure of bipartite entanglement. E is said to be monogamous if for any bipartite
state ρ ∈ D(AB) that satisfies

E ρAB = E ρAB1

(14.184)

we have that E ρAB2 = 0.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

14.7. MONOGAMY OF ENTANGLEMENT 685

The condition expressed in equation (14.184) is considered to be a strong one, and is

usually not met by most states within D(AB). It is often referred to as the “disentangling
condition”. As we will explore further below, quantum Markov states (defined in Defini-
tion 13.2.4) are always guaranteed to satisfy this equality for any entanglement monotone
E. Additionally, the condition presented in equation (14.175) is even stronger than the
one defined in Definition 14.7.1. Specifically, if E satisfies (14.175), then any ρAB that satis-
fies (14.184) must have E(ρAB2 ) = 0. Nonetheless, Definition 14.7.1 still captures the essence
of monogamy, by stipulating that if system A shares the maximum amount of entanglement
with subsystem B1 , it is left with no entanglement to share with B2 .
In Definition 14.7.1 we do not invoke a particular monogamy relation such as (14.175).
Instead, we propose a minimalist approach which is not quantitative, in which we only require
what is essential from a measure of entanglement to be monogamous. Yet, this requirement
is sufficient to generate a more quantitative monogamy relation as demonstrated in the
following theorem.

Theorem 14.7.2. Let E be a continuous measure of entanglement. Then, E is

monogamous according to Definition 14.7.1 if and only if for every d ∈ N, and every
systems A and B = B1 B2 with |AB| = d, there exists 0 < α < ∞ such that

E α (ρAB ) ⩾ E α (ρAB1 ) + E α (ρAB2 ) ∀ ρ ∈ D(AB) . (14.185)

Proof. We leave it as an exercise to show that if E satisfies (14.185) then E is a monogamous

measure of entanglement (according to Definition 14.7.1). We therefore assume now that E
is monogamous and prove the relation (14.185). Since E is a measure of entanglement, it is
non-increasing under partial traces, and therefore for all ρ ∈ D(AB)
E(ρAB ) ⩾ max{E(ρAB1 ), E(ρAB2 )} . (14.186)
Without loss of generality we assume that E(ρAB ) > 0 and set
E(ρAB1 ) E(ρAB2 )
x1 := and x2 := . (14.187)
E(ρAB ) E(ρAB )
From (14.186) we get that x1 , x2 ∈ [0, 1]. Moreover, since E is monogamous we get that if
x1 = 1 then we must have x2 = 0 and vice versa. Therefore, there exists µ > 0 such that
xµ1 + xµ2 ⩽ 1 , (14.188)
since either xµj → 0 when µ increases, or if x1 = 1 then by assumption x2 = 0 and similarly
if x2 = 1 then x1 = 0. Let f : D(AB) → R+ denote a function defined such that f (ρAB )
represents the smallest value of µ that satisfies equality in (14.188). Since E is continuous,
so is f , and the compactness of D(AB) gives:
α := max f (ρAB ) < ∞ . (14.189)
ρ∈D(AB)

By definition, α satisfies the condition in (14.185).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

686 CHAPTER 14. MULTIPARTITE ENTANGLEMENT

It is important to note that the relation given in (14.185) is not of the form given
in (14.183), since the monogamy exponent α in (14.185) depends on the dimension d, whereas
f is considered universal in the sense that it does not depend on the dimension. Therefore,
if a measure of entanglement such as the entanglement of formation is not monogamous
according to the class of relations given in (14.183), it does not necessarily mean that it is
not monogamous according to Definition 14.7.1.
In the next theorem we show that all quantum Markov states satisfy the disentangling
condition. For this purpose, we will rename B1 as B and B2 as B ′ , since the theorem
involves further decomposition of system B into subsystems. Specifically, an entangled
Markov quantum state ρ ∈ D(ABB ′ ) is a state of the form (cf. (13.185))
′ (1) (2)
B′
M
ρABB = px ρAB
x
x
⊗ ρB
x
x
(14.190)
x∈[m]

where M
B= Bx(1) ⊗ Bx(2) , (14.191)
x∈[m]

(1) (2)
B′ (1) (2)
and for each x ∈ [m], ρAB
x
x
and ρB
x
x
are density matrices in D(ABx ) and D(ABx ),
respectively.

Theorem 14.7.3. Let E be an entanglement

monotone. Then, E satisfies the
′
disentangling condition E ρABB = E ρAB (cf. (14.184)) for all quantum Markov
states of the form (14.190).

Proof. Since local ancillary systems are free in entanglement theory, one can append a classi-
(1) (2)
cal ancillary system X that encodes the orthogonality of the subspaces Bx ⊗ Bx . This can
(1) (2)
be done with an isometry that maps states in Bx ⊗ Bx to states in B (1) ⊗ B (2) ⊗ |x⟩⟨x|X ,
(1) (2)
where systems B (1) and B (2) have dimensions maxx Bx and maxx Bx , respectively.
Therefore, without loss of generality we can write the above Markov state as
′ (1) (2) B ′
X
σ ABB X = px ρAB
x ⊗ ρxB ⊗ |x⟩⟨x|X . (14.192)
x∈[m]

Now, note that with any entanglement monotone E, the entanglement between A and BB ′
is measured by
′ ′
E ρABB = E σ ABB X

(1) B (2) B ′
X
(13.65)→ = px E ρABx ⊗ ρ x
x∈[m]
(14.193)
X (1)

= px E ρABx .
x∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

14.7. MONOGAMY OF ENTANGLEMENT 687

Similarly, the entanglement between A and B is measured by

E ρAB = E σ ABX

AB (1) B (2)
X
= px E ρ x ⊗ ρx
x∈[m] (14.194)

AB (1)
X
= px E ρ x .
x∈[m]

′
We therefore obtain E ρABB = E ρAB . This completes the proof.

The Markov state mentioned in the theorem has an important property: the marginal
′ ′
state ρAB is separable, which implies E(ρAB ) = 0. Therefore, Markov states always satisfy
the condition given in Definition 14.7.1. However, one might question whether the converse of
′ ′
the statement in the theorem holds true. In other words, if a state ρABB satisfies E(ρABB ) =
E(ρAB ), is it necessarily a Markov state? For mixed tripartite states, the answer is obviously
′
“no” because all separable states between system A and BB ′ satisfy E(ρABB ) = E(ρAB ),
but not all separable states are Markov states. Nonetheless, in the following theorem, we will
see that under mild assumptions, the converse of the above theorem holds for pure tripartite
states.
In Section 13.2.1, we observed that every entanglement monotone takes the form (13.70)
when evaluated on pure states. Specifically, the entanglement monotone E can be expressed
as follows:
E ψ AB = g ρA with ρA := TrB ψ AB ,

(14.195)
where the function g : D(A) → R+ is Schur concave. Furthermore, we noted that if g
is symmetric (i.e., invariant under unitary channels) and concave, then the convex roof
extension of E corresponds to an entanglement monotone. As a reminder, given any measure
of entanglement E on mixed states, we can construct its convex roof extension as
X
EF ρAB := min px E ψxAB

∀ ρ ∈ D(AB) , (14.196)
x∈[m]

where the minimum is over all pure state decompositions of ρAB = x∈[m] px ψxAB . Moreover,
P

if E is convex (e.g., entanglement monotone) then E ρAB ⩽ EF ρAB for all ρ ∈ D(AB).

Theorem 14.7.4. Let E be an entanglement monotone and ψ ∈ Pure(ABB ′ ).

Suppose g as defined in (14.195) is strictly concave. The following statements are
equivalent:
′ ′
1. E ψ ABB = E ρAB , where ρAB := TrB ′ ψ ABB .
′
2. There exists subsystems B1 and B2 , χ ∈ Pure(AB1 ), ϕ ∈ Pure(B2 B ), and an
′ ′
isometry U : B1 B2 → B such that ψ ABB = U χAB1 ⊗ ϕB2 B U ∗ .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

688 CHAPTER 14. MULTIPARTITE ENTANGLEMENT

′
Remark. The above theorem states that if the pure state ψ ABB satisfies the disentangling
condition, then it is a Markov state (up to local unitary on system B). Additionally, keep
in mind that since E measures entanglement, the function g defined in (14.195) is invariant
under unitary channels. As g is also strictly concave, the convex roof extension of E yields
an entanglement monotone, as stated in Theorem 13.2.1. However, we don’t assume E to be
equal to its convex roof extension. Instead, we observe that since E is convex, it is always
lower than or equal to its convex roof extension.

Proof. We only prove the implication from the first statement to the second, as the converse
is straightforward and left as an exercise for the reader. The first statement implies that

ABB ′
= E ρAB ⩽ EF ρAB ,

E ψ (14.197)

where we used the fact that E is no greater than its convex roof extension. P
On the other
hand, from Lemma 14.6.1 we get that every pure-state decomposition of ρ = x∈[m] px ψxAB
AB
′
has a corresponding measurement on system B ′ of ψ ABB , where the outcome x occurs with
probability px and the post-measurement state on system AB is ψxAB . When combined with
the fact that E is an entanglement monotone, this implies that
′
X
E ψ ABB ⩾ px E ψxAB .

(14.198)
x∈[m]

The two equations above lead to the very strong conclusion that all pure-state decom-
AB ABB ′
positions of ρ have the same average entanglement, which equals E ψ . In other
words, the inequality in the above equation is actually an equality, and it holds for every
pure-state decomposition {px , ψxAB }x∈[m] of ρAB . This equality can be expressed in terms of
the function g as follows:
X
g(ρA ) = px g(ρA
x) , (14.199)
x∈[m]
AB
where ρAx := TrB ψx . Since g is strictly concave, the equation above holds if and only if
ρ = ρx for all x ∈ [m]. Let B1 be a system of dimension r := |B1 | = Rank(ρA ), and let
A A

χ ∈ Pure(AB1 ) be a purification of ρA . Since each |ψxAB ⟩ is also a purification of ρA = ρA

x,
we can infer that there exists an isometry Vx : B1 → B such that

|ψxAB ⟩ = I A ⊗ VxB1 →B |χAB1 ⟩ . (14.200)

Our first goal is to show that if {|ψxAB ⟩}x∈[m] are the eigenvectors of ρAB , then Vx∗′ Vx = δxx′ I B1 .
To prove it, let {qy , ϕAB
y }x∈[m] be another pure-state decomposition of ρ
AB
, also with m
elements). Then, for the exact same reasons as stated above, for each y ∈ [m] there exists
an isometry Wy : B1 → B such that
B1 →B AB1
|ϕAB A
y ⟩ = I ⊗ Wy |χ ⟩ . (14.201)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

14.7. MONOGAMY OF ENTANGLEMENT 689

Recall from Exercise 2.3.15 that the ensemble {qy , ϕAB

y }x∈[m] can be related to the spectral
decomposition of ρAB by a unitary matrix U = (uyx ) as follows:
√ X √
qy |ϕAB
y ⟩ = uyx px |ψxAB ⟩ ∀ y ∈ [m] . (14.202)
x∈[m]

Combining this with (14.200) and (14.201) gives

A √ AB1
A
X √
I ⊗ qy Wy |χ ⟩ = I ⊗ uyx px Vx |χAB1 ⟩ . (14.203)
y∈[m]

−1/2
By multiplying both sides by ρA we can replace |χAB1 ⟩ on both sides of the equation
above with the (unnormalized) maximally entangled state |ΩB̃1 B1 ⟩. Therefore, the equation
above gives
√ X √
qy Wy = uyx px Vx . (14.204)
x∈[m]

Since Wy is an isometry we get

X √
qy I B1 = ūyx′ uyx px′ px Vx∗′ Vx
x,x′ ∈[m]
X X √ (14.205)
= px |uyx |2 I B1 + ūyx′ uyx px′ px Vx∗′ Vx .
x∈[m] x̸=x′
x,x′ ∈[m]

Now, using the fact that {|ψxAB ⟩}x∈[m] forms an orthonormal set of vectors we get from (14.202)
that
√ 2
X
qy = qy |ϕAB
y ⟩ 2
= px |uyx |2 . (14.206)
x∈[m]

Combining this with (14.205) gives

X √
ūyx′ uyx px′ px Vx∗′ Vx = 0 . (14.207)
x̸=x′
x,x′ ∈[m]

The equation above holds for all unitary matrices U = (uyx ) and all y ∈ [m]. Setting y = 1,
and choosing U to be a unitary matrix with its first row as √12 (1, 1, 0, . . . , 0) gives V2∗ V2 +
V1∗ V2 = 0. Similarly, choosing the first row of U to be √12 (1, i, 0, . . . , 0) gives V2∗ V2 −V1∗ V2 = 0.
Thus, we obtain V1∗ V2 = V2∗ V1 = 0. By repeating the same argument with permuted versions
of √12 (1, 1, 0, . . . , 0) and √12 (1, i, 0, . . . , 0), we conclude that for all x, x′ ∈ [m] such that x ̸= x′ ,
we have Vx∗′ Vx = 0.
Let {|z⟩}z∈[r] be an orthonormal basis of B1 , and define |φB xz ⟩ := Vx |z⟩ for all x ∈ [m] and
z ∈ [r]. Using the fact that Vx∗′ Vx = δxx′ I B1 , we can derive that ⟨φB B
x′ z ′ |φxz ⟩ = δxx′ δzz ′ for all
x ∈ [m] and z ∈ [r]. Let K be the subspace spanned by the orthonormal vectors {|φB xz ⟩}, with

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

690 CHAPTER 14. MULTIPARTITE ENTANGLEMENT

x ∈ [m] and z ∈ [r], and note that the dimension of K is mr. Thus, there exists a subspace
B2 of B with |B2 | = m such that K is isomorphic to B1 ⊗ B2 . This isomorphism implies that
B1 B2 →B
there exists an isometry U : B1 B2 → B such that |φB xz ⟩ = U |z⟩B1 |x⟩B2 . Combining
B := B1 B1 B1 B2
this with the definition |φxz ⟩ Vx |z⟩ gives Vx |z⟩ = U |z⟩ |x⟩ for all x ∈ [m] and all
B1 →B B1 B2 →B
z ∈ [r]. Hence, Vx =U I B1 ⊗ |x⟩B2 so that |ψxAB ⟩ = U B1 B2 →B |χAB1 ⟩|x⟩B2 and
X
ρAB = U χAB1 ⊗ σ B2 U ∗ where σ B2 := px |x⟩⟨x|B2 .

(14.208)
x∈[m]

′
Observe that the state in (14.208) has a purification of the form U B1 B2 →B |χAB1 ⟩|ϕB2 B ⟩,
where X√
′ ′
|ϕB2 B ⟩ = px |x⟩B2 |x⟩B . (14.209)
x∈[m]
′
Therefore, since |ψ ABB ⟩ is also a purification of ρAB we conclude that up to a local unitary
′ ′
on system B and on system B ′ , the state ψ ABB has the form |χAB1 ⟩|ϕB2 B ⟩.

Exercise 14.7.3. Consider a monotonically increasing convex function h : R+ → R+ with

the property that h(t) = 0 if and only if t = 0. For any state ρ ∈ D(AB), let
X
Eh ρAB := min px h E ψxAB ,

(14.210)
x∈[m]

where the minimum is over all pure-state decompositions of ρAB = x∈[m] px ψxAB . Show that
P
if Eh is monogamous on pure tripartite states, then E is also monogamous on pure tripartite
states.

Many operational measures of entanglement, such as the relative entropy of entanglement,

entanglement cost, and distillable entanglement, reduce to the entropy of entanglement for
bipartite pure states. Recall that the entropy of entanglement is given in terms of the von
Neumann entropy of the reduced state, H(ρ) = −Tr[ρ log ρ]. Since the von Neumann entropy
is known to be strictly concave, these measures are monogamous on pure tripartite states.
In the following theorem, we show that their convex roof extensions are also monogamous
for mixed tripartite states.

Theorem 14.7.5. Let ρ ∈ D(ABB ′ ), and E and EF be as in (14.195) and (14.196),

respectively, where g is both symmetric and strictly concave. Then, the convex roof
extension EF is monogamous.

′ ′
Proof. Let {px , |ψ ABB ⟩}x∈[m] be the optimal pure state decomposition of ρABB satisfying
′
X ′

EF ρABB = px E ψxABB . (14.211)
x∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

14.8. NOTES AND REFERENCES 691
′
Denoting by ρAB
x := TrB ′ ψxABB we have that
X
ρAB = px ρAB
x . (14.212)
x∈[m]

′
Suppose EF ρABB = EF ρAB . Combining this with the two equations above gives

ABB ′
X
= EF ρAB

px E ψx
x∈[m]
X (14.213)
px EF ρAB

Convexity of EF −−−−→ ⩽ x .
x∈[m]

On the other hand, for every x ∈ [m] we have

ABB ′
⩾ EF ρAB

E ψx x , (14.214)

since EF is a measure of entanglement (in fact, an entanglement monotone) and does not
increase under the tracing out of the local system B ′ . Hence, from the two inequalities above
we get that for all x ∈ [m] we have

ABB ′
= EF ρAB

E ψx x . (14.215)

(1) (2)
Now, from Theorem 14.7.4 we get that for each x ∈ [m] there exists systems Bx and Bx ,
(1) (2)
and an isometry Vx : Bx Bx → B such that
′ (1) (2) (1) (2) ′
Bx →B
|ψxABB ⟩ = VxBx
χAB
x
x
ϕB
x
x B
, (14.216)

(1) (2) ′
for some χx ∈ Pure(ABx ) and ϕx ∈ Pure Bx B . Tracing out system B on both sides of
the equation above gives
′ B′
ψxAB = χAx ⊗ ϕx . (14.217)
′ ′ ′
Therefore, the marginal state ρAB = x∈[m] px ψxAB is separable so that E ρAB = 0. This
P
completes the proof.

14.8 Notes and References

A comprehensive review of multipartite entanglement can be found in Chapter 17 of the
book [16]. The classification of all homogeneous SL-invariant polynomials using the Schur-
Weyl duality is presented in [102].
Critical states, also known as normal forms, are discussed in detail in [220] and [226]
for mathematically inclined readers. The canonical form of 3-qubit states as given in Theo-
rem 14.4.1 is due to [1]. The classification of SLOCC classes in three qubits was done by [69],

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

692 CHAPTER 14. MULTIPARTITE ENTANGLEMENT

while that for four qubits was done by [218]. A detailed analysis of all 4-qubit maximally
entangled states can be found in [100, 203]. The classification of maximally entangled sets
of multipartite systems is presented in [59].
The generalization of Nielsen’s majorization theorem to the multipartite case, as pre-
sented in Theorem 14.5.2, is due to [101]. The generalization of this theorem to the full
set SEP can be found in [117]. Several other generalizations of this result, particularly de-
terministic interconversions of multipartite entanglement under various operations including
LOCC can be found in [204] and [60]. It is worth mentioning that in [92] and [196], it was
shown that the stabilizer group of almost all multipartite entangled states is trivial, and con-
sequently, LOCC conversion between two states in the same SLOCC class is almost never
possible. Nevertheless, certain multipartite states that have symmetry (e.g., GHZ states,
graph states, stabilizer states, etc.) have a non-trivial stabilizer group and consequently rich
entanglement properties [147].
In [117], it was demonstrated that any pure-state transformation attainable by LOCC
using a finite number of communication rounds can also be accomplished using SEP1 . How-
ever, not every pure-state transformation possible with SEP is achievable with SEP1 . These
findings underscore that SEP1 serves as a robust outer approximation of LOCC, particularly
given that infinite rounds of classical communication are less feasible in practice.
The concept of entanglement of assistance was first introduced in [65], and the example
given in (14.163) that demonstrates that it is not a tripartite entanglement monotone was
taken from [96]. The result that asymptotic entanglement of assistance is equal to the smaller
of its two local entropies was discovered in [201]. The concept of localizable entanglement was
first introduced in the context of spin chains in [219], and its comparison with entanglement
of collaboration can be found in [87].
Monogamy of entanglement was first introduced in [53], in which the CKW monogamy
relation was discovered. The monogamy of the squashed entanglement was discovered
in [50]. The concept of ”monogamy of entanglement without inequalities” was first in-
troduced in [104] and developed further in [106]. Additional references on monogamy of
entanglement can be found in those papers.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

Part V

Additional Examples of Static

Resource Theories

693
CHAPTER 15

The Resource Theory of Asymmetry

In Shannon’s theory, information is considered to be “fungible,” meaning it can be encoded

into any physical system’s degree of freedom, and the information’s content is independent of
the encoding method. For instance, a simple yes/no message can be transmitted equally well
by a 5/0-volt potential difference across a circuit element or by flipping a coin to heads/tails.
This type of information is known as “speakable information,” as it can be conveyed through
speech or symbols.
However, there are also non-fungible types of information, such as a direction in space,
the time of an event, or the relative phase between two quantum states in a superposition.
Such information is referred to as “unspeakable information” because it cannot be conveyed
verbally without a shared coordinate system, a synchronized clock, or a common phase
reference. For example, directional information can only be transmitted between two parties
through the exchange of a physical system whose state represents the direction itself, such as
a classical gyroscope, in the absence of a common gravitational field or stellar background.
Unspeakable information can be communicated verbally when a reference frame is present,
which is true for both classical and quantum information. However, despite speakable in-
formation being fungible, multiple parties must first agree on how to encode/decode this
information in a physical system, which implicitly necessitates a common reference frame.
As a result, quantum information processing tasks assume the existence of a shared reference
frame, and the lack of this shared frame significantly restricts what can be accomplished.
The absence or deterioration of a common reference frame is a natural constraint that
frequently occurs in the study of multiple physical systems. Consequently, this constraint
gives rise to a resource theory of reference frames, which can be more broadly classified as a
resource theory of asymmetry.

15.1 Free States and Free Operations

Consider two parties (Alice and Bob) who do not share a reference frame. Mathematically,
we represent the information about the frame by an element g of a compact group G. For

695
696 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

instance, g ∈ G could correspond to a particular orientation in space, clock synchronization,

phase information, etc.
In this chapter, it is assumed without explicit statement that G is either a finite group
or a compact Lie group. Additionally, readers who are not well-versed in representation
theory are advised to first read Appendix C before proceeding with this chapter. The same
notations as those in Appendix C will be utilized, and frequent references to this appendix
will be made throughout the chapter.

G-Invariant States
In the resource theory of asymmetry, each element g ∈ G is denoted by a unitary matrix
Ug . If ρ ∈ D(A) represents the density matrix of a quantum system with respect to Alice’s
reference frame, then the state of the same physical system with respect to Bob’s reference
frame is given by
Ug (ρ) := Ug ρUg∗ (15.1)
If Alice and Bob are unaware of the element g ∈ G that establishes the relation between
their reference frames, then the states that Alice can prepare relative to Bob’s reference
frame are those satisfying ρ = Ug (ρ) for all g ∈ G. Such states are referred to as G-invariant
and satisfy [ρ, Ug ] = 0, as indicated by Definition C.3.2.
The absence of a shared reference frame places a limitation on the types of states that
Alice can generate relative to Bob’s reference frame. She is only capable of creating G-
invariant states, which comprise the free states in the QRT of reference frames, denoted
as
F(A) = INVG (A) := {ρ ∈ D(A) : Ug (ρ) = ρ ∀g ∈ G} . (15.2)
For instance, suppose the group G = U (1) corresponds to an optical phase reference
or to dynamics with rotational symmetry around a fixed axis (in which case the group is
SO(2), which is known to be isomorphic to the group U (1)). In this scenario, a unitary
representation of G is provided by Uθ = eiN̂ θ , where θ ∈ U (1) and N̂ is the total number
operator (or in the case of rotational symmetry, N̂ can be replaced with Ln , the angular
momentum P operator in the n direction). In this instance, the free states are given by states
of the form n pn |n⟩⟨n|, where |n⟩ corresponds to the eigenvectors of N̂ .
More generally, it will be observed that the absence of a shared reference frame enforces
a superselection rule regarding the types of states that Alice can generate. This superselec-
tion rule is characterized by the fact that coherent superpositions between states in specific
subspaces are not feasible. For instance, coherent superpositions of U (1) states among the
eigenstates of the number operator are not free and cannot be prepared by Alice.

G-Covariant Channels
The set of free operations in the QRT of reference frames can be defined similarly to the
free states. Let σ ∈ D(B) be an arbitrary density matrix of system B described in Bob’s
reference frame. Suppose Alice performs a quantum operation on this system described

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.1. FREE STATES AND FREE OPERATIONS 697

by the channel E ∈ CPTP(A → A) in her reference frame. How would this operation be
described in Bob’s reference frame? If Bob knows that their reference frames are linked by
an element g ∈ G, then Ug∗ (σ) is Alice’s description of the initial state, and E(Ug∗ (σ)) is her
description of the final state. Therefore, the final state in Bob’s reference frame is given by
Ug ◦ E ◦ Ug∗ (σ), and his description of Alice’s operation is Ug ◦ E ◦ Ug∗ .
Hence, if Alice and Bob are unaware of the value of g ∈ G, they will have a similar
description of the CPTP map E only if E satisfies

Ug ◦ E ◦ Ug∗ = E ∀g∈G. (15.3)

Quantum channels of this kind are referred to as G-covariant, and they represent the free
operations in the QRT of asymmetry. Similar to G-invariant states, a quantum channel is
G-covariant if and only if it commutes with Ug for all g ∈ G.
Therefore, the set of free operations in the QRT of reference frames can be expressed as

F(A → A) = COVG (A → A) := {E ∈ CPTP(A → A) : [E, Ug ] = 0 ∀ g ∈ G} . (15.4)

where [E, Ug ] := E ◦ Ug − Ug ◦ E (see Fig. 15.1 below). For instance, for G = U (1), a
G-covariant quantum channel, E ∈ CPTP(A → A), satisfies for all θ ∈ [0, 2π) and all
ρ ∈ D(A)
E eiθN̂ ρe−iθN̂ = eiθN̂ E(ρ)e−iθN̂ . (15.5)

In subsequent sections, we’ll delve into characterizations of G-covariant channels, em-

ploying covariant adaptations of the operator sum representation and the Stinespring repre-
sentation of a quantum channel. Before we proceed to these broader characterizations, let’s
focus initially on the specific case of unitary channels. Consider a covariant unitary channel
E(·) = V (·)V ∗ , where V : A → A is a unitary matrix. According to condition (15.3), for
every element g ∈ G and for any state ρ ∈ D(A), the channel satisfies:
∗
Ug V Ug∗ ρA Ug V Ug∗ = V ρA V ∗ .

(15.6)

This implies that for every g ∈ G, there exists a phase ωg ∈ C with |ωg | = 1 such that

Ug V Ug∗ = ωg V . (15.7)

Since this equation holds for all g ∈ G, it follows that the map g 7→ ωg is a 1-dimensional
representation of G. Specifically, when g = e is the identity element, we have Ug = I which
gives ωg = 1. Furthermore, for g, h ∈ G, we have
∗
ωgh V = Ugh V Ugh
= Ug Uh V Uh∗ Ug∗
(15.8)
= ωh Ug V Ug∗
= ωh ωg V ,

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

698 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

which implies that ωgh = ωg ωh . In other words, the set of all G-covariant unitary chan-
nels can be characterized by unitary matrices that are “almost” G-invariant, meaning they
commute with the elements of the group up to a phase, where this phase itself forms a
1-dimensional representation of G.
In conclusion, we have observed that in the QRT of reference frames, the set of free
states is the set of symmetric states (i.e., those states that commute with Ug for all g ∈ G),
and the set of free operations is the set of symmetric operations (i.e., those operations that
commute with Ug for all g ∈ G). Symmetric evolutions are prevalent in physics and may arise
in various contexts, not just from the absence of a shared reference frame. Therefore, the
set of G-covariant operations defines a resource theory with applications extending beyond
quantum reference frames. It may be referred to as a QRT of asymmetry because in any
QRT in which F specifies a set of G-covariant operations, asymmetric states and asymmetric
operations are the resources of the theory.
Thus far, we have only examined G-covariant channels with the same input and output
dimensions. More generally, a quantum channel E : CPTP(A → B) is G-covariant with
respect to two (unitary) representations of G, {UgA }g∈G and {UgB }g∈G , if

E A→B ◦ UgA = UgB ◦ E A→B ∀g ∈ G. (15.9)

Refer to Fig. 15.1 for an illustrative depiction of G-covariant operations. The set of all
G-covariant quantum channels in CPTP(A → B) will be denoted by COVG (A → B). It is
worth noting that this notation does not explicitly specify the two unitary representations
of G, {UgA }g∈G and {UgB }g∈G . The representations used will be clear from the context.

Figure 15.1: Heuristic description of G-covariant operations. The channel E is G-covariant if for
every choice of group element g ∈ G, the blue and purple pathways yield the same outcome.

G-Covariant Measurements
The result obtained from a quantum measurement, often referred to as the classical outcome,
provides a form of information known as “speakable information.” This type of information
can be effectively communicated between parties who do not share a common reference

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.2. DISTINCTIVE CONCEPTS IN THE QRT OF ASYMMETRY 699

frame. Let’s consider an example where Alice and Bob do not have a shared Cartesian
reference frame. Suppose Alice performs a measurement on the spin of an electron in the z-
direction relative to her reference frame and obtains an outcome of “up” (indicating that the
electron’s spin is pointing in the positive z-direction). Alice can then transmit this outcome
to Bob, allowing him to determine that the electron’s spin is aligned with the positive z-
direction in relation to Alice’s frame. Therefore, even though the specific information about
the z-direction itself cannot be conveyed, the measurement outcome, i.e., the “up”/“down”
information, can be effectively communicated between the parties involved.
Consequently, we make the assumption that the group G associated with the resource
theory of asymmetry has a trivial action on classical systems that represent measurement
outcomes. Moving forward, in Section 3.5.10, we observed that a general quantum mea-
surement can be characterized by a quantum instrument denoted as E ∈ CPTP(A → BX),
where X represents the classical outcome of the measurement. We refer to E as a G-covariant
quantum instrument if it satisfies the condition:
E A→BX ◦ UgA→A = UgB→B ◦ E A→BX ∀g∈G. (15.10)
The collection of all such G-covariant quantum instruments is denoted by COVG (A → BX).
Every quantum instrument E A→BX as discussed above can be expressed as
X
E A→BX = ExA→B ⊗ |x⟩⟨x|X (15.11)
x∈[m]

where m ∈ N, and each Ex ∈ CP(A → B). If E A→BX is G-covariant the relation above in
conjunction with (15.10) implies that for all x ∈ [m] we have
ExA→B ◦ UgA→A = UgB→B ◦ ExA→B ∀g∈G. (15.12)
In other words, the quantum instrument E A→BX is G-covariant if and only if each CP map
ExA→B is G-covariant.
A special type of G-covariant quantum instrument is a G-covariant POVM. We get G-
covariant POVM by taking above B to be the trivial system (i.e., |B| = 1) so that for each
x ∈ [m] and every ρ ∈ L(A), ExA→B (ρA ) = Tr[ΛA A
x ρ ] for some Λx ∈ Eff(A) and the set
A
{Λx }x∈[m] is a POVM. Now, for a trivial system B, the condition given in (15.12) becomes
equivalent to
Tr[Λx ρ] = Tr[Λx Ug (ρ)] = Tr[Ug∗ (Λx ) ρ] ∀ g ∈ G. (15.13)
Since the condition above holds for all ρ ∈ L(A) we must have Λx = Ug∗ (Λx ) for all g ∈ G
and x ∈ [m]. In other words, a POVM {Λx }x∈[m] is G-covariant if and only if each element
Λx is G-invariant; i.e., each Λx satisfies [Λx , Ug ] = 0 for all g ∈ G.

15.2 Distinctive Concepts in the QRT of Asymmetry

In this subsection, we introduce several mathematical tools and concepts that are distinctive
to the resource theory of asymmetry and do not appear in other resource theories.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

700 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

15.2.1 The G-Twirling Operation

If Alice lacks the information about g, her description of Bob’s density matrix is obtained
by averaging over all possible values of g. The uniform Haar measure over the group G is
denoted by dg, and this average can be expressed as follows:
Z
G(ρ) := dg Ug (ρ) . (15.14)
G

The averaging CPTP map is known as the G-twirling map (see Sec. C.4). If the group G is
finite, the integral is replaced by a discrete sum over the |G| elements of the group, that is,
1
P
G(ρ) = |G| g∈G Ug (ρ).
The free states in this QRT have a very particular structure. First, note that ρ ∈ F(A)
if and only if it is G-invariant, meaning that Ug (ρ) = ρ for all g. In particular, G(ρ) = ρ
for all ρ ∈ F(A). Combining this with the definition of F(A) implies that G-twirling is a
resource-destroying map (see Definition 9.3.3). Additionally, one can characterize the free
states using techniques from representation theory. In particular, Theorem C.3.3 states that
ρ ∈ D(A) is free if and only if ρA has the following form:
M
ρA = uBλ ⊗ ρC where ρC
A A A
λ := TrBλ Π ρ Π . (15.15)
λ λ λ λ
λ
λ∈Irr(U )

Moreover, note that the above expression implies the following corollary.

Corollary 15.2.1. The G-twirling map can be decomposed as

M
RBλ ⊗ idCλ ◦ P Aλ

G= (15.16)
λ∈Irr(U )

where P Aλ (·) = Πλ (·)Πλ , ΠAλ : A → A is a projection to subspace Aλ ,

RBλ ∈ CPTP(Bλ → Bλ ) is the completely depolarizing (randomizing) channel, and
idCλ ∈ CPTP(Cλ → Cλ ) is the identity channel.

The above corollary demonstrates that the G-twirling operation eliminates any correla-
tions among distinct irreducible representations. For example, let us consider the case where
G = U(1). As this group is Abelian, it has only one-dimensional irreps (i.e., |Bλ | = 1). The
irreps of U(1) are labeled by integers λ = k ∈ Z, and the k-th irrep uk : U(1) 7→ C is of the
form:
uk (θ) = eikθ ∀θ ∈ U(1) . (15.17)
In this context, we will consider an infinite dimensional (separable) Hilbert space denoted
by A with basis vectors |n⟩ where n belongs to the set of integers Z. The “number” operator
which generates the U(1) symmetry can be defined as follows:
X
N̂ := n|n⟩⟨n| , (15.18)
n∈Z

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.2. DISTINCTIVE CONCEPTS IN THE QRT OF ASYMMETRY 701

Note that we allow negative values of n and work with the representation θ 7→ eiN̂ θ .
For each irrep on a single copy of A, the multiplicity space is trivial (i.e., |Cλ | = 1), and
the G-twirling operation can be easily represented as
X
G(·) = |k⟩⟨k|(·)|k⟩⟨k| , (15.19)
k∈Z

which means that G is the completely dephasing channel with respect to the basis |k⟩k∈Z in
this case.
However, when considering ℓ copies of A, the multiplicity space of a given irrep is usually
not trivial, and as a result, the G-twirling operation is not equivalent to the dephasing
channel. Specifically, let N̂x be the number operators associated with system Ax for each
x ∈ [ℓ]. Consider the unitary representation on An = (A1 , . . . , Aℓ ) defined by
O X
θ 7→ eiN̂x θ = eiN̂tot θ , where N̂tot := N̂x . (15.20)
x∈[ℓ] x∈[ℓ]

In this case, the irreps are denoted by the eigenvalues n ∈ Z of the total number operator
N̂ tot. While the representation space Bn (i.e., Bλ with λ = n) is trivial (i.e., one dimensional)
for every irrep λ = n, the multiplicity space Cn is not. Let
X
Π(ℓ)
n := |k1 ⟩⟨k1 | ⊗ · · · ⊗ |kℓ ⟩⟨kℓ | (15.21)
k1 +···+kℓ =n
k1 ,...,kℓ ∈Z

be the projection onto the eigenspace of N̂tot corresponding to the eigenvalue n. Using this
notation, the G-twirling operation can be expressed as
X
Gℓ (·) = Π(ℓ) (ℓ)
n (·)Πn . (15.22)
n∈Z

(ℓ)
Note that Πn is the projection onto the multiplicity space Cn . This space is often referred
to as the decoherence-free subspace, as any pure state ψ ∈ Pure(Cn ) is U (1)-invariant, i.e.,
G(ψ) = ψ. For example, if ℓ = 3, any linear combination of |011⟩, |101⟩, and |110⟩ is an
eigenvector of N̂tot corresponding to an eigenvalue of 2. Therefore, the coherence of any state
in the span of these three vectors remains unaffected by the G-twirling operation.
The G-twirling operation can be applied to quantum channels as well. Suppose Alice
applies a quantum operation E ∈ L(A → A) to her system. If Bob knows the relation
between their reference frames, then he can describe the operation relative to his own system
as Ug ◦ E ◦ Ug∗ , where Ug is a unitary operator that relates Alice’s and Bob’s frames. However,
in the absence of a shared reference frame, Bob cannot use this description. Instead, the
channel E appears to him as a mixture of the form G dg Ug ◦ E ◦ Ug† .RIn order for Alice and
R

Bob to have the same description of the channel, the condition E = G dg Ug ◦ E ◦ Ug† must
be satisfied. This integral is a type of twirling operation applied to the channel E.
Exercise 15.2.1. Show that the G-twirling map is unital and idempotent; i.e. G ◦ G = G.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

702 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

For compact Lie groups the G-twirling is defined in terms of an integral over the group.
From Carathéodory theorem (see Theorem A.3.2) it follows that the G-twirling can be
expressed as a finite convex combination of unitary channels Ug . To see why, for every

g ∈ G let |ψgAÃ ⟩ := UgA ⊗ I Ã ΩAÃ , and let C be the convex hull of the set ψgAÃ g∈G . Note
that C ⊂ R, where R is a subspace of Herm(AÃ) given by
n o
R := ΛAÃ ∈ Herm(AÃ) : ΛA ∝ I A , ΛÃ ∝ I Ã , (15.23)

where the symbol ∝ stands for ‘proportional to’.

Since we assume that G is a compact Lie group, the set ψgAÃ g∈G is also compact in
R. Consequently, also its convex hull is compact (this follows from Carathéodory’s theorem,
see Exercise A.3.4). Therefore, the Choi matrix of the G, JGAÃ , also belongs to C, and from
Carathéodory’s
AÃ theorem it can be expressed as a convex combination of at most d elements
of ψg g∈G , where d := dim(R) + 1. We therefore conclude that there exists p ∈ Prob(d)
and d group elements {gx }x∈[d] ⊂ G such that
X
JGAÃ = px ψgAxÃ . (15.24)
x∈[d]

In other words, G can be expressed as

X
G(·) = px Ugx (·)Ug∗x . (15.25)
x∈[d]

Exercise 15.2.2. Let m := |A|. Show that

d := dim(R) + 1 = (m2 − 1)2 + 2 ⩽ m4 . (15.26)

Exercise 15.2.3. Let Gk ∈ CPTP(Ak → Ak ) be the G-twirling map as defined on k-copies

of A. That is, for any ρ ∈ D(Ak ) we have
Z
Ak k ∗
Gk (ρ ) := dg Ug⊗k ρA Ug⊗k . (15.27)
G

Show that
G1⊗k ◦ Gk = G1⊗k . (15.28)
Exercise 15.2.4. Let ρ ∈ D(A) and α ∈ R+ . Show that if ρ is G-invariant then also
ρα /Tr[ρα ]. Hint: Use (15.15).
Exercise 15.2.5. Let g 7→ Ug be a projective unitary representation of a finite or compact
(λ)
Lie group G. For each λ ∈ Irr(U ), let Ug be the reduction of Ug to the space Bλ as given
in (C.45). Show that for every ρ ∈ L(Bλ ) we have
Z
dgUg(λ) ρBλ Ug∗(λ) = Tr ρBλ I Bλ .

(15.29)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.2. DISTINCTIVE CONCEPTS IN THE QRT OF ASYMMETRY 703

The Weighted G-Twirling

The weighted G-twirling is a variant of the G-twirling that also plays an important role in
the resource theory of asymmetry. It is defined as follows:
R Let p : G → R+ : g 7→ p(g) a
probability distribution that is normalized such that G dg p(g) = 1. With respect to this
distribution, the weighted G-twirling is defined as
Z
Gp (·) := dg p(g)Ug (·)Ug∗ . (15.30)
G

When p is uniform, i.e., p(g) = 1, we get that Gp = G.

It is worth noting that by choosing different probability densities p, we can obtain any
convex combination of the unitary channels Ug (·) := Ug (·)Ug∗ . Additionally, if ρ ∈ INVG (A),
then Gp (ρ) = ρ. However, the converse is not true; that is, in general, there exists a
density matrix ρ ̸∈ INVG (A) such that Gp (ρ) = ρ. One such example is obtained by taking
p(g) = δ(g) to be the Dirac delta function (i.e., p(g) = 0 for all g ̸= e) so that Gp = idA is
the identity channel. In this extreme example, every resource state ρ ̸∈ INVG (A) satisfies
Gp (ρ) = ρ.
Unlike like the G-twirling, the weighted G-twirling is not necessarily G-covariant. This
depends on the group G and on the distribution p(g). For example, for abelian group the
weighted G-twirling is covariant since Gp ◦ Ug = Ug ◦ Gp for all g ∈ G. For non-abelian
groups, the weighted G-twirling is G-covariant if p(g) is a class function as introduced in
Definition C.6.2.

Theorem 15.2.1. Let p : G → R+ : g 7→ p(g) be a normalized probability

distribution. If p(g) is a class function then Gp is G-covariant.

Proof. Let h ∈ G and ρ ∈ L(A). Then,

Z
Uh∗ ◦ Gp ◦ Uh (ρ) = dg p(g)Uh−1 Ug Uh ρUh∗ Ug∗ Uh∗−1
ZG
∗
= dg p(g)Uh−1 gh ρUhgh−1

ZG (15.31)
g ′ := hgh−1 −−−−→ = dg ′ p(hg ′ h−1 )Ug′ ρUg∗′
ZG
p is a class function→ = dg ′ p(g ′ )Ug′ ρUg∗′ = Gp (ρ) .
G

Since ρ ∈ L(A) was arbitrary we conclude that Uh∗ ◦ Gp ◦ Uh = Gp for all h ∈ G. This
completes the proof.
In the definition of the weighted G-twirling we assumed that p(g) is an arbitrary prob-
ability density over G. However, as we saw earlier, thanks to Carathéodory’s theorem, the
G-twirling can be expressed as a finite convex combination of unitary channels of the form

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

704 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

Ug (·)Ug∗ . The same arguments can be applied to Gp , so we can assume, without loss of
generality, that p is a discrete probability distribution, i.e., p ∈ Prob(d) for some d ⩽ m4 ,
where m := |A|. Hence, the weighted G-twirling takes the form:
X
Gp (·) = px Ugx (·)Ug∗x . (15.32)
x∈[d]

Exercise 15.2.6. Give an example of a group G, a probability distribution p(g) ̸= δ(g), and
a state ρ ∈ D(A) such that Gp (ρ) = ρ but ρ ̸∈ INVG (A).

15.2.2 Three Representations of G-Covariant Maps

Covariant Operator Sum Representation
In this subsection, we introduce the concept of an irreducible tensor operator and use it to
characterize the operator sum representation of a covariant channel. The irreducible tensor
operator is defined with respect to a projective representation, g 7→ UgE , of the group G.
This projective representation induces a particular structure on the Hilbert space E (see
Theorem C.3.2). Specifically, E can be decomposed as
M
E= Bλ ⊗ Cλ (15.33)
λ∈Irr(U E )

where Bλ is an irreducible G-invariant subspace of A, and Cλ is the multiplicity subspace.

According to Theorem C.3.2 for all g ∈ G we can also decompose UgE as

UgE ∼
M
= Ug(λ) ⊗ I Cλ , (15.34)
λ∈Irr(U E )

(λ) (λ)
where each Ug acts irreducibly on Bλ . We will denote by um′ m (g) the m′ m-component of
(λ)
Ug , and use the indices λ, m, and x to label the basis of E as given in (C.46). The index
x corresponds to the multiplicity index.

Definition 15.2.1. Let g 7→ UgE be as above, and let g 7→ UgA and g 7→ UgB be two
additional projective unitary representations of G. We say that a set of operators
{Kλ,m,x }λ,m,x ⊂ L(A, B) is an irreducible tensor operator with respect to the three
projective unitary representations on systems A, B, and E, if its elements are
orthonormal and satisfy for all λ, m, and x,
X (λ)
UgB Kλ,m,x Ug∗A = um′ m (g)Kλ,m′ ,x ∀g∈G, (15.35)
m′

(λ) (λ)
where um′ m (g) is m′ m component of the matrix Ug that appear in (15.34).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.2. DISTINCTIVE CONCEPTS IN THE QRT OF ASYMMETRY 705

Remark. The orthonormality condition for an irreducible tensor operator is defined in terms
of the Hilbert-Schmidt inner product. Specifically, we make the assumption that:

Tr Kλ∗′ ,m′ ,x′ Kλ,m,x = δλλ′ δmm′ δxx′ .

(15.36)

The condition stated in Equation (15.35) imposes constraints not only on the elements
of the irreducible tensor operator, but also on the three representations UgA , UgB , and UgE .
This can be illustrated by the following exercise, where it is shown that the cocycle of the
map g 7→ UgE is entirely determined by the cocycles of g 7→ UgA and g 7→ UgB .
7 UgA and g 7→ UgB have cocycles
Exercise 15.2.7. Suppose the representations g →
n A o n B o
iθ (g,h) iθ (g,h)
e and e , (15.37)
g,h∈G g,h∈G

respectively. Show that if (15.35) holds then g 7→ UgE has a cocycle

n B o
i(θ (g,h)−θA (g,h))
e . (15.38)
g,h∈G

It is important to note that if A = B in the definition given above, then the exercise
shows that the representation g 7→ UgE is non-projective. Additionally, we emphasize that
the irreps λ ∈ Irr(U E ) used in the definition of the irreducible tensor operator may not
necessarily be the same irreps that appear in the decompositions of UgA or UgB . Therefore,
the dimension of the system E = span{|λ, m, x⟩E } depends on the irreps λ ∈ Irr(U E ) that
(λ) (λ)
appear in the decomposition of UgE . Specifically, the components umm′ (g) of Ug appear in
the decomposition of UgE and not in the decompositions of UgA or UgB .

Theorem 15.2.2. Let E ∈ L(A → B) be trace preserving. The map

E ∈ COVG (A → B) if and only if there exists a projective unitary representation
g 7→ UgE of G, and a corresponding orthonormal set of irreducible tensor operators
{Kλ,m,x }λ,m,x in L(A, B) such that
X
∗
E(ρ) = Kλ,m,x ρKλ,m,x ∀ ρ ∈ L(A) . (15.39)
λ,m,x

Proof. We first prove that the channel given in (15.39) is G-covariant. Indeed, for any
ρ ∈ L(A) we have
X ∗
UgB ◦ E ◦ Ug∗A (ρ) = UgB Kλ,m,x Ug∗A ρ UgB Kλ,m,x Ug∗A
λ,m,x
X (λ) (λ) ∗
(15.35)→ = ukm (g)ūk′ m (g)Kλ,k,x ρKλ,k ′ ,x
(15.40)
λ,m,x,k,k′
X
∗
Ug(λ) is a unitary matrix→ = δkk′ Kλ,k,x ρKλ,k ′ ,x = E(ρ) .

λ,x,k,k′

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

706 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

Hence, E is G-covariant.
Conversely, suppose E is a G-covariant quantum channel. Let {Kx }x∈[n] ∈ L(A, B) be a
canonical Kraus decomposition of E (see Corollary 3.4.2). Since E is G-covariant it follows
that for all g ∈ G and ρ ∈ L(A)
X ∗
E(ρ) = UgB ◦ E ◦ UgA∗ (ρ) = UgB Kx Ug∗A ρ UgB Kx Ug∗A .

(15.41)
x∈[n]

Therefore, the set {UgB Kx Ug∗A }x∈[n] also form a canonical Kraus decomposition of E. Now,
recall from Sec. 3.4.4 that every two operator sum representations of E that have the same
number of elements are related by a unitary
matrix. Therefore, for any g ∈ G there exists
E
an n × n unitary matrix Ug = uxz (g) ∈ L(E) with n := |E| such that for all x ∈ [n], we
have X
UgB Kx Ug∗A = uzx (g)Kz . (15.42)
z∈[n]

Furthermore, since {Kz }z∈[n] are linearly independent (as they are orthonormal in the Hilbert-
Schmidt inner product), it follows that for every g ∈ G, there is a unique UgE that satisfies
the equation (15.42). Additionally, using the notation given in (15.37) for the cocycles, we
get that for all g, h ∈ G

= ei(θ (g,h)−θ (g,h)) UgB UhB Kx Uh∗A Ug∗A

B A
B ∗A
Ugh Kx Ugh
(15.42)→ = ei(θ (g,h)−θ (g,h))
B A
X
u (h)U B K U ∗A zx g z g
z∈[n]

Using (15.42) once more→ = ei(θ

B (g,h)−θ A (g,h)
)
X
uzx (h)uz′ z (g)Kz′ (15.43)
z,z ′ ∈[n]

= ei(θ )
B (g,h)−θ A (g,h)
X
(UgE UhE )z′ x Kz′
z ′ ∈[n]

We therefore conclude that

= ei(θ )U E U E .
B (g,h)−θ A (g,h)
E
Ugh g h (15.44)

That is, the mapping g 7→ UgE is a projective unitary representation of the group G. Finally,
using the unitary freedom in the choice of the canonical Kraus decomposition {Kz }z∈[n] of
E (see Exercise 3.4.21), we choose it in such a way that UgE is block-diagonal with respect
L (λ)
to the irreps of G. In this basis, UgE = λ Ug ⊗ I Cλ so we can denote the Kraus operators
by {Kλ,m,x } (with λ the irrep label and x the multiplicity index). This completes the
proof.

Exercise 15.2.8. Extend the theorem above to CP maps that are not necessarily trace pre-
serving. That is, show that E ∈ CP(A → B) is G-covariant if and only if it can be expressed
as in (15.39).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.2. DISTINCTIVE CONCEPTS IN THE QRT OF ASYMMETRY 707

To illustrate the theorem above, we will provide a few examples. Let’s begin with the case
of a covariant unitary channel. As we mentioned earlier, a unitary channel E(·) = V (·)V ∗ ,
where V : A → A is a unitary matrix, is covariant if and only if (15.7) holds for all g ∈ G.
Here, g 7→ ωg is a 1-dimensional unitary representation of G. As we will illustrate now, the
theorem mentioned above can be used to derive the same conclusion.
Indeed, since a unitary channel has only one Kraus operator, the unitary representation
g 7→ UgE must be an irreducible representation (therefore, a single λ), one-dimensional (thus,
a single m), and with no multiplicity (a single x). This implies that |E| = 1, so we have
UgE = ωg for some ωg ∈ C where |ωg | = 1. Let us denote the single Kraus operator of E by
V = Kλ,m,x . In this case, the relation (15.35) can be expressed as follows:

UgA V Ug∗A = ωg V . (15.45)

It’s worth noting that if there exists g ∈ G such that ωg ̸= 1 (i.e., V is not G-invariant),
then G(V ) = 0. To see why, consider taking the integral over G (with respect to the Haar
measure) on both sides of the equation above:
Z
G(V ) = cV where c := dg ωg . (15.46)
G

If c ̸= 0, then we have V = 1c G(V ) = G 1c V , which is G-invariant. This implies that ωg = 1

for all g ∈ G, and c = 1 in this case. Therefore, if V is not G-invariant, we must have c = 0,
and consequently, G(V ) = 0.
As another example, let’s consider the group U (1). The Kraus operators of a U (1)
covariant channel can be labeled as Kk,α ∈ L(A), where α is the multiplicity index. Then,
from (15.35) we get that

eiθN̂ Kk,α e−iθN̂ = eikθ Kk,α ∀ θ ∈ U (1) . (15.47)

Note that the irreducible representations of U (1) are one-dimensional. As a result, the
Kraus operators are not mixed with one another under the action of U (1). This provides a
significant simplification compared to the non-Abelian case.
Any Kraus operator Kk,α that satisfies (15.47) must have the form (see Exercise 15.2.9)

Kk,α = Sk Dk,α , (15.48)

where X
Sk := |n + k⟩⟨n| , (15.49)
n∈Z

is the “shift” operator, and Dk,α are diagonal operators in L(A); i.e., ⟨n|Dk,α |n′ ⟩ = 0 for
n ̸= n′ . Note that in the infinite-dimensional Hilbert space A, the shift operator Sk is unitary.
Therefore, in the QRT of U (1)-asymmetry, the set of free unitary operations consists of
diagonal unitaries, shift operators, and combinations of the two.
(k,α)
Exercise 15.2.9. Prove (15.48). Hint: Substitute Kk,α = n,n′ cnn′ |n⟩⟨n′ | in (15.47).
P

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

708 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

Exercise 15.2.10. Let G be the U(1)-twirling map, and Sk the shift operator. Show that Sk
is not U(1)-invariant by showing that G(Sk ) = 0. Still, we emphasize that E(·) = Sk (·)Sk∗ is
U(1)-covariant.

Exercise 15.2.11. Let A be an n-dimensional Hilbert space, Zn be the cyclic group of n-

elements, and let for any k ∈ Zn let
X
Gk := |x + k (mod n)⟩⟨x| . (15.50)
x∈[n]

Find the general form of the Kraus operators constituting the operator sum representation of
a Zn -covariant channel with respect to the representation k 7→ Gk .

Theorem 15.2.2 demonstrates that every G-covariant channel E ∈ COVG (A → B) in-

duces an auxiliary system E and a projective unitary representation g 7→ Ug ∈ L(E) that
corresponds to its Kraus decomposition with irreducible tensor operators. The terms “in-
duced space” and “induced representation” can be used to refer to E and g 7→ UgE , respec-
tively. These concepts are important in the study of symmetry in physics, as they enable
us to understand the structure of G-covariant channels in terms of irreducible tensor oper-
ators. Furthermore, the induced space E appears in the covariant version of Stinespring’s
representation.

Covariant Stinespring Representation

In Theorem 3.4.6, we learned that it is possible to represent every quantum channel as the
action of an isometry followed by a partial trace. In this context, we now demonstrate
that the Stinespring representation of a covariant channel involves an isometry that is itself
G-invariant.

Theorem 15.2.3. Let E ∈ L(A → B) be a linear map. The map

E ∈ COVG (A → B) if and only if there exists a system E, a projective unitary
representation g 7→ UgE , and an (intertwiner) isometry V : A → BE such that
E(ρ) = TrE [V ρV ∗ ] and for all g ∈ G
E

UgB ⊗ U g V = V UgA . (15.51)

Remark. If E is G-covariant then in the proof below we will see that the representation g 7→
E T
UgE is given by the induced representation of E. The components of the matrix U g = Ug∗E
E
equals the complex conjugate of the corresponding components of UgE . Recall that g 7→ U g
is also a projective unitary representation (see Exercise C.3.1).

Proof. Suppose E has the form E(ρ) = TrE [V ρV ∗ ], where the isometry V satisfies (15.51).
Using the standard Stinesprings dilation theorem, we know that E ∈ CPTP(A → B).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.2. DISTINCTIVE CONCEPTS IN THE QRT OF ASYMMETRY 709

Moreover, for all g ∈ G and all ρ ∈ L(A) we have

E A→B ◦ UgA (ρ) = TrE V UgA (ρ)V ∗

h E
i
(15.51)→ = TrE UgB ⊗ U g (V ρV ∗ )
(15.52)
= UgB (TrE [(V ρV ∗ )])
= UgB ◦ E A→B (ρ) ,

E E
where U g (·) := U g (·)(UgE )T .
Conversely, suppose E ∈ COVG (A → B). Let {Kλ,m,x } be its canonical covariant Kraus
decomposition as given in Theorem 15.2.2, and define the isometry V : A → BE as
X
V := Kλ,m,x ⊗ |λ, m, x⟩E (15.53)
λ,m,x

where E := span{|λ, m, x⟩E } is the induced space of E decomposed according to the irreps
of G that appear in the induced representation of E. By definition, for all ρ ∈ L(A) we have
X
TrE [V ρV ∗ ] = ∗
Kλ,m,x ρKλ,m,x
λ,m,x (15.54)
(15.39)→ = E(ρ) .

Therefore, it is left to show that V satisfies (15.51). Indeed, taking g 7→ UgE to be the induced
representation of E we get
E
X E
UgB ⊗ U g V Ug∗A = UgB Kλ,m,x Ug∗A ⊗ U g |λ, m, x⟩E
λ,m,x
XX (λ) E
(15.35)→ = um′ m (g)Kλ,m′ ,x ⊗ U g |λ, m, x⟩E
λ,x m,m′
(15.55)
P (λ) XX E T
u ′ (g)|λ,m,x⟩ ′
m
m m
= UgE
T
|λ,m′ ,x⟩
−−−−→ = Kλ,m′ ,x ⊗ Ug UgE |λ, m , x⟩ E

λ,x m′
T
Ug
E
UgE = IE −−−−→ = V .

Therefore, V is an intertwiner isometry.

When considering a channel E ∈ COVG (A → A), a slightly different version of the covari-
ant Stinespring dilation theorem is obtained. Note that in this case, not only is the output
system B replaced with the input system A, but the same projective unitary representation
is also considered on both the input and output systems of E. This enables us to obtain a
covariant Stinespring dilation theorem that involves a G-invariant unitary matrix.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

710 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

Theorem 15.2.4. Let E ∈ L(A → A) be a linear map. The map E is a G-covariant

quantum channel, i.e. E ∈ COVG (A → A), if and only if there exists a system E,
G-invariat state |0⟩⟨0| ∈ Pure(E), and G-invariant unitary W : AE → AE such that

E A→A (ρA ) = TrE W AE ρA ⊗ |0⟩⟨0|E W ∗AE .

(15.56)

Remark. The matrix W AE in the theorem above is G-invariant with respect to the projective
E
unitary representation g 7→ UgA ⊗ U g , where the representation g 7→ UgE is the induced
representation of E. Moreover, from Theorem C.4.2 it follows that a G-invariant pure state
always exists. To see it, using the same notations as in Theorem C.4.2 we get for all ψ ∈
Pure(E) the vector Π|ψ⟩ is proportional to a G-invariant state.
Proof. The proof that (15.56) implies that E is G-covariant follows similar lines as the ones
appear in the proof of Theorem 15.2.3 (we leave the details to Exercise 15.2.12). For the
converse, if E ∈ COVG (A → A), Theorem 15.2.3 states that there exists an intertwiner
isometry V : A → AE such that
E

UgA ⊗ U g V = V UgA and E(ρ) = TrE [V ρV ∗ ] . (15.57)

Now, let |0⟩ ∈ E be a G-invariant state and define Ã := {|ψ A ⟩ ⊗ |0⟩E : |ψ⟩ ∈ A}. Clearly
Ã is a subspace of AE and we define the isometry Ṽ : Ã → AE via

Ṽ |ψ⟩A |0⟩E := V |ψ⟩A . (15.58)

With this definition we have

h i
E(ρ) = TrE Ṽ ρA ⊗ |0⟩⟨0|E Ṽ ∗ (15.59)

and from Exercise 15.2.13 it follows that Ṽ is a G-invariant isometry. Denote by Π :=

I A ⊗ |0⟩⟨0|E the projection (in AE) to Ã . Then, Theorem C.9.1 states that there exists
a G-invariant unitary W ∈ U(AE) such that Ṽ Π = W Π. We therefore get from (15.59)
that (15.56) must hold with this choice of W .
Exercise 15.2.12. Show that if W : AE → AE is G-invariant unitary matrix and |0⟩⟨0| ∈
Pure(E) is G invariant state then the map E as defined in (15.56) is G-covariant.
Exercise 15.2.13. Show that the operator Ṽ as defined in (15.58) is a G-invariant isometry.
Hint: Show that the operator Ṽ Π : AE → AE with Π := I A ⊗ |0⟩⟨0|E is G-invariant.

15.2.3 Alignment of References Frames

As we discussed earlier, when parties are separated, they often need to establish a common
reference frame. For example, they may need to synchronize their clocks or align their
Cartesian frames. While lacking a shared reference frame doesn’t completely hinder tasks

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.2. DISTINCTIVE CONCEPTS IN THE QRT OF ASYMMETRY 711

such as communication and computation, it does impose limitations, reducing their practical
efficiency. This often requires more advanced encodings. Hence, parties may prioritize
allocating communication resources to establish a shared reference frame initially. Later,
they can utilize a standard encoding instead of continuously circumventing its absence with
a relational encoding.
In tasks aimed at establishing a shared reference frame, parties can employ quantum par-
ticles to encode information regarding the relative orientation of their frames. For instance,
spin-1/2 particles, like electrons, can encode the orientation of Cartesian frames, while ex-
changing quantum states of an optical mode can align phase references. Hence, in the realm
of quantum reference frames, which involve quantum particles holding information about a
shared reference frame, the usefulness of a quantum state is determined by the amount of
information that can be extracted from it to establish such a reference frame.
The above discussion illustrates that the resource theory of quantum reference frames
introduces certain aspects that differ from what we have encountered thus far. Specifically,
when Alice and Bob do not share a reference frame, Alice can gain at least partial information
about Bob’s reference frame by receiving a resource in the form of a quantum state that
encodes it. As a result, the set of free operations (i.e., G-covariant operations) needs to
be updated to incorporate this partial information. For instance, instead of using regular
G-twirling operations, weighted G-twirling operations can be employed, taking into account
that the parties have partial knowledge about the element g ∈ G that relates their reference
frames.
We now discuss the general approach to align reference frames making the notions dis-
cussed above rigorous. Consider two parties, Alice and Bob, who doesn’t share a reference
frame with G being the corresponding group describing the reference frame. The goal is for
Bob to learn the element g ∈ G that relates between his reference frame and Alice’s refer-
ence frame. To accomplish that, Alice sends Bob a quantum reference frame (e.g., spin-1/2
particles pointing in the z-direction of her Cartesian reference frame) in a form of a quan-
tum state ρ ∈ D(A). From Bob’s perspective, he received one of the states {Ug ρUg∗ }g∈G , all
occurring with uniform prior.
To determine the specific state he possesses, i.e., to identify the group element g ∈ G,
Bob conducts a POVM, {Λg }g∈G , on his system. Consequently, the probability that Bob
guesses the group element as g ′ , given the actual element is g, is denoted by:
q(g ′ |g) := Tr Λg′ Ug ρUg∗ .

(15.60)
In order to quantify how much information Bob gained after the measurement, consider first
the case that G is a finite group. In this senario, we can use the probability that Bob guess
g correctly as our figure of merit and maximize this function over all states and all POVMs.
For a given state ρ ∈ D(A) and a POVM {Λg }g∈G , this probability is given by:
1 X 1 X
Tr Λg Ug ρUg∗ .

Prguess (ρ, {Λg }g∈G ) := p(g|g) = (15.61)
|G| g∈G |G| g∈G

Thus, Alice and Bob’s objective is to maximize this guessing probability across all possible
ρ (referred to as the fiducial state) and all POVMs {Λg }g∈G .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

712 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

Conversely, if G is a compact Lie group, the chance of Bob correctly inferring Alice’s
reference frame becomes infinitesimally small. In such instances, the direct likelihood or
guessing probability cannot serve as an effective figure of merit. Instead, the maximum
likelihood of a correct guess is adopted as the figure of merit. This maximum likelihood, akin
to the formula above but integrated over the group, is defined as:
Z Z
dg Tr Λg Ug ρUg∗ ,

µmax := max dg p(g|g) = max (15.62)
G G

with the maximization conducted over all fiducial states ρ ∈ D(A) and all POVMs {Λg }g∈G .
Given that the guessing probability in (15.62) is linear in ρ, the maximal value can always
be achieved with a pure state, allowing us to assume, for simplification, that the fiducial
state ρ = ψ is pure.
L From Theorem C.3.2 it follows that the Hilbert space A can be decomposed as A =
λ∈Irr(U ) Bλ ⊗ Cλ , where for each irrep λ, Bλ denotes the representation space, and Cλ
denotes the multiplicity space. We will denote by dλ := |Bλ | and mλ := |Cλ |. With these
notations we can write any pure state ψ ∈ Pure(A) as
M
|ψ⟩ = cλ |ψλ ⟩ (15.63)
λ∈Irr(U )

where each ψλ ∈ Pure(Bλ Cλ ), and cλ ∈ C with λ∈Irr(U ) |cλ |2 = 1. In the following theorem
P

we consider a projective unitary representation, g 7→ UgA , and denote by dλ := |Bλ | and mλ :=

|Cλ | the dimensions of the representation and multiplicity spaces (respectively) associated
with the irrep λ ∈ Irr(U ). We also use the notation nλ := min{mλ , dλ } for all λ ∈ Irr(U ).

Theorem 15.2.5. Using the notations above, the maximum likelihood µmax as
defined in (15.62) is given by
X
µmax = dλ nλ . (15.64)
λ∈Irr(U )

Proof. Let ψ and {Λg }g∈G be the optimal state and POVM that maximizes the maximum
likelihood. Expressing ψ as in (15.63), observe that due i Schmidt decomposition of |ψλ ⟩
h to the
there exists an orthogonal projector Πλ such that Tr Πλ = nλ and I Bλ ⊗ ΠC
Cλ Cλ
λ |ψλ ⟩ = |ψλ ⟩.
λ

Therefore, the operator M

Π := I Bλ ⊗ ΠCλ
λ
(15.65)
λ∈Irr(U )
P
satisfies Π|ψ⟩ = |ψ⟩ and Tr[Π] = λ∈Irr(U ) dλ nλ . From (C.45) we also get that [Π, Ug ] = 0
for all g ∈ G. Therefore,
Ug ψUg∗ = Ug ΠψΠUg∗
[Π, Ug ] = 0 −−−−→ = ΠUg ψUg∗ Π (15.66)
Ug ψUg∗ ⩽ I A −−−−→ ⩽ Π .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.2. DISTINCTIVE CONCEPTS IN THE QRT OF ASYMMETRY 713

Substituting the above inequality into (15.62) we get that the maximum likelihood is bounded
from above by: Z
µmax ⩽ max dg Tr [Λg Π]
G
Z X (15.67)
dg Λg = I −−−−→ = Tr[Π] = dλ nλ .
G
λ∈Irr(U )

For the converse, take ρ = ψ to be the pure state

1 M p 1 X
|ψ⟩ = √ dλ nλ |Φλ ⟩ where |Φλ ⟩ := √ |x⟩Bλ |x⟩Cλ , (15.68)
ν nλ
λ∈Irr(U ) x∈[nλ ]

dλ nλ . For the POVM we define Λg := νUg ψUg∗ , and observe that

P
and ν := λ∈Irr(U )
Z
dg Λg = νG(ψ)
G
M
Corollary 15.2.1→ = dλ nλ (RBλ ⊗ idCλ )(Φλ )
(15.69)
λ∈Irr(U )
M
= I Bλ ⊗ ΠC
λ ,
λ

λ∈Irr(U )

where ΠC Cλ
P
λ := x∈[nλ ] |x⟩⟨x| . Let Π be the projector appearing on the right hand side of
λ

the equation above and observe that it satisfies Π|ψ⟩ = |ψ⟩ and [Ug , Π] = 0. Therefore, the
set {I A − Π} ∪ {Λg }g∈G is a POVM. Moreover, the measurement outcome corresponding to
the element I A − Π occur with probability

Tr I A − Π Ug ψUg∗ = Tr Ug I A − Π ψUg∗ = 0

∀ g ∈ G. (15.70)

In other words, the outcome corresponding to I A − Π never occur. We therefore conclude

that for this choice of POVM and fiducial state the maximum likelihood is given by
Z Z
∗
dgTr ψ 2 = ν .

dgTr Λg Ug ψUg = ν (15.71)
G G

Hence µmax ⩾ ν and since we already saw that µmax ⩽ ν we conclude that µmax = ν. This
completes the proof.

Example: Alignment of Cartesian Frame

As an example, we explore the alignment of an entire three-dimensional coordinate system
using the communication of 1/2-spin particles. Suppose the Cartesian frames of Alice and
Bob are related by a group element g ∈ SO(3). Every group element of SO(3) represents
a rotation by some angle θ and along a unit vector n ∈ R3 . In Sec. 2.4.1 we studied the
1
representation of SO(3) in C2 and showed that g := (θ, n) 7→ Ug := e 2 θn·σ is a unitary
representation of SO(3) on A := C2 .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

714 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

⊗n
Here we are interested in the representation of SO(3) on the space An := (C2 ) for
some integer n. We extend here the group of rotations SO(3) to the group SU (2) to allow
for spinor representations. For simplicity, we will assume that n is even, and use some well-
known results from representation theory. Specifically, the representation g 7→ Ug⊗n can be
decomposed into a direct sum of SU (2) irreps, labeled by the total angular momentum j
ranging from 0 to n/2. The decomposition (C.44), for the representation of SU (2) on An ,
has been extensively studied in representation theory, and is given by
n/2
M
n
A = Bj ⊗ Cj (15.72)
j=0

where |Bj | = 2j + 1 and

n 2j + 1
|Cj | = . (15.73)
n/2 − j n/2 + j + 1
From the formula above we see that |Cj | ⩾ |Bj | for all j except the case j = n/2 for which
|Bj | ⩾ |Cj |. Therefore, according to Theorem 15.2.5, the maximum likelihood of a correct
guess is given by
n/2−1
X 1 5
µmax = (2j + 1)2 + (n + 1) = n3 + n + 1 . (15.74)
j=0
6 6

Moreover, in this case the optimal state (15.68) that achieves the maximum likelihood above
is given by  
n/2−1
1 X √
|ψ⟩ = √ (2j + 1)|Φj ⟩ + n + 1|n/2, n/2⟩ . (15.75)
µmax j=0

where
j
1 X
|Φj ⟩ := √ |j, m⟩Bj |ϕC
m⟩
j
(15.76)
2j + 1 m=−j
C
and {|ϕmj ⟩}m∈{−j,...,j} is an orthonormal set of vectors in Cj .
As a specific example, suppose n = 2. In this case we have two irreps, corresponding to
total angular momentum j = 0 and j = 1. In this case |Cj | = 1 for both j = 0 and j = 1,
and the representation space B0 = span{|0, 0⟩} is one dimensional spanned by the singlet
state
1
|0, 0⟩B0 := |ΨA Ã
− ⟩ := √ (|01⟩ − |10⟩) (15.77)
2
whereas B1 is three dimensional spanned by the triplet states |1, 1⟩B1 := |0⟩A |0⟩A , |1, 0⟩B1 =
|ΨA Ã A A
+ ⟩, and |1, −1⟩ := |1⟩ |1⟩ . Therefore, the formula above implies to the two 1/2-spin
particle state
1 √ 1 √
|ψ⟩ = |0, 0⟩B0 + 3|1, 1⟩B1 = |ΨA−
Ã
⟩ + 3|11⟩AÃ
(15.78)
2 2

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.3. QUANTIFICATION OF ASYMMETRY 715

achieves the largest maximum likelihood µmax = 4 as defined in (15.62). It is worth pointing
out that the state above is not unique, and replacing |1, 1⟩B1 with any other normalized state
in B1 would still give the maximum likelihood µmax = 4.

Exercise 15.2.14. Let φ ∈ Pure(B1 ) be a pure state in the triplet space of two spin-1/2
particles. Show that there exists a POVM {Λg }g∈G such that the state

1 √ B Z
B0
dg Tr Λg Ug ρUg∗ = 4 .

|ψ⟩ = |0, 0⟩ + 3|φ ⟩ 1
satisfies (15.79)
2 G

15.3 Quantification of Asymmetry

The reformulation of symmetric dynamics in the context of a resource theory has significant
implications. Often, the dynamics of a system can be so complex that a complete character-
ization of its evolution becomes impractical. Instead, by understanding the symmetries of
the Hamiltonian, partial information about its dynamics can be gained. Noether’s Theorem
is one way to do this, as it states that a differentiable symmetry of the action of a physical
system has a corresponding conservation law. However, Noether’s theorem is not applicable
to open systems, and as a result, it does not capture all the consequences of symmetric
evolution of mixed states.
Recently, it was demonstrated that the QRT of asymmetry provides a systematic ap-
proach to capturing all the outcomes of symmetric evolution. The fundamental idea is that
the conserved quantities of closed systems can be substituted with resource monotones in
open systems. These resource monotones measure the degree of asymmetry in a quantum
state, and they cannot increase under symmetric evolution.
In this section, we will explore different measures of asymmetry and their properties. We
will focus on three different types of measures:

1. Measures of Quantum Frameness: Measures that quantify how well a resource

state can be utilized for the alignment of quantum reference frames.

2. Relative Entropies of Asymmetry: Measures that are derived from the general
framework of resource theories using different choices of relative entropies.

3. Derivatives of Asymmetry: Measures that are unique to the theory of asymmetry

and involve taking derivatives of quantum divergences.

While some measures, like the relative entropy of asymmetry, are derived from the gen-
eral framework of quantum resource theories, they have certain drawbacks, such as not being
additive under tensor products and having zero regularized versions. To overcome these lim-
itations, we introduce a new technique to construct measures of asymmetry that involve
taking derivatives of quantum divergences. We refer to these measures as derivatives of
asymmetry. The concept of derivatives of asymmetry encompasses significant measures like
the quantum Fisher information and the Wigner-Yanase-Dyson skew information. These

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

716 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

measures play pivotal roles in fields like quantum metrology, where precision and sensitivity
are paramount. By exploring the derivatives of asymmetry, we can gain a deeper under-
standing of asymmetry in quantum systems and their applications in diverse fields beyond
quantum information.

15.3.1 Measures of Quantum Frameness

In this section we introduce a subset of measures of asymmetry that we call “measures of
quantum frameness”. These are measures that appear as “figure of merits” in the context
of reference frame alignment, and they quantify the uncertainty (or more correctly, the
certainty) that Bob has about Alice’s reference frame. In other words, they quantify the
distinguishability of the elements in the set S(ρ) := {Ug ρUg∗ }g∈G . Note that since S(ρ) =
S(Ug ρUg∗ ), measures of quantum frameness must be invariant under the action ρ 7→ Ug ρUg∗ .
In any reference frame alignment scheme, Alice sends Bob a quantum state ρ ∈ D(A)
that is described relative to her reference frame. From Bob’s perspective, he received one of
the states {Ug ρUg∗ }g∈G , all occurring with uniform prior. To learn g ∈ G, Bob performs a
POVM, {Λg }g∈G on his system so that the probability that he guesses g ′ ∈ G, given that
the actual element relating the frames is g, is given by

q(g ′ |g) := Tr Λg′ Ug ρUg∗ .

(15.80)

In order to quantify how much information Bob gained after the measurement, let X be the
random variable corresponding to the element g ∈ G that relates between Alice and Bob’s
reference, and let Y be the random variable associated with Bob’s measurement outcome
g ′ ∈ G. With these notations, any measure of conditional uncertainty can be used to
quantify the uncertainty of X given that Bob has access to Y . Let S(X|Y )q , with q :=
{q(g ′ |g)}g,g′ ∈G denotes the probability distribution, be some measure of conditional certainty
such as the negative of the conditional entropy H(X|Y )q . Then, a measure of quantum
frameness associated with S(X|Y )q is defined for all ρ ∈ D(A) as

F(ρ) := max S(X|Y )q (15.81)

{Λg }

where the maximum is over all POVMs that Bob can perform on his system. In other words,
Bob chooses a POVM that maximizes his certainty about X. We say that F is a measure
of quantum reference frame only if it can be written in this way.

Theorem 15.3.1. Every measure of quantum frameness is a measure of asymmetry.

Proof. Let ρ ∈ D(A) and {Γg }g∈G ⊂ Eff(A). Let N ∈ COVG (A → B) and observe that
since N is G-covariant we get that

Tr Γg′ UgB N (ρ)Ug∗B = Tr Γg′ N UgA ρUg∗A

(15.82)
= Tr N ∗ (Γg′ ) Ug ρUg∗ .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.3. QUANTIFICATION OF ASYMMETRY 717

Therefore,

F N (ρ) = max
∗
S(X|Y )q ⩽ max S(X|Y )q = F(ρ) , (15.83)
{N (Γg )} {Λg }

where the first maximum is over all POVMs of the form {N ∗ (Γg )}g∈G which is a subset of
all possible POVMs {Λg }g∈G . This completes the proof.

Note that in the proof above we did not need to use any of properties of the function
S(X|Y )q . However, since S(X|Y )q measures the conditional certainty of X given that Bob
has access to Y , measures of quantum frameness has additional properties. In Chapter 7
we saw that all measures of conditional uncertainty has to behaves monotonically under
conditional majorization. However, in Chapter 7 we only considered finite dimensional,
discrete probability distributions. Therefore, for finite groups in which X and Y are discrete
random variables with |X| = |Y | = |G|, the function S(X|Y )q must behaves monotonically
under conditional majorization. We called such functions in Sec. 4.6.5 conditionally Schur
convex functions.
The extension of conditional majorization to continuous probability distributions is a
complex and currently unresolved issue in the field. However, various functions, such as the
family of conditional R’enyi entropies (including the conditional von-Neumann entropy), can
be utilized to measure the conditional entropy of continuous distributions. For the purpose
of our discussion here, we only need to focus on one common property shared by all such
functions that quantify conditional uncertainty: their invariance under the action of the
group, both from the left and from the right.
Let’s recall that S(X|Y )q represents a function of the conditional probability distribu-
tion q(g ′ |g). Now, suppose Bob rotates his reference frame by an element h ∈ G, causing
the corresponding element g ∈ G relating his frame to Alice’s frame to change to h−1 g.
Consequently, the outcome of the measurement g ′ transforms to h−1 g ′ under this change
in Bob’s reference frame. Since such a transformation should not affect Bob’s uncertainty
about g, we deduce that both distributions q(g ′ |g) and r(g ′ |g) := q(h−1 g ′ |h−1 g) represent the
same conditional uncertainty. Hence, any function S(X|Y )p that quantifies Bob’s certainty
about X must be left-invariant, meaning that S(X|Y )q = S(X|Y )r holds for all conditional
distributions q and all h ∈ G.
Similarly, let’s consider the scenario where Bob changes his reference frame such that
g 7→ gh and g ′ 7→ g ′ h. As before, such a transformation should not affect Bob’s uncertainty
about g. Consequently, both distributions q(g ′ |g) and r(g ′ |g) := q(g ′ h|gh) represent the same
conditional uncertainty. Thus, any function S(X|Y )q that quantifies Bob’s certainty about
X must also be right-invariant, meaning that S(X|Y )q = S(X|Y )r holds for all conditional
distributions q and all h ∈ G.
Many of the functions S(X|Y )q are not linear in q which makes the optimization in (15.81)
very difficult for such choices. We therefore focus here on measure of conditional certainty
that are linear in q. We start with the maximum likelihood that we already encountered in
Sec. 15.2.3

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

718 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

The Maximum Likelihood

In the previous section we introduced a figure of merit, called maximum likelihood, that
characterize how well a quantum state can be used to establish a shared reference frame. In
particular, for a quantum state ρ ∈ D(A) we define
Z
dg Tr Λg Ug ρUg∗

µ(ρ) := max (15.84)
G

where the maximum is over all POVMs {Λg }g∈G ⊂ Eff(A). Observe that µmax := maxρ∈D(A) µ(ρ)
is the maximum likelihood for Bob’s correct guess of Alice’s reference frame.

Exercise 15.3.1. Consider the maximum likelihood measure as defined above.

1. Show that µ(ρ) can be expressed as in (15.81) with

Z
S(X|Y )q := dg q(g|g) . (15.85)
G

2. Show that the function in (15.85) is both left and right-invariant.

In the theorem below we will see that the maximum likelihood µ(ρ) can be expresses
in terms of the max relative entropy. Among other things, this result demonstrates that
the function µ(ρ) behaves monotonically under G-covariant operations and therefore can be
used to define a measure for asymmetry. To be more precise, since for ρ ∈ INVG (A) we have
µ(ρ) = 1, the function ρ 7→ µ(ρ) − 1 is a measure of asymmetry (in fact, it can be shown to
be an asymmetry monotone, see the exercise below).

Theorem 15.3.2. Using the same notations as above, for any ρ ∈ D(A)

log µ(ρ) = min Dmax (ρ∥σ) . (15.86)

σ∈INVG (A)

Proof. For any POVM {Λg }g∈G denote by

Z
Λ := dg Ug∗ Λg Ug , (15.87)
G

so that Z
dg Tr Λg Ug ρUg∗ = Tr [Λρ] .

(15.88)
G

Since {Λg }g∈G is a POVM we get that

Z Z Z
Ug∗ Λg Ug = G(I A ) = I A .

G(Λ) = dg G = dg G(Λg ) = G dg Λg (15.89)
G G G

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.3. QUANTIFICATION OF ASYMMETRY 719

Conversely, let RΛ ∈ Pos(A) be such that G(Λ) = I A , and define , Λg := Ug ΛUg∗ for every
g ∈ G, so that G dg Λg = G(Λ) = I A . By definition, this POVM {Λg }g∈G satisfies (15.88).
We can therefore express µ(ρ) as the following SDP:

µ(ρ) = max Tr [Λρ] . (15.90)

Λ∈Pos(A)
G(Λ)=I

The above optimization problem is an SDP. As such, it has a dual given by (see Sec. A.9)
n o
µ(ρ) = min t ⩾ 0 : tσ ⩾ ρ , σ ∈ INVG (A) . (15.91)

That is,
log µ(ρ) = min Dmax (ρ∥σ) . (15.92)
σ∈INVG (A)

This completes the proof.

Exercise 15.3.2. Use the duality relations discussed in Sec. A.9 to show that the dual
of (15.90) is given by the expression in (15.91).

Exercise 15.3.3. Show that the maximum likelihood µ(ρ) is an asymmetry monotone.

We now use the expression in the theorem above to compute the maximum likelihood
of a pure state ψ ∈ D(A). For this purpose, we will use the fact that any quantum state
σ ∈ INVG (A) has the form M
σA = I Bλ ⊗ σλCλ , (15.93)
λ∈Irr(U )

for some σλ ∈ Pos(Cλ ). Let sλ := Tr [σλ ] and observe that since σ is normalized we must
have X
dλ s λ = 1 , (15.94)
λ∈Irr(U )

where dλ := |Bλ |. We will also use the fact that any pure state ψ ∈ Pure(A) can be expressed
as M √
|ψ⟩ = pλ |ψλ ⟩ (15.95)
λ∈Irr(U )

where {pλ }λ∈Irr(U ) is a probability distribution, and each ψλ ∈ Pure(Bλ Cλ ) is a pure state in
the tensor product of the representation and multiplicity spaces.

Theorem 15.3.3. The maximum likelihood of a pure state ψ ∈ Pure(A) is given by

hp i
log µ(ψ) = H1/2 G(ψ) := 2 log Tr G(ψ) , (15.96)

where H1/2 is the Rényi entropy of order α = 1/2.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

720 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

Proof. Let t ∈ R+ and σ ∈ D(A) be such that tσ ⩾ ψ. This condition holds if and only if
tI A ⩾ σ −1/2 |ψ⟩⟨ψ|σ −1/2 , (15.97)
(where all inverses are understood as generalized inverses). The above condition holds if and
only if t ⩾ ⟨ψ|σ −1 |ψ⟩. Therefore,
µ(ψ) = min ⟨ψ|σ −1 |ψ⟩ . (15.98)
σ∈INVG (A)

Now, a density matrix σ ∈ INVG (A) if and only if it has the form (15.93). Therefore, the
maximum likelihood of ψ is given by
X
µ(ψ) = min pλ ψλ I Bλ ⊗ σλ−1 ψλ (15.99)
λ∈Irr(U )

where the minimum is over all σλ ∈ Pos(Cλ ) whose traces satisfy (15.94). Using the notations
sλ := Tr [σλ ] and ηλ := s1λ σλ , we split the minimization into two parts: first, we fix the
numbers {sλ }λ∈Irr(U ) and minimize the expression over all ηλ ∈ D(Cλ ), and then we minimize
the resulting expression over all {sλ }λ∈Irr(U ) that satisfy (15.94). That is,
X
µ(ψ) = min pλ s−1
λ min ψλ I Bλ ⊗ ηλ−1 ψλ . (15.100)
{sλ } ηλ ∈D(Cλ )
λ∈Irr(U )
h i
Denote the reduced density matrix of ψλ by ρCλ
λ
:= Tr Bλ ψ Bλ Cλ
λ , and observe that

ψλ I Bλ ⊗ ηλ−1 ψλ = Tr ρλ ηλ−1

√
ρλ √ 2 2 −1
τλ := √ −−−−→ = (Tr [ ρλ ]) Tr τλ ηλ (15.101)
Tr ρλ
√
Definition 6.3.2 with α = 2→ = (Tr [ ρλ ])2 2D2 (τλ ∥ηλ ) .
Therefore, the minimum of the expression above over all η ∈ D(Cλ ) is obtained when η = τλ .
Hence, X √ 2
µ(ψ) = min pλ s−1
λ (Tr [ ρλ ]) . (15.102)
{sλ }
λ∈Irr(U )

For the remaining of the optimization problem, for each λ ∈ Irr(U ) we denote by rλ :=
√ 2
and qλ := 1s sλ , where s := λ∈Irr(U ) sλ . Observe that from (15.94) we get
P
pλ Tr ρλ
that s−1 = λ∈Irr(U ) dλ qλ . Therefore, with these notations we get that
P

X X
µ(ψ) = min dλ qλ rλ qλ−1 . (15.103)
λ∈Irr(U ) λ∈Irr(U )

where the minimum is over all probability distributions {qλ }λ∈Irr(U ) . From the Cauchy-
Schwarz inequality we get that the minimum is given by
X p 2
µ(ψ) = dλ rλ . (15.104)
λ∈Irr(U )

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.3. QUANTIFICATION OF ASYMMETRY 721
√ 2
Taking the log on both sides and substituting the expression pλ Tr ρλ for rλ gives
X p √
log µ(ψ) = 2 log dλ pλ Tr [ ρλ ] . (15.105)
λ∈Irr(U )

Finally, observe
hp the iexpression inside the log on the right-hand side of the equation above is
given by Tr G(ψ) . Hence, log µ(ψ) is given by (15.96). This completes the proof.

Exercise 15.3.4. Use the formula in (15.96) to prove Theorem 15.2.5.

The Weighted Maximum Likelihood

In Sec. 15.2.3 we highlighted the importance of maximizing the likelihood of making the
correct guess. However, when it comes to practical considerations, relying solely on the
maximum likelihood density as a measure of success is not particularly advantageous, as it
only rewards a completely accurate guess. A more general approach is to employ a payoff
function f : G × G 7→ R+ , denoted as f (g ′ , g), which determines the reward or payoff
associated with guessing group element g ′ when the true group element that relates between
the parties’ reference frames is g. By assuming a uniform prior for the signal states, the
figure of merit for the alignment scheme can be defined as the average payoff:
Z Z
µf (ρ) := max dg dg ′ f (g, g ′ )q(g ′ |g) ∀ ρ ∈ D(A) , (15.106)
G G

where the maximum is over all POVMs {Λg′ }g′ ∈G ⊂ Eff(A), and q(g ′ |g) := Tr Λg′ Ug ρUg∗ is

the probability of guessing g ′ given that the actual element that relates between the parties’
reference frames is g. Note that by taking f (g ′ , g) = δ(g ′ g −1 ) to be the Dirac delta function
we can get back the maximum likelihood function. We therefore call the function above the
weighted maximum likelihood.
Note that we can write µf (ρ) = max S(X|Y )q as given in (15.81), where q := {q(g ′ |g)}g,g′ ∈G ,
the maximum is over all POVM as above, and S(X|Y )q = Lf (q) is taken to be the linear
functional Z Z
Lf (q) := dg dg ′ f (g, g ′ )q(g ′ |g) . (15.107)
G G

Since the function Lf (q) represents the certainty that Bob has about g, it has to be (see the
discussion above) both left and right invariant. Fix h ∈ G and denote by r(g ′ |g) := q(hg ′ |hg).
Since Lf is left invariant we have Lf (r) = Lf (q) for all conditional distributions p. Since
Z Z Z Z
′ ′ −1 ′ −1
Lf (r) := dg dg f (g, g )q(h g |h g) = dg dg ′ f (hg, hg ′ )q(g ′ |g) , (15.108)
G G G G

the condition Lf (r) = Lf (q) is equivalent to

Z Z
dg dg ′ f (hg, hg ′ ) − f (g, g ′ ) q(g ′ |g) = 0 . (15.109)
G G

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

722 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

As the above condition holds for all conditional probability distributions q(g ′ |g) , we conclude
that f itself is left-invariant. That is,

f (hg, hg ′ ) = f (g, g ′ ) ∀ g, g ′ , h ∈ G . (15.110)

The left-invariance property of f is consistent with the intuition that the payoff function
should exclusively depend on the relative transformation between the transmitted state,
characterized by the group element g, and the measurement outcome, represented by the
group element g ′ .
Following similar arguments as above, the right-invariance property of Lf implies that
the function f itself is also right-invariant, that is,

f (kh−1 , k ′ h−1 ) = f (k, k ′ ) ∀ k, k ′ , h ∈ G . (15.111)

The fact that f is both right and left invariant has the following consequences.
First, by taking h = g −1 in (15.110) we get that

f (g, g ′ ) = f (e, g −1 g ′ ) . (15.112)

That is, f (g, g ′ ) can be viewed as a function of g −1 g ′ . We will denote this function by p so
that f (g, g ′ ) = p(g −1 g ′ ). Now, since f (g, g ′ ) is also right invariant we get that for all h, g ∈ G
we have p(hgh−1 ) = p(g). That is, p is a class function as introduced in Definition C.6.2.
R Since f is non-negative so is p, and consequently, it is natural to normalize p such that
G
dg p(g) = 1. That is, {p(g)}g∈G is a probability distribution over the group. Moreover,
for a function f that is both left and right invariant we have
Z Z
dg ′ p(g −1 g ′ )Tr Λg′ Ug ρUg∗

µf (ρ) = max dg
ZG ZG
dg ′ dh p(h)Tr Λg′ Ug′ Uh∗ ρUh Ug∗′

h := g −1 g ′ −−−−→ = max (15.113)
ZG G

renaming g ′ as g→ = max dg Tr Ug∗ Λg Ug Gp (ρ) ,

where Gp is the weighted G-twirling as defined in (15.30). We therefore conclude that

µf (ρ) = µ Gp (ρ) = min Dmax Gp (ρ) σ , (15.114)
σ∈INVG (A)

where the last equality follows from Theorem 15.3.2

Exercise 15.3.5. Explain why for f that is not left and right invariant, the function µf is
not necessarily a measure of asymmetry.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.3. QUANTIFICATION OF ASYMMETRY 723

15.3.2 The Relative Entropy of Asymmetry

For every normalized quantum divergence D the “distance” (as measured by D) of a state
ρ ∈ D(A) to the set INVG (A) is a measure of asymmetry. That is, the function

AsyD (ρ) = min D(ρ∥σ) (15.115)

σ∈INVG (A)

is a measure of asymmetry. For certain choices of the divergence D, the function above can
be hard to compute. However, for the relative entropy it has a very simple form.
For any α ∈ [0, 2], the α-Rényi relative entropy of asymmetry is defined as

Asyα (ρ) := min Dα (ρ∥σ) ∀ ρ ∈ D(A) . (15.116)

σ∈INVG (A)

In Exercise 15.2.4 you showed that if ρ is G-invariant then for all α ∈ [0, 2] the state
ρα := ρα /Tr[ρα ] is also G invariant. Therefore, from Theorem 10.105 it follows that the
α-Rényi relative entropy of asymmetry is given by
1
Asyα (ρ) = log ∥G (ρα )∥1/α
α−1 (15.117)
(10.108)→ = H1/α G(ρα ) − Hα (ρ) ,

where Hα is the α-Rényi entropy, and G is the G-twirling map that is also the resource
destroying map of the QRT of asymmetry. The special case of α = 1 is also known as the
G-asymmetry of the state ρ ∈ D(A) and is given by

Asy(ρ) = H G(ρ) − H(ρ) . (15.118)

From Theorem 15.3.2 it follows that the log of the maximum likelihood, log µ(ρ), can
be viewed as the max relative entropy of asymmetry, in which Dα in (15.116) is replace by
Dmax . Since Dmax is the largest relative entropy, the formula in (15.117) with α = 2 can be
used to provide a lower bound for log µ(ρ). Specifically, we have
2
ρ
log µ(ρ) ⩾ H1/2 G − H2 (ρ) . (15.119)
Tr[ρ2 ]
Remarkably, due to Theorem 15.3.3, the inequality above becomes an equality on all pure
states.
Despite the elegant expression above for the G-asymmetry, in general, the G-asymmetry
is not additive under tensor products. In fact, in the following theorem we show that its
regularization is zero!

Theorem 15.3.4. Let G be a finite or compact Lie group and let ρ ∈ D(A). Then,
1
Asy ρ⊗n = 0 .

lim (15.120)
n→∞ n

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

724 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

Remark. The theorem above underscores a notable constraint associated with using the
G-asymmetry as a measure of asymmetry in quantum systems. It signals the necessity
to investigate other measures capable of surmounting this limitation, particularly in the
asymptotic regime where numerous copies of asymmetric states are considered. We will
see that venturing into alternative measures will pave the way for a broader and more
nuanced comprehension of asymmetry’s nature and characteristics when approached from
the perspective of the asymptotic domain.
Proof. According to (15.25) the action of the G-twirling on the state ρ is given by
X
G(ρ) = px Ugx (ρ)Ug∗x , (15.121)
x∈[d]

where d is an integer satisfying d ⩽ m4 (see Exercise 15.2.2). Therefore, from the von-
Neumman property (7.120) we get that

H G(ρ) ⩽ H(ρ) + H(p) ⩽ H(ρ) + log(d) . (15.122)
Thus, combining this with the definition in (15.118) gives Asy(ρ) ⩽ log(d).
Now, fix n ∈ N and consider the action of the G-twirling on ρ⊗n :
Z
⊗n
dg Ug⊗n ρ⊗n Ug⊗n .

Gn ρ := (15.123)
G

Observe that the support of ρ⊗n is a subspace of the symmetric subspace Symn (A). Thus,
we can view ρ⊗n as a positive semidefinite operator acting on Symn (A). Moreover, the map
g 7→ Ug⊗n can also be seen as a projective unitary representation of G on the space Symn (A).
Therefore, if we repeat the same steps that led to the inequality Asy(ρ) ⩽ log(d) but with
ρ⊗n instead of ρ, we obtain:
Asy ρ⊗n ⩽ log(dn ) ,

(15.124)
where dn is an integer no greater than the dimension of Symn (A) to the power four (see (15.26)).
Combining this with the formula (C.166) for the dimension of the symmetric subspace we
arrive at
⊗n
n+m−1
Asy ρ ⩽ 4 log
n (15.125)
(8.87)→ ⩽ 4m log(n + 1) .
Hence,
1 1
Asy ρ⊗n ⩽ 4m lim log(n + 1) = 0 .

lim (15.126)
n→∞ n n→∞ n

This completes the proof.

To illustrate the theorem mentioned above, let’s consider the group U(1) and the pure
state ψ belonging to Pure(A), given by
X
|ψ A ⟩ = cx |x⟩ , (15.127)
x∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.3. QUANTIFICATION OF ASYMMETRY 725

where {|x⟩}x∈[m] are the eigenvectors of the number operator N̂ , and each cx ∈ C. We can
express n copies of ψ as
n
X
|ψ ⊗n ⟩ = aj |ϕA
j ⟩ (15.128)
j∈[mn]
n
where aj ∈ C and |ϕAj ⟩ is the eigenvector of the total number operator N̂tot corresponding
to the eigenvalue j, for each j ∈ [mn]. By applying the G-twirling to ψ ⊗n , we obtain
(see (15.22))
n
X
Gn ψ ⊗n = |aj |2 ϕA

j . (15.129)
j∈[mn]

Denoting by p ∈ Prob(mn) with components pj := |aj |2 for each j ∈ [mn], we conclude that

H Gn ψ ⊗n = H(p) ⩽ log(mn) .

(15.130)

Therefore,
1 1
H Gn ψ ⊗n ⩽ lim log(nm) = 0 .

lim (15.131)
n→∞ n n→∞ n

The key observation in this example is that the rank of G (ψ ⊗n ) grows linearly with n.

The Weighted G-Asymmetry

Using the weighted G-twirling, we define the wighted G-asymmetry as:

Asyp (ρ) := H Gp (ρ) − H(ρ) . (15.132)

Theorem 15.3.5. The weighted G-asymmetry as defined in (15.132) is a measure of

asymmetry.

Proof. We begin by expressing Asyp (ρ) as the mutual information of the state σ XA , defined
as X
σ XA := px |x⟩⟨x|X ⊗ Ugx ρA Ug∗x , (15.133)
x∈[d]

where X is a classical system of dimension d. By definition, the mutual information is given

by (see (13.177))

D σ XA σ A ⊗ σ X = H σ X + H σ A − H σ XA ,

(15.134)

where σ A = Gp ρA and H σ X = H(p). From Exercise 7.5.1, we have
X
H σ XA = H(p) + px H Ugx ρA Ug∗x

x (15.135)
−−−−→ = H(p) + H ρA .

H Ugx ρA Ug∗x = H ρA

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

726 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

Combining these results, we get

Asyp ρA = D σ XA σ X ⊗ σ A .

(15.136)

Next, let N ∈ COVG (A → B) and observe that since N A→B ◦ UgA = UgB ◦ N A→B for all
g ∈ G, we have

N A→B σ XA = N A→B ◦ E A→XA ρA

(15.137)
X
px |x⟩⟨x|X ⊗ UgBx ◦ N A→B ρA .

=
x∈[d]

Therefore,

Asyp N A→B ρA = D N A→B σ XA σ X ⊗ N A→B σ A

DPI→ ⩽ D σ XA σ X ⊗ σ A

(15.138)
= Asyp ρA .

This completes the proof.

The Mutual Information of Asymmetry

The relation (15.136) can be generalized to an arbitrary divergence D in the following way.
For any ρ ∈ D(A) and p ∈ Prob(d), we define the mutual information of asymmetry of ρA
with respect to p as:
Ip ρA := D σ XA σ X ⊗ σ A ,

(15.139)
where σ XA is defined as in (15.133). Note that the same argument used in (15.138) can be
repeated with D replacing D. Therefore, Ip is also a measure of asymmetry.
By tracing
out system X in (15.139), we obtain the function (recall that the marginal
σ A = Gp ρA )
Ap (ρ) := D ρ Gp (ρ) ∀ ρ ∈ D(A) . (15.140)
This function is also a measure of asymmetry, since if ρ ∈ INVG (A) then Ap (ρ) = 0, and for
any N ∈ COVG (A → B) we have

Ap N (ρ) = D N (ρ) Gp ◦ N (ρ)

N is G − covariant→ = D N (ρ) N ◦ Gp (ρ) (15.141)
DPI→ ⩽ D (ρ∥Gp (ρ)) = Ap (ρ) .

Therefore, Ap is a valid measure of asymmetry.

Exercise 15.3.6. Show that for every divergence D, all ρ ∈ D(A), and all p ∈ Prob(d) we
have
Ap (ρ) ⩽ Ip (ρ) . (15.142)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.3. QUANTIFICATION OF ASYMMETRY 727

15.3.3 Derivatives of Asymmetry

Consider the last measure of asymmetry Ap that we discussed in the previous subsection,
and take p to be the vector p = (1, 0, . . . , 0)T . Further, denote by g := g1 so that Ap can be
denoted as Ag and is given by

Ag (ρ) := D ρ Ug ρUg∗

∀ ρ ∈ D(A) . (15.143)

Since a Lie group is a differentiable manifold, we can take g to be infinitesimally close to

the identity element. Specifically, we can always choose Ug = eitΛ , where Λ is some generator
of the representation g 7→ Ug , and t ⩾ 0 is some phase. The derivative of asymmetry with
respect to the divergence D and generator Λ is defined for all ρ ∈ D(A) as
d
D ρ eitΛ ρe−itΛ

DΛ (ρ) := . (15.144)
dt t=0

As we will see shortly that quite often the derivative on the right-hand side yields the constant
zero function. In such cases, DΛ (ρ) is defined in terms of the second derivative as

1 d2 itΛ −itΛ

DΛ (ρ) := D ρ e ρe . (15.145)
2 dt2 t=0

Exercise 15.3.7. Show that the derivative of asymmetry as defined above is a measure
of asymmetry. Hint: Use the fact that for each g ∈ G the function Ag is a measure of
asymmetry.
In order to compute the derivatives above, we will use the expension
1
eitΛ ρe−itΛ = ρ + it[Λ, ρ] − t2 Λ, [Λ, ρ] + O(t3 ) .

(15.146)
2
It will be convenient to use the notations σ := i[Λ, ρ] and η := − 21 Λ, [Λ, ρ] so that

eitΛ ρe−itΛ = ρ + tσ + t2 η + O(t3 ) . (15.147)

We now use this expansion to compute the derivatives of asymmetry for several examples.

Differential Trace Distance of Asymmetry

As our first example, let D = T be the trace distance. In this case, DΛ = TΛ is called the
differential trace distance of asymmetry and is given by
1
TΛ (ρ) = lim+ T (ρ, ρ + tσ)
t→0 t
1 1
= lim+ tσ 1 (15.148)
2 t→0 t
1
= [ρ, Λ] 1
2

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

728 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

The differential trace distance measures the asymmetry in a state ρ relative to a subgroup
of G associated with a generator Λ. This measure depends on the coherence of ρ over the
eigenspaces of Λ, which is indicated by the non-zero commutator [ρ, Λ]. The question then
arises as to which operator norm should be used to measure the commutator [ρ, Λ] and thus,
the asymmetry of ρ. While the answer to this question may not be immediately apparent,
the above discussion indicated that the trace norm is the most appropriate measure for this
purpose.
Exercise 15.3.8. Let ψ ∈ Pure(A) and Λ ∈ Herm(A). Show that
p
TΛ (ψ) = ⟨ψ|Λ2 |ψ⟩ − ⟨ψ|Λ|ψ⟩2 . (15.149)
That is, on pure states, the differential trace distance of asymmetry reduces to the variance
of the observable Λ.

The Wigner-Yanase-Dyson Skew Information

For our second example, we take D = Dα to be the Petz quantum Rényi divergence of over
α ∈ [0, 2] (see Definition 6.3.2). In this case, we will see that the
first derivative in (15.144)
is zero, so we will have to expend the function Dα ρ eitΛ ρe−itΛ up to second order in t. Up
to second order in t we have
1 h 1−α i
Dα ρ ρ + tσ + t2 η = log Tr ρα ρ + tσ + t2 η

. (15.150)
α−1
In what follows, we will consider the spectral decomposition of ρA = x∈[m] px |x⟩⟨x|A (where
P

{|x⟩}x∈[m] is the basis of A consisting of the eigenvectors of ρA ), and make use of the divided
difference approach discussed in Appendix D.1. Particularly, the trace above has the form
given in Corollary D.1.1 with g(t) := tα and f (t) := t1−α . Therefore, the function h(t) as
defined in Corollary D.1.1 is given by
h(t) := g(t)f ′ (t) = 1 − α , (15.151)
so that h(ρ) = (1 − α)I A is a constant function. As such, Tr [h(ρ)σ] = (1 − α)Tr[σ] = 0 and
similarly Tr [h(ρ)η] = (1 − α)Tr[η] = 0. Observe further that since h is a constant function,
Lh (σ) = 0. Substituting all this into Corollary D.1.1 we conclude that

2 1 1 2
log 1 − t Tr [Lf (σ)Lg (σ)] + O(t3 )

Dα ρ ρ + tσ + t η =
α−1 2
2
(15.152)
1 t
= Tr [Lf (σ)Lg (σ)] + O(t3 ) ,
21−α
where the self-adjoint linear maps Lf , Lg ∈ Herm(A → A) are defined by (see Appendix D.1
for more details)
px1−α − py1−α
⟨x|Lf (σ)|y⟩ = ⟨x|σ|y⟩
px − py (15.153)
1−α 1−α

= −i px − py ⟨x|Λ|y⟩ ,

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.3. QUANTIFICATION OF ASYMMETRY 729

and similarly
⟨y|Lg (σ)|x⟩ = −i pαy − pαx ⟨y|Λ|x⟩ .

(15.154)
Therefore,
X
py1−α − p1−α pαy − pαx |⟨x|Λ|y⟩|2

Tr [Lf (σ)Lg (σ)] = x
x,y∈[m]
X X
=2 px |⟨x|Λ|y⟩|2 − 2 p1−α
x pαy |⟨x|Λ|y⟩|2 (15.155)
x,y∈[m] x,y∈[m]

= 2Tr ρΛ2 − 2Tr ρ1−α Λρα Λ .

Hence, for all α ∈ [0, 2], ρ ∈ D(A) and Λ ∈ Herm(A), the differential α-Rényi divergence of
asymmetry is given by
1
Tr ρΛ2 − Tr ρ1−α Λρα Λ .

DΛ,α (ρ) = (15.156)
1−α
The expression in the parenthesis above (i.e. without the factor 1/(1 − α)) is known as the
Wigner-Yanase-Dyson skew information. Note that as the previous example, on pure states
the Wigner-Yanase-Dyson skew information reduces to the variance of Λ.
Exercise 15.3.9. Let ρ = ψ ∈ Pure(A) and Λ ∈ Herm(A). Show that
1
⟨ψ|Λ2 |ψ⟩ − ⟨ψ|Λ|ψ⟩2 .

DΛ,α (ψ) = (15.157)
1−α
Exercise 15.3.10. Show that for α ∈ (0, 1) the function DΛ,α (ρ) is concave in ρ. Hint: Use
Lieb’s Concavity Theorem (see Theorem B.6.1).
Exercise 15.3.11. Show that for α = 1 and ρ ∈ D(A) we have
DΛ (ρ) := lim DΛ,α (ρ) = Tr Λ2 ρ log ρ − Tr [ΛρΛ log ρ] .

(15.158)
α→1

The Quantum Fisher Information

As our third example, we take D = D̃α to be the minimal quantum divergence. In this case,
we use the invariance of every relative entropy under unitary operations to get that
D̃α ρ∥Ug ρUg∗ = D̃α Ug∗ ρUg ∥ρ .

(15.159)
Since Ug∗ ρUg = ρ − tσ + t2 η + O(t3 ) we get that

D̃α ρ∥Ug ρUg∗ = Dα (ρ − tσ + t2 η∥ρ) + O(t3 ) .

(15.160)
By definition,
1 h 1−α 1−α α i
Dα (ρ − tσ + t2 η∥ρ) = log Tr ρ 2α ρ − tσ + t2 η ρ 2α
α−1
(15.161)
1 α
log Tr ρ̃ − tσ̃ + t2 η̃

=
α−1

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

730 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

where
1−α 1−α 1−α 1−α 1−α 1−α
ρ̃ := ρ 2α ρρ 2α = ρ1/α , σ̃ := ρ 2α σρ 2α , and η̃ := ρ 2α ηρ 2α . (15.162)
α
The trace Tr (ρ̃ − tσ̃ + t2 η̃) has the form given in Corollary D.1.1 with g(t) := 1 and
f (t) := tα . Therefore, h(t) := g(t)f ′ (t) = αtα−1 , so that

Tr [h(ρ̃)σ̃] = αTr[ρ̃α−1 σ̃] = αTr[σ] = 0 (15.163)

and similarly Tr [h(ρ̃)η̃] = αTr[η] = 0. Observe further that since g is a constant function,
Lg (σ) = 0. Substituting all this into Corollary D.1.1 we conclude that
α 1
ρ̃ − tσ̃ + t2 η̃ = 1 + t2 Tr [σ̃Lh (σ̃)] + O(t3 ) .

Tr (15.164)
2
It will be convenient to denote s := α1 . Since we assume that α ∈ [1/2, ∞] we have that
s ∈ [0, 2]. Working with the eigenbasis of ρ we get for all x, y ∈ [m]
1−α 1−α
⟨x|σ̃|y⟩ = px2α py2α ⟨x|σ|y⟩
(15.165)
s := 1/α, σ := i[Λ, ρ] −−−−→ = ipx(s−1)/2 p(s−1)/2
y (py − px )⟨x|Λ|y⟩ .

Furthermore, since the eigenvalues of ρ̃ are {psx }x∈[m] we get by definition of Lh that

h psy − h (psx ) 1 p1−s
y − px1−s
⟨y|Lh (σ̃)|x⟩ = ⟨y|σ̃|x⟩ = ⟨y|σ̃|x⟩ , (15.166)
psy − psx s psy − psx

where the case px = py is understood in terms of the limit

p1−s
y − p1−s
x 1 − s 1−2s
lim = p . (15.167)
py →px s s
py − px s x

With these expressions we get

X
Tr [σ̃Lh (σ̃)] = ⟨x|σ̃|y⟩⟨y|Lh (σ̃)|x⟩
x,y∈[m]
(15.168)
1 X ps−1
x − ps−1
y
(15.165), (15.166)→ = s s
(py − px )2 |⟨x|Λ|y⟩|2 .
s py − px
x,y∈[m]

Note that since the limit py → px of the components in the sum above is zero, we can restrict
the sum above to all x, y ∈ [m] that satisfies px ̸= py . Hence, we conclude that

1 X ps−1
x − ps−1
y
D̃Λ,s (ρ) := D̃Λ,α (ρ) = (px − py )2 |⟨x|Λ|y⟩|2 . (15.169)
2(s − 1) px − psy
s
x,y∈[m]
px ̸=py

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.3. QUANTIFICATION OF ASYMMETRY 731

If ρ is given by the pure state ψ = |1⟩⟨1| ∈ Pure(A) then px = δ1x for all x ∈ [m]. In this
case, for all s ∈ [0, 2]
m m
1 X 1 X
D̃Λ,s (ψ) = |⟨ψ|Λ|y⟩|2 + |⟨x|Λ|ψ⟩|2
s − 1 y=2 s − 1 x=2
m
2 X
= |⟨x|Λ|ψ⟩|2
s − 1 x=2 (15.170)
m 2
ψA Λ I A − ψ Λ ψA A

−−−−→ =
X
|x⟩⟨x|A = I A − ψ A
x=2 s−1
2
⟨ψ|Λ2 |ψ⟩ − ⟨ψ|Λ|ψ⟩2 .

=
s−1
The Fisher information is a measure of asymmetry that is obtained by setting s = 2 in
the family of asymmetry monotones given in equation (15.169), which yields:
X (px − py )2
FΛ (ρ) := 4D̃Λ,2 (ρ) = 2 |⟨x|Λ|y⟩|2 . (15.171)
px + py
x,y∈[m]

The Fisher information is a fundamental concept in statistics and information theory with
numerous applications in quantum metrology and quantum information. It plays a crucial
role in studying the ultimate limits of precision in quantum measurements, commonly re-
ferred to as the quantum Cramér-Rao bound. Moreover, the Fisher information is employed
to measure the distinguishability of quantum states, to characterize the entanglement prop-
erties of multipartite systems, and to devise optimal quantum measurement strategies. In the
field of quantum thermodynamics, it has an operational interpretation as the coherence cost
of preparing a system in a particular state without any restrictions on work consumption.
Exercise 15.3.12. Show that for s = α = 1
X
D̃Λ (ρ) := lim D̃Λ,α (ρ) = (log px − log py )(px − py )|⟨x|Λ|y⟩|2 . (15.172)
α→1
x,y∈[m]

Exercise 15.3.13. Show that D̃Λ,s can be expressed as

D̃Λ,s (ρ) = Tr [ΛLρ (Λ)] ∀ Λ ∈ Herm(A), ∀ ρ ∈ D(A) , (15.173)

where Lρ ∈ Herm(A → A) is a self-adjoint linear map.

Exercise 15.3.14. Show that the Fisher information upper bound the Wigner-Yanase skew
information; that is, show that for all ρ ∈ D(A) and Λ ∈ Herm(A) we have
√ √
FΛ (ρ) ⩾ 4 Tr ρΛ2 − Tr [ ρΛ ρΛ] .

(15.174)

Hint: Recall that D̃α (ρ∥σ) ⩽ Dα (ρ∥σ) for all ρ, σ ∈ D(A).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

732 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

15.4 Manipulation of Pure Asymmetry

In this section, we aim to investigate the circumstances in which it is possible to transform
an asymmetric pure state ψ into another pure state ϕ, using G-covariant maps. To achieve
this, we must first identify all the pure states that are equivalent under symmetric (i.e.,
G-covariant) operations. This characterization is crucial for understanding the limitations
and possibilities of pure-state transformations in this resource theory.

15.4.1 Characterization of Asymmetry Equivalence Classes

Definition 15.4.1. Consider a projective unitary representation g 7→ Ug ∈ L(A),

where G is a finite or compact Lie group.

1. We say that two states ρ, σ ∈ D(A) are G-equivalent if there exists

E, F ∈ COVG (A → A) such that ρ = E(σ) and σ = F(ρ).

2. Two pure states ψ, ϕ ∈ Pure(A) are called unitarily G-equivalent if there exists
a G-invariant unitary matrix V : A → A such that V |ψ⟩ = |ϕ⟩.

We will also refer to the set of all states σ ∈ D(A) that are G-equivalent to ρ as the
G-equivalence class of ρ. In this subsection, our focus is on characterizing the G equivalence
class of a pure state. To achieve this goal, we begin by characterizing unitarily G-equivalent
states. Note that if [V, Ug ] = 0 holds for all g, then [V ∗ , Ug ] = 0 holds for all g as well.
Therefore, we can replace the condition V |ψ⟩ = |ϕ⟩ in the definition of unitarily G-equivalent
states with |ψ⟩ = V |ϕ⟩.
Exercise 15.4.1. Let g 7→ G be a projective unitary representation of G and for every
ρ ∈ D(A) let
SymG (ρ) := {g ∈ G : Ug ρUg∗ = ρ} . (15.175)
1. Show that SymG (ρ) is a subgroup of G.
G−COV
2. Show that if ρ −−−−−→ σ for some ρ, σ ∈ D(A) then SymG (ρ) is a subgroup of SymG (σ).
3. Show that if ρ and σ are G-equivalent then SymG (ρ) = SymG (σ).
Every projective unitary representation, g 7→ UgA , corresponds
L to a decomposition of the
Hilbert space A as given in (C.44). Specifically, A = λ∈Irr(U ) Aλ , where Aλ = Bλ ⊗ Cλ .
Accordingly, every two pure states ψ, ϕ ∈ Pure(A) can be expressed as
X X
|ψ A ⟩ = |ψλBλ Cλ ⟩ and |ϕA ⟩ = |ϕBλ
λ Cλ
⟩, (15.176)
λ∈Irr(U ) λ∈Irr(U )

where |ψλ ⟩, |ϕλ ⟩ ∈ Bλ Cλ are subnormalized states in Aλ . In the following theorem we show
that ψ A and ϕA are unitarily G equivalent if the marginals of ϕBλ
λ Cλ
and ϕB
λ
λ Cλ
on Bλ are the

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.4. MANIPULATION OF PURE ASYMMETRY 733

same. In addition, the theorem characterizes states that are unitarily G-equivalent in terms
of their characteristic functions. In Sec. C.7, we discuss various properties of characteristic
functions, and we encourage readers who are unfamiliar with this material to read Sec.C.7
before proceeding to the theorem below.

Theorem 15.4.1. Let ψ, ϕ ∈ Pure(A). The following statements are equivalent:

1. The states ψ and ϕ are unitarily G-equivalent.

2. There exists a unitary matrix V ∈ U(A) such that V Ug |ψ⟩ = Ug |ϕ⟩, ∀ g ∈ G.

3. The characteristic functions of ψ and ϕ are the same: χψ (g) = χϕ (g), ∀ g ∈ G.

h i h i
4. Using (15.176), for all λ ∈ Irr(U ), TrCλ ψλBλ Cλ = TrCλ ϕB
λ
λ Cλ
.

Proof. The implication 1 ⇒ 2: Suppose that ψ and ϕ are unitarily G-equivalent. Then there
exists a G-invariant unitary matrix V : A → A such that |ϕ⟩ = V |ψ⟩. Since V is G-invariant,
after multiplying both sides by Ug from the left we get Ug |ϕ⟩ = Ug V |ψ⟩ = V Ug |ψ⟩.
The implication 2 ⇒ 3: For all g ∈ G we have
⟨ψ|Ug |ψ⟩ = ⟨ψ|V ∗ V Ug |ψ⟩
V |ψ⟩ = |ϕ⟩ −−−−→ = ⟨ϕ|V Ug |ψ⟩ (15.177)
V Ug |ψ⟩ = Ug |ϕ⟩ −−−−→ = ⟨ϕ|Ug |ϕ⟩ .

The implication 3 ⇒ 4: Since we assume that χψ (g) = χϕ (g) for all g ∈ G, we get from
Theorem C.7.1 h i Z
Bλ Cλ
TrCλ ψλ = |Bλ | dg χψ (g −1 )Ug(λ)
ZG
χψ (g −1 ) = χϕ (g −1 ) −−−−→ = |Bλ | dg χϕ (g −1 )Ug(λ) (15.178)
G
h i
(C.129)→ = TrCλ ϕB λ
λ Cλ
.

The implication 4 ⇒ 1: Since for each λ ∈ Irr(U ) the states ψλBλ Cλ and ϕBλ
λ Cλ
have the
same marginal on representation space Bλ , there exists a unitary matrix Vλ : Cλ → Cλ such
that
I Bλ ⊗ VλCλ ψλBλ Cλ = ϕB λ
λ Cλ
. (15.179)
Let V : A → A be the unitary matrix
M
V := I Bλ ⊗ VλCλ . (15.180)
λ∈Irr(U )

Then, by definition, |ϕA ⟩ = V |ψ A ⟩, and since for each g ∈ G the unitary matrix Ug has the
form (C.45) we get that [V, Ug ] = 0. Hence, ψ A and ϕA are unitarily G-equivalent. This
completes the proof.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

734 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

Exercise 15.4.2. Let ψ ∈ Pure(A) and ϕ ∈ Pure(B), where |B| = ̸ |A|, and consider two
projective unitary representations g 7→ UgA and g 7→ UgB in A and B, respectively. Show that
if χψ (g) = χϕ (g) for all g ∈ G then ψ and ϕ are G-equivalent.

The above theorem characterizes unitarily G-equivalent states. However, from a resource
theory perspective, two states belong to the same resource equivalence class if they are G-
equivalent (not necessarily unitarily). Our next theorem characterize this G-equivalence
class, assuming that the states involved are G-regular.

Definition 15.4.2. Let ψ, ϕ ∈ Pure(A) and G be a group. We say that ψ and ϕ are
G-regular with respect to a representation g 7→ Ug if one of the following two
conditions holds:

1. The group G is finite and there is no g ∈ G such that χψ (g) = χϕ (g) = 0; that
is, the functions χψ , χϕ : G → C cannot take the zero value simultaneously.

2. The group G is a compact Lie group and there is no open set (other than the
trivial one) C of G for which χψ (g) = χϕ (g) = 0 for all g ∈ C.

It is worth pointing out that every connected compact Lie group G satisfies the second
condition above. In fact, if G is connected, for any state ψ ∈ Pure(A), there cannot be an
open neighbourhood C of G for which χψ (g) = 0 for all g ∈ C. To see why, by contradiction,
suppose that χψ (g) = 0 for all g ∈ C. Since the function χψ : G → C is analytic, the identity
theorem in complex analysis implies that χψ is the zero function, which contradicts the fact
that χψ (e) = 1 for the identity element e ∈ G.

Exercise 15.4.3. Let G be a compact Lie group and let ψ, ϕ ∈ Pure(A) be such that for all
g ∈ G there exists elements h, h′ ∈ H of a connected subgroup H of G for which |χψ (hgh′ )| +
|χϕ (hgh′ )| =
̸ 0. Show that ψ and ϕ are G-regular.

To clarify the notion of G-regular states, let’s consider the group O(2) of 2 × 2 real
orthogonal matrices. This group is a compact Lie group, but it is not connected because
matrices with determinant one are not continuously connected to matrices with determinant
minus one. Let H := SO(2) be the subgroup of O(2) consisting of all the elements of O(2)
with determinant one. The question we want to answer is: Is there a state ψ ∈ Pure(C2 )
such that χψ (g) = 0 for all g ̸∈ H?
 question, wefirst observe that all the matrices g ∈ O(2) with det(g) = −1
To answer this
cos θ sin θ
have the form   for some θ ∈ [0, 2π]. Therefore, χψ (g) = 0 for all g ̸∈ H if
sin θ − cos θ
and only if  
cos θ sin θ
ψ   ψ =0 ∀ θ ∈ [0, 2π] . (15.181)
sin θ − cos θ

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.4. MANIPULATION OF PURE ASYMMETRY 735

The only pure state that satisfies the above equation is |ψ⟩ = √12 (|0⟩ + i|1⟩). Therefore, in
this example, the second condition in Definition 15.4.2 is satisfied except in the case where
|ψ⟩ = |ϕ⟩ = √12 (|0⟩ + i|1⟩).

Theorem 15.4.2. Let g 7→ Ug be a projective unitary representation of G, and let

ψ, ϕ ∈ Pure(A) be G-regular states. The states ψ and ϕ are G-equivalent if and only
if there exists a 1-dimensional unitary representation of G, {eiθg }g∈G , such that for
all g ∈ G
⟨ψ|Ug |ψ⟩ = eiθg ⟨ϕ|Ug |ϕ⟩ . (15.182)

Remark. We will see in the proof below that for any finite or compact (not necessarily
connected) Lie group G, if (15.182) holds, then ϕ and ψ are G-equivalent. Therefore, we
only need the assumption that ψ and ϕ are G-regular for the converse part. In Sec. D.6
of the appendix we provide additional observations for the case that ψ and ϕ are not G-
regular. Moreover, it is worth noting that semi-simple compact Lie groups, such as SU (2),
do not have any non-trivial 1-dimensional representation. Therefore, it follows from the
theorem above and the preceding theorem that for such groups, the following statements are
all equivalent:
1. ψ and ϕ are G-equivalent.

2. ψ and ϕ are unitarily G-equivalent.

3. ψ and ϕ have the same characteristic function.

Proof. We first prove that the condition (15.182) implies that ψ and ϕ are G-equivalent. Let
E be a qubit system (i.e., |E| = 2) with an orthonormal basis {|φ1 ⟩, |φ2 ⟩}, and let g 7→ UgE
be the non-projective unitary representation of G with

UgE := φE iθg E
1 + e φ2 , (15.183)

where g 7→ eiθg is the 1-dimensional representation of G that appear in (15.182). Clearly,

with respect to the representation g 7→ UgE , the state φE E
1 and φ2 are G-invariant, and for
all g ∈ G, χφ1 (g) = 1 and χφ2 (g) = eiθg .
Combining this with (15.182) we conclude that |ψ⟩A |φ1 ⟩E has the same characteristic
function as |ϕ⟩A |φ2 ⟩E . From Theorem 15.4.1 it follows that there exists a G-invariant unitary
V (with respect to the representation g 7→ UgA ⊗ UgE ) such that
∗
ψ A ⊗ φE A
1 = V ϕ ⊗ φ2 V .
E
(15.184)
∗
Define a quantum channel E(ρA ) := TrE V ρA ⊗ φE 2 V for all ρ ∈ L(A). By definition
E(ϕA ) = ψ A , and the G-invariance of V and φE 2 implies that E ∈ COVG (A → A). Therefore,
ϕ can be converted to ψ by symmetric (i.e. covariant) operations. Following similar lines
we also get that ψ can be converted to ϕ by G-covariant operations. Hence, ψ and ϕ are
G-equivalent.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

736 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

For the converse part of the proof, suppose there exists a G-covariant channel mapping
ψ to ϕ and another G-covariant channel that maps ϕ to ψ. From the covariant version of
Stinespring delation theorem (see Theorem 15.2.3) there exists two isometries V1 : A → AE
and V2 : A → AẼ, each satisfying (15.51) for all g ∈ G, and with the property that

V1 |ψ⟩A = |ϕ⟩A |φ1 ⟩E (15.185)

V2 |ϕ⟩A = |ψ⟩A |φ2 ⟩Ẽ , (15.186)

for some φ1 , φ2 ∈ Pure(E). Since, V1 and V2 satisfy (15.51) for all g ∈ G, the two equations
above imply that for all g ∈ G

χψ (g) = χϕ (g)χφ1 (g)

(15.187)
χϕ (g) = χψ (g)χφ2 (g) .

Combining these two equations yields in particular that

χψ (g) = χψ (g)χφ1 (g)χφ2 (g)

(15.188)
χϕ (g) = χϕ (g)χφ1 (g)χφ2 (g)

First, if for all g ∈ G χψ (g) ̸= 0 and/or χϕ (g) ̸= 0 then χφ1 (g)χφ2 (g) = 1. Since the absolute
value of characteristic functions cannot exceed one, it follows that |χφ1 (g)| = |χφ2 (g)| = 1 for
all g ∈ G. Therefore, from Lemma C.7.1 we get that the states φ1 and φ2 are G-invariant
in this case. Second, suppose G is a compact Lie group and suppose by contradiction
that there exists g ∈ G such that χφ1 (g)χφ2 (g) ̸= 1. Then, from the continuity of the
characteristic function, there exists a neighbourhood C ⊂ G of g such that for all g ′ ∈ C
we have χφ1 (g ′ )χφ2 (g ′ ) ̸= 1. From (15.188) it then follows that χψ (g ′ ) = χϕ (g ′ ) = 0 for all
g ′ ∈ C in contradiction with the assumption that ψ and ϕ are G-regular. Therefore, also in
this case χφ1 (g)χφ2 (g) = 1 for all g ∈ G, so that φ1 and φ2 are G-invariant.
To summarize, in both cases we can express the characteristic functions of φ1 and φ2 as
χφ1 (g) = eiθg and χφ2 (g) = e−iθg , where g 7→ eiθg is a 1-dimensional unitary representations
of G (see Exercise C.7.1). Substituting this into (15.187) completes the proof.

As an example, consider the group U (1) and an arbitrary state

X
|ψ̃⟩ = λn |n⟩ , (15.189)
n∈[m]

where {|n⟩}n∈Z is the eigenbasis of the number operator. The characteristic function of ψ̃ is
given by X
χψ̃ (θ) = ψ̃ eiθN̂ ψ̃ = |λn |2 eiθn . (15.190)
n∈[m]

Since the characteristic function of ψ̃ depends only on the absolute values of the coefficients
{λn }n∈[m] , we get from Theorem 15.4.1 (particularly, the equivalence of the first and third

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.4. MANIPULATION OF PURE ASYMMETRY 737

statements of this theorem) that ψ̃ is unitarily G-equivalent to the state

X q
|ψ⟩ = pψ (n)|n⟩ (15.191)
n∈[m]

where pψ (n) := |λn |2 . Therefore, similar to the Schmidt decomposition in entanglement

theory, in the QRT of U (1) asymmetry, all pure states are unitarily U (1)-equivalent to a
state of the form above. Therefore, in this resource theory, the resource is characterized by
the probability distribution pψ : Z → [0, 1].
Every 1-dimensional unitary representation of U (1) is determined by some integer k ∈ Z
and a mapping θ 7→ eiθk . Therefore, according to Theorem 15.4.2 two pure state ψ and ϕ are
G-equivalent if and only if there exists k ∈ Z such that χψ (θ) = eiθk χϕ (θ) for all θ ∈ U (1).
This condition can be written as
X X
pψ (n)eiθn = eiθk pϕ (n)eiθn ∀ θ ∈ U (1) , (15.192)
n∈Z n∈Z

where pψ , pϕ : Z → [0, 1] are the probability distributions associated with ψ and ϕ, respec-
tively. Using the Fourier transform (see Exercise 15.4.4) we get that the above condition can
be expressed as
pψ (n) = pϕ (n + k) . (15.193)
As a specific example, observe that the states |ψ⟩ = √1 (|0⟩ + |1⟩) and |ϕ⟩ = √1 (|1⟩ + |2⟩) are
2 2
G-equivalent since in this case pψ (n) = pϕ (n − 1).

Exercise 15.4.4. Show that the condition in (15.192) is equivalent to one in (15.193). Hint:
Apply a Fourier transform on both sides of (15.192).

15.4.2 Deterministic Transformations

In this subsection, we will explore the conditions under which a pure asymmetric state can
be deterministically converted into another state by G-covariant operations. In entangle-
ment theory, we learned that such conversions under LOCC are determined by Nielsen’s
majorization theorem. However, we will show that in the QRT of asymmetry convertibility
is actually determined by the concept of a positive-definite function on a group. For readers
who are not familiar with this topic, we provide a review in Sec. C.8 of the appendix.

Theorem 15.4.3. Let ψ, ϕ ∈ Pure(A) be pure states. There exists a G-covariant

map E ∈ COVG (A → A) such that ϕ = E(ψ) if and only if there exists a positive
definite function f : G → C such that χψ (g) = χϕ (g)f (g) for all g ∈ G.

Remark. If χϕ (g) ̸= 0 for all g ∈ G then the theorem above states in this case that ψ can
be converted to ϕ by symmetric operations if and only if χψ (g)/χϕ (g) is a positive definite
function over G.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

738 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

G−COV
Proof. Suppose first that ψ −−−−−→ ϕ. In the derivation of the relation in (15.187), using
G−COV
the covariant Stinespring dilation theorem we showed that the condition ψ −−−−−→ ϕ implies
that there exists a pure state φ ∈ Pure(A) such that

χψ (g) = χϕ (g)χφ (g) . (15.194)

Hence, taking f (g) := χφ (g) we get χψ (g) = χϕ (g)f (g). Finally, observe that the character-
istic function χφ : GC is a normalized positive definite function over G (see Theorem C.8.1).
Conversely, suppose χψ (g) = χϕ (g)f (g) for some positive definite function f . Since
for g = e we get f (e) = χψ (e)/χϕ (e) = 1 the function f is normalized so that according
to Theorem C.8.1 it corresponds to some characteristic function f (g) = ⟨φ|UgE |φ⟩, where
g 7→ UgE is some unitary representation of G on some Hilbert space E. Moreover, there
exists a G-invariant state |0⟩ ∈ E whose characteristic function is constant and equal to
one for all group elements. Therefore, from the relation χψ (g) = χϕ (g)f (g) we get that
the states |ψ⟩A |0⟩E and |ϕ⟩A |φ⟩E have the same characteristic function. Therefore, there
exists a G-invariant unitary V : AE → AE such that V |ψ⟩ |0⟩ = |ϕ⟩A |φ⟩E . Taking
A E

the trace over E on both sides demonstrates that ψ can be converted to ϕ by a G-covariant
channel.

Exercise 15.4.5. Consider two states ψ, ϕ ∈ Pure(A) and suppose ψ has the property that
G−COV
χψ (g) = 0 for all g ∈ G such that g ̸= e. Show that ψ −−−−−→ ϕ. In other words, ψ with
such a property is a maximal resource state.

Example: The Cyclic Group Zn

Let n ∈ N be a fixed integer, and consider the group Zn , which represents a cyclic group of
order n. The group Zn is the group of integers {0, . . . , n − 1}, where the group operation is
addition modulo n, and 0 is the identity element of the group. It is well known that every
cyclic group or order n is isomorphic to Zn . Since Zn is an Abelian group, all of its irreps
are one-dimensional. Each irrep can be uniquely identified by an integer y ∈ Zn , under the
2πyx
action x 7→ ei n for all x ∈ Zn .
Consider a (non-projective) unitary representation of Zn in the space A = Cn , given for
all x ∈ Zn by x 7→ Ux , where X 2πyx
Ux := ei n |y⟩⟨y| . (15.195)
y∈Zn

Note that the above unitary representation of Zn composed of a direct sum of its irreps, each
occurring with multiplicity one. Pn−1 √
We would like to find the conditions under which the quantum pure state |ψ⟩ = x=0 px |x⟩
Pn−1 √
can be converted to another pure state |ϕ⟩ := x=0 qx |x⟩ by Zn -covariant operations. Ob-
serve that the characteristic function of |ψ⟩ is given for any x ∈ Zn by
X 2πyx
χψ (x) = ⟨ψ|Ux |ψ⟩ = py e i n . (15.196)
y∈Zn

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.4. MANIPULATION OF PURE ASYMMETRY 739

Similarly, χϕ (x) can be expressed as above with qy replacing py . The above equation demon-
strates that the characteristic function is nothing but the discrete Fourier transform of the
sequence {p0 , . . . , pn−1 }.
The theorem above implies that ψ can be converted to ϕ by Zn -covariant operations if
and only if the function x 7→ χψ (x)/χϕ (x) is a positive definite function over Zn . From
Exercise C.8.2 we have that a function f : Zn → C is positive definite if and only if its
Zn −COV
(discrete) Fourier transform is positive. We therefore conclude that ψ −− −−−→ ϕ if and only
if
X χψ (x) 2πxy
ei n ⩾ 0 ∀ y ∈ Zn . (15.197)
x∈Z
χ ϕ (x)
n

To illustrate the condition above, we consider now the case n = 2. For n = 2 the condition
above gives for y ∈ Z2 = {0, 1}

χψ (0) χψ (1) p0 − p1
0⩽ + (−1)y = 1 + (−1)y . (15.198)
χϕ (0) χϕ (1) q0 − q1

The condition above can be expressed as

|p0 − p1 |
⩽1 (15.199)
|q0 − q1 |

which is equivalent to
max{p0 , p1 } ⩽ max{q0 , q1 } . (15.200)
2 Z −COV
The condition we obtained for the case n = 2 can be expressed also as ψ −− −−−→ ϕ if
and only if q ≻ p where p := (p0 , p1 )T and q := (q0 , q1 )T . More generally, for arbitrary
Z2 −COV
integer n ∈ N we have that if ψ −− −−−→ ϕ then necessarily q ≻ p. To see why, observe that
the relation χψ (x) = χϕ (x)f (x) implies that f (0) = 1 and f (x) itself can be expressed as a
Fourier series
i2πzx
X
f (x) = rz e n (15.201)
z∈Zn

where rz ∈ R. Since f is positive definition over Zn , we must have that rz ⩾ 0 for all z ∈ Zn .
Since f (0) = 1 we conclude that {rz }z∈Zn is a probability distribution. Substituting the
above expression for f (x) into the relation χψ (x) = χϕ (x)f (x) gives
X 2πyx X 2π(z+w)x
py e i n = qw rz ei n . (15.202)
y∈Zn w,z∈Zn

Considering all summations and subtractions to be modulus n, we change variables on the

right hand side of the equation above by denoting y = z + w so that
X 2πyx X 2πyx
py ei n = qw ry−w ei n . (15.203)
y∈Zn y,w∈Zn

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

740 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

Hence, the equation above implies that for all y ∈ Zn

X
py = qw ry−w . (15.204)
w∈Zn

Denoting by p := (p0 , . . . , pn−1 )T , q := (q0 , . . . , qn−1 )T and by R the n × n matrix whose

(y, w) component is ry−w we can express the equation above as p = Rq. Observe that by
definition R is doubly stochastic so that q ≻ p.
The relation given in (15.205) can be used to provide alternative necessary and sufficient
Zn −COV
conditions for ψ −− −−−→ ϕ. First, note that by rewriting (??) with z := y − w we get that
Zn −COV
ψ −−−−−→ ϕ if and only if there exists r = (r0 , . . . , rn−1 )T ∈ Prob(n) such that
X
py = qy−z rz . (15.205)
z∈Zn

Next, observe that the equation above can be expressed simply as p = Qr, where Q is an
n × n matrix whose (y, z) component is qy−z . Hence, assuming Q is invertible we conclude
Zn −COV
that ψ −− −−−→ ϕ if and only if Q−1 p ⩾ 0, where the inequality is entry-wise. In order to
avoid the computation of Q−1 we can also use the Cramer’s rule as we discuss now.
The matrix Q as defined above is known as a circulant matrix. The eigenvalues of such
matrices are given by the discrete Fourier transforms. Specifically, the x-th eigenvalue of Q
is given by X 2πyx
λx (Q) = χϕ (x) = py e i n ∀ x ∈ Zn . (15.206)
y∈Zn

The matrix Q is also doubly stochastic so its determinant is in the interval [0, 1]. Therefore,
as long as χϕ (x) ̸= 0 for all x ∈ Zn we have det(Q) > 0. Next, for any x ∈ Zn let Qx be the
matrix obtained from Q by replacing the x-th column with the column (p0 , p1 , . . . , pn−1 )T .
Zn −COV
Then, assuming det(Q) > 0 we get from Cramer’s rule that ψ −− −−−→ ϕ if and only if
det(Qx ) ⩾ 0 for all x ∈ Zn .
As a specific example, consider the case n = 3. For this case the matrix Q has the form
 
q q q
 0 2 1
Q = q1 q0 q2  . (15.207)
 
 
q2 q1 q0

Observe that det(Q) ⩾ 0 with equality if and only if q0 = q1 = q2 . That is, if |ϕ⟩ ̸=
Z3 −COV
√1 (|0⟩ + |1⟩ + |2⟩) then det(Q) > 0. Hence, ψ −−−−−→ ϕ if and only if the following three
3
conditions hold:
det(Q0 ) = p0 (q02 − q1 q2 ) + p1 (q12 − q0 q2 ) + p2 (q22 − q0 q1 ) ⩾ 0
det(Q1 ) = p0 (q22 − q0 q1 ) + p1 (q02 − q1 q2 ) + p2 (q12 − q0 q2 ) ⩾ 0 (15.208)
det(Q2 ) = p0 (q12 − q0 q2 ) + p1 (q22 − q0 q1 ) + p2 (q02 − q1 q2 ) ⩾ 0 .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.4. MANIPULATION OF PURE ASYMMETRY 741

3 Z −COV
1. Show that if q1 = q2 then ψ −−−−−→ ϕ if and only if q ≻ p.

2. Show that for p = (5/12, 7/24, 7/24)T and q = (5/12, 1/3, 1/4)T it is not possible to
convert ψ to ϕ by Z3 -covariant operations even though q ≻ p.

15.4.3 Catalysis
In every resource theory, if the state ψ cannot be deterministically transformed into the
state ϕ using the limited set of operations, the use of a catalyst provides a potential solution.
As we explored in earlier chapters, a catalyst refers to an additional system that is initially
prepared in a state not compatible with the constraints of the resource theory but must be
restored to its original state at the conclusion of the process. An illustrative example can be
found in the resource theory of entanglement, as discussed in Sec. 12.2.3, where we observed
that certain conversions between states are prohibited under LOCC. However, by employing
LOCC alongside a suitable catalyst, such conversions become achievable.
This notion of catalysis vividly demonstrates the significant variations encountered within
the resource theory of asymmetry, contingent upon the choice of groups involved. Specifically,
we will demonstrate that a catalyst holds no utility for a connected compact Lie group,
whereas for a finite group, a catalyst always exists.

Theorem 15.4.4. Let ψ, ϕ ∈ Pure(A), and G be a group with a projective unitary

representation g 7→ UgA . Suppose further ψ cannot be converted to ϕ by G-covariant
operations. Then,

1. If G is a finite group, then there exists an ancillary system C along with a

projective unitary representation g 7→ UgC , and a state φ ∈ Pure(C), such that

G−COV
ψ A ⊗ φC −−−−−→ ϕA ⊗ φC . (15.209)

2. If G is a connected compact Lie group then (15.209) can never hold.

Proof. Suppose first that G is a finite group, and let g 7→ UgC be the regular representation
of G on the space C := C|G| = span{|g⟩ : g ∈ G} (see Sec. C.6). Fix an element h ∈ G and
let |φC ⟩ := |h⟩C . By the definition of the regular representation, we have that χφ (g) = δe,g , so
that (15.210) holds trivially. Therefore, for any h ∈ G, the state |φC ⟩ = |h⟩ satisfies (15.209).
We next prove that if G is a connected compact Lie group then the relation (15.209)
never holds. Suppose by contradiction that (15.209) does hold. Then, from Theorem 15.4.3
there exists a positive-definite function f : G → C such that χψ⊗φ (g) = χϕ⊗φ (g)f (g) for

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

742 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

all g ∈ G. Since the representation on system AC is given by g 7→ UgA ⊗ UgC , we have

χψ⊗φ (g) = χψ (g)χφ (g) and similarly χϕ⊗φ (g) = χϕ (g)χφ (g) so that

χψ (g)χφ (g) = χϕ (g)χφ (g)f (g) ∀g∈G. (15.210)

As discussed below Definition 15.4.2, since G is a connected compact Lie group, there exists
a neighbourhood, C, around the identity element of the group such that χφ (g) ̸= 0 for all
elements g ∈ C. Combining this with the equation above gives

χψ (g) = χϕ (g)f (g) ∀g∈C. (15.211)

However, since the functions χψ , χϕ , and f , are all analytic, the identity theorem in complex
analysis implies that the equality above holds for all g ∈ G. Hence, from Theorem 15.4.3
G−COV
we get that ψ −−−−−→ ϕ in contradiction with the asumption of the theorem that ψ cannot
be converted to ϕ by G-covariant operations. Hence, the relation (15.209) cannot hold if G
is a connected compact Lie group.

The existence of a catalyst for finite groups is a consequence of the fact that for finite
groups, it is possible to completely overcome the lack of a shared reference frame by sending
a single resource from Alice to Bob. In the proof presented above, the state |φC ⟩ := |h⟩C
serves as an “ultimate” resource that removes the restriction to G-covariant operations. To
understand why, let’s revisit the guessing probability given in (15.61).
Taking ρ = φC with the regular representation Ug |h⟩⟨h|Ug∗ = |gh⟩⟨gh| yields

1 X
Prguess (ρ, {Λg }g∈G ) = Tr [Λg |gh⟩⟨gh|] . (15.212)
|G| g∈G

Therefore, by choosing Λg := |gh⟩⟨gh|, we obtain Prguess (ρ, {Λg }g∈G ) = 1. Consequently,

upon receiving the state φC , Bob can determine the group element g that relates his reference
frame to Alice’s reference frame.
The above discussion demonstrates that for finite groups, the notion of catalysis is not
very useful as it merely reflects the existence of an ultimate resource for such groups, specif-
ically in relation to the regular representation. On the other hand, for connected Lie groups,
catalysis does not exist, and it remains an open problem whether catalysis can exist for
arbitrary compact Lie groups.

15.4.4 Probabilistic Transformations

In this subsection we study the conversion of a pure state to an ensemble of pure states
by G-covariant operations. Recall that every ensemble of pure states {px , ϕAx }x∈[m] can
characterized with a cq-state in D(AX) of the form
X
σ AX := px ϕA X
x ⊗ |x⟩⟨x| . (15.213)
x∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.4. MANIPULATION OF PURE ASYMMETRY 743

Given a pure state ψ ∈ Pure(A) we want to find the conditions under which the conversion
G−COV
ψ A −−−−−→ σ AX is posible.

G−COV
Theorem 15.4.5. Using the same notations as above, ψ A −−−−−→ σ AX if and only if
there exists normalized positive-definite and continuous (in the case of Lie group)
functions fx : G → C such that
X
χψ (g) = px fx (g)χϕx (g) . (15.214)
x∈[n]

Proof. From the covariant version of Stinespring dilation theorem, E ∈ COVG (A → AX)
if and only if there exists a system E, a projective unitary representation g 7→ UgE , and an
intertwiner isometry V : A → AXE such that for all η ∈ L(A) we have E(η) = TrE (V ηV ∗ ).
G−COV
Therefore, ψ A −−−−−→ σ AX if and only if there exists an intertwiner isometry V : A → AXE
such that
σ AX = E A→AX ψ A = TrE V ψ A V ∗ .

(15.215)
We first assume that such a covariant channel E A→AX exists, and prove the relation (15.214).
Indeed, the equation above implies that V |ψ A ⟩ is a purification of σ AX and therefore have
the form X√
V |ψ A ⟩ = px |ϕA X E
x ⟩|x⟩ |φx ⟩ , (15.216)
x∈[m]

for some orthonormal set {|φE x ⟩}x∈[m] in E. Since G acts trivially on system X, and since V
is an intertwiner we get that
V UgA |ψ A ⟩ = UgB ⊗ I X ⊗ UgE V |ψ A ⟩

X√
px UgA |ϕA X E E (15.217)
= x ⟩ ⊗ |x⟩ ⊗ Ug |φx ⟩
x∈[m]

Finally, taking the inner product between the two states in (15.216) and (15.217) gives
X
χψ (g) = px χϕx (g)χφx (g) . (15.218)
x∈[m]

Since fx (g) := χφx (g) is a positive-definitive function (see Theorem C.8.1) we get that (15.214)
holds.
Conversely, suppose (15.214) holds. From Theorem C.8.1 fx can be expressed as the
characteristic function of some state φE x . Without loss of generality we can assume that the
E′
states {|φx ⟩}x∈[m] are orthonormal since otherwise we can replace each |φE
E E
x ⟩ with |φx ⟩|x⟩ ,
where E ′ is another ancillary system upon which the group G acts trivially (so that |φE x⟩
E E′
and |φx ⟩|x⟩ have the same characteristic function). With this in mind, let
X√
|ϕAXE ⟩ := px |ϕA X E
x ⟩|x⟩ |φx ⟩ . (15.219)
x∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

744 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

Then, from (15.214) we get that χψ (g) = χϕ (g) for all g ∈ G. Moreover, there exists
a G-invariant state |0⟩ ∈ XE whose characteristic function is constant and equal to one
for all group elements. Therefore, from the relation χψ (g) = χϕ (g) we get that the states
|ψ A ⟩|0⟩XE and |ϕAXE ⟩ have the same characteristic function. Therefore, there exists a G-
invariant unitary V : AXE → AXE such that V |ψ A ⟩|0⟩XE = |ϕAXE ⟩. Taking the trace
over E on both sides demonstrates that ψ A can be converted to σ AX by a G-covariant
channel. This completes the proof.

Exercise 15.4.8. Prove the following corollary to the theorem above: The conversion
G−COV
ψ A −−−−−→ ϕA can be achieved with probability q if and only if there exists a normalized
positive definition function f : G → C such that χψ (g) − qf (g)χϕ (g) is positive definite.

15.5 Manipulation of Mixed Asymmetry

In this section we study interconversions among asymmetric mixed states in the single-
shot regime. Unlike like the pure-state case, for mixed state there is no simple criterion to
determine in one state can be converted to another by G-covariant operations. However, we
will see that the problem can be casted as an SDP optimization problem.
We start with the following observation about the Choi matrix JEAB of a G-covariant
channel E ∈ COVG (A → B). The G-covariance property implies that

E A→B = UgB ◦ E A→B ◦ Ug∗A ∀g∈G. (15.220)

Applying both sides of this equation to the maximally entangled state |ΩAÃ ⟩ gives

JEAB = UgB ◦ E Ã→B ◦ Ug∗Ã ΩAÃ

(2.91)→ = ŪgA ⊗ UgB ◦ E Ã→B ΩAÃ (15.221)

= ŪgA ⊗ UgB JEAB

Therefore, the matrix JEAB is a Choi matrix of a G-covariant channel if and only if
∗
ŪgA ⊗ UgB JEAB ŪgA ⊗ UgB = JEAB

∀ g ∈ G. (15.222)

That is, the Choi matrix JEAB is symmetric with respect to the projective unitary repre-
sentation g 7→ ŪgA ⊗ UgB . In this section, we will denote by G ∈ CPTP(AB → AB) the
G-twirling operation with respect to this representation, so that E is G-covariant if and only
AB AB
if G JE = JE .
With this property we can use Theorem 11.1.1 to get necessary and sufficient condi-
tions for a conversion of one mixed state to another by G-covariant operations. To apply

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.5. MANIPULATION OF MIXED ASYMMETRY 745

Theorem 11.1.1 for the case that F(A → B) = COVG (A → B), observe that
Tr η B E A→B ρA = Tr JEAB ρT ⊗ η B

sup sup
E∈COVG (A→B) E∈COVG (A→B)

Tr J AB G AB→AB ρT ⊗ η B

= sup
J∈Pos(AB)
(15.223)
J A =I A
↑
−Hmin (B|A)G (ρT ⊗η)
(7.147)→ = 2 .
G−COV
Therefore, Theorem 11.1.1 implies the following characterization of ρA −−−−−→ σ B .

Corollary 15.5.1. Let ρ ∈ D(A) and σ ∈ D(B). The following are equivalent:
G−COV
1. ρA −−−−−→ σ B .
↑ ↑
2. For all η ∈ D(B) we have Hmin (B|A)G(ρT ⊗η) ⩽ Hmin (B|B̃)G(σT ⊗η) .

While the condition outlined in the corollary holds theoretical significance, it falls short of
offering a practical methodology for assessing whether a quantum state ρA can be transformed
into another state σ B through G-covariant operations. To address this gap, a more applicable
criterion is derived from the condition presented in (11.6). For the context at hand, this
criterion is articulated in a specific format, which we encapsulate as a theorem for clarity
and ease of application.

Theorem 15.5.1. Let ρ ∈ D(A), σ ∈ D(B), and define

n o
f (ρ, σ) := min Tr[τ − ση] : τ A ⊗ I B ⩾ G ρT ⊗ η B . (15.224)
τ ∈Pos(A), η∈D(B)

G−COV
Then, ρA −−−−−→ σ B if and only if f (ρ, σ) ⩾ 0.

Remark. The optimization of the function f (ρ, σ) can be solved efficiently and algorithmically
with an SDP program.
The proof of the theorem above is based on the fact that σ = E(ρ) if and only if for all
Λ ∈ Herm(B) we have Tr[Λσ] = Tr[ΛE(ρ)]. This relation can be expressed as
Tr[Λσ] = Tr JEAB ρT ⊗ ΛB .

(15.225)
In the following exercise you use this to complete the proof.
Exercise 15.5.1. Use (11.6) and (15.223) to prove the theorem above.
G−COV
Exercise 15.5.2. Let ρ ∈ D(A) and σ ∈ D(B). Show that ρA −−−−−→ σ B if and only if
there exists F ∈ CPTP(A → B) such that for all g ∈ G

F Ug (ρ) = Ug (σ) . (15.226)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

746 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

Exercise 15.5.3. The conversion distance from ρ ∈ D(A) to σ ∈ D(B) is defined as

F
1 B
T ρ→
− σ := min σ − E A→B (ρA ) 1
. (15.227)
E∈COVG (A→B) 2

Use the trace distance property

1 B
σ − E A→B (ρA ) 1
= min Tr [Λ] , (15.228)
2 Λ∈Pos(B)
Λ⩾σ B −E A→B (ρA )

to show that the conversion distance can be expressed as the following SDP:

F
T ρ→
− σ = min Tr [Λ] (15.229)

subject to:
h i
B B AÃ T Ã
1. Λ ⩾ σ − TrA J ρ ⊗I .

2. J A = I A .

3. G(J AÃ ) = J AÃ .

4. Λ ∈ Pos(B), J ∈ Pos(AÃ).

15.6 Time Translation Symmetry

Time-translation symmetry, also known as time-translational invariance, is a fundamental
concept in physics that relates to the behavior of physical systems under shifts or translations
in time. It is a principle that states that the laws of physics remain unchanged or invariant
over time. In simpler terms, time-translation symmetry implies that the fundamental laws
of physics do not depend on the specific moment in time at which they are applied. This
means that if a physical experiment or process is performed today or tomorrow, under the
same conditions, the outcome should be the same.
The concept of time-translation symmetry is closely related (via Noether’s theorem)
to the conservation of energy, where the total energy of a closed system remains constant
over time. It provides a foundation for many important principles and theories in physics,
including the laws of motion, quantum mechanics, and relativity. Therefore, time-translation
symmetry is a fundamental symmetry in the universe, and it plays a crucial role in our
understanding of the laws governing the behavior of matter and energy over time. Given its
importance, we devote this section to study the resource theory of asymmetry with respect
to this time-translation symmetry.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.6. TIME TRANSLATION SYMMETRY 747

15.6.1 Time-Translation Covariant Operations

We say that a quantum state ρ ∈ D(A) is time-translation invariant with respect to a
Hamiltonian H A ∈ Pos(A) if, for all t ∈ R, the following equation holds:
A At
e−iH t ρA eiH = ρA . (15.230)
Note that ρA is time-translation invariant if and only if it commutes with the Hamiltonian.
This is why it is sometimes referred to in the literature as “quasi-classical,” as both the state
and the Hamiltonian are diagonal with respect to the same basis. Throughout this section,
we will alway work with the eigenbases of the Hamiltonians.
The Hamiltonian H A can be decomposed as
X
HA = ax ΠA
x , (15.231)
x∈[m]

where {ax }x∈[m] is the set of distinct eigenvalues of H A and each ΠA x is a projection to the
eigenspace of ax . Without loss of generality, we will assume that a1 < a2 < · · · < am (noting
that they are all distinct, allowing us to arrange {ax }x∈[m] in increasing order). With the
above form of H A , the state ρA is time-translation invariant, if and only if it takes the form:
X
ρ= px ρx , (15.232)
x∈[m]

where p ∈ Prob(m), ρx ∈ D(A), and ρx ρy = 0 for every x ̸= y. We will use the notation
INV(A) to denote the set of states in D(A) that are time-translation invariant.
Exercise 15.6.1. Prove the above form of ρ. Hint: supp(ρx ) ⊆ supp(Πx ).
Consider a quantum channel N ∈ CPTP(A → B), where systems A and B have cor-
responding Hamiltonians H A ∈ Pos(A) and H B ∈ Pos(B). The channel N is said to be
time-translation covariant if for all t ∈ R
A A
B B
N A→B e−iH t ρA eiH t = e−iH t N A→B (ρB )eiH t ∀t∈R. (15.233)

We will use the notation COV(A → B) to denote the set of all time-translation covariant
channels in CPTP(A → B).
In the Choi representation, the property given in (15.233) can be expressed as (see the
relation (15.222))
A B
A B
∗
e−iH̄ t ⊗ eiH t JNAB e−iH̄ t ⊗ eiH t = JNAB ∀t∈R. (15.234)

Note that H̄ A has the same eigenvalues as H A . For our purposes, we can replace H̄ A (in the
equation above) with H A , since it will not make any difference in our analysis. Therefore,
N ∈ COV(A → B) if and only if JNAB commutes with the operator
ξ AB := H A ⊗ I B − I A ⊗ H B . (15.235)
Therefore, the degeneracy of the energy levels of the operator ξ AB will play a key role in the
resource theory of time-translation asymmetry.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

748 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

The Pinching Channel

The set of channels COV(A → A) contains the pinching channel PH as defined in Sec. 3.5.12.
For H A as in (15.231), the pinching channel on system A is given by
X
PHA→A ρA := ΠA A A

x ρ Πx . (15.236)
x∈[m]

This pinching channel, also known as the “twirling channel” (as it is the G-twirling map
A
with respect to the group G = {eiH t }t∈R ), has the property that a state ρ ∈ D(A) is
quasi-classical if and only if PH (ρ) = ρ.
Exercise 15.6.2. Show that the condition PH (ρ) = ρ is equivalent to the condition that ρ
has the form given in (15.232).
Exercise 15.6.3. Let P ∈ CPTP(A → A) and P ′ ∈ CPTP(A′ → A′ ) be the pinching chan-
′
nel associated with the Hamiltonians H A and H A , respectively. Further, let N ∈ CPTP(A →
A′ ).
1. Show that P ′ ◦ N ◦ P ∈ COV(A → A′ ).

2. Show that if N ∈ COV(A → A′ ) then P ′ ◦ N = N ◦ P.

3. Show that if the Hamiltonian H A is non-degenerate then PHA→A = ∆A→A , where ∆A→A
is the completely dephasing channel as defined in Sec. 3.5.2.
Covariant channels can also be characterized in terms of the pinching channel. Consider
N ∈ CPTP(A → B) and let Pξ ∈ COV(AB → AB) be the pinching channel associated with
the operator ξ AB given in (15.235). Then, the quantum channel N A→B is time-translation
covariant if and only if its Choi matrix satisfies

PξAB→AB JNAB = JNAB .

(15.237)

This follows from Exercise 3.5.20 and our earlier observation that N ∈ COV(A → B) if and
only if its Choi matrix commutes with ξ AB .
The twirling channel can also be used to quantify time-translation asymmetry. For
example, the relative entropy distance of a quantum state ρ ∈ D(A) to its twirled state P(ρ)
is a time-translation asymmetry (sometimes referred to as coherence) measure given by

C(ρ) := D ρ PH (ρ) = H PH (ρ) − H(ρ) , (15.238)

where D(ρ∥σ) := Tr[ρ log ρ] − Tr[ρ log σ] is the Umegaki relative entropy and H(ρ) :=
−Tr[ρ log ρ] is the von-Neumann entropy. The above function is non-increasing under time-
translation covariant operations, and achieves its maximal value of log d (where d := |A|) on
the maximally coherent state |+⟩ := √1d x∈[d] |x⟩, where {|x⟩}x∈[d] is the energy eigenbasis.
P

We next move to characterize the set COV(A → B) in three different cases that depends on
the level of degeneracy of the Hamiltonians involved.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.6. TIME TRANSLATION SYMMETRY 749

The Case of Relatively Non-Degenerate Hamiltonians

Let H A and H B be the Hamiltonians of two systems A and B, of dimensions m := |A| and
n := |B|. The Hamiltonians can be expressed in their spectral decomposition as
X X
HA = ax |x⟩⟨x|A and H B = by |y⟩⟨y|B , (15.239)
x∈[m] y∈[n]

where {ax } and {by } are the energy eigenvalues of H A and H B , respectively.

Definition 15.6.1. We say that the Hamiltonians H A and H B , as defined

in (15.239), are relatively non-degenerate if for all x, x′ ∈ [m] and y, y ′ ∈ [n] we have

ax − ax ′ = b y − b y ′ ⇒ x = x′ and y = y ′ . (15.240)

If the condition above does not hold we say that the Hamiltonians are relatively
degenerate.

Note that if H A and H B are relatively non-degenerate, then each of them is also non-
degenerate. For example, suppose H A is degenerate with ax = ax′ for some x ̸= x′ ∈
[m]. Then, for y = y ′ we get ax − ax′ = 0 = by − by′ even though x ̸= x′ . Therefore,
relative non-degeneracy is a stronger notion than non-degeneracy. In fact, relative non-
degeneracy of H A and H B is equivalent to the non-degeneracy of the operator ξ AB as defined
in (15.235). Moreover, in the generic case in which H A and H B are arbitrary (chosen at
random) the Hamiltonians are relatively non-degenerate. For this case, time-translation
covariant channels have a very simple characterization.

Theorem 15.6.1. Let A and B be two physical systems with relatively

non-degenerate Hamiltonians. Then, N ∈ CPTP(A → B) is a time-translation
covariant channel if and only if

N A→B = ∆B→B ◦ N A→B ◦ ∆A→A , (15.241)

where ∆A→A and ∆B→B are the completely dephasing channels of systems A and B,
respectively. In other words, for physical systems with relatively non-degenerate
Hamiltonians only classical channels are time-translation covariant.

Proof. Since we assume that the Hamiltonians H A and H B are relatively non-degenerate
we get that the joint operator, ξ AB , is non-degenerate. Hence, JNAB is diagonal in the same
eigenbasis {|x⟩A |y⟩B }x∈[m],y∈[n] of ξ AB , so that
∆A→A ⊗ ∆B→B JNAB = JNAB .

(15.242)
The above equation describes the same relation as the one given in (15.241). Hence, N A→B
is a classical channel. This completes the proof.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

750 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

The Case of Bohr Spectrum

We consider now the case in which A = B. Therefore, since H A = H B we cannot apply
the characterization theorem above to this case. Instead, we will assume that H A has a
non-degenerate Bohr spectrum. In its spectral decomposition H A has the form
X
HA = ax |x⟩⟨x|A , (15.243)
x∈[m]

where {ax }x∈[m] is the set of distinct eigenvalues of H A .

Definition 15.6.2. We say that H A as given in (15.243) has a non-degenerate Bohr

spectrum if it has the property that for any x, y, x′ , y ′ ∈ [m]

ax − ay = ax′ − ay′ ⇐⇒ x = x′ and y = y ′ or x = y and x′ = y ′ ;

that is, there are no degeneracies in the nonzero differences of the energy levels of
H A.

Exercise 15.6.4. Show that H A has a non-degenerate Bohr spectrum if and only if all the
non-zero eigenvalues of the operator

ξ AÃ := H A ⊗ I Ã − I A ⊗ H Ã (15.244)

are distinct. In other words, H A has a non-degenerate Bohr spectrum if and only if the zero
eigenvalue of ξ AÃ is the sole eigenvalue with a multiplicity greater than one.

It is noteworthy that the vast majority of Hamiltonians exhibit a non-degenerate Bohr

spectrum, indicating that Hamiltonians lacking this feature are exceptionally rare, constitut-
ing a set of measure zero. This observation segues into a focused interest in time-translation
covariant channels that are compatible with non-degenerate Bohr spectra, offering a unique
area for characterization.

Theorem 15.6.2. Consider a Hamiltonian, H A , as outlined in (15.243), which is

characterized by having a non-degenerate Bohr spectrum, and let
N ∈ CPTP(A → A). The channel N ∈ COV(A → A) if and only if for all
x, x′ , y, y ′ ∈ [m], ⟨xx′ |JNAÃ |yy ′ ⟩ = 0 unless x = x′ and y = y ′ , or x = y and x′ = y ′ .

Remark. We will see below that even if the spectrum of the Hamiltonian H A has degeneracies,
any quantum channel N ∈ CPTP(A → A) whose Choi matrix has the form (15.249) is
necessarily time-translation covariant.

Proof. Following the same lines as in Theorem 15.6.1, by replacing H B with H A everywhere,
we get that a quantum channel N ∈ CPTP(A → A) is time-translation covariant if and only

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.6. TIME TRANSLATION SYMMETRY 751

if its Choi matrix JNAÃ commutes with the operator

X
ξ AÃ := H A ⊗ I Ã − I A ⊗ H Ã = (ax − ay )|x⟩⟨x|A ⊗ |y⟩⟨y|Ã . (15.245)
x,y∈[m]

Since H A has a non-degenerate Bohr spectrum, the set {ax − ay } that appear in the sum
above consists of distinct eigenvalues, as we only consider indices x, y ∈ [m] that satisfy
y ̸= x. We therefore conclude that the pinching channel Pξ ∈ CPTP(AÃ → AÃ) associated
with the operator ξ AÃ is given by
X
Pξ (·) = Π(·)Π + Pxy (·)Pxy , (15.246)
x,y∈[m]
x̸=y

where X
Pxy := |xy⟩⟨xy| and Π := |xx⟩⟨xx| . (15.247)
x∈[m]

Observe that Π is the

projection
to the zero eigenspace of ξ AÃ . With these notations the
condition JNAÃ = Pξ JNAÃ is equivalent to
X
JNAÃ = ΠJNAÃ Π + Pxy JNAÃ Pxy . (15.248)
x,y∈[m]
x̸=y

Observe that Choi matrix JNAÃ satisfies the condition above if and only if ⟨xx′ |JNAÃ |yy ′ ⟩ = 0
unless x = x′ and y = y ′ , or x = y and x′ = y ′ . This completes the proof.

The condition in the theorem is equivalent to the statement that the Choi matrix has
the form X
AÃ AÃ AÃ
JN = py|x |xy⟩⟨xy| + (1 − δxy )qxy |xx⟩⟨yy| , (15.249)
x,y∈[m]

where qxy := xx JNAÃ yy and py|x := ⟨xy|JNAÃ |xy⟩. Observe that by definition px|x = qxx
for all x ∈ [m]. Given that JNAB is the Choi matrix of a quantum channel, it implies
certain properties for the coefficients {py|x }x,y∈[m] and the matrix QN , which consists of the
components qxy . Specifically, the first term in the right hand side of the equation above
corresponds to ΠJNAÃ Π, and the second term is a sum over all Pxy JNAÃ Pxy . Therefore, from
the condition ΠJNAÃ Π ⩾ 0 we get that QN ⩾ 0, where QN is the matrix whose components
are qxy . Similarly, the condition Pxy JNAÃ Pxy ⩾ 0 implies that py|x ⩾ 0. Thus, we conclude
that JNAÃ ⩾ 0 if and only if QN ⩾ 0 and eachPpy|x ⩾ 0. Finally, the remaining condition
JNA = I A implies that for all x ∈ [m] we have y∈[m] py|x = 1. To summarize, the theorem
above implies that N ∈ COV(A → A) if its Choi matrix has the form (15.249), with QN ⩾ 0
and {py|x }x,y∈[m] being a conditional probability distribution.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

752 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

15.6.2 Exact State Conversion

In this section, we examine the precise state conversions for each of the two distinct degen-
eracies we discussed above of the Hamiltonians involved. Similar to previous sections, we
COV
denote by ρ −−−→ σ the conversion of a quantum state ρ to another quantum state σ through
covariant operations. It is worth mentioning that there are numerous other significant ex-
amples of Hamiltonians whose spectra are either degenerate or do not satisfy the condition
stated in Definition 15.6.2. The QRT of time-translation asymmetry with such Hamiltonians
is still in the process of being fully developed and remains an active area of research.

The Case of Relatively Non-Degenerate Hamiltonians

First, we consider the conversion of a state ρ ∈ D(A) to a state σ ∈ D(B) by covariant
operation, with Hamiltonians H A and H B that are relatively non-degenerate. As we saw in
Theorem 15.6.1 the set COV(A → B) consists of all classical channels in CPTP(A → B),
with respect to the eigen-bases of the Hamiltonians H A and H B . Therefore, in this case,
COV
ρA −−−→ σ if and only if σ B is classical.
COV
Exercise 15.6.5. Prove the statement above. That is, show that ρA −−−→ σ B if and only if
σ B = ∆B (σ B ).

The Case of Bohr Spectrum

Next, we explore the exact single-shot interconversions of systems whose Hamiltonians pos-
sess a non-degenerate Bohr spectrum. This problem is more challenging compared to the
similar problem involving relatively non-degenerate Hamiltonians. We will use the notation
{|x⟩A }x∈[m] to represent the energy eigenbasis of a Hamiltonian H A , and consider two density
matrices in D(A):
X X
ρA = rxx′ |x⟩⟨x′ |A and σ A = sxx′ |x⟩⟨x′ |A (15.250)
x,x′ ∈[m] x,x′ ∈[m]

with components {rxx′ } and {sxx′ }, respectively. In the theorem below we assume that
rxx′ ̸= 0 for all x, x′ ∈ [m], and define the m × m matrix Q, with components
 n o
min 1, sxx if x = y
rxx
qxy := sxy (15.251)

rxy
otherwise.

Theorem 15.6.3. Let ρ, σ ∈ D(A) be as in (15.250) with rxx′ ̸= 0 for all x, x′ ∈ [m],
and suppose the Hamiltonian H A has a non-degenerate Bohr spectrum. Then, the
following statements are equivalent:

1. There exists E ∈ COV(A → A) such that σ = E(ρ).

2. The matrix Q as defined in (15.251) is positive semidefinite.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.6. TIME TRANSLATION SYMMETRY 753

Remark. We will see in the proof below that the second statement implies the first statement
even if the Hamiltonian H A has a degenerate Bohr spectrum. Moreover, we will see that if
rxy = 0 for some off diagonal terms (i.e. x ̸= y) then sxy must also be zero. However, in this
case, for any x ̸= y ∈ [m] with rxy = 0, the components of qxy can be arbitrary. This means
that in this case the condition becomes cumbersome, as we will need to require that there
exists Q as defined above but with no restriction on the components qxy for which rxy = 0.

Proof. From Theorem 15.6.2 and the preceeding discussion below (15.249), it follows that
there exists N ∈ COV(A → A) such that σ = N (ρ) if and only if there exists a conditional
probability distribution {py|x }x,y∈[m] , and an m×m positive semidefinite matrix Q, such that
h i
σ = N (ρ) = TrA JNAÃ (ρT ⊗ I Ã )
X X
= py|x rxx |y⟩⟨y| + qxy rxy |x⟩⟨y| (15.252)
x,y∈[m] x̸=y
x,y∈[m]

That is, σ = N (ρ) if and only if

X
syy = py|x rxx ∀ y ∈ [m] and
x∈[m] (15.253)
sxy = qxy rxy ∀ x ̸= y ∈ [m] .

Hence, for the off diagonal terms, sxy = 0 whenever rxy = 0. Since we assume that that
all the off-diagonal terms of ρ are non-zero, i.e. rxy ̸= 0 for x ̸= y, there is no freedom left
in the choice of the off diagonal terms of QN and we must have qxy = srxy xy
. Since QN must
be positive Psemidefinite we will maximize its diagonal terms {px|x }x∈[m] given the constraint
that syy = x∈[m] py|x rxx . This constraint immediately gives syy ⩾ py|y ryy so that we must
have py|y ⩽ sryy
yy
. Clearly, we also have py|y ⩽ 1 so we conclude that

syy
py|y ⩽ min 1, . (15.254)
ryy

Remarkably, this condition

n is sufficient since there exists conditional probabilities {py|x },
o
syy P
with both py|y = min 1, ryy and syy = x∈[m] py|x rxx . Indeed, for simplicity set rx := rxx
and sx := sxx , and define
( n o
min 1, srxx if x = y
py|x := (15.255)
1
µrx
(s y − r y )+ (rx − s x )+ otherwise

where
X 1
µ := (sy − ry )+ = ∥s − r∥1 , (15.256)
2
y∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

754 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

and we used the notation (sy −ry )+ := sy −ry if sP

y ⩾ ry and (sy −ry )+ := 0 P
if sy < ry . Clearly,
py|x ⩾ 0, and it is straightforward to check that y∈[m] py|x = 1 and sy = x∈[m] py|x rx ; that
is, the above conditional probability distribution satisfies all the required conditions. This
completes the proof.
Observe that if H A has degenerate Bohr spectrum and Q ⩾ 0 then we still get that the
Choi matrix of the form (15.249) (with py|x as in (15.255) and qxy as in (15.251)) corresponds
to a quantum channel N ∈ CPTP(A → A) with the property that σ = N (ρ). As discussed
below the proof of Theorem 15.6.2, all channels with a Choi matrix of the form (15.249) are
time-translation covariant. Hence, N ∈ COV(A → A).
Exercise 15.6.6. In the proof above we saw that if rxy = 0 for some x ̸= y then σ = E(ρ)
for some E∈ COV(A
 → A) only if sxy = 0. Use this to show that if ρ has a block diagonal
ρ̃ 0
form ρ =  , and if it can be converted by a time-translation covariant channel to σ,
0 0
 
σ̃ 0
then σ must have the form σ =   where D is some diagonal matrix.
0 D

Exercise 15.6.7. Show that J AB as given in (15.249) is positive semidefinite if and only if
both py|x ⩾ 0 for all x and y, and Q ⩾ 0.
Exercise 15.6.8. Show that the coefficients {py|x } as defined in (15.255) satisfy
X X
py|x = 1 and sy = py|x rx ∀ y ∈ [m] . (15.257)
y∈[m] x∈[m]

Example: The Qubit Case

For the case that |A| = 2, all non-degenerate Hamiltonians (i.e., Hamiltonians with two
distinct eigenvalues) have a Bohr spectrum. Let
   
a z b w
ρ=  and σ =   (15.258)
z̄ 1 − a w̄ 1 − b

be two qubit states. Without loss of generality suppose that a ⩾ b. In this case the matrix
Q can be expressed as  
b w
a z

Q=  , (15.259)
 
 
w̄
z̄
1
and Q ⩾ 0 if and only if
b w 2
⩾ . (15.260)
a z

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

15.6. TIME TRANSLATION SYMMETRY 755

COV
Therefore, ρ −−−→ σ if and only if ν(ρ) ⩾ ν(σ), where ν : D(A) → R+ is a measure of qubit
time-translation-asymmetry defined on every density matrix of the form (15.258) as
|z|2
ν(ρ) := . (15.261)
a
p
If ρ is a pure state, so that |z| = a(1 − a), then ν(ρ) ⩾ ν(σ) holds if and only if
|w|2 ⩽ b(1 − a). Note that |w|2 ⩽ b(1 − b) since σ ⩾ 0. Therefore, by taking
|w|2

a ∈ b, 1 − (15.262)
b
we get |w|2 ⩽ b(1 − a) and also a ⩾ b. Hence, for any mixed state σ there exists a pure state
ψ that can be converted to σ.
On the other hand, if σ is pure (i.e. |w|2 = b(1 − b)) and ρ arbitrary qubit, then the
condition in (15.260) becomes
|z|2 ⩾ a(1 − b) . (15.263)
Since ρ ⩾ 0 we also have |z|2 ⩽ b(1 − b). Combining both equation we find that the only way
ρ can be converted to a pure qubit state σ is if b = a (since a ⩾ b was the initial assumption)
and |z|2 = a(1 − a). That is, ρ is a pure state itself, and up to a diagonal unitary equals
to σ. Hence, pure coherence cannot be obtained from mixed coherence, and deterministic
interconversion among inequivalent pure resources is not possible.
The example above shows that there is no unique “golden unit” that can be used as the
ultimate resource in two dimensional systems. Instead, any pure resource (i.e. pure state
that is not an energy eigenstate) is maximal in the sense that there is no other resource that
can be converted into it. However, the set of all pure qubit resources is maximal (i.e. any
mixed state can be reached from some pure state by translation covariant operations). We
now show that this latter property holds in general.

Corollary 15.6.1. Let σ ∈ D(A) be an arbitrary state, and denote by px := ⟨x|σ|x⟩

the diagonal elements of σ in the energy eigenbasis {|x⟩}x∈[m] of system A. Then, the
pure quantum state X√
|ψ⟩ := px |x⟩ (15.264)
x∈[m]

can be converted to σ by a time-translation covariant channel.

Proof. Observe that the diagonal elements of Q are all 1, and the off-diagonal terms are
given by
σxy
qxy = √ ∀ x, y ∈ [m] , x ̸= y. (15.265)
px p y
Therefore, we can express Q = Dp−1 σDp−1 , where Dp is the diagonal matrix whose diagonal
√ √
is ( p1 , ..., pm ). Since Dp > 0 and σ ⩾ 0 it follows that Q ⩾ 0. This completes the
proof.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

756 CHAPTER 15. THE RESOURCE THEORY OF ASYMMETRY

Exercise 15.6.9. Show that if ρ and σ are two distinct pure states and both have non-zero
off-diagonal terms (with respect to the energy eigenbasis) then the matrix Q is not positive
semidefinite.

15.7 Notes and References

An outstanding review article on reference frames and superselection rules in quantum infor-
mation can be found in [12]. The theory of quantum reference frames as a resource theory
was initially introduced in [97] and further developed as the resource theory of asymmetry
in [156, 158]. The Kraus representation of a G-covariant map was presented in [97], which
was later utilized in [154] to derive the covariant version of Stinespring’s dilation theorem.
The review article [12] provides numerous references on the advancement of techniques
for aligning reference frames. In particular, we adopted the group theoretical approach to
frame alignment developed in [45].
The concept of relative entropy of asymmetry, initially referred to as G-asymmetry, was
introduced in [213] and further developed in [93]. Other measures of asymmetry, including
several derivatives of asymmetry, were investigated in [157].
The study of pure-state asymmetry originated in [97] for specific groups and was subse-
quently extended to all finite or compact Lie groups in [156, 158]. However, the manipula-
tion of mixed-state asymmetry, particularly in the context of approximate state conversions,
remains poorly understood. Nonetheless, the recognition that the exact state conversion
problem can be solved using semidefinite programming (refer to Theorem 15.5.1) was first
introduced in [91].
Exact conversions under time-translation covariant transformations were investigated
in [89]. The positivity of the matrix Q in Theorem 15.6.3 was discovered earlier in [168]
through a slightly different approach. Lastly, the asymptotic regime under periodic Hamil-
tonians was recently studied in [155], where it was demonstrated that the quantum Fisher
information can be interpreted as the cost of time-translation asymmetry.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

CHAPTER 16

The Resource Theory of

Nonuniformity

This chapter introduces a specialized sub-theory of quantum thermodynamics, which will

receive further attention in the subsequent chapter. This theory operates under the premise
that the environment exhibits a high level of “noise,” leading physical systems to inherently
evolve towards a maximally mixed state. Within this framework, systems that have reached
maximally mixed states are deemed readily accessible and free, whereas pure states are
considered valuable resources. Consequently, this theory is often termed the resource theory
of purity or nonuniformity, emphasizing that a state’s value increases with its deviation from
the uniform (maximally) mixed state.
In comparison to the broader field of quantum thermodynamics, the QRT of nonunifor-
mity presents a more streamlined approach, facilitating simpler calculations of conversion
rates and resource monotones. Notably, the profound link between majorization and relative
majorization, explored in Sec. 4.3, allows numerous findings in the QRT of athermality to
be directly inferred as corollaries from theorems within the QRT of nonuniformity. This
interconnection suggests a logical progression: by initially laying down the principles and
methodologies inherent to the QRT of nonuniformity, we can seamlessly apply these insights
to thermodynamic systems. This approach not only streamlines our exploration but will also
enriche our overall comprehension of quantum thermodynamics.

16.1 The Free Operations

1 A
As discussed above, for any system A, we consider the maximally mixed state uA := |A| I
A
to be the free state of the theory. That is, the set of free states F(A) := {u } consists of a
single state. The set of free operations of the QRT of nonuniformity consists of physically
implementable operations (see Sec. 9.2.3) relative to this set of states. They are called
completely factorizable channels.

757
758 CHAPTER 16. THE RESOURCE THEORY OF NONUNIFORMITY

16.1.1 Completely Factorizable Channels

Definition 16.1.1. A quantum channel N ∈ CPTP(A → A′ ) is said to be

completely factorizable if there exist systems B and B ′ and a unitary channel
U ∈ CPTP(AB → A′ B ′ ) such that |AB| = |A′ B ′ | and
h i
A→A′ A AB→A′ B ′ A B

N ρ = TrB U ′ ρ ⊗u ∀ ρ ∈ L(A) . (16.1)

′ ′
Note that a completely factorizable channel N A→A has the property that N A→A uA =
′
uA . That is, factorizable channels take maximally mixed states to maximally mixed states.
In particular, if |A| = |A′ | then a completely factorizable channel is unital. However, as we
will see shortly, not all unital channels are completely factorizable.

Theorem 16.1.1. Let N ∈ CPTP(A → A) be a mixture of unitaries of the form

X
N A→A = px UxA→A (16.2)
x∈[ℓ]

where each UxA→A is a unitary channel, ℓ ∈ N, and p := (p1 , . . . , pℓ )T is a probability

vector in Qn (i.e. each px is a non-negative rational number). Then, N A→A is
completely factorizable.

Proof. Since all the {px }x∈[ℓ] are rational, there exists a common m ∈ N and ℓ
denominatorP
integers {mx }x∈[ℓ] such that px = mmx , and in particular x∈[ℓ] mx = m since x∈[ℓ] px = 1.
P

U AB |x⟩A |y⟩B = UkAy |x⟩A |y⟩B (16.3)

where ky is the integer in [ℓ] satisfying

X X
mj ⩽ y < mj (16.4)
j∈[ky −1] j∈[ky ]

(note that ky depends on y). That is, U AB is a controlled unitary that its action on A
depends on the input of system B. Using the notation U AB→AB := U AB (·)U ∗AB get that for
all ω ∈ L(A)
1 X
TrB U AB→AB ω A ⊗ uB = TrB U AB→AB ω A ⊗ |y⟩⟨y|B

m
y∈[m]
(16.5)
1 X A→A A
= Uk y ω ,
m
y∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

16.1. THE FREE OPERATIONS 759

where we used the definition of U AB above. Now, observe that from the definition of ky , for
any x ∈ [ℓ] there exists mx values of y ∈ [m] for which ky = x. Therefore, continuing from
the last line above we get
X mx A→A A
TrB U AB→AB ω A ⊗ uB =

U ω
m x
x∈[ℓ]
X (16.6)
= px UxA→A (ω A ) .
x∈[ℓ]

px UxA→A is a noisy operation. This completes the proof.

P
Hence, x∈[ℓ]

16.1.2 Noisy Operations

In the proof of Theorem (16.1.1), we made the assumption that the coefficients {px }x∈[ℓ]
are rational numbers. Additionally, it should be noted that the dimension m of system B
is determined by the common denominator of these rational coefficients. Consequently, we
cannot employ a continuity argument to establish that any mixture of unitaries, potentially
with irrational coefficients, qualifies as a completely factorizable channel. This limitation
arises because the dimension of system B tends to infinity when the rational coefficients
{px }x∈[ℓ] approach irrational numbers.
The aforementioned issue stems from the fact that the system B appearing in the def-
inition of completely factorizable channels, while finite, is yet unbounded. Consequently,
it is possible, in principle, to define a sequence of completely factorizable channels Nj ∈
CPTP(A → A′ ), which can only be realized using systems Bj of increasing dimensions as j
grows. Specifically, we observe that limj→∞ |Bj | = ∞. In other words, the set encompassing
all completely factorizable channels in CPTP(A → A′ ) is not closed. However, this unphysi-
cal attribute can be addressed by considering the closure of the set of completely factorizable
channels.

Definition 16.1.2. A quantum channel N ∈ CPTP(A → A′ ) is called a noisy

operation if there exists a sequence of completely factorizable channels
{Nk }k∈N ⊂ CPTP(A → A′ ) such that
′ ′
lim NkA→A = N A→A . (16.7)
k→∞

The set of all noisy operations in CPTP(A → A′ ) is denoted by Noisy(A → A′ ).

Remark. The limit (16.7) is understood in terms of the Choi matrices. That is, the rela-
tion (16.7) means that
′ ′
lim JNAAk
− JNAA = 0 . (16.8)
k→∞ 1
Note that by definition the set of noisy operations is closed. Moreover, the set of noisy op-
erations in CPTP(A → A) forms a subset of unital channels (see Exercise 16.1.1). However,

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

760 CHAPTER 16. THE RESOURCE THEORY OF NONUNIFORMITY

it can be shown that not every unital channel is a noisy operation, so that noisy operations
forms a strict subset of unital channels.

Exercise 16.1.1. Show that if N ∈ Noisy(A → A) then N A→A is a unital channel.

Theorem 16.1.2. Any mixture of unitary channels in CPTP(A → A) is a noisy

operation.

Exercise 16.1.2. Use Theorem 16.1.1 and the definition of noisy operations to prove the
theorem above.

16.1.3 The Structure of the QRT of Nonuniformity

We define the resource theory of nonuniformity as a framework where the set of free opera-
tions is noisy operations; i.e., for any pair of systems A and A′ , F(A → A′ ) = Noisy(A → A′ ).
In this resource theory, we can treat all states and operations as classical without any loss of
generality. To see this, observe first that any quantum state ρA can be converted into a diago-
nal state in the same basis by applying a unitary channel. Such a unitary channel constitutes
a reversible noisy operation within the set of free operations. Hence, all resource states in
D(A) can be represented by diagonal density matrices in the same basis. We therefore fix
a basis and denote the completely dephasing channel in this basis as ∆A ∈ CPTP(A → A).
To summarize, without loss of generality we can assume that all resources are characterized
by states satisfying ∆A (ρA ) = ρA .
Now, suppose we can convert ρ ∈ D(A) to σ ∈ D(B) using a free noisy operation
N ∈ CPTP(A → B). Since we assume that ρA = ∆A (ρA ) and σ B = ∆B (σ B ), it follows that

σ B = N A→B ρA

σ is diagonal→ = ∆B ◦ N A→B ρA

(16.9)
ρ is diagonal→ = ∆B ◦ N A→B ◦ ∆A ρA .

Hence, if ρA can be converted to σ B using a free quantum channel N ∈ Noisy(A → B),

then ρA can also be mapped to σ B using the classical channel ∆B ◦ N A→B ◦ ∆A . The latter
channel is also a noisy operation since ∆A and ∆B are themselves noisy operations; they can
be expressed as random unitary channels (see Exercise 3.5.23). Therefore, all resources and
free channels can be characterized using classical systems and classical channels.
Finally, considering that all pure states in a fixed dimension are equivalent, we select the
qubit pure state |0⟩⟨0|, where |0⟩ ∈ C2 , as the chosen reference state for this resource theory,
serving as the golden unit.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

16.2. MEASURES OF NONUNIFORMITY 761

16.2 Measures of Nonuniformity

A measure of nonuniformity was defined earlier in Def. 5.1.3. On the other hand, according
to the definition of a resource measure, a measure of nonuniformity is a function
[
g: D(A) → R ∪ {∞} (16.10)
A

that is non-increasing under noisy operations and take the value zero on free states. To see
that both definitions are equivalent, observe first that since we consider only diagonal states
(in the same basis) we can replace D(A) above with the classical set Prob(d), where d := |A|.
Due to Corollary 16.3.1, the monotonicity of g under noisy operation is equivalent to
the Schur concavity of g and to the third condition in Def. 5.1.3. The only additional
assumption that we added in Def. 5.1.3 is that g is continuous. This assumption is crucial
for the bijection between divergences and measures of non-uniformity (see Theorem 5.1.3),
and we will assume it also here.
The bijection given in Theorem 5.1.3 demonstrates that all measures of nonuniformity
can be expressed as

g(p) = D p u(d)

∀ d ∈ N ∀ p ∈ Prob(d) . (16.11)

where D is a classical divergence. Therefore, all the divergences and relative entropies that
introduced in Chapters 5 and 6 can be used to quantify nonuniformity. A particular useful
one is the nonuniformity measure obtained by taking D to be the KL-divergence. In this
case, for all p ∈ Prob(n) we have

g(p) = D p u(d) = log(d) − H(p) ,

(16.12)

where H is the Shannon entropy. Similarly, for the Rényi divergences we have for all α ∈
[0, ∞]
gα (p) = Dα p u(d) = log(d) − Hα (p) .

(16.13)
It is worth mentioning that for pure states, specifically when taking p = (1, 0, . . . , 0)T , we
get that gα (p) = log(d). This implies that the nonuniformity of pure states increases with
the dimension d.

16.3 Interconversions in the Single-Shot Regime

In this section, we delve into the interconversions between nonuniformity states in the single-
shot regime, considering both exact and approximate scenarios. We will demonstrate that,
similar to pure bipartite entanglement, majorization plays a crucial role in determining these
Noisy
interconversions. We will use the notations ρ −−−→ σ whenever σ = N (ρ) for some noisy
operation N ∈ Noisy(A → A).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

762 CHAPTER 16. THE RESOURCE THEORY OF NONUNIFORMITY

16.3.1 Exact Deterministic Conversions

Noisy
Theorem 16.3.1. Let ρ, σ ∈ D(A). Then, ρ −−−→ σ if and only if ρ ≻ σ.

Proof. Suppose σ = N (ρ) for some noisy operation N ∈ Noisy(A → A). Since a noisy
operation N ∈ Noisy(A → A) is also a unital channel, from Section 3.5.9 it follows that
ρ ≻ σ. In the same subsection we also proved that ρ ≻ σ if and only if there exist a random
unitary channel that take ρ to σ. From the Theorem 16.1.2, this random unitary is also a
noisy operation, so the proof is concluded.

The theorem above can be slightly modified to accommodate systems of different di-
mensions. In particular, if ρ ∈ D(A) and σ ∈ D(B) then σ B = N A→B (ρA ) for some noisy
operation N ∈ Noisy(A → B) if and only if ρA ⊗ uB ≻ uA ⊗ σ B . This is because appending
a maximally mixed state is a reversible free operation.
From here onward we consider the ‘states’ of the QRT of nonuniformity to be probability
vectors in Prob(d). Therefore, from the theorem and the discussion above it follows that for
two given states p ∈ Prob(d) and q ∈ Prob(d′ ) we have
Noisy ′

p −−−→ q ⇐⇒ p, u(d) ≻ q, u(d ) . (16.14)

That is, conversion under noisy operations induce a pre-order that can be characterized with
relative majorization. Note that if d = d′ this pre-order reduces to the standard definition
of majorization, however, for d ̸= d′ it is not equivalent to majorization between p and q.
In particular, embedding a state, say p ∈ Prob(d), in a higher dimensional space Prob(d′ )
with d′ > d (by adding zero components) can increase the resourcefulness of p. Therefore,
such embeddings are not free.

Exercise 16.3.1. Let q = (1/2, 1/2, 0, 0)T be the vector obtained from the uniform state u(2)
by adding two zeros. Show that q can be converted by noisy operations to any state in D(2).

16.3.2 The Conversion Distance

Following the general definition given in (11.24), we define the conversion distance of nonuni-
formity, from a state p ∈ Prob(d) into a state q ∈ Prob(d′ ) as

Noisy
1 (d) (d′ )
T p −−−→ q := min ′ ∥q − r∥1 : (p, u ) ≻ (r, u ) . (16.15)
r∈Prob(d ) 2

From the properties of the conversion distance (see for example Lemma 11.1.1), it follows
Noisy
that T (p −−−→ q) remains invariant under any permutation of the components of p or q.
Therefore, in the rest of this chapter we will always assume without loss of generality that
p = p↓ and q = q↓ .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

16.3. INTERCONVERSIONS IN THE SINGLE-SHOT REGIME 763

Theorem 16.3.2. Let p, q ∈ Prob(d) be two probability vectors. Then,

Noisy
T p −−−→ q = max ∥q∥(ℓ) − ∥p∥(ℓ) . (16.16)
ℓ∈[d]

Remark. The case that p ∈ Prob(d) and q ∈ Prob(d′ ) with d ̸= d′ can be solved by applying
′
the theorem above to the vectors p ⊗ u(d ) and u(d) ⊗ q. Specifically,
n o
Noisy (d) (d′ )
T p −−−→ q = max′ u ⊗ q (ℓ) − p ⊗ u . (16.17)
ℓ∈[dd ] (ℓ)

Proof. Since we consider the case that both p and q are d-dimensional, the conversion
distance can be expressed as

Noisy
1
T p −−−→ q = min ∥q − r∥1 : p ≻ r . (16.18)
r∈Prob(d) 2

The above expression for the conversion distance represents the distance of q to the set
majo(p) as defined in (4.96) (with p replacing q). Hence,

Noisy
T p −−−→ q = T q, majo(p)
(16.19)
Theorem 4.2.4→ = max ∥q∥(ℓ) − ∥p∥(ℓ) .
ℓ∈[n]

This completes the proof.

16.3.3 The Single-Shot Nonuniformity Cost

In order to define the single-shot nonuniformity cost of a resource state p ∈ Prob↓ (d), we
(m)
first recall that the vector e1 , defined as (1, 0, . . . , 0)T ∈ Prob↓ (m), represents the maximal
(m)
resource in dimension m. Moreover, the resourcefulness of e1 increases with m, and the
(m)
set {e1 }m∈N forms a golden unit according to Definition 11.1.1. Therefore, we define the
ε-nonuniformity cost of p as:
n o
ε (m) Noisy
Cost (p) := min log m : T e1 −−−→ p ⩽ ε . (16.20)

ε (d) Noisy
Clearly, Cost (p) ⩽ log d since T e1 −−−→
p = 0. We will therefore assume (implicitly)
ε
in the rest of this section that m ⩽ d. In the following theorem we use the notation Hmin (p)
for the smoothed min-entropy of p. In (10.147) we found a closed form for this smoothed
entropy given by:
∥p∥(ℓ) − ε

ε
Hmin (p) = − log max . (16.21)
ℓ∈[d] ℓ

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

764 CHAPTER 16. THE RESOURCE THEORY OF NONUNIFORMITY

Theorem 16.3.3. Let ε ∈ (0, 1) and p ∈ Prob↓ (d). The ε-nonuniformity cost of p is
given by
ε
Costε (p) = log d2−Hmin (p) .

(16.22)

Proof. We first prove the theorem for the case ε = 0. In this case,
n o
(m) Noisy
Costε=0 (p) := min log m : e1 −−−→ p . (16.23)

(m) Noisy (m)

The condition e1 −−−→ p is equivalent to (e1 , u(m) ) ≻ (p, u(d) ). Moreover, in Exer-
(m)
cise 16.3.2 you show that the condition (e1 , u(m) ) ≻ (p, u(d) ) is equivalent to p1 ⩽ md . Since
the smallest integer that satisfies this condition is m = ⌈dp1 ⌉ we conclude that
Costε=0 (p) = log ⌈dp1 ⌉
(16.24)
cf. (6.22)→ = log d2−Hmin (p) .

This completes the proof for the case ε = 0. For ε > 0 we use (11.34) to get
Costε (p) = min Costε=0 (p′ )
p′ ∈B
ε (p)
l ′
m
(16.24)→ = ′ min log d2−Hmin (p ) (16.25)
p ∈Bε (p)
ε
= log d2−Hmin (p) ,

ε
where the last line follows from the definition of Hmin (p). This completes the proof.
(m) m
Exercise 16.3.2. Show that the condition (e1 , u(m) ) ≻ (p, u(d) ) is equivalent to p1 ⩽ d
.
Exercise 16.3.3. Let p ∈ Prob↓ (d) and m ∈ [d].
1. Show that m
(m) Noisy
T e1 −−−→ p = fp (16.26)
d
P
where fp (t) := (p
x∈[d] x − t)+ is the function studied at the end of Sec. 4.2.2.
2. Provide a direct proof of the theorem above using the above conversion distance and the
explicit expression given in (4.107) for fp−1 .
3. Show that the conversion distance above can also be expressed as

(m) Noisy
1 m−1
T e1 −−−→ p = p − mu(d) 1 − . (16.27)
2 2
Exercise 16.3.4. Show that the single-shot ε-nonuniformity cost of p is bounded by

ε k
log ∥p∥(k) − ε ⩽ Cost (p) − log(d/k) ⩽ log ∥p∥(k) − ε + (16.28)
d
where k ∈ [d] is the integer satisfying ε ∈ (rk , rk+1 ], where rk is defined in (4.83).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

16.3. INTERCONVERSIONS IN THE SINGLE-SHOT REGIME 765

16.3.4 The Single-Shot Distillable Nonuniformity

For any ε ∈ (0, 1) and p ∈ Prob↓ (d), we define the ε-single-shot distillable nonuniformity of
p as n o
Noisy (m)
Distillε (p) := max log m : T p −−−→ e1 ⩽ε . (16.29)

Unlike the case for resource cost, an analogous formula to (11.34) does not exist for resource
distillation. Therefore, the calculation of single-shot
distillablenonuniformity necessitates a
Noisy (m)
direct computation of the conversion distance T p −−−→ e1 . In the following lemma we
provide a closed formula of this conversion distance in terms of the coefficient µm which is
defined for all m ∈ N as

µm := ∥p∥(⌊d/m⌋) + (d/m − ⌊d/m⌋) p⌊d/m⌋+1 , (16.30)

d
with the convention that ∥p∥(0) = 0 so that µm = p
m 1
if m > d.

Lemma 16.3.1. Let d, m ∈ N, p ∈ Prob↓ (d), and µm as defined in (16.30). Then,

Noisy (m)
T p −−−→ e1 = 1 − µm . (16.31)

Proof. The case m > d is left as an exercise, and we assume here that m ⩽ d. From the
previous section, the conversion distance can be expressed as
X (m)
Noisy (m)
T p −−−→ e1 = max (e1 ⊗ u(d) )↓j − (u(m) ⊗ p)↓j . (16.32)
k∈[dm]
j∈[k]

(m)
Since the vector e1 ⊗ u(d) has exactly n non-zero components (all equal to 1/d), we get
that the optimizer k above must satisfy k ⩽ d. Moreover, the jth term in the sum above
have the form
1 px
(e1 ⊗ u(d) )↓j − (u(m) ⊗ p)↓j = −
(m)
(16.33)
d m
where x = mj . Since p = p↓ the terms in the equation above are non-decreasing with

d
j.
We therefore conclude that the optimal k in (16.32) must be k = d. Denoting a := m and
b := d − am (hence d = am + b) we get

Noisy (m)
X pa+1
T p −−−→ e1 =1− px + b
m
x∈[a]
(16.34)

d
b = d − am −−−−→ = 1 − ∥p∥(a) − − a pa+1
m
= 1 − µm .

This completes the proof.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

766 CHAPTER 16. THE RESOURCE THEORY OF NONUNIFORMITY

Combining the definition of the ε-single-shot distillable nonuniformity with the lemma
above we obtain the following closed form for Distillε (p).

Theorem 16.3.4. Let ε ∈ (0, 1), p ∈ Prob↓ (d), m ∈ [d], and µm as defined
in (16.30). If p1 > 1 − ε then Distillε (p) := ⌊dp1 /(1 − ε)⌋. Otherwise, the
ε-single-shot distillable nonuniformity is given by

Distillε (p) := max {log m : µm ⩾ 1 − ε} . (16.35)

m∈[d]

Exercise 16.3.5. Use the closed form in (16.31) to prove Theorem 16.3.4.
Exercise 16.3.6. Show that for ε = 0 the single-shot distillable nonuniformity of p ∈
Prob(d) is given by
Distillε=0 (p) = log(d) − Hmax (p) , (16.36)
where Hmax is the max-entropy given by Hmax (p) := log(k), where k is the number of non-zero
components of p.
The formula in Theorem 16.3.4 is somewhat cumbersome. One can get somewhat simpler
bounds on the single-shot distillable entanglement by removing the floor functions that
appear in the definition of µm . These simpler bounds can be expressed in terms of the formula
for the smoothed max-entropy given in Lemma 10.4.2. Specifically, from Lemma 10.4.2 it
follows that the smoothed max-entropy can be expressed as the logarithm of an integer k
satisfying
∥p∥(k−1) < 1 − ε ⩽ ∥p∥(k) (16.37)
with the convention ∥p∥(0) := 0.

ε
Corollary 16.3.1. Let ε ∈ (0, 1), p ∈ Prob(d), and set k := 2Hmax (p) . Then,

log(d − 1 − k) − log (1 + k) ⩽ Distillε (p) ⩽ log(d) − log (k − 1) (16.38)

Proof. First, observe that

∥p∥(⌊d/m⌋) ⩽ µm ⩽ ∥p∥(⌊d/m⌋+1) (16.39)
Therefore,
Distillε (p) ⩽ max log m : ∥p∥(⌊d/m⌋+1) ⩾ 1 − ε .

(16.40)
m∈[d]
d d d
Now, observe that m ⩽ m so that m ⩽ ⌊d/m⌋ . Hence,

ε d
Distill (p) ⩽ max log : ∥p∥(⌊d/m⌋+1) ⩾ 1 − ε
m∈[d] ⌊d/m⌋

Replacing ⌊d/m⌋ d
with arbitrary ℓ∈[d]→ ⩽ max log : ∥p∥(ℓ+1) ⩾ 1 − ε (16.41)
ℓ∈[d] ℓ
d
(16.37)→ = log .
k−1

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

16.4. ASYMPTOTIC CONVERSIONS 767

For the lower bound, observe that (16.39) gives

Distillε (p) ⩾ max log m : ∥p∥(⌊d/m⌋) ⩾ 1 − ε

. (16.42)
m∈[d]

Now, for the lower bound we cannot replace ⌊d/m⌋ with arbitrary integer ℓ ∈ [d] since this
will increase the right-hand side above. Instead, we use the fact that for any s ∈ [ d1 , 1] there
exists a unique m ∈ [d] such that

1 m
s− < ⩽s. (16.43)
d d

Observe further that for any such s ∈ [ d1 , 1] and m ∈ [d], if in addition ∥p∥(⌊s−1 ⌋) ⩾ 1 − ε
then also ∥p∥(⌊d/m⌋) ⩾ 1 − ε since s−1 ⩽ md . Moreover, since such m and s also satisfy
log m ⩾ log(ds − 1) we get that

Distillε (p) ⩾ max

1
log(ds − 1) : ∥p∥(⌊s−1 ⌋) ⩾ 1 − ε . (16.44)
s∈[ d ,1]

In the last step, denote by ℓ := ⌊s−1 ⌋ ∈ [d] and use the fact that s−1 ⩽ ℓ + 1 to get s ⩾ ℓ+1
1
.
Substituting this to the right-hand side of the equation above gives

ε d
Distill (p) ⩾ max log − 1 : ∥p∥(ℓ) ⩾ 1 − ε
ℓ∈[d] 1+ℓ
(16.45)
d
= log −1 .
1+k

This completes the proof.

16.4 Asymptotic Conversions

The distillation rate of a nonuniformity state q from another nonuniformity state p is defined
as nm o
⊗n Noisy ⊗m
Distill(p → q) := lim+ sup : T p −−−→ q ⩽ε (16.46)
ε→0 n,m∈N n
We will show in this section is that the above conversion rate has the following simple
formula.

Theorem 16.4.1. Let p ∈ Prob(d) and q ∈ Prob(d′ ) be two probability vectors.

The conversion rate in (16.46) is given by

D p∥u(d) log(d) − H(p)
Distill(p → q) = ′) = . (16.47)
(d
D (q∥u ) log(d′ ) − H(q)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

768 CHAPTER 16. THE RESOURCE THEORY OF NONUNIFORMITY

Remark. Note that the formula for the asymptotic conversion rate demonstrates that the
resource theory of nonuniformity is reversible. Specifically, note that for any p and q as
above, Distill(p → q)Distill(q → p) = 1.
We prove the theorem above by computing separately the nonuniformity cost and the
distillable nonuniformity. Recall from the discussion in Sec. 11.5.1, specifically (11.111), that
the asymptotic cost of a nonuniformity state p ∈ Prob(k) is given by
1
Costε p⊗n .

Cost(p) := lim+ lim inf (16.48)
ε→0 n→∞ n
Therefore, we can use the results from the single-shot case to compute this asymptotic rate.

Lemma 16.4.1. Let p ∈ Prob(d) and ε ∈ (0, 1). Then, the asymptotic
nonuniformity cost of p is given by
1
Costε p⊗n = log(d) − H(p) .

Cost(p) = lim (16.49)
n→∞ n

Proof. From the result in the single-shot case, specifically (16.22), we obtain
1 ε ⊗n
1 l ε
n −Hmin (p⊗n )
m
lim Cost p = lim log d 2
n→∞ n n→∞ n (16.50)
AEP (11.63)→ = log(d) − H(p) .

This completes the proof.

Similarly, from the discussion in Sec. 11.5.1, specifically (11.112), the asymptotic distil-
lation rate of a nonuniformity state p ∈ Prob(d) is given by
1
Distill(p) = lim+ lim sup Distillε p⊗n .

(16.51)
ε→0 n→∞ n

As before, we can use the results from the single-shot regime to compute this expression.

Lemma 16.4.2. Let p ∈ Prob(d) and ε ∈ (0, 1). Then, the asymptotic distillable
nonuniformity is given by
1
Distill(p) = lim Distillε (p⊗n ) = log(d) − H(p) . (16.52)
n→∞ n

Proof. From the upper bound in (16.38) we get

dn

1 ε ⊗n 1
lim sup Distill (p ) ⩽ lim sup log ε (p⊗n ) − 1
n→∞ n n→∞ n 2Hmax (16.53)
AEP (10.171)→ = log(d) − H(p) .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

16.5. NOTES AND REFERENCES 769

Similarly, from the lower bound in (16.38) we get

dn

1 ε ⊗n 1
lim inf Distill (p ) ⩾ lim inf log ε (p⊗n )
−1
n→∞ n n→∞ n 1 + 2Hmax (16.54)
AEP (10.171)→ = log(d) − H(p) .

Comparing the two inequalities in the two equations above we conclude that
1
lim Distillε (p⊗n ) = log(d) − H(p) . (16.55)
n→∞ n

This completes the proof.

Exercise 16.4.1. Use the Lemmas above to prove Theorem 16.4.1.

16.5 Notes and References

To the best of the author’s knowledge, the inaugural paper on the resource theory of non-
uniformity was presented by the work of [130]. This paper introduced the term “Noisy
Operations.” The study of factorizable channels, in the context of von Neumann algebra,
was carried out independently, as can be seen in, for instance, [108]. It was only much later
that this resource theory was identified as a subset of the QRT of quantum athermality. For
a comprehensive review and additional references, one can refer to the review article [94].

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

770 CHAPTER 16. THE RESOURCE THEORY OF NONUNIFORMITY

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

CHAPTER 17

Quantum Thermodynamics

Thermodynamics stands as one of the most influential theories in physics, finding applica-
tions across a wide range of disciplines. Initially focused on steam engines, its relevance has
expanded to encompass fields such as biochemistry, nanotechnology, and black hole physics,
among others [83, 26, 61]. Despite its immense success, the foundational aspects of thermo-
dynamics continue to be a subject of controversy. There persists a pervasive confusion re-
garding the relationship between macroscopic and microscopic laws, particularly concerning
reversibility and time-symmetry. Furthermore, there is a lack of consensus on the optimal
formulation of the second law. As early as 1941, Nobel laureate Percy Bridgman noted,
“there are almost as many formulations of the Second Law as there have been discussions of
it,” and unfortunately, little progress has been made in resolving this situation since then.
In recent years, researchers have taken a fresh perspective on these fundamental issues by
approaching thermodynamics as a resource theory. This viewpoint considers a system that
is not in equilibrium with its environment as a valuable resource known as “athermality.”
Athermality serves as the fuel utilized in work extraction, computational erasure operations,
and other thermodynamic tasks.
The resource-theoretic approach to thermodynamics delves into the quantification of
a state’s deviation from equilibrium and explores its utility in quantum thermodynamics.
It also investigates the necessary and sufficient conditions for transforming one state into
another. Within this framework, different notions of state conversion can be examined,
including exact and approximate conversions, single-copy and multiple-copy scenarios, and
conversions with or without the aid of a catalyst.
These quantum-information techniques have brought forth numerous novel insights, par-
ticularly considering the historical importance of information in foundational topics such as
Maxwell’s demon [153], the thermodynamic reversibility of computation [17, 18], Landauer’s
principle regarding the work cost of erasure [145, 137], and Jaynes’s utilization of maximum
entropy principles in deriving statistical mechanics [138, 139].
Furthermore, the resource-theoretic approach to thermodynamics reveals that the con-
ventional formulation of the second law of thermodynamics, which focuses on entropy non-
decrease, is insufficient as a criterion for determining the feasibility of a given state conversion.

771
772 CHAPTER 17. QUANTUM THERMODYNAMICS

However, we will discover that it is possible to identify a set of measures quantifying the
degree of nonequilibrium (including entropy) such that a state conversion is feasible if and
only if all of these measures do not increase.

17.1 Thermal States and Athermality States

We use the term ‘thermal bath’ or ‘thermal reservoir’ to indicate a thermodynamic system
with the property that any amount of reasonable heat that is added or extracted from it
does not change its temperature. In other words, its heat capacity is extremely large. For a
heat bath that is held at a fixed inverse temperature β := kB1T , the state,
B
B e−βH
γ := , (17.1)
Tr e−βH B

is the thermal equilibrium state known as the Gibbs state. The Gibbs state, γ B , is also
referred to as the thermal state of the system B, and the normalization factor
h B
i
Z B := Tr e−βH (17.2)

is called the partition function.

The free states in the resource theory of athermality corresponds to physical systems that
are in thermal equilibrium with their surrounding. We will therefore consider Gibbs states
to be free. Note however that the Gibbs state of a system B is defined with respect to the
Hamiltonian H B of the system (or heat bath). As it follows from the exercise below, any
density matrix is a Gibbs state with respect to some Hamiltonian.
Exercise 17.1.1. Show that for any density matrix ρ ∈ D>0 (A) there exists a Hamiltonian
H ∈ Pos(A) such that ρA is the Gibbs state with respect to this Hamiltonian.

17.1.1 Optimality of the Gibbs state

Energy and entropy are the two cornerstones of thermodynamics. Any quantum (or classical)
system tends to evolve into equilibrium state with its environment. In such a spontaneous
evolution the second law of thermodynamics states that the entropy associated with the
system cannot decrease, or alternatively, heat can never pass from a colder system to a
warmer body without some external work that put into the system. One can therefore think
of the equilibrium state of a system with a given (fixed) entropy as the state that has the
lowest amount of energy, so that no further heat-exchanged can occur with the environment.
Explicitly, let A be a quantum system with Hamiltonian H A and a given entropy S.
Among all states ρ ∈ D(A) with entropy H(ρ) = S, we want to find the one that has the
smallest amount of energy. That is, we want to minimize Tr[ρA H A ] given the constraint that
the entropy of ρA equals S. This problem can be solved by minimizing the Lagrangian
L(ρ, λ) := Tr[Hρ] + λ(Tr[ρ log ρ] + S) (17.3)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

17.1. THERMAL STATES AND ATHERMALITY STATES 773

over all ρ ∈ D(A), where λ is a Lagrange multiplier. Let ρ be the optimal density matrix
that minimizes the Lagrangian above. Then, any other state in D(A) can be written as
ρ + tY for some t ∈ R and Y ∈ Herm(A) is a traceless matrix (we can also assume without
loss of generality that ∥Y ∥∞ ⩽ 1 although we will not need it). Since ρ is optimal we must
have for any such Y

d
L(ρ + tY, λ)
0=
dt t=0 (17.4)
Exercise 17.1.2→ = Tr[HY ] + λTr[Y log ρ] .

That is, an optimal ρ must satisfy

Tr[Y (H + λ log ρ)] = 0 (17.5)

for all traceless matrices Y ∈ Herm(A), so that H + λ log ρ is orthogonal (in the Hilbert-
Schmidt inner product) to the subspace of all traceless matrices in Herm(A). Consequently,
H + λ log ρ must be proportional to the identity matrix; i.e., there exists c ∈ R such that
H + λ log ρ = cI. Hence, the optimal ρ has the form

c H e−βH
ρ = e λ e− λ = , (17.6)
Z

where in the last equality we denoted by β := λ1 , and used the fact that Tr[ρ] = 1 so that
c
e λ = 1/Tr[e−βH ].

Exercise 17.1.2. Use Corollary D.1.1 to prove the expression for the directional derivative
given in (17.4).

Exercise 17.1.3. Let α ∈ [0, ∞]. Find the state ρα ∈ D(A) that minimizes Tr[H A ρA ] while
keeping the α-Rényi entropy fixed.

17.1.2 Passive States

In deriving the Gibbs state mentioned above, we sought a state that minimizes the average
energy while maintaining a constant von Neumann entropy. Although this method is math-
ematically sounds, it does not offer a convincing rationale for designating the Gibbs state as
the only free state within the model. This raises the question of whether there exists a more
operational method to derive the Gibbs state, one that does not rely on the concept of von
Neumann entropy. As we will explore here, such an approach does indeed exist.
Let A be a physical system with a Hamiltonian H A . Suppose that the system A is
described by the density matrix ρ ∈ D(A). Therefore, the energy of the system is given by
Tr[H A ρA ]. Under a closed (i.e. unitary) evolution/process, the system A can be evolved into
another state of the form U ρA U ∗ , where U ∈ U(A) is some unitary matrix. The maximal
amount of work that can be extracted from such a system cannot exceed the energy difference

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

774 CHAPTER 17. QUANTUM THERMODYNAMICS

between the initial and final states. Therefore, we define the maximal extractable work from
a system in a state ρA as
Wmax ρA = max Tr H A ρA − U ρA U ∗

(17.7)
U

where the maximum is over all unitary matrices U ∈ U(A). Interestingly, the above opti-
mization problem can be solved analytically.

Lemma 17.1.1. Let H A = x∈[m] ax |x⟩⟨x|A be the Hamiltonian of system A, with

P
the energy eigenvalues arranged in non-decreasing order; i.e. a1 ⩽ a2 ⩽ · · · ⩽ am .
Then,
Wmax ρA = Tr H A ρA − Tr H A σρA

∀ ρ ∈ D(A) , (17.8)
where σρA is defined with respect to the eigenvalues {px }x∈[m] of ρA as
X
σρA := p↓x |x⟩⟨x|A . (17.9)
x∈[m]

Proof. From a variant of the von-Neumann trace inequality, known as the Ruhe’s Trace
Inequality as given in Theorem B.3.3, it follows that
X
Tr H A U ρA U ∗ ⩾ ax p↓x

x∈[m] (17.10)
A A
= Tr H σρ ,
where we used the lower bound in (B.26) with N := H A and M := U ρA U ∗ . The proof is
then concluded with the observation that there exists a unitary matrix U satisfying U ρA U ∗ =
σρA .
Exercise 17.1.4. Let n, m ∈ N and ρ, σ ∈ D(A).
1. Show that
Wmax (ρ ⊗ σ) ⩾ Wmax (ρ) + Wmax (σ) . (17.11)
2. Show that if n ⩾ m then
1 1
Wmax ρ⊗n ⩾ Wmax ρ⊗m .

(17.12)
n m
In the following lemma, we demonstrate that the maximum extractable work can never
exceed the difference between the energy of the system and the energy of the system at
equilibrium.

Lemma 17.1.2. Let γ A be a Gibbs state of system A with a temperature β for

which H(ρA ) = H(γ A ). Then,

Wmax ρA ⩽ Tr H A ρA − Tr H A γ A .

(17.13)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

17.1. THERMAL STATES AND ATHERMALITY STATES 775

Proof. The Gibbs state γ A is the state with the smallest energy that has an entropy H(ρA ) =
H(σρA ). Therefore, the state σρA has higher energy than γ A so that

Tr H A γ A ⩽ Tr H A σρA .

(17.14)

Combining this with (17.8) completes the proof.

The lemma mentioned above establishes an additive upper bound. Consequently, it can
be inferred from the lemma that:
1
reg
ρA := lim Wmax ρ⊗n

Wmax
n→∞ n
1 n n n (17.15)
(17.13) applied to system An → ⩽ lim Tr H A ρ⊗n − Tr H A γ A

n→∞ n
= Tr H A ρA − Tr H A γ A ,

n n n
where we used the relations Tr H A ρ⊗n = nTr H A ρA and Tr H A γ A = nTr H A γ A .

We now demonstrate that the aforementioned inequality actually holds as an equality.

Theorem 17.1.1. Let γ A be a Gibbs state of system A with a temperature β for

which H(ρA ) = H(γ A ). Then,
reg
ρA = Tr H A ρA − Tr H A γ A .

Wmax (17.16)

Proof. From (17.15) it is sufficient to prove that

reg
ρA ⩾ Tr H A ρA − Tr H A γ A .

Wmax (17.17)

Let
f (ρ) := min Tr[H A U ρA U ∗ ] , (17.18)
U

and observe that the inequality in (17.17) is equivalent to

1
f reg (ρ) := lim f (ρ⊗n ) ⩽ Tr[H A γ A ] . (17.19)
n→∞ n

Our goal is therefore to prove this inequality.

Since f is invariant under unitaries we can assume without loss of generality that ρ is
diagonal and we denote its diagonal by p = (p1 , . . . , pm )T . We also denote by X the random
variable that takes the value X = x ∈ [m] with probability px . Further, let γ ′ ∈ D(A)
be another Gibbs state of system A, with a diagonal g′ = (g1′ , . . . , gm ′ T
) and with inverse
′ ′
temperature β < β (i.e. γ corresponds to a Gibbs state with the same Hamiltonian H A
and with a higher temperature). Further, let Y be the random variable that takes the value
Y = y ∈ [m] with probability gy′ . With these notations we have

H(Y ) = H(γ ′ ) > H(γ) = H(ρ) = H(X) . (17.20)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

776 CHAPTER 17. QUANTUM THERMODYNAMICS

Now, let ε > 0 and recall that the number of strongly ε-typical sequences drawn from an
i.i.d.∼ p source scales predominantly as 2nH(X) . Therefore, since H(X) < H(Y ) we get
that for sufficiently small ε > 0 and sufficiently large n we have |Tst n st n
ε (X )| < |Tε (Y )|. In
n n
particular, there exists a one-to-one function πn : [m] → [m] , with the property that for
any xn ∈ Tst n n st n n
ε (X ) we have πn (x ) ∈ Tε (Y ). Define the unitary Un ∈ L(A ) by its action
n n := n n n
on basis elements of A as Un |x ⟩ |πn (x )⟩ for all x ∈ [m] . Since Un is not necessarily
optimal we get
1 n
f reg (ρ) ⩽ lim Tr H A Un ρ⊗n Un∗

n→∞ n
1 X n (17.21)
pxn Tr H A |π(xn )⟩⟨π(xn )| .

= lim
n→∞ n
n n x ∈[m]

Next, observe that

n
X X
HA = (ax1 + · · · + axn ) |xn ⟩⟨xn | = n t(xn ) · a|xn ⟩⟨xn | , (17.22)
xn ∈[m]n xn ∈[m]n

where t(xn ) ∈ Type(n, m) is the type of the sequence xn and a := (a1 , . . . , am )T . Substituting
this into the previous equation gives
X
f reg (ρ) ⩽ lim pxn t (πn (xn )) · a
n→∞
xn ∈[m]n
X (17.23)
Exercise 17.1.5→ = lim pxn t (πn (xn )) · a ,
n→∞
xn ∈Tst n
ε (X )

where in the second line we restricted xn to the set of strongly ε-typical sequences. The
theorem of strongly typical sequences ensures that the contribution of non-typical sequences
vanishes in the limit n → ∞ (see Exercise 17.1.5). Since the above inequality holds for all
ε ∈ (0, 1), taking the limit ε → 0+ gives
X
f reg (ρ) ⩽ lim+ lim pxn t (πn (xn )) · a
ε→0 n→∞
xn ∈Tst (X n ) (17.24)
′
εA ′A
= g · a = Tr H γ ,
′
→ g′ as n → ∞. Finally,

where we used the fact that πn (xn )∈ Tε,stn (g ) so that t π n (x n
)
since we proved that f reg (ρ) ⩽ Tr H A γ ′A for any Gibbs state with inverse temperature

β ′ > β it follows that the inequality also hold for β ′ = β. This completes the proof.
Exercise 17.1.5. Prove the relation (17.23). Hint: Use Theorem 8.5.1 in conjunction with
the fact that t (πn (xn )) · a is bounded from above; e.g., t (πn (xn )) · a ⩽ am since t (πn (xn )) is
a probability vector.
A state ρ ∈ D(A) characterized by W reg max(ρ) = 0 is identified as a completely passive
state. Such states are inherently unable to facilitate work extraction, irrespective of their
quantity. As inferred from Exercise 17.1.4, if W reg max(ρ) = 0, then it follows that:
Wmax ρ⊗n = 0

∀n∈N. (17.25)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

17.1. THERMAL STATES AND ATHERMALITY STATES 777

This insight, derived from Theorem 17.1.1, establishes the Gibbs state as the unique com-
pletely passive state. Consequently, this finding compellingly supports the designation of
the Gibbs state, or thermal state, as the exclusive free state in the domain of quantum
thermodynamics.
Exercise 17.1.6. Give full details why the only state that is completely passive is the Gibbs
state.

17.1.3 Athermality States

An athermality state of system A cannot be solely characterized by ρA since the resourceful-
ness of the state also depends on the Hamiltonian. Therefore, in quantum thermodynamics,
every thermodynamic state comprises a quantum state ρ ∈ D(A) that acts on the Hilbert
space A, and a time-independent Hamiltonian H A ∈ Pos(A) that governs the dynamics of
system A. In other words, a state of athermality can be characterized by a pair (ρA , H A ).
This characterization is widely used in the literature.
However, from a resource-theoretic perspective, this characterization has several draw-
backs. Firstly, it is not invariant under an energy shift of the form H A 7→ H A + cI A , where
c ∈ R is a constant. Indeed, the choice of setting the minimal energy of a system to be zero
is somewhat arbitrary. Secondly, we will observe that the resourcefulness of the state ρA is
determined in relation to its deviation from the Gibbs state γ A of system A.
Therefore, it appears more natural to characterize athermality states (i.e., the “objects” of
this theory) as pairs of the form (ρA , γ A ). It is worth noting that all the relevant information
about the Hamiltonian H A is contained in the Gibbs state γ A , which is invariant under
energy shifts. Using this notation, the Gibbs state can be represented as (γ A , γ A ). In the
following lemma, we demonstrate that the Gibbs state can also be utilized to determine if a
given unitary evolution exhibits time-translation symmetry.

Lemma 17.1.3. Let U ∈ U(A) be a unitary matrix. Then U A commutes with a

Hamiltonian H ∈ Pos(A) if and only if U A commutes with the Gibbs state γ A .

A
Proof. By definition, the Gibbs state γ A commutes with U A if and only if e−βH commutes
with U A . Therefore, if [U A , H A ] = 0 then clearly UP A
commutes with γ A . Conversely,
A
suppose U A commutes with e−βH and express H A = x λx Px , where {Px } are orthogonal
projections satisfying Px Py = δxy Px and {λx } are distinct eigenvalues of H A . Then,
A A
X X
e−βλx U Px = U e−βH = e−βH U = e−βλy Py U . (17.26)
x y

Multiplying by Px from the right and Py from the left we get

e−βλx Py U Px = e−βλy Py U Px . (17.27)

since λx ̸= λy we conclude that Py U Px = 0 for all x ̸= y. Hence, U commutes with H A .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

778 CHAPTER 17. QUANTUM THERMODYNAMICS

Note that in the lemma above, the condition that U A commutes with γ A can be expressed
as
U A→A γ A := U A γ A U ∗A = γ A .

(17.28)
That is, the unitary matrix U A commutes with the Hamiltonian if and only if the unitary
channel U A→A preserves the Gibbs state.
Suppose now that system B is comprised of two subsystems B1 and B2 and that the total
Hamiltonian of system B can be expressed as

H B = H B1 ⊗ I B2 + I B1 ⊗ H B2 . (17.29)

In this case, the Gibbs state γ B of the composite system can be expressed as a tensor product
of the two Gibbs states of the subsystems. Indeed, we have

e−β (H ⊗I +I ⊗H )
B1 B2 B1 B2
B
γ = h i, (17.30)
Tr e−β (H ⊗I +I ⊗H )
B1 B2 B1 B2

and since
e−β (H ) = e−βH B1 ⊗ e−βH B2
B1 ⊗I B2 +I B1 ⊗H B2
(17.31)
B
h B
i
we conclude that γ B = γ B1 ⊗ γ B2 , where for each j = 1, 2, γ Bj := e−βH j /Tr e−βH j is the
Gibbs state of subsystem Bj .

17.2 The Free Operations

There have been various formulations in the literature regarding the free operations of the
QRT of thermodynamics, all of which involve operations conserving certain extensive quan-
tities like energy, particle number, charge, etc. In this chapter, we focus solely on free
operations corresponding to energy conservation, giving rise to the resource theory of ather-
mality. However, these operations can be analogously extended to conserve other extensive
quantities. For further details, we direct the reader to the section titled ’History and Further
Reading.’
Let us denote the Hilbert space associated with the thermal bath with a fixed temperature
T as B. Additionally, we will consider a quantum system A that interacts with the heat
bath (see Fig. 17.1).

17.2.1 Thermal Operations

The set of free operations relative to a background heat bath at temperature T comprise of
three basic steps:

1. Thermal equilibrium. Any subsystem B, with Hamiltonian H B ∈ Pos(B), can be

prepared in its thermal Gibbs state γ B .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

17.2. THE FREE OPERATIONS 779

Figure 17.1: A quantum system interacting with a heat bath.

2. Conservation of energy. Unitary operation on a composite physical system that com-

mutes with the total Hamiltonian can be implemented.

3. Discarding subsystems. It is possible to trace over any subsystem of a composite

system.
Remark. For the second step of conservation of energy, it is assumed that the couplings
among the subsystems of the composite system is controlled entirely by the experimenter.
Therefore, in the absence of such intervention, it will be assumed that the total Hamiltonian
is decoupled, and can be expressed as a sum of the free Hamiltonians of the subsystems.
Any CPTP map comprising of the above three steps is called thermal operation (TO).
TO forms the set of free operations in the QRT of athermality. To investigate more closely
these operations, let (ρA , γ A ) be an athermality state with ρ ∈ D(A) and corresponding
Gibbs state γ ∈ D(A) (or equivalently a Hamiltonian H ∈ Pos(A)). From the first step
above it follows that the transformation

ρA → ρA ⊗ γ B (17.32)

is a free operation, where B is some ancillary system in the Gibbs state γ B and Hamiltonian
H B . The total Hamiltonian of system AB is given by H AB := H A ⊗ I B + I A ⊗ H B . Its
corresponding Gibbs state is given by γ AB := γ A ⊗ γ B . According to the second step above,
any unitary matrix U : AB → AB that commutes with the total Hamiltonian H AB yields a
permissible evolution of the system AB. Combining this with Lemma 17.1.3 we conclude that
a unitary evolution U ∈ CPTP(AB → AB) is free if and only if it preserves the Gibbs state
γ AB . For such a Gibbs preserving unitary channel U AB→AB we get that the transformation

ρA ⊗ γ B → U AB→AB ρA ⊗ γ B

(17.33)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

780 CHAPTER 17. QUANTUM THERMODYNAMICS

can be implemented by TO. Finally, if systems A = A1 · · · An and B = B1 · · · Bm are

themselves comprised of several subsystems A1 , . . . , An and B1 , . . . , Bm then tracing out
several of these subsystems is a free operation. We therefore conclude that any thermal
operation E ∈ CPTP(A → A′ ) can be expressed as
′
E A→A (ρA ) = TrB ′ U AB→AB ρA ⊗ γ B

(17.34)

where A′ and B ′ are (possibly composite) subsystems of AB such that AB = A′ B ′ .

In the following lemma we show that the requirement AB = A′ B ′ is not necessary, only
that AB ∼= A′ B ′ . This is not completely trivial since systems A′ and B ′ can correspond to
completely different physical systems.

Lemma 17.2.1. Let AB, A′ B ′ be two composite physical systems with

′ ′ ′ ′
corresponding Gibbs states γ AB := γ A ⊗ γ B and γ A B := γ A ⊗ γ B . Suppose
′ ′
|AB| = |A′ B ′ | and let U AB→A B ∈ CPTP(AB → A′ B ′ ) be a unitary channel
satisfying
′ ′ ′ ′
U AB→A B γ AB = γ A B .

(17.35)
Then, the map
′
h ′ ′ i
N A→A ω A := TrB ′ U AB→A B ω A ⊗ γ B

∀ ω ∈ L(A) (17.36)

is a thermal operation.

′ ′ ′ ′
Proof. Let γ ABA B := γ AB ⊗ γ A B and let V ∈ CPTP(ABA′ B ′ → ABA′ B ′ ) be the unitary
matrix given by
′ ′ ′ ′
V := U AB→A B ⊗ U ∗A B →AB . (17.37)
′ ′
In the exercise below you show that V preserves the joint Gibbs state γ ABA B . Hence, the
channel
h ′ ′
i h ′ ′ ′ ′ ′ ′
i
TrABB ′ V ω A ⊗ γ BA B = TrABB ′ U AB→A B ω A ⊗ γ B ⊗ U ∗A B →AB γ A B

h i (17.38)
AB→A′ B ′ A B
= TrB ′ U ω ⊗γ

is a thermal operation.

Exercise 17.2.1. Show that the matrix V as defined in the proof above is indeed Gibbs
preserving. Hint: Apply U ∗ to both sides of (17.35) to show that U ∗ is Gibbs preserving.

Exercise 17.2.2. Consider the unitary matrix U : AB → A′ B ′ associated with the unitary
′ ′ ′ ′
channel U AB→A B mentioned in the lemma above (i.e. U AB→A B (·) := U (·)U ∗ . Demonstrate
that the condition (17.35) is satisfied if and only if
′ ′
U H AB = H A B U . (17.39)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

17.2. THE FREE OPERATIONS 781

Recall that a density matrix ρ ∈ D(A) can be viewed as an athermality state only
when the Hamiltonian or Gibbs state of system A is specified. Similarly, a quantum channel
N ∈ CPTP(A → A′ ) on its own cannot be considered a thermal operation without specifying
the Gibbs state associated with systems A and A′ . We will therefore view a thermal operation
′ ′
as a triple (N A→A , γ A , γ A ), where N ∈ CPTP(A → A′ ), γ A is the input Gibbs state, and
′
γ A is the output Gibbs state. We use this perspective in the following formal definition of
thermal operations.

′
Definition 17.2.1. Let N ∈ CPTP(A → A′ ) and γ A and γ A be two density
′ ′
matrices. The triple (N A→A , γ A , γ A ) is called a thermal operation if there exists a
unitary channel U ∈ CPTP(AB → A′ B ′ ) (with |AB| = |A′ B ′ |), and density matrices
′
γ B and γ B , such that both (17.35) and (17.36) hold.

′ ′
In the following lemma we show that the triple (N A→A , γ A , γ A ) in the definition above
is not independent.

′ ′ ′
Lemma 17.2.2. Let (N A→A , γ A , γ A ) be a thermal operation. Then, N A→A is
Gibbs preserving; i.e.,
′ ′
N A→A (γ A ) = γ A . (17.40)

′ ′
Proof. Observe that since U AB→A B in (17.36) is Gibbs preserving it follows that
′
h ′ ′ i h ′ ′i ′
N A→A (γ A ) = TrB ′ U AB→A B γ AB = TrB ′ γ A B = γ A . (17.41)

This completes the proof.

We denote by TO(A → A′ ) the set of all quantum channels N ∈ CPTP(A → A′ ) such

′ ′
that (N A→A , γ A , γ A ) is a thermal operation. For simplicity of the notation TO(A → A′ ),
we do not explicitly specify the input and output Gibbs states. However, throughout this
chapter, the physical systems A and A′ will always possess well-defined Hamiltonians (and
consequently Gibbs states). We also denote by GPO(A → A′ ) the set of all Gibbs preserving
CPTP maps; that is,
n ′ ′
o
GPO(A → A′ ) := N ∈ CPTP(A → A′ ) : N A→A (γ A ) = γ A . (17.42)

Clearly, from their definitions and the lemma above it follows that

TO(A → A′ ) ⊆ GPO(A → A′ ) . (17.43)

Theorem 17.2.1. The set TO(A → A′ ) is convex.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

782 CHAPTER 17. QUANTUM THERMODYNAMICS

Proof. Let {Nx }x∈[m] be a set of m channels in TO(A → A′ ), and consider a convex combi-
nation of these m channels:
′ ′
X
N A→A := px NxA→A (17.44)
x∈[m]

where p := (p1 , . . . , pm )T ∈ Prob(m). Since each Nx is a thermal operation, it can be

expressed as
′
h ′ ′ i
NxA→A ω A := TrBx′ UxABx →A Bx ω A ⊗ γ Bx

∀ ω ∈ L(A) , (17.45)

where for each x ∈ [m], Bx and Bx′ are auxiliary thermal bathes, and Ux is a Gibbs preserving
unitary channel. Let
M M M
B := Bx , B ′ := Bx′ and γ B := px γ Bx . (17.46)
x∈[m] x∈[m] x∈[m]

Finally, we define the unitary channel U ∈ CPTP(AB → A′ B ′ ) as

′ ′ ′ ′
M M
U AB→A B (η AB ) := UxABx →A Bx (η ABx ) ∀ η AB := η ABx ∈ D(AB) . (17.47)
x∈[m] x∈[m]

With these definitions we get for all ω ∈ L(A)

h i h i
AB→A′ B ′ ABx →A′ Bx′
X
A B A Bx
TrB U
′ ω ⊗γ = px TrBx U
′ ω ⊗γ
x∈[m]
X ′ ′
(17.48)
(17.45)→ = px NxA→A (ω A ) = N A→A (ω A ) .
x∈[m]

′
Therefore, N A→A is a thermal operation. This completes the proof.

17.2.2 Closed Thermal Operations

Thermal operations comprise of all quantum channels of the form given in (17.36). In
general, these operations do not form a topologically closed set, as the dimension of system
B in (17.36) is unbounded (but finite since we will only consider finite dimensional systems).
It is therefore possible that some given state cannot be converted to another by thermal
operations, and yet, for any ε > 0 the conversion is possible up to an ε-error. This unphysical
property can be tailored back to noisy operations which were defined as the closure of
factorizable channels. We will therefore define closed thermal operations (CTO) to be the
closure of thermal operations.

Definition 17.2.2. Let A and A′ be two physical systems. The set of closed thermal
operations, denoted as CTO(A → A′ ), is defined as

CTO(A → A′ ) := TO(A → A′ ) . (17.49)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

17.2. THE FREE OPERATIONS 783

Remark. By definition, N ∈ CTO(A → A′ ) if and only if there exists a sequence of thermal
′
operations NnA→A n∈N ⊂ TO(A → A′ ) such that
′ ′
lim NnA→A = N A→A . (17.50)
n→∞

The limit in (17.50) is equivalent to

′ ′
lim JNAA
n
− JNAA =0, (17.51)
n→∞ 1
′ ′
where JNAAn
and JNAA are the Choi matrices of Nn and N , respectively.
The physical justification for CTO is obvious: for ε := 10−100 being one over googol, it
is impossible to discriminate between states or channels that are ε-close to each other and
for all practical purposes the states or channels can be considered identical. We will see
however, that this assumption has significant consequences particularly in the quasi-classical
regime.
Exercise 17.2.3. Show that CTO is closed under concatenation. That is, show that if
(N , γ A , γ B ) and M, γ B , γ C ) are closed thermal operations then (M ◦ N , γ A , γ C ) is also a
closed thermal operation.
′ ′
Lemma 17.2.3. Let (ρA , γ A ) and (σ A , γ A ) be two athermality states. Then the
following statements are equivalent (see Fig. 17.2):
′ ′
1. The state ρA , γ A can be converted to σ A , γ A by CTO.
′ ′
2. For any ε > 0 there exists states ρ̃A and σ̃ A that are ε-close to ρA and σ A ,
respectively, such that
′ ′
σ̃ A = N A→A ρ̃A for some N ∈ TO(A → A′ ) .

(17.52)

Proof. The proof that 1 ⇒ 2 is left as an exercise, and we prove that 2 ⇒ 1. Let {εk }k∈N
be a sequence of positive numbers with zero limit, and for each k ∈ N, let (ρA A
k , γ ) be an
′ ′
athermality state that can be converted to a state that is εk -close to (σ A , γ A ). That is, for
each k there exists a thermal operation Nk ∈ TO(A → A′ ) with the property that
′ A′
NkA→A ρA

k ≈εk σ . (17.53)
Since the set CPTP(A → A′ ) is compact, there exists a converging subsequence of {Nk }k∈N .
For simplicity of the exposition here, we assume without loss of generality that the sequence
{Nk }k∈N itself is converging (otherwise, we have to replace k with a subsequence {nk }k∈N )
′ A→A ′ A A ′
and set N := limk→∞ Nk . By definition, N ∈ CTO(A → A ) since each Nk ,γ ,γ is
a thermal operation. Moreover, observe that
′ ′ ′
N A→A ρA = lim NkA→A (ρk ) = σ A ,

(17.54)
k→∞
′ ′
where we used (17.53). Hence, ρA , γ A can be converted to σ A , γ A by CTO. This com-
pletes the proof.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

784 CHAPTER 17. QUANTUM THERMODYNAMICS

Figure 17.2: The conversion of ρ to σ by CTO. For any ε > 0 there exists states ρ̃ and σ̃ that are
ε-close to ρ and σ, respectively, such that ρ̃ can be converted to σ̃ by thermal operation.

Exercise 17.2.4. Prove that 1 ⇒ 2 in the lemma above.

Exercise 17.2.5. Show that the lemma above still holds even if we replace in (17.52) N ∈
TO(A → A′ ) with N ∈ CTO(A → A′ ).

17.2.3 Gibbs-Preserving Covariant Operations

In Sec. 15.6 we studied time-translation covariant channels. Such channels are defined with
respect to the input and output Hamiltonians associated with the channel. Specifically, let
′
H A ∈ Pos(A) and H A ∈ Pos(A′ ) be two Hamiltonians, and let E ∈ CPTP(A → A′ ). Then,
′
we say that E A→A is time-translation covariant with respect to the Hamiltonians H A and
′
H A if for all t ∈ R
′ ′ ′ ′
UtA →A ◦ E A→A = E A→A ◦ UtA→A ∀t ∈ R . (17.55)
A
where UtA→A (·) := UtA (·)Ut∗A is the unitary channel defined with UtA := e−itH and similarly
′ ′ ′ ′ ′ A′
UtA →A (·) := UtA (·)Ut∗A , where UtA := e−itH . We show here that thermal operations are
time-translation covariant.

Theorem 17.2.2. Let γ ∈ D(A) be a Gibbs state, and let E ∈ CTO(A → A′ ).

′
Then, E A→A is time-translation covariant with respect to the Hamiltonians of
systems A and A′ .

Proof. Suppose first that E ∈ TO(A → A′ ), and it has the form (17.36) with AB ∼ = A′ B ′ .
A→A′
To see that E is time-translation covariant, observe that
′
A A
h ′ ′
A A
i
E A→A e−itH ρA eitH = TrB ′ U AB→A B e−itH ρA eitH ⊗ γ B
h i
AB→A′ B ′ A A B B
[γ B , H B ] = 0 −−−−→ = TrB ′ U e−itH ρA eitH ⊗ e−itH γ B eitH (17.56)
h ′ ′
i
= TrB ′ U AB→A B ◦ VtAB→AB ρA ⊗ γ B ,

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

17.2. THE FREE OPERATIONS 785

A B AB
where VtAB := e−itH ⊗e−itH = e−itH , H AB := H A ⊗I B +I A ⊗H B is the total Hamiltonian,
and VtAB→AB := VtAB (·)Vt∗AB . From (17.39) we get that the unitary channel U AB→AB :=
U AB (·)U ∗AB satisfies
′ ′ ′ ′ ′ ′ ′ ′
U AB→A B ◦ VtAB→AB = VtA B →A B ◦ U AB→A B , (17.57)
′ ′ ′ ′ A′ B ′ A′ B ′
where VtA B →A B (·) := e−itH (·)eitH . Combining this with (17.56) we get for any
ρ ∈ D(A)
h ′ ′ ′ ′ i
A→A′ −itH A A itH A A B →A B AB→A′ B ′ A B
E e ρ e = TrB Vt
′ ◦U ρ ⊗γ . (17.58)

′ ′ ′ ′ ′ ′ ′ ′ ′ ′ A′ A′
Finally, observe that VtA B →A B = UtA →A ⊗ UtB →B , where UtA →A (·) := e−itH (·)eitH and
′ ′ B′ B′
UtB →B (·) := e−itH (·)eitH . Substituting this into the equation above gives
′
A A
′ ′
h ′ ′ i
E A→A e−itH ρA eitH = UtA →A TrB ′ U AB→A B ρA ⊗ γ B
(17.59)
A′ ′ A′
= e−itH E A→A ρA eitH .

This completes the proof for E ∈ TO(A → A′ ). The case E ∈ CTO(A → A′ ) follows from the
fact that the limit of time-translation covariant channels is itself time-translation covariant
(see Exercise 17.2.6).
Exercise 17.2.6. Let G be a group, and let {En }n∈N be a sequence of channels in COVG (A →
A′ ) (with respect to some unitary representations of G on A and A′ ). Show that if the limit
E := limn→∞ En exists then also E ∈ COVG (A → A′ ).

′
Definition 17.2.3. Let γ A and γ A be two Gibbs states. A channel
N ∈ GPO(A → A′ ) is called a Gibbs-preserving covariant operation (in short, GPC
operation) if in addition of being Gibbs preserving it is also time-translation
covariant satisfying (17.55). We denote by GPC(A → A′ ) the set of all such GPC
channels in GPO(A → A′ ).

From its definition GPC(A → A′ ) form a subset of GPO(A → A′ ). Furthermore, from

the theorem above we have

CTO(A → A′ ) ⊆ GPC(A → A′ ) . (17.60)

Exercise 17.2.7. Show that GPC(A → A′ ) is convex.

The following exercise shows another (possibly strictly) subclass of GPC operations. We
say that isometry channel V ∈ CPTP(A → A′ ) is time-translation covariant if it satisfies
′
V HA = HA V (17.61)
′
where H A and H A are the Hamiltonians of systems A and A′ .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

786 CHAPTER 17. QUANTUM THERMODYNAMICS

Exercise 17.2.8. Let A, B, A′ , and B ′ be four physical systems with corresponding Hamilto-
′ ′ ′ ′
nians H A , H B , H A , and H B , and let V AB→A B (·) = V (·)V ∗ be a time-translation covariant
isometry channel. Denote by
′
h ′ ′ i
E A→A ω A := TrB ′ V AB→A B ω A ⊗ γ B

∀ ω ∈ L(B) , (17.62)

Z AB
and set t := Z A′ B ′
. Show that the map

′ ′ ′ ′
N A→A ω A = tE A→A ω A + γ A − tE A→A γ A Tr ω A

(17.63)

is a thermal operation (and in particular a quantum channel). Hint: Start with the covariance
AB A′ B ′
property e−βH = V ∗ e−βH V . to get
′ ′
h i h i
AB ∗ −βH A B A′ B ′ ∗ A′ B ′ ′ ′
Z = Tr V V e =Z Tr V V γ ⩽ ZA B , (17.64)

with equality if and only if |AB| = |A′ B ′ | (in which case V is a unitary matrix), and conclude
that ′ ′
A′ γ A − tE A→A γ A
τ := (17.65)
1−t
is a density matrix.

17.3 Quasi-Classical Athermality

In this section we examine a scenario in which every resource (ρA , γ A ) is diagonal in the
eigenbasis of H A ; i.e. [ρA , γ A ] = 0. This implies that we are considering physical sys-
tems that lack quantum coherence between different energy levels. In particular, the Gibbs
state, being commutative with the Hamiltonian, also lacks coherence between energy levels.
Therefore, we refer to this scenario as the quasi-classical case. We start by showing that in
the quasi-classical case thermal operations has the same capability for interconversions as
Gibbs-preserving operations.

17.3.1 CTO vs GPO

In the semi-classical regime, it is convenient to denote an athermality state (ρA , γ A ) (where
we assume that [ρ, γ] = 0) by (p, g), where p, g ∈ Prob(m) are probability vectors of
dimension m := |A|, and their components comprise the eigenvalues of ρ and γ, respectively.
It then follows that in the quasi-classical regime, a state (p, g) can be converted to (p′ , g′ )
by GPO if and only if there exists an m′ × m column stochastic matrix, E, where m := |A|
and m′ := |A′ |, such that
p′ = Ep and g′ = Eg . (17.66)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

17.3. QUASI-CLASSICAL ATHERMALITY 787

Note that E corresponds to a Gibbs preserving channel. The relation above corresponds
precisely to the definition of relative majorization (see Section 4.3). Therefore, we conclude
that
GPO
(p, g) −−−→ (p′ , g′ ) ⇐⇒ (p, g) ≻ (p′ , g′ ) . (17.67)

Remarkably, the relation above remains unchanged even if we replace the set GPO with
CTO.

Theorem 17.3.1. Let (ρ, γ) and (ρ′ , γ ′ ) be two quasi-classical states of systems A
and A′ , respectively. The following statements are equivalent:

1. (ρ, γ) can be converted to (ρ′ , γ ′ ) by CTO.

2. (ρ, γ) can be converted to (ρ′ , γ ′ ) by GPO.

Remark. Note that the theorem above does not state that CTO=GPO, only that they have
the same conversion power. In general, we have CTO⊆GPO since GPO is a closed set of
operations containing thermal operations. Therefore, the implication 1 ⇒ 2 is trivial, and
we only need to prove the direction 2 ⇒ 1.
The proof of the theorem above is technically involved and extensive; it has been deferred
to Appendix D.7. It remains a compelling open challenge to discover a more concise and
straightforward proof for this theorem.

The Church of the Trivialized Hamiltonian

The theorem above, in conjunction with (17.67), implies that interconversions under CTO
can be characterized with relative majorization.

Corollary 17.3.1. Let (p, g) and (p′ , g′ ) be two athermality states (in the
quasi-classical regime) of systems A and A′ , respectively. Then,
CTO
(p, g) −−−→ (p′ , g′ ) ⇐⇒ (p, g) ≻ (p′ , g′ ) . (17.68)

We can therefore apply all the machinery of the theory of (relative) majorization to the
theory of athermality. In particular, one of the immediate consequences of the corollary
above is that in the quasi-classical regime, there exists a bijection between the resource
theory of athermality and the resource theory of nonuniformity. This remarkable connection
between the two theories essentially states that in the quasi-classical regime athermality is
nonuniformity. This equivalence follows from Theorem 4.3.2.
Specifically, suppose (p, g) is an athermality state in the quasi-classical regime, and
suppose that g has only rational components. Then, we can write the components of g as

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

788 CHAPTER 17. QUANTUM THERMODYNAMICS

nx
P
gx = n
with x ∈ [m], nx ∈ N, and n := x∈[m] nx , and we have
m
M
(n)
px u(nx ) .

(p, g) ∼ r, u where r := (17.69)
x=1

That is, there exists an n-dimensional system R in some state r ∈ Prob(n), with trivial
Hamiltonian (i.e. uniform Gibbs states), such that (p, g) ∼ (r, u(n) ). Combining this with
Theorem 17.3.1 we conclude that
CTO CTO
(p, g) −−−→ (r, u(n) ) and (r, u(n) ) −−−→ (p, g) . (17.70)
In other words, (p, g) and (r, u(n) ) corresponds to the same resource, so that the athermality
of (p, g) can be interpreted as the nonuniformity of (r, u(n) ).
Exercise 17.3.1. Let ε > 0 and (p, g) be an athermality state (in the quasi-classical regime).
We do not assume that g has rational components. Show that there exists an n-dimensional
system R with trivial Hamiltonian, and two states r1 , r2 ∈ Prob(n) that satisfies 21 ∥r1 −r2 ∥1 ⩽
ε and
(r1 , u(n) ) ≻ (p, g) ≻ (r2 , u(n) ) . (17.71)
Hint. Use Sec. 4.3.5.
Exercise 17.3.2. Prove that the relation (17.69) implies that for any k ∈ N we also have
⊗k
p⊗k , g⊗k ∼ r⊗k , u(n) . (17.72)

The equivalence between athermality and non-uniformity give rise to the following prop-
erty.

The Many Second Laws of Thermodynamics

Theorem 17.3.2. Let (p, g) and (p′ , g′ ) be two thermal states. The following
statements are equivalent.

1. For every ε > 0 there exists a thermal catalyst κ := (r, g̃) such that
CTO
(pε , g) ⊗ κ −−−→ (p′ε , g′ ) ⊗ κ , (17.73)

for some pε ∈ Bε (p) and p′ε ∈ Bε (p′ ).

1
2. For all α ⩾ 2

Dα (p∥g) ⩾ Dα (p′ ∥g′ ) and Dα (g∥p) ⩾ Dα (g′ ∥p′ ) . (17.74)

Remark. Very recently (see the notes and references at the end of this section) it was shown
that the theorem above can be strengthened by replacing pA A
ε with p so that (17.73) becomes

CTO
(p, g) ⊗ κ −−−→ (p′ε , g′ ) ⊗ κ . (17.75)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

17.4. QUANTIFICATION OF ATHERMALITY 789

This improvement makes the result somewhat more physical, and furthermore, provides
simple characterization for catalytic majorization (cf. Lemma 4.5.1): (p, g) ≻c (p′ , g′ ) if and
only if for every ε > 0 there exists p′ε ∈ Bε (p′ ) such that (p, g) ≻∗ (p′ε , g′ ). The proof of
this improvement involves techniques not covered in this book and the interested reader can
find the relevant references at the last section of this chapter.

Proof. From Theorem (17.3.1) the condition in (17.73) is equivalent to:

(pε , g) ≻∗ (p′ε , g′ ) (17.76)

Therefore, from Lemma 4.5.1 the condition above is equivalent to:

(p, g) ≻c (p′ , g′ ) . (17.77)

Hence, the equivalence of the two conditions in the theorem follows from Theorem 4.5.1.
This completes the proof.

17.4 Quantification of Athermality

Measures of athermality are functions that take athermality states to the real numbers and
behave monotonically under CTO. Recall that in the theory of athermality, any physical
system is described by pair of states of the form (ρA , γ A ), and consequently measures of
athermality are functions of such pair of states. Since we consider Hamiltonians with bounded
energy, all Gibbs states are positive-definite (i.e., we assume γ A > 0).

Definition 17.4.1. A measure of athermality is a function

[
D: D(A) × D>0 (A) → R (17.78)
A

that satisfies the following two conditions:

′ ′
1. Monotonicity: For any two athermality states (ρA , γ A ) and (σ A , γ A )
CTO ′ ′ ′ ′
(ρA , γ A ) −−−→ (σ A , γ A ) ⇒ D(ρA , γ A ) ⩾ D(σ A , γ A ) . (17.79)

2. Normalization: On the trivial system |A| = 1, D(1, 1) = 1.

We have chosen the symbol D to denote a measure of athermality, given that every
normalized quantum divergence D also serves as a measure of athermality. Such measures
of athermality behave monotonically under the larger set of Gibbs-preserving operations.
However, it’s worth noting that not all measures of athermality are quantum divergences, as
they only need to exhibit monotonic behavior under CTO.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

790 CHAPTER 17. QUANTUM THERMODYNAMICS

Exercise 17.4.1. Show that every normalized quantum divergence is a measure of ather-
mality.

In the quasi-classical regime, athermality measures are applied to pairs of probability vec-
tors, with the stipulation that the second vector remains strictly positive due to the Gibbs
states’ inability to contain zero components (assuming finite energies). Furthermore, as pre-
CTO
viously discussed, two athermality states (p, g) and (p′ , g′ ) satisfy (p, g) −−−→ (p′ , g′ ) if and
only if (p, g) ≻ (p′ , g′ ). Thus, within the quasi-classical framework, the earlier definition
of an athermality measure essentially transforms into the definition of a divergence. This
implies that, in the quasi-classical domain, athermality measures are indeed divergences.
Additionally, the direct correlation between classical divergences and non-uniformity mea-
sures extends to form a bijection between nonuniformity measures and athermality measures,
further intertwining these concepts.

17.4.1 A Complete Family of Monotones

In section Sec. 10.5 we introduced a complete family of resource monotones. Taking F = GPC
we compute these monotones for the theory of athermality. In order to apply Theorem 11.1.1
into our case here, we will view each state as a pair (ρA , γ A ) and the free operations as
channels of the form E ⊕ E, with E ∈ COV(A → A′ ). With these identifications, the state η
in (11.1) is replaced by η := (η0 , η1 ) with η0 , η1 ∈ Pos(A) so that (10.180) becomes
h ′
′ ′
i ′ ′
Gη (ρ, γ) := max Tr J AA ρT ⊗ η0A + γ A ⊗ η1A − η0A + η1A (17.80)
∞

where the maximum is over all J ∈ Pos(AA′ ) subject to:

′ ′
1. Pξ (J AA ) = J AA , where Pξ is the pinching channel associated with the operator
(cf. (15.235))
′ ′ ′
ξ AA := H A ⊗ I A − I A ⊗ H A . (17.81)

2. J A = I A .

The above optimization problem is an SDP, and consequently has a dual given by (see
Exercise 17.4.2)
′ ′
Gη (ρ, γ) = min Tr σ A − η0A + η1A

(17.82)
∞

where the minimum is over all σ ∈ Pos(A) subject to

′ ′
′ ′

σ A ⊗ I A ⩾ ω AA := Pξ ρT ⊗ η0A + γ A ⊗ η1A . (17.83)

Hence, Gη (ρ, γ) can be expressed as

↑ ′ ′ ′
Gη (ρ, γ) = 2−Hmin (A |A)ω − η0A + η1A . (17.84)
∞

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

17.4. QUANTIFICATION OF ATHERMALITY 791

′ ′
If the Hamiltonians H A and H A are non-degenerate then the operator ξ AA is non-
degenerate so that Pξ is the completely dephasing channel in the energy eigenbasis. In this
′
case, ω AA is diagonal and therefore we can assume without loss of generality that also η0
and η1 are diagonal. For this case, for every choice of η we have

Gη (ρ, γ) = Gη (∆(ρ), γ) , (17.85)

where ∆ ∈ CPTP(A → A) is the energy dephasing channel. Therefore, for such a choice of
system A′ , Gη depends only on the diagonal elements of ρ.
Exercise 17.4.2. Express the optimization problem in (17.80) as a conic linear program-
ming of the form (A.57) (i.e., as a dual problem) and then use the primal problem A.52 to
obtain (17.82).
Exercise 17.4.3. Show that if A = A′ and H A has a non-degenerate Bohr spectrum, then
without loss of generality we can assume that η1 is diagonal in the energy eigenbasis (i.e.,
Gη depends only on the diagonal elements of η1 ).
′ ′ ′
Exercise 17.4.4. Let η0 , η1 ∈ Pos(A′ ), ρ, γ ∈ D(A), and ω̃ AA := γ A ⊗ η0A + η1A .
1. Show that
↑ ′ ′
Hmin (A′ |A)ω̃ = Hmin (A′ )ω = − log η0A + η1A . (17.86)
∞

2. Show that the function

↑
fη (ρ, γ) := Hmin (A′ )ω − Hmin (A′ |A)ω (17.87)

is a measure of athermality.
Exercise 17.4.5. Show that for the case F = GPO, the athermality monotones Gη (ρ, γ) are
given as in (17.84), but with
′ ′ ′
ω AA := ρT ⊗ η0A + γ A ⊗ η1A . (17.88)

17.4.2 The Free Energy

We saw in this book that the Umegaki relative entropy has several operational interpretations
and play a key role in quantum resource theories. In particular, we will see in the following
sections that under Gibbs preserving operations the relative entropy is the unique measure of
athermality in the asymptotic setting, as in this setting both the distillable rate of athermality
and the athermality cost are given in terms of the relative entropy. Therefore, it is not a
surprise that the relative entropy to the Gibbs state is related to an important quantity in
thermodynamics known as the free energy.
In thermodynamics, the free energy is a fundamental concept that represents the potential
energy available in a system to do useful work. It is a state function, meaning its value
depends only on the current state of the system and not on how the system reached that

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

792 CHAPTER 17. QUANTUM THERMODYNAMICS

state. Free energy is denoted by the symbol “F” and for an athermality state (ρ, γ) of system
A the free energy is defined as the energy available to do useful work and is given by:
h i
F (ρ) := Tr ρĤ − T H(ρ) (17.89)

where T is the temperature, and we added here the ‘hat’ symbol to the Hamiltonian Ĥ of
the system, in order to distinguish it from the entropy symbol H(ρ), which stands for the
von-Neumann entropy of ρ.
1 −β Ĥ
Exercise 17.4.6. Show that the free energy of the Gibbs state γ := Z
e is given by

F (γ) = −T log Z . (17.90)

To see the relation of the free energy to the relative entropy, observe that the relative
entropy of athermality is given by:

1 −β Ĥ
D(ρ∥γ) = −H(ρ) − Tr ρ log e
Z
h i
= log Z − H(ρ) + βTr ρĤ (17.91)
= βF (ρ) + log Z

(17.90)→ = β F (ρ) − F (γ) .

Hence, the free energy is the key factor that directly governs the optimal rate of intercon-
versions of athermality.
The Umegaki relative entropy of athermality has another interesting representation. For
a quantum athermality state (ρ, γ), with Hamiltonian Ĥ, we can express D(ρ∥γ) as:

D (ρ∥γ) = −H(ρ) − Tr [ρ log γ]

= −H(ρ) − Tr [PĤ (ρ) log γ]
(17.92)
= D PĤ (ρ) γ + H PĤ (ρ) − H(ρ)

= D PĤ (ρ) γ + C(ρ) ,

where PĤ is the pinching channel associated with the Hamiltonian Ĥ, and C(ρ) is the
coherence measure defined in (15.238) (C(ρ) is also known as the G-asymmetry of the state
ρ as defined in 15.118, where G stands for the group of time-translation symmetry). That
is, the athermality of the state (ρ, γ) can be decomposed into two components:

1. Its nonuniformity that is quantified by D PĤ (ρ) γ .

2. Its asymmetry (or coherence between energy eigenspaces) that is quantified by the
coherence measure C(ρ).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

17.5. SINGLE-SHOT EXACT INTERCONVERSIONS 793

We will see later on that this decomposition has an operational meaning, in which (roughly
speaking) D PĤ (ρ) γ is the cost to prepare the athermality state (PĤ (ρ), γ) and C(ρ) is
the cost to ‘rotate’ PĤ (ρ) to ρ. Moreover, since the regularization of the C(ρ) vanishes (see
Theorem 15.3.4), we conclude that
1
D Pn ρ⊗n γ ⊗n = D (ρ∥γ) ,

lim (17.93)
n→∞ n

where Pn is the pinching channel associated with the total Hamiltonian of system An .

17.5 Single-Shot Exact Interconversions

In this section we study the condition under which an athermality state (ρA , γ A ) can be
converted to another state (σ A , γ A ) by either CTO, GPC, or GPO. We will
Pwork in the basis
{|x⟩}x∈[m] in which the Hamiltonian is diagonalized, and denote by Ĥ = x∈[m] ax |x⟩⟨x| the
Hamiltonian of system A, where {ax }x∈[m] are the energy eigenvalues of Ĥ.

17.5.1 Exact Conversions under GPO

The state (ρ, γ) can be converted to (ρ′ , γ ′ ) by GPO if and only if there exists a channel
N ∈ CPTP(A → A′ ) such that N (ρ) = ρ′ and N (γ) = γ ′ . In the previous chapter we saw
that in the quasi-classical case these conditions are equivalent to relative majorization. Here
we study its quantum version.

Definition 17.5.1. Let ρ, γ ∈ D(A) and ρ′ , γ ′ ∈ D(A′ ). We say that the pair (ρ, γ)
relatively majorizes the pair (ρ′ , γ ′ ), and write

(ρ, γ) ≻ (ρ′ , γ ′ ) (17.94)

if there exists a channel N ∈ CPTP(A → A′ ) such that

N (ρ) = ρ′ and N (γ) = γ ′ . (17.95)

The two conditions in (17.95) are equivalent to the existence of a Choi matrix J ∈
Pos(AA′ ) that satisfies
h ′
′
i h ′
′
i
TrA′ J AA ρT ⊗ I A = ρ′ and TrA′ J AA γ T ⊗ I A = γ′ . (17.96)
′
This problem, of determining whether or not such a Choi matrix J AA exists is an SDP
feasibility problem that can be solved efficiently and algorithmically using techniques from
semi-definite programming. However, unlike the classical case, where relative majorization
can be characterized with Lorenz curves, it is not known in the fully quantum case whether
a similar geometrical characterization exists.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

794 CHAPTER 17. QUANTUM THERMODYNAMICS

Observe that any quantum divergence behaves monotonically under quantum relative
majorization. Specifically, if D is a quantum divergence then

(ρ, γ) ≻ (ρ′ , γ ′ ) ⇒ D(ρ∥γ) ⩾ D ρ′ γ ′ .

(17.97)

The converse to the above property also holds. That is, if for any choice of a quantum
divergence D we have D(ρ∥γ) ⩾ D ρ′ γ ′ then we must have (ρ, γ) ≻ (ρ′ , γ ′ ). In fact, we
show now that this assertion still holds even if we restrict D to have a very specific form.
′ ′ ′
Recall the complete family of monotones given in (17.84) with ω AA := ρT ⊗ η0A + γ A ⊗ η1A
given as in Exercise 17.4.5. From the completeness of the family of monotones, it follows
that (ρ, γ) ≻ (ρ′ , γ ′ ) if and only if Gη (ρ, γ) ⩾ Gη (ρ′ , γ ′ ) for all η0 , η1 ∈ Pos(A′ ). Similar
to (17.87), for every η0 , η1 ∈ Pos(A′ ) we define

Dη (ρ∥γ) := Hmin (A′ )ω − Hmin (A′ |A)ω . (17.98)

The above functions forms a family of normalized quantum divergences that can be used to
characterize quantum relative majorization.

Exercise 17.5.1. Show that for every η0 , η1 ∈ Pos(A′ ), the function Dη as defined above is
a quantum divergence.

Theorem 17.5.1. Let ρ, γ ∈ D(A) and ρ′ , γ ′ ∈ D(A′ ). Then, the following are
equivalent:

1. (ρ, γ) ≻ (ρ′ , γ ′ ).

2. For any η0 , η1 ∈ Pos(A′ ) we have Dη (ρ∥γ) ⩾ Dη (ρ′ ∥γ ′ ).

Exercise 17.5.2. Consider Theorem 17.5.1.

1. Prove the theorem.

2. Show that the theorem holds even if we restrict η0 and η1 to satisfy Tr[η0 + η1 ] = 1
′
(hence, we can assume without loss of generality that ω AA is a density matrix).

3. Show that the theorem holds even if we restrict η0 and η1 to satisfy Tr[η0 ] = Tr[η1 ] =
1/2.

While the aforementioned theorem offers a characterization of quantum relative majoriza-

tion, it doesn’t provide a straightforward method for determining whether one pair of states
relatively majorizes another due to the necessity of verifying an infinite number of condi-
tions. Instead, as previously mentioned, one can employ Semidefinite Programming (SDP)
feasibility algorithms from convex analysis to address this challenge.
Nevertheless, when dealing with qubit states ρ and γ, a far simpler method exists to
characterize quantum relative majorization. In the upcoming theorem, we leverage the

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

17.5. SINGLE-SHOT EXACT INTERCONVERSIONS 795

fidelity function F (η, ζ) := |η 1/2 ζ 1/2 |1 for all η, ζ ∈ Pos(A), including unnormalized states,
to provide a more accessible approach to understanding quantum relative majorization.

Theorem 17.5.2. Let ρ, γ ∈ D(A) and ρ′ , γ ′ ∈ D(A′ ), and suppose |A| = 2.

Furthermore, let a := 2Dmax (ρ∥γ) and b := 2Dmax (γ∥ρ) . Then, (ρ, γ) ≻ (ρ′ , γ ′ ) if and only
if the following conditions hold:

1. Dmax (ρ∥γ) ⩾ Dmax (ρ′ ∥γ ′ ).

2. Dmax (γ∥ρ) ⩾ Dmax (γ ′ ∥ρ′ ).

3. F (aγ ′ − ρ′ , bρ′ − γ ′ ) ⩾ F aγ − ρ, bρ − γ .

Remark. By definition of a and b we have aγ − ρ ⩾ 0 and bρ − γ ⩾ 0. Moreover, since ρ and

γ are qubits, aγ − ρ and bρ − γ are rank one. Finally, note that from the first two conditions
above we also get aγ ′ − ρ′ ⩾ 0 and bρ′ − γ ′ ⩾ 0.
Proof. The case ρ = γ is left as an exercise, and we assume now that ρ ̸= γ so that
both a and b are strictly greater than one. The necessity of the first two conditions in the
theorem follows from the fact that Dmax satisfies the DPI. Similarly, the necessity of the
third condition follows from the DPI of the fidelity. It is therefore left to show the sufficiency
of the conditions. Define
aγ − ρ bρ − γ
ψ := and ϕ := . (17.99)
a−1 b−1
As discussed in the remark above, ψ and ϕ are pure states. Denoting by
aγ ′ − ρ′ bρ′ − γ ′
η := and ζ := , (17.100)
a−1 b−1
we can express the third condition as F (η, ζ) ⩾ F (ψ, ϕ). From Uhlmann’s theorem there
exists purifications of η and ζ, denoted by ψ ′ , ϕ′ ∈ Pure(A′ Ã′ ) such that F (η, ζ) = |⟨ψ ′ |ϕ′ ⟩|,
so that we get
|⟨ψ ′ |ϕ′ ⟩| ⩾ |⟨ψ|ϕ⟩| . (17.101)
The main trick of the proof is to introduce two states φ1 , φ2 ∈ Pure(R), where R is some
Hilbert space (which we can choose to be two dimensional), that satisfies

⟨ψ|ϕ⟩ = ⟨ψ ′ |ϕ′ ⟩⟨φ1 |φ2 ⟩ . (17.102)

The above invariant overlap implies that the matrix V : A → A′ Ã′ R defined by

V |ψ⟩ := |ψ ′ ⟩|φ1 ⟩ and V |ϕ⟩ := |ϕ′ ⟩|φ2 ⟩ , (17.103)

is an isometry. Finally, let N ∈ CPTP(A → A′ ) be the channel defined via

′
N A→A ω A := TrÃ′ R V ω A V ∗

∀ ω ∈ L(A) . (17.104)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

796 CHAPTER 17. QUANTUM THERMODYNAMICS

To see that the channel above satisfies the desired properties, first observe that by isolating
ρ and γ from (17.99) we get

(a − 1)ψ + a(b − 1)ϕ b(a − 1)ψ + (b − 1)ϕ

ρ= and γ = . (17.105)
ab − 1 ab − 1
Therefore,
(a − 1)N (ψ) + a(b − 1)N (ϕ)
N (ρ) =
ab − 1
(a − 1)η + a(b − 1)ζ (17.106)
By definition→ =
ab − 1
′
(17.100)→ = ρ
Using similar lines it can be shown that N (γ) = γ ′ (see Exercise 17.5.3). This completes the
proof.

Exercise 17.5.3. Prove the assertion in the proof above that N (γ) = γ ′ .

Exercise 17.5.4. Let ρ, σ, γ ∈ D(A) be three qubit states (i.e. |A| = 2). Show that
GPO
(ρ, γ) −−−→ (σ, γ) (17.107)

if and only if

Dmax ρ γ ⩾ Dmax σ γ and Dmax γ ρ ⩾ Dmax γ σ . (17.108)

That is, the third fidelity condition of the theorem above is unnecessary in this case.

17.5.2 Exact Conversions under CTO and GPC

Given two athermality states (ρ, γ) of system A, and (ρ′ , γ ′ ) of system A′ , the condition
GPC
(ρ, γ) −−→ (ρ′ , γ ′ ) is equivalent to the existence of a Choi matrix J ∈ Pos(AA′ ) that satisfies
the following four conditions:
′ ′
1. TrA′ J AA ρT ⊗ I A = ρ′

′ ′
2. TrA′ J AA γ T ⊗ I A = γ ′

′ ′ ′ ′
3. Pξ J AA = 0, where Pξ is the pinching channel of ξ AA := H A ⊗ I A − I A ⊗ H A .

4. J A = I A .

Similar to the GPO case, this problem, of determining whether or not such a Choi matrix
′
J AA exists, is an SDP feasibility problem that can be solved efficiently and algorithmically
using techniques from semi-definite programming. However, for certain choices of Hamilto-
GPC
nians, there exists a much simpler way to characterize the conversion (ρ, γ) −−→ (ρ′ , γ ′ ).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

17.5. SINGLE-SHOT EXACT INTERCONVERSIONS 797

The Case of Relatively Non-Degenerate Hamiltonians

Theorem 15.6.1 of Sec. 15.6 has the following implication in thermodynamics.

Corollary 17.5.1. Let A and A′ be two physical systems with relatively

non-degenerate Hamiltonians. Let (ρ, γ) and (ρ′ , γ ′ ) be two athermality states on
system A and A′ , respectively. Further, ler r, r′ , g, and g′ , be the probability vectors
whose components are the elements on the diagonals of ρ, σ ′ , γ and γ ′ , respectively.
Then, for F being CTO or GPC, the following are equivalent:
F
− (ρ′ , γ ′ ).
1. (ρ, γ) →

2. ρ′ is diagonal and (r, g) ≻ (r′ , g′ ).

Exercise 17.5.5. Use Theorem 15.6.1 to prove the corollary above.

Note that in the general case of relatively non-degenerate Hamiltonians, CTO and GPC
operations can only disrupt the coherence between the energy levels of the input state ρA .
In such scenarios, coherence cannot be manipulated, but only destroyed. Therefore, for the
remainder of this chapter, we will focus on Hamiltonians that exhibit relative degeneracy.
When considering the conversion of one athermality state (ρ, γ) to another athermality
state (ρ′ , γ ′ ) we will use the properties
F F
→ (ρ ⊗ γ ′ , γ ⊗ γ ′ )
(ρ, γ) ← and (ρ′ , γ ′ ) ←
→ (γ ⊗ ρ′ , γ ⊗ γ ′ ) . (17.109)

The equivalence relations above follow from the fact that appending or removing a Gibbs
F
state is a free operation in the theory of athermality. Therefore, the conversion (ρ, γ) → −
(ρ′ , γ ′ ) between a state of system A and a state of system A′ is equivalent to the conversion
F
(ρ ⊗ γ ′ , γ ⊗ γ ′ ) →
− (γ ⊗ ρ′ , γ ⊗ γ ′ ) between two states of system AA′ ; see Fig. 17.3. In other
words, interconversions among states with the same dimensions (i.e. states with |A| = |A′ |)
is general enough to capture also interconversions with |A′ | ̸= |A| (as long as we do not
impose non-degeneracy constraints). We will therefore focus here on interconversions among
states that are all in D(A).

The Case of Bohr Spectrum

P
Consider a physical system A whose Hamiltonian, Ĥ = x∈[m] ax |x⟩⟨x|, has a non-degenerate
Bohr spectrum; i.e. ax − ax′ = ay − ay′ if and only if x = x′ and y = y ′ , or x = y and x′ = y ′ .
GPC
Further, consider a conversion of the form (ρ, γ) −−→ (σ, γ), where all the off-diagonal terms
of ρ are non-zero. In this case, Theorem 15.6.3 states that ρ can be converted to σ by a time-
translation covariant channel if and only if the matrix Q as defined in (15.251) is positive
semidefinite. Since GPC channels are in particular covariant under the time-translation
GPC
group, the condition Q ⩾ 0 is a necessary (but not sufficient) condition for (ρ, γ) −−→ (σ, γ).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

798 CHAPTER 17. QUANTUM THERMODYNAMICS

Figure 17.3: Equivalence of a conversion from A to A′ and a conversion from AA′ to itself.

To establish the full set of necessary and sufficient conditions, let J AB be the Choi
matrix of a time-translation covariant channel E ∈ COV(A → A) that satisfies E(ρ) = σ and
E(γ) = γ. Denoting by {rxy }x,y∈[m] and {sxy }x,y∈[m] the components of ρ and σ, respectively,
we get that the Choi matrix of E has the form (cf. (15.249))
X X sxy
J AÃ = py|x |x⟩⟨x|A ⊗ |y⟩⟨y|Ã + |x⟩⟨y|A ⊗ |x⟩⟨y|Ã , (17.110)
x,y x̸=y
rxy

where P = (py|x ) is some column stochastic matrix, and we assumed that the off diagonal
terms of ρA are non-zero. Let r and s be the probability vectors consisting of the diagonals
of ρ and σ, and identify the diagonal matrix γ with the Gibbs vector g consisting of its
diagonal. Then, the Choi matrix above corresponds to such a GPC channel E if and only if
it is positive semidefinite and

P r = s and P g = g . (17.111)

The above condition implies that (r, g) ≻ (s, g), however, it is not sufficient since we also
require that J AÃ ⩾ 0. This latter condition is equivalent to the requirement that the matrix
obtained by replacing the diagonal elements of Q (as defined in (15.251)) with {px|x }x∈[m] is
positive semidefinite. We summarize these considerations in the following exercise.

Exercise 17.5.6. Let (ρ, γ) and (σ, γ) be two athermality states of a system A, whose Hamil-
tonian Ĥ has a non-degenerate Bohr spectrum. Suppose also that the off diagonal terms of
ρ are non-zero. Show that
GPC
(ρ, γ) −−→ (σ, γ) (17.112)
if and only if there exists a column stochastic matrix P that satisfies both (17.111) and the
matrix X X sxy
px|x |x⟩⟨x| + |x⟩⟨y| ⩾ 0 . (17.113)
rxy
x∈[m] x̸=y∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

17.5. SINGLE-SHOT EXACT INTERCONVERSIONS 799

The exercise above does not offer significant computational simplification compared to
the SDP feasibility problem discussed at the beginning of this section. This is because de-
termining the existence of a column stochastic matrix P itself constitutes an SDP problem.
However, the exercise’s significance lies in its ability to highlight the role of quantum co-
herence in converting athermality, as demonstrated by the following theorem. Furthermore,
we will observe later that in the qubit case, the exercise above provides a straightforward
criterion for exact inter-conversions under GPC.

Theorem 17.5.3. Let (ρ, γ) and (σ, γ) be two quantum athermality states of
dimension m := |A|. For any x, y ∈ [m] let rxy := ⟨x|ρ|y⟩ and sxy := ⟨x|σ|y⟩ be the
xy-component of ρ and σ, respectively. Suppose that rxy ̸= 0 for all x, y ∈ [m] and
that rxx = sxx for all x ∈ [m]. Then,
GPC
X sxy
(ρ, γ) −−→ (σ, γ) ⇐⇒ Q := I + |x⟩⟨y| ⩾ 0 . (17.114)
rxy
x̸=y∈[m]

Proof. Since the diagonals of ρ and σ are the same, we get that if Q ⩾ 0 then by taking
the stochastic matrix P to be the identity matrix, all the conditions in Exercise 17.5.6 are
GPC GPC
satisfied so that (ρ, γ) −−→ (σ, γ). Conversely, if (ρ, γ) −−→ (σ, γ) then by Exercise 17.5.6
P with a diagonal {px|x } that satisfies (17.113). By adding
there exists a stochastic matrix P
the positive semidefinite matrix x∈[m] (1 − px|x )|x⟩⟨x| to the matrix in (17.113) we get that
also Q ⩾ 0. This completes the proof.

In simple terms, the condition stated in the theorem above, that ρ and σ share the same
diagonals, implies that they have the same non-uniformity and only differ in their coherence
(asymmetry) properties. Interestingly, the condition Q ⩾ 0 turns out to be identical to the
condition given in Theorem 15.6.3 when ρ and σ have the same diagonal elements. Thus,
GPC
in this case, we can state that (ρ, γ) −−→ (σ, γ) if and only if ρ can be transformed into
σ through time-translation covariant operations. It is noteworthy that the Gibbs state, γ,
does not play a role in such conversions because ρ and σ share the same non-uniformity (i.e.,
same diagonal elements).

Corollary 17.5.2. Let σ ∈ D(A) be an arbitrary state, and denote by px := ⟨x|σ|x⟩

the diagonal elements of σ in the energy eigenbasis {|x⟩}x∈[m] of system A. Then, the
pure quantum state X√
|ψ⟩ := px |x⟩ (17.115)
x∈[m]

GPC
can be converted to σ by GPC. That is, (ψ, γ) −−→ (σ, γ).

Exercise 17.5.7. Prove the corollary above. Hint: See the proof of Corollary 15.6.1.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

800 CHAPTER 17. QUANTUM THERMODYNAMICS

Exercise 17.5.8. Show that the corollary above still holds even if we replace GPC with
CTO. Hint: Use the fact that ψ can be converted to σ by time-translation covariant channel,
and then use Theorem 15.2.4.

The Qubit Case

In this subsection we use the considerations above to provide the analytical conditions
for inter-conversions of athermality when the systems invloved are qubits. Specifically, let
ρ, σ, γ ∈ D(A) with |A| = 2. Denote
     
r a s b g 0
ρ=  , σ=  and γ =   . (17.116)
ā 1 − r b̄ 1 − s 0 1−g

We also denote the diagonals of the matrices above by r := (r, 1−r)T , s := (s, 1−s)T and g =
GPC
(g, 1 − g)T , respectively. We would like to find the conditions under which (ρ, γ) −−→ (σ, γ).
Recall that if a = 0 then we must have b = 0 since GPC cannot generate coherence between
energy levels. Therefore, the case a = 0 has already been covered by the quasi-classical
regime. We will therefore assume in the rest of this subsection that a ̸= 0.

Theorem 17.5.4. Let ρ, σ, γ ∈ D(A) be three qubit states as above and suppose
GPC
a ̸= 0 and γ ̸= u. Then, for r ̸= g, (ρ, γ) −−→ (σ, γ) if and only if (r, g) ≻ (s, g) and
2
|b|2

s−g r−s
2
⩽ + g(1 − g) . (17.117)
|a| r−g r−g
GPC
For r = g, (ρ, γ) −−→ (σ, γ) if and only if s = g and |a| ⩾ |b|.

Proof. From Exercise 17.5.6 it follows that (ρ, γ) can be converted to (σ, γ) by GPC if and
only if there exists a 2 × 2 column stochastic matrix P = {py|x }x,y∈{0,1} that satisfies P r = s,
P g = g, and  
p0|0 b/a
 ⩾0. (17.118)
b̄/ā p1|1
Note that this last condition is equivalent to
|b|2
⩽ p0|0 p1|1 . (17.119)
|a|2
The conditions P r = s and P g = g can be expressed as the following linear systems of
equations
         
r 1−r p0|0 s r 1−r p1|0 1−s
    =   and    =   . (17.120)
g 1−g p0|1 g g 1−g p1|1 1−g

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

17.6. THE CONVERSION DISTANCE OF ATHERMALITY 801

Note that the equations involving p1|0 and p1|1 follows trivially from the ones involving p0|0
and p0|1 since P is column stochastic. From Cramer’s rule it then follows that for the case
that r ̸= g    
s 1−r r 1−s
det   det  
g 1−g g 1−g
p0|0 =   and p1|1 =  . (17.121)
r 1−r r 1−r
det   det  
g 1−g g 1−g
Finally, substituting the above expression in (17.119) gives (after some simple algebra) the
inequality (17.117).
For the case that r = g we also have s = g (otherwise, (r, g) ̸≻ (s, g)) and the linear
system of equations in (17.120) has a unique solution given by p0|0 = p1|1 = 1. Therefore, in
this case, (17.119) gives |b| ⩽ |a|. This completes the proof.
Exercise 17.5.9. Show that if s = g in (17.116) then (ρ, γ) can be converted to (σ, γ) by
GPC if and only if
|b|2
⩽ det(γ) . (17.122)
|a|2
From the exercise above it follows that already in the qubit case, conversions under GPC
have a certain type of discontinuity. To see this, consider the case s = g, and observe that
the condition |a|2 det(γ) ⩾ |b|2 is stronger than the condition |a| ⩾ |b| that one obtains
if also r = g. In particular, observe that det(γ) ⩽ 41 . Hence, there exists an ε > 0 and
ρ, σ, γ ∈ D(A) such that for any ρ ∈ Bε (σ) the state (ρ, γ) cannot be converted by GPC to
(σ, γ) unless ρ = σ.
Exercise 17.5.10. Find explicit example of three qubit states ρ, σ, γ, and ε > 0 such that
GPC
for any ρ ∈ Bε (σ), (ρ, γ) −
̸ −→ (σ, γ) unless ρ = σ.

17.6 The Conversion Distance of Athermality

We consider first the case that F = GPO, and define the conversion distance as

GPO ′ ′
1 ′ ′
T (ρ, γ) −−−→ (ρ , γ ) := min ∥ρ − E(ρ)∥1 : γ = E (γ) (17.123)
E∈CPTP(A→A′ ) 2
Using the fact that the trace distance between two density matrices can be expressed as
1 ′
∥ρ − E(ρ)∥1 = min Tr [Λ] , (17.124)
2 Λ∈Pos(A′ )
Λ⩾ρ′ −E(ρ)

we can express the conversion distance as the following SDP:

GPO ′ ′
T (ρ, γ) −−−→ (ρ , γ ) = min Tr [Λ] (17.125)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

802 CHAPTER 17. QUANTUM THERMODYNAMICS

where the minimum is over all Λ ∈ Pos(A′ ) that satisfy the following conditions:
′ ′
1. Λ ⩾ ρ′ − TrA J AA ρT ⊗ I A .

′ ′
2. γ ′ = TrA J AA γ A ⊗ I A .

3. J ∈ Pos(AA′ ) and J A = I A .
For the case that F = GPC the conversion distance is evaluated exactly as above with the
′
additional constraint on J AA that
′
′ ′ ′ ′
Pξ J AA = J AA , where ξ AA := H A ⊗ I A − I A ⊗ H A . (17.126)
Note that this additional condition is still in a form suitable for SDP.
The preceding discussion demonstrates that the conversion distance of athermality can be
computed numerically. However, the formulation presented above for the conversion distance
lacks insight and does not offer any practical means to calculate the distillable athermality
or the athermality cost of a state (ρ, γ). Therefore, we now turn our attention to the case
where the target state is quasi-classical, and show that for this case there exists an analytical
formula for the conversion distance.

Theorem 17.6.1. Let (ρ, γ) be an arbitrary state of system A and (ρ′ , γ ′ ) be a

quasi-classical state of system A′ . Then,

GPC ′ ′ GPC ′ ′
T (ρ, γ) −−→ (ρ , γ ) = T (P(ρ), γ) −−→ (ρ , γ ) , (17.127)

where P denotes the pinching channel associated with the Hamiltonian of system A.

Remark. Note that on the right-hand side, we have a conversion distance between two quasi-
classical states. In the next subsection, we will demonstrate that for such cases, an analytical
formula exists.
Proof. Let P and P ′ be the pinching channels associated with the Hamiltonians of systems
A and A′ , respectively, and observe that for any E ∈ COV(A → A′ ) that satisfies γ ′ = E(γ)
we have
γ ′ = P ′ (γ ′ ) = P ′ ◦ E(γ)
(17.128)
Part 2 of Exercise 15.6.3→ = E ◦ P(γ) .
Therefore,

GPC ′ ′
1 ′ ′
T (P(ρ), γ) −−→ (ρ , γ ) = min ∥ρ − E ◦ P(ρ)∥1 : γ = E (γ)
E∈COV(A→A′ ) 2

1 ′ ′
(17.128)→ = min ∥ρ − E ◦ P(ρ)∥1 : γ = E ◦ P (γ)
E∈COV(A→A′ ) 2 (17.129)

1 ′
N := E ◦ P→ ⩾ min ∥ρ − N (ρ)∥1 : γ ′ = N (γ)
N ∈COV(A→A′ ) 2

GPC ′ ′
= T (ρ, γ) −−→ (ρ , γ ) .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

17.6. THE CONVERSION DISTANCE OF ATHERMALITY 803

For the converse inequality, observe that by using Part 2 of Exercise 15.6.3 we get that for
every E ∈ COV(A → A′ )
∥ρ′ − E ◦ P(ρ)∥1 = ∥ρ′ − P ′ ◦ E(ρ)∥1
′ ′

P ′ (ρ′ ) = ρ′ −−−−→ = P ρ − E(ρ) (17.130)
1
DPI→ ⩽ ∥ρ′ − E(ρ)∥1 .
Combining this inequality with the definition of the conversion distance, specifically with
the first equality in (17.129), gives

GPC ′ ′
1 ′ ′
T (P(ρ), γ) −−→ (ρ , γ ) ⩽ min ∥ρ − E(ρ)∥1 : γ = E (γ)
E∈COV(A→A′ ) 2 (17.131)

GPC ′ ′
= T (ρ, γ) −−→ (ρ , γ ) .

Combining the two inequalities in (17.129) and (17.129) gives the equality in (17.127). This
completes the proof.
Exercise 17.6.1. Use Lemma 11.1.1 to provide a shorter proof of the inequality in (17.129).
Exercise 17.6.2. Let P and P ′ be the pinching channel associated with the Hamiltonians
of systems A and A′ , respectively, and let N := P ′ ◦ E, where E ∈ CPTP(A → A′ ). Show
that N ∈ COV(A → A′ ) if and only if
N =N ◦P . (17.132)

17.6.1 The Conversion Distance Between Quasi-Classical States

Consider two quasi-classical athermality states (p, g) of system A and (p′ , g′ ) of system
A′ . Since applying a permutation on both components of p and g is a reversible thermal
operation, without loss of generality we assume that the components of the probability
vectors are ordered as
p1 p2 pm p′1 p′2 p′n
⩾ ⩾ ··· ⩾ and ⩾ ⩾ · · · ⩾ . (17.133)
g1 g2 gm g1′ g2′ gn′
where m := |A| and n := |A′ |. Suppose first that the Gibbs states g and g′ have rational
coefficients, and denote by k ∈ N the common denominator of all the components of g and
g′ . That is, for each x ∈ [m] and y ∈ [n], we have gx = akx and gy′ = bky , where ax , by ∈ N are
integers satisfying X X
ax = by = k. (17.134)
x∈[m] y∈[n]

Recall that from Theorem 4.3.2, there exists r, s ∈ Prob(k) such that (p, g) ∼ (r, u(k) ) and
(p′ , g′ ) ∼ (r′ , u(k) ). Specifically,
M M
r := px u(ax ) and r′ := p′y u(by ) . (17.135)
x∈[m] y∈[n]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

804 CHAPTER 17. QUANTUM THERMODYNAMICS

From (17.133) we have r = r↓ and r′ = r′↓ , and moreover,

(p, g) ≻ (p′ , g′ ) ⇐⇒ r ≻ r′ . (17.136)

With these notations and the assumption that the Gibbs vectors have rational components,
we have the following closed formula for the conversion distance.

Theorem 17.6.2. Let (p, g), (p′ , g′ ), and r, r′ ∈ Prob(k) be as above. Then,

CTO
T (p, g) −−−→ (p′ , g′ ) = max ∥r′ ∥(ℓ) − ∥r∥(ℓ) .

(17.137)
ℓ∈[k]

Proof. Let E be a k×n column stochastic matrix defined on every q ∈ Prob(n) as (cf. (4.136))
M
Eq := qy u(by ) . (17.138)
y∈[n]

Observe that r′ = Ep′ and that ∥p′ − q∥1 = ∥r′ − Eq∥1 (see Exercise 17.6.3). Thus,

CTO ′ ′
1 ′
T (p, g) −−−→ (p , g ) = min ∥r − Eq∥1 : r ≻ Eq
q∈Prob(n) 2

1 ′ (17.139)
s := Eq −−−−→ ⩾ min ∥r − s∥1 : r ≻ s
s∈Prob(k) 2

Noisy
cf. (16.15) −−−−→ := T r − −−→ r′ .

We next show that the above inequality is in fact an equality.

Let  
1 ··· 1
1 
. .

D(by ) with D(by ) :=  .. . . . ..  ,
M
D := (17.140)
by  
y∈[n]
1 ··· 1
′ ′
and observe that D is a k × k doubly stochastic matrix satisfying Dr
= r . Thus,
using the
Noisy
data processing inequality with the matrix D in the definition of T r −−−→ r′ above gives

Noisy ′
1 ′
T r −−−→ r ⩾ min ∥Dr − Ds∥1 : r ≻ s
s∈Prob(k) 2
(17.141)
1 ′
Dr = r and s ≻ Ds −−−−→ ⩾
′ ′
min ∥r − Ds∥1 : r ≻ Ds
s∈Prob(k) 2
since r ≻ Ds is a weaker constraint than r ≻ s. By definition Ds has the form Eq for some
q ∈ Prob(n) (see Exercise 17.6.3). Therefore,

Noisy ′
1 ′
CTO ′ ′

T r −−−→ r ⩾ min ∥r − Eq∥1 : r ≻ Eq = T (p, g) −−−→ (p , g ) . (17.142)
q∈Prob(n) 2

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

17.6. THE CONVERSION DISTANCE OF ATHERMALITY 805

Combining this with (17.139) we conclude

CTO ′ ′ Noisy ′
T (p, g) −−−→ (p , g ) = T r −−−→ r
(17.143)
Theorem 16.3.2→ = max ∥r′ ∥(ℓ) − ∥r∥(ℓ) .

ℓ∈[k]

This completes the proof.

Exercise 17.6.3. Using the same symbols as in the proof above, show that ∥p′ − q∥1 =
∥r′ − Eq∥1 , and that for every s ∈ Prob(k) there exists q ∈ Prob(n) such that Ds = Eq.
In the theorem above we assumed that the Gibbs states g and g′ have rational compo-
nents. In Appendix D.8 we show that the conversion distance is continuous in g and g′ . This
in turns implies that one can use the theorem above to estimate the conversion distance up
to an arbitrary precision even for the case that g and g′ have irrational components.

17.6.2 The Golden Unit of Athermality

We start by discussing the golden unit of athermality in the quasi-classical regime, and
use the notation {e1 , . . . , em } for the standard basis of Rm . From Exercise 4.3.23, the
maximal resource of a system A of dimension |A| = m, with a fixed Hamiltonian H A :=
↑ A
P
x∈[m] ax |x⟩⟨x| (or equivalently with a fixed Gibbs state g) is given by (em , g), where
em corresponds to the maximal eigenvalue am of H A . In general, we cannot call (em , g)
the “golden unit” of system A since it depends on the Hamiltonian H A . That is, without
specifying the Hamiltonian of system A, and as long as the maximal energy am < ∞, we
cannot specify a resource that is maximal on all systems with the same dimension |A| = m.
On the other hand, if we do take am = ∞, so that the Gibbs state has a zero m-
component, then the resource (em , g) is an infinite resource in the sense that all other
systems A′ in any dimension and any quasi-classical athermality state (p′ , g′ ) will satisfy
(em , g) ≻ (p′ , g′ ). Hence, such a system cannot serve as the golden unit since it is an infinite
resource. In other words, even in dimension m = 2, by maximizing over all two dimensional
Hamiltonians, we get an infinite resource, by taking a2 = ∞ so that gA = e1 and the thermal
state is the infinite resource (e2 , e1 ).
We therefore choose the golden unit for a fixed dimension |A| = m, to be of the form
(em , g), but instead of choosing the Hamiltonian that maximizes the resource in that dimen-
sion we chooses the one that minimizes it. That is, we are looking for a Gibbs state g that
satisfies
(|m⟩⟨m|, g̃) ≻ (|m⟩⟨m|, g) ∀ g̃ ∈ Prob(m) . (17.144)
The only vector g that satisfies the above relation is the uniform vector u(m) . Hence, the
golden unit for any system A will be chosen as
(em , u(m) ) ∼ (|0⟩⟨0|A , uA ) , (17.145)
where on the right-hand side we used the quantum notations, with |0⟩ denoting any pure
state of system A. Recall that when the Gibbs state is uniform all pure states are equivalent
under thermal operations (which are equivalent to noisy operations in this case).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

806 CHAPTER 17. QUANTUM THERMODYNAMICS

Exercise 17.6.4. Show that a vector g ∈ Prob(m) satisfies (17.144) for all g̃ ∈ Prob(m) if
and only if g = u(m) .

In the fully quantum case, under GPC and CTO, coherence among energy level is a
resource that cannot be measured by the golden unit (|0⟩⟨0|A , uA ). The reason is that
this golden unit is quasi-classical, and it cannot be converted by GPC (or CTO) to any
athermality state that is not quasi-classical (even if we take m := |A| = ∞). This means
that in the QRT of quantum athermality, there exists another type of resource, namely,
time-translation asymmetry, that can not be quantified by the golden unit (|0⟩⟨0|A , uA ). We
conclude that quantum athermality can be viewed as a resource comprising of two types:

1. Nonuniformity (since in the quasi-classical regime athermality can be viewed as nonuni-

formity)

2. Time-translation asymmetry (also referred as quantum coherence).

In contrast to GPC and CTO, GPO has the capability to induce coherence between
energy levels. Consequently, as demonstrated in the subsequent exercise, we can retain the
state (|0⟩⟨0|A , uA ) as the golden unit of the resource theory.

Exercise 17.6.5. Let m := |A| and (ρ′ , γ ′ ) be an athermality state of system A′ . Show that
for sufficiently large m
GPO
(|0⟩⟨0|A , uA ) −−−→ (ρ′ , γ ′ ) . (17.146)

Exercise 17.6.6. Show that under GPO operations, the resource (|0⟩⟨0|A , uA ) is equivalent
to the resource |0⟩⟨0|X , uX
m , where X is a two-dimensional classical system, m := |A|, and

1 m−1
uX
m := |0⟩⟨0|X + |1⟩⟨1|X . (17.147)
m m

The exercise above demonstrates that we can always consider the golden unit to be a
qubit. Moreover, note that uX m is well defined even if m is not an integer. This can help
simplifying certain expressions, and we will therefore consider also the states |0⟩⟨0|X , uX
m
with m ∈ R+ . We will use the notation

Υm := |0⟩⟨0|X , uX

m (17.148)

to denote this golden unit.

Exercise 17.6.7. Show that {Υm }m∈N satisfies the conditions of a golden unit outlined
in Definition 11.1.1.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

17.6. THE CONVERSION DISTANCE OF ATHERMALITY 807

17.6.3 The Conversion Distance to and from the Golden Unit

The conversion distance (under GPO) from an arbitrary state (ρ, γ) of system A to the
golden unit Υm is given by

GPO
1 X A X A

T (ρ, γ) −−−→ Υm = min |0⟩⟨0| − E(ρ ) 1 : um = E γ (17.149)
E∈CPTP(A→X) 2
Observe that since any E ∈ CPTP(A → X) is a binary POVM channel, it can be expressed
as
E(ω) = Tr ωΛ |0⟩⟨0|X + Tr ω (I − Λ) |1⟩⟨1|X

∀ ω ∈ L(A) , (17.150)
for some effect 0 ⩽ Λ ⩽ I A . We therefore get the simplification (see Exercise 17.6.8)
1
|0⟩⟨0|X − E(ρA ) 1
= 1 − Tr[ρΛ] (17.151)
2
so that

GPO
1
T (ρ, γ) −−−→ Υm = min 1 − Tr[ρΛ] : Tr[Λγ] = (17.152)
Λ∈Eff(A) m
Note that the expression above is somewhat similar to the hypothesis testing divergence (see
Exercise 17.6.9). Moreover, since the golden unit Υm is quasi-classical we get from (17.127)
that under GPC we have

GPC
1
T (ρ, γ) −−→ Υm = min 1 − Tr[P(ρ)Λ] : Tr[Λγ] = (17.153)
Λ∈Eff(A) m
where P is the pinching channel associated with the Hamiltonian of system A.
Exercise 17.6.8. Prove the equality (17.151).
1
Exercise 17.6.9. Set ε := 1 − m
. Show that

GPO ε
T (ρ, γ) −−−→ Υm = 2Dmin (γ∥ρ) . (17.154)

We next consider the conversion distance from the golden unit Υm to an arbitrary state
(ρ, γ) of system A. Here we only consider GPO since GPC cannot generate coherence. By
definition,

GPO
1 A X A X

T Υm −−−→ (ρ, γ) = min ρ − E(|0⟩⟨0| ) 1 : γ = E um (17.155)
E∈CPTP(X→A) 2
Denoting by ω = E(|0⟩⟨0|) and τ = E(|1⟩⟨1|), the conversion distance can be simplified as

GPO
1 1 m−1
T Υm −−−→ (ρ, γ) = min ∥ρ − ω∥1 : γ = ω + τ
ω,τ ∈D(A) 2 m m

1
= min ∥ρ − ω∥1 : mγ ⩾ ω (17.156)
ω∈D(A) 2

1
= min ∥ρ − ω∥1 : Dmax (ω∥γ) ⩽ log m .
ω∈D(A) 2

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

808 CHAPTER 17. QUANTUM THERMODYNAMICS

This expression will be instrumental in our calculations regarding the cost of athermality
under GPO.
Exercise 17.6.10. Let (ρ, γ) be an athermality state of system A. Show that under GPC
for any m ∈ N
GPC
1
T Υm −−→ (ρ, γ) ⩾ min ∥ρ − ∆(σ)∥1 (17.157)
σ∈D(A) 2

where ∆ ∈ CPTP(A → A) is the completely dephasing channel (with respect to the basis of
the Hamiltonian of system A).

17.7 Distillation and Cost in the Single-Shot Regime

17.7.1 Distillation of Athermality
As discussed earlier, for any ε ∈ [0, 1] the ε-approximate single-shot distillation of an ather-
mality state (ρ, γ) of system A is defined by
n o
GPO
Distillε (ρ, γ) := log sup m : T (ρ, γ) −−−→ Υm ⩽ ε . (17.158)
0<m∈R

Integrating this with the formulas from the preceding subsection that pertain to the conver-
sion distance, we arrive at the subsequent outcome. We denote by P ∈ CPTP(A → A) the
ε
pinching channel associated with the Hamiltonian of system A, and by Dmin the quantum
hypothesis testing divergence as defined in (8.185).

Theorem 17.7.1. Let ε ∈ [0, 1]. For any athermality state (ρ, γ) of a quantum
system A, the ε-approximate single-shot distillation of athermality is given by:

1. Under GPO: Distillε (ρ, γ) = Dmin

ε
(ρ∥γ).

2. Under GPC and CTO: Distillε (ρ, γ) = Dmin

ε
(P(ρ)∥γ) .

Proof. From (17.158) we get

ε 1
GPO

Distill (ρ, γ) = − log inf : T (ρ, γ) −−−→ Υm ⩽ ε
0<m∈R m

1 1
(17.152)→ = − log inf : 1 − Tr[ρΛ] ⩽ ε , Tr[Λγ] = , Λ ∈ Eff(A) (17.159)
0<m∈R m m
n o
= − log inf Tr[Λγ] : 1 − Tr[ρΛ] ⩽ ε , Λ ∈ Eff(A)
ε
= Dmin (ρ∥γ) .

This completes the proof of the first part. The second part of the proof follows from the first
part in conjunction with (17.127). This concludes the proof.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

17.7. DISTILLATION AND COST IN THE SINGLE-SHOT REGIME 809

Observe that when we take ε = 0 we get that the exact single-shot distillation is given
by
Distill0 ρA , γ A = Dmin ρA γ A ,

(17.160)
This result give a physical meaning to the min relative entropy as the exact single-shot
distillation rate under GPO.

17.7.2 Athermality Cost Under GPO

For any ε ∈ [0, 1], the ε-single-shot cost of an athermality state (ρ, γ) is defined as
n o
ε GPO
Cost (ρ, γ) := log inf m : T Υm −−−→ (ρ, γ) ⩽ ε . (17.161)
0<m∈R

Theorem 17.7.2. Let ε ∈ [0, 1]. For any athermality state (ρ, γ) of system A, the
ε-single-shot distillation (under GPO) is given by

Costε (ρ, γ) = Dmax

ε
(ρ∥γ) . (17.162)

Proof. Combining the expression (17.156) for the conversion distance together with the def-
inition (17.161) gives

ε
n1 o
Cost (ρ, γ) = inf log m : ∥ρ − ω∥1 ⩽ ε , Dmax (ω∥γ) ⩽ log m , ω ∈ D(A)
0<m∈R 2
n 1 o
= inf Dmax (ω∥γ) : ∥ρ − ω∥1 ⩽ ε , ω ∈ D(A)
2
ε
= Dmax (ρ∥γ) .
(17.163)
This completes the proof.

Observe that for ε = 0 we get the exact single-shot athermality cost

Cost0 (ρ, γ) = Dmax (ρ∥γ) . (17.164)

This result provides a physical meaning to the max relative entropy as the exact single-shot
cost under GPO.

Exercise 17.7.1. Let γ ∈ D>0 (A) be the Gibbs state of system A with eigenvalues g1 , . . . , gm .
Let ψγ ∈ Pure(A) be the pure state
X√
|ψγ ⟩ := gx |x⟩ . (17.165)
x∈[m]

Show that the exact single-shot athermality cost of (ψγ , γ) is equal to log(m).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

810 CHAPTER 17. QUANTUM THERMODYNAMICS

17.8 The Asymptotic Regime

A primary goal of resource theories is to attain reversibility in the asymptotic inter-conversions
of resources. This entails that the cost-rate, in an asymptotic context, for generating a spe-
cific resource should align with the rate at which golden units can be extracted from it.
Reversibility characteristics hold significant importance in the realm of quantum informa-
tion, given the value of quantum resources. They guarantee that resources are not wasted
during quantum information processing tasks. Nonetheless, the pursuit of reversibility fre-
quently necessitates the contemplation of a broader array of permissible operations.
Unlike GPO, both thermal operations and GPC are incapable of generating coherence
between energy levels. This means that even for very large m, the golden unit Υm cannot
be converted into a single copy of an athermality state (ρ, γ) that exihibits coherence across
energy levels. Nevertheless, as we will soon discover, this irreversibility — highlighted by the
significant cost of preparing the (ρ, γ) state in contrast to the finite rate at which it can be
utilized to distill golden units of athermality — can be mitigated by introducing a modest
degree of coherence into the system.
In certain cases, reversibility can be attained by allowing the use of a sublinear amount
of resources. For instance, in the resource theory of pure bipartite entanglement, we ob-
served that distillation requires no communication, whereas formation necessitates a sublin-
ear amount of classical communication. Thus, reversibility is achieved in this theory through
local operations and a sublinear amount of classical communication. The concept of adding
a sublinear amount of a specific resource to achieve reversibility is highly appealing because
the rate at which such resources are consumed diminishes in the asymptotic limit. We will
employ this idea when studying the asymptotic cost of athermality under thermal operations.
However, we begin by examining the asymptotic distillation of athermality.

17.8.1 Distillation of Athermality

In this section we compute the asymptotic distillable athermality under either GPO, GPC,
or CTO. The asymptotic distillable rate of an athermality state (ρ, γ) is related to the
single-shot quantity via (cf. (11.112))
1
Distill (ρ, γ) = lim lim sup Distillε ρ⊗n , γ ⊗n .

(17.166)
ε→0+ n→∞ n

Recall from Theorem 17.7.1 that in the single-shot regime, for any ε ∈ (0, 1), the distillable
athermality under GPO is given by

Distillε (ρ, γ) = Dmin

ε

ρ γ . (17.167)

The regularization of the formula above is given by

1 1 ε
Distillε ρ⊗n , γ ⊗n = lim Dmin ρ⊗n γ ⊗n

lim
n→∞ n n→∞ n (17.168)
The Quantum Stein′ s Lemma→ = D(ρ∥γ) .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

17.8. THE ASYMPTOTIC REGIME 811

Note that in this case we did not need to take the limsup over n since the limit exists.
Therefore, under GPO, the asymptotic distillable athermality is given by the relative entropy
D(ρ∥γ). Remarkably, this is also the distillable rate under GPC and CTO.

Theorem 17.8.1. Let (ρ, γ) be an athermality state of a quantum system A, and let
ε ∈ (0, 1). Then, the distillable athermality under either CTO or GPC is given by
1
Distill (ρ, γ) = lim sup Distillε ρ⊗n , γ ⊗n = D (ρ∥γ) .

(17.169)
n→∞ n

Proof. Let ε ∈ (0, 1) and recall from Theorem 17.7.1 that the ε-single-shot distillable ather-
mality under GPC or CTO is given by
Distillε (ρ, γ) = Dmin
ε

P(ρ) γ , (17.170)
where P is the pinching channel corresponding to the Hamiltonian of system A. Since
P(γ) = γ we have
Distillε (ρ, γ) = Dmin
ε

P(ρ) P(γ)
ε (17.171)
DPI→ ⩽ Dmin (ρ∥γ) .
Thus,
1 1 ε
lim sup Distillε ρ⊗n , γ ⊗n ⩽ lim sup Dmin ρ⊗n γ ⊗n

n→∞ n n→∞ n (17.172)
′
The Quantum Stein s Lemma→ = D(ρ∥γ) .
To get the opposite inequality, for every n ∈ N let Pn ∈ CTO(An → An ) denotes the pinching
channel associated with the Hamiltonian of system An . Now, fix k ∈ N and observe that for
every ε ∈ (0, 1)
1 1 ε
lim sup Distillε ρ⊗n , γ ⊗n = lim sup Dmin Pn (ρ⊗n ) γ ⊗n

n→∞ n n→∞ n
1 ε
Pnk (ρ⊗nk ) γ ⊗nk

⩾ lim sup Dmin (17.173)
n→∞ nk
1 ε
Pk⊗n ◦ Pnk (ρ⊗nk ) Pk⊗n γ ⊗nk ,

DPI→ ⩾ lim sup Dmin
n→∞ nk

where in the last line we used the data processing inequality with the channel Pk⊗n .Now, the
Gibbs state is invariant under the pinching channel and in particular Pk⊗n γ ⊗nk = γ ⊗nk .
Moreover, from Exercise 15.2.3 it follows that Pk⊗n ◦ Pnk = Pk⊗n . We therefore get that
1 1 ε
lim sup Distillε ρ⊗n , γ ⊗n ⩾ lim sup Dmin Pk⊗n (ρ⊗nk ) γ ⊗nk

n→∞ n n→∞ nk
1 1 ε ⊗n ⊗n
= lim sup Dmin Pk (ρ⊗k ) γ ⊗k (17.174)
k n→∞ n
1
= D Pk ρ⊗k γ ⊗k ,

k

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

812 CHAPTER 17. QUANTUM THERMODYNAMICS

where in the last line we used the quantum Stein’s lemma. The above inequality can also
be understood physically by observing that the state σk := Pk ρ⊗k is quasi-classical, and

consequently, it has a distillable athermality rate given by D(σk ∥γ ⊗k ). Now, since the above
inequality holds for all k ∈ N we conclude that
1 1
lim sup Distillε ρ⊗n , γ ⊗n ⩾ lim sup D Pk ρ⊗k γ ⊗k

n→∞ n k→∞ k (17.175)
(17.93)→ = D(ρ∥γ) .
This completes the proof.

17.8.2 Athermality Cost

We begin our discussion by examining the cost of athermality under GPO. In this context,
the asymptotic cost rate of an athermality state (ρ, γ) connects to the single-shot quantity
as follows:
1
Cost (ρ, γ) = lim lim inf Costε ρ⊗n , γ ⊗n .

(17.176)
ε→0+ n→∞ n
Integrating this with Theorem 17.7.2, we arrive at the equation:
1 1 ε
lim Costε ρ⊗n , γ ⊗n = lim Dmax ρ⊗n γ ⊗n

n→∞ n n→∞ n (17.177)
AEP→ = D(ρ∥γ) .
Hence, under GPO, both the asymptotic cost and the distillable athermality are given by
the relative entropy D(ρ∥γ). This signifies that, within the framework of GPO, the QRT
of athermality exhibits reversibility. We will now proceed to explore the athermality cost
under CTO.
As discussed above, the golden unit Υm cannot be used to generate states with coherence
among energy levels. Therefore, the QRT of athermality under CTO is irreversible. In this
section we show how reversibility can be restored by appending the free operations with
resources that are asymptotically negligible. To see how it is done, we first need to introduce
a few concepts.

Scaling of Time-Translation Asymmetry

Let A be a physical system with Hamiltonian H A = x∈[m] ax |x⟩⟨x|, where m = |A|, and
P
√
let |ψ⟩ = x∈[m] px |x⟩ be given in its standard form. For any n ∈ N, the state ψ ⊗n has
P
the form
X √ X − n H(t(xn ))+D(t(xn )∥p)
⊗n n
|ψ ⟩ = pxn |x ⟩ = 2 2 |xn ⟩ (17.178)
xn ∈[m]n xn ∈[m]n

where we used (8.85). For any type t ∈ Type(n, m) define

n 1 X
|t⟩A := 1/2 |xn ⟩ , (17.179)
n
nt1 ,...,ntm xn ∈X n (t)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

17.8. THE ASYMPTOTIC REGIME 813

where the sum runs over all sequences xn ∈ [m]n of the same type t. With the above
notations X √ n
|ψ⟩⊗n = qt,n |t⟩A (17.180)
t∈Type(n,m)

where
n

qt,n := 2−n H(t)+D(t∥p)
. (17.181)
nt1 , . . . , ntm
n
Note that the vectors |t⟩A are eigenvectors of the Hamiltonian of system An . Specifically,
n n n
X
H A |t⟩A = n tx ax |t⟩A , (17.182)
x∈[m]

n
so that the energy in the state |t⟩A is n times the average energy with respect to the type
t.

Exercise 17.8.1. Consider the generic case, in which the energy eigenvalues {a1 , . . . , am }
are rationally independent; i.e. for any set of m integers ℓ1 , . . . , ℓm ∈ Z we have

ℓ1 a1 + · · · + ℓm am = 0 ⇐⇒ ℓ1 = ℓ2 = · · · = ℓm = 0 . (17.183)

Show that under this mild assumption (which we will not assume in the text), for every
n
n ∈ N, the number of distinct eigenvalues of H A equals |Type(n, m)|. That is, each energy
n
eigenvalue of H A corresponds to exactly one type.
n
Given that each |t⟩A is an energy eigenstate, it naturally follows from (17.180) that
we can express |ψ ⊗n ⟩ as a linear combination of up to |Type(n, m)| ⩽ (n + 1)m energy
eigenstates. In simpler terms, the coherence inherent in |ψ ⊗n ⟩ can be compactly represented
within an (n + 1)m dimensional vector (dimension polynomial in n).
This observation leads to a notable implication. As established in Corollary 17.5.2, for
any mixed state in D(A) there exists a pure state in Pure(A) that can be converted into
it via GPC. When we couple this insight with the aforementioned observation, a significant
deduction emerges: the pure state coherence cost for preparing ρ⊗n ∈ D(An ) must not
surpass m log(n + 1). To put it differently, the rate of asymmetry cost – the coherence
expense per instance of ρ – cannot outpace m log(n+1)n
, a ratio that approaches zero in the
limit as n → ∞. In contrast, the non-uniformity cost does not go to zero in the asymptotic
limit since the energy of ρ⊗n grows linearly with n.
In summary, athermality is made up of two main resources: nonuniformity and time-
translation asymmetry, the latter of which is often referred to as coherence. Because of
this, the costs related to athermality states can be categorized into two parts: the cost
of nonuniformity and the cost of coherence. However, the coherence cost decreases and
approaches zero in the asymptotic limit, necessitating a unique form of rescaling. This
complexity lends a subtle character to the resource theory of quantum athermality, leaving
several critical questions within the theory still unresolved.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

814 CHAPTER 17. QUANTUM THERMODYNAMICS

The Energy Spread

The energy spread of a given pure state ψ ∈ Pure(A) is defined as the difference between the
maximal and minimal energies that appear when writing ψ as a superposition of energy eigen-
vectors. In the discussion above we saw that n copies of a state ψ ∈ Pure(A) can be expressed
as a linear combination of no more that (n + 1)m energy eigenvectors. Among these energy
eigenvectors are the zero energy eigenvector (corresponding to the type t = (1, 0, . . . , 0)T )
and the maximal energy eigenvector (corresponding to the type t = (0, . . . , 0, 1)T ). There-
fore, since the energy in the decomposition (17.180) spreads from zero to nam (where am is
the maximal energy of a single copy of system A), we conclude that the energy spread of
ψ ⊗n is nam .
The energy spread can be reduced significantly if one allows for a small deviation from the
state ψ ⊗n . Specifically, let ε ∈ (0, 1) and denote by Sn,ε the set of all types in Type(n, m) for
which 12 ∥t − p∥1 ⩽ ε. We also denote by Scn,ε the complement of the set Sn,ε in Type(n, m).
With these notations, for any ε ∈ (0, 1) we can split |ψ ⊗n ⟩ into two parts
X √ n
X √ n
|ψ⟩⊗n = qt,n |t⟩A + qt,n |t⟩A . (17.184)
t∈Sn,ε t∈Scn,ε

From Lemma 8.4.1 it follows that the coefficients qt,n satisfies

1
2−nD(t∥p) ⩽ qt,n ⩽ 2−nD(t∥p) . (17.185)
(n + 1)m
Therefore, the fidelity of |ψ ⊗n ⟩ with the second term on the right-hand side of (17.184) is
given by
X X
qt,n ⩽ 2−nD(t∥p)
t∈Scn,ε t∈Scn,ε
2
X
Pinsker′ s inequality→ ⩽ 2−2nε (17.186)
t∈Scn,ε
2 2 n→∞
⩽ 2−2nε Type(n, m) ⩽ 2−2nε (n + 1)m −−−→ 0 .

Therefore, for any ε > 0 and sufficiently large n, the state |ψ⟩⊗n can be made arbitrarily
close to the state
1 X √ n
X
|ψεn ⟩ := √ qt,n |t⟩A where νε := qt,n . (17.187)
νε t∈S t∈S
n,ε n,ε

P
From (17.182) the energy of any type t ∈ Type(n, m) is given by µt := n x∈[m] tx ax .
In the sum above, the type t belong to Sn,ε so that 12 ∥t − p∥1 ⩽ ε. Consequently, each
component x ∈ [m] of the vector t − p satisfies |tx − px | ⩽ 2ε. Using this property, we get
that X X
|µt − µp | ⩽ n ax |tx − px | ⩽ 2nε ax . (17.188)
x∈[m] x∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

17.8. THE ASYMPTOTIC REGIME 815

Therefore, for any two types t, t′ ∈ Type(n, m) that are ε-close to p we have
X
|µt − µt′ | ⩽ 4nε ax (17.189)
x∈[m]

In other words, the energy spread of |ψεn ⟩ is no greater than 4nε x∈[m] ax .
P
P
Note that by taking ε > 0 sufficiently small we can make the energy spread 4nε x∈[m] ax
much smaller that nam . However, we still get that the energy spread of ψεn is linear in n.
We show now that by taking ε to depend on n, we can find states in Pure(An ) that are very
close to ψ ⊗n but with energy spread that is sublinear in n.

Lemma 17.8.1. Let ψ ∈ Pure(A) and α ∈ (1/2, 1). There exists a sequence of pure
state {χn }n∈N in Pure(An ) with the following properties:

1. The limit
lim ψ ⊗n − χn 1
=0. (17.190)
n→∞

2. The state χn can be expressed as a linear combination of no more than

(n + 1)m energy eigenstates.

3. The energy spread of χn is no more than 4nα x∈[m] ax .

Proof. Let εn = nα−1 . Since α ∈ 21 , 1 we have limn→∞ εn = 0 and limn→∞ nε2n = ∞. The

latter implies that if we replace ε in (17.186) with εn we still get the zero limit of (17.186).
Hence, the pure state χn := ψεnn satisfies (17.190). Since for all ε > 0 we have that ψεn can be
expressed as a linear combination of no more than (n + 1)m energy eigenvectors, it follows
that also χn have this property. Finally, from (17.189) we get that the energy spread of χn
cannot exceed X X
4nεn ax = 4nα ax . (17.191)
x∈[m] x∈[m]

This completes the proof.

Exercise 17.8.2. Show that the average energy

χn H ⊗n χn (17.192)

grows linearly with n.

Sublinear Athermality Resources

We saw in the Lemma above that the state ψ ⊗n is very close to a state χn , whose energy
spread is sublinear in n. On the other hand, the average energy ⟨χn |H ⊗n |χn ⟩ grows linearly in
n (see Exercise 17.8.2). We will therefore consider systems whose energy grows sub-linearly
in n as asymptotically negligible resources. Note however that such resources can contain

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

816 CHAPTER 17. QUANTUM THERMODYNAMICS

significant coherence among energy levels since the coherence grows logarithmically with n.
Indeed, as we will see shortly, such resources makes the QRT of athermality reversible.

Definition 17.8.1. A sublinear athermality resource (SLAR) is a sequence of

quantum athermality systems {Rn }n∈N , such that |Rn | grows polynomially with n,
and there exists two constants independent of n, 0 ⩽ α < 1 and c > 0, such that

H Rn ∞
⩽ cnα ∀n∈N. (17.193)

The key assumption in the given definition is that the energy of systems Rn grows sub-
linearly with n. Consequently,
R R as n approaches infinity in the asymptotic limit, the resource-
fulness of any states (ω n , γ n ) n∈N becomes insignificant compared to the resourcefulness

of n copies of the golden unit Υ2 := |0⟩⟨0|X , uX 2 . We will soon discover that this small
amount of athermality resource is sufficient to restore reversibility.
Exercise 17.8.3. Show that the distillation rate of athermality as given in Theorem 17.8.1
does not change if we replace CTO by CTO+SLAR. In other words, show that SLAR cannot
increase the distillation rate of athermality.

The Cost of Athermality

The type of free operations that we consider here are CTO assisted with SLAR. In order
to define the cost under such operations, for any system R and ε ∈ (0, 1), we define the
R-assisted ε-single-shot cost as
CostεR ρA , γ A

(17.194)

CTO
:= inf log m : min T Υm ⊗ ϕR , γ R −−−→ ρA , γ A ⩽ ε .
0<m∈R ϕ∈D(R)

From Corollary 17.5.2 and Exercise 17.5.8 it follows that any mixed state in D(A) can be
obtained by thermal operations from a pure state in Pure(A). Thus, we can restrict the
minimum above over all density matrices ϕ ∈ D(A) to a minimum over all pure states
ϕ ∈ Pure(A).
Exercise 17.8.4. Let ε ∈ (0, 1/2), ρ, σ, γ ∈ D(A), and suppose that ρ ≈ε σ. Show that for
any system R
Cost2ε ε
R (ρ, γ) ⩽ CostR (σ, γ) . (17.195)
With the above definition of the R-assisted single-shot athermality cost, we define the
asymptotic SLAR-assisted athermality cost as
1
CostεRn ρ⊗n , γ ⊗n ,

Cost (ρ, γ) := inf lim+ lim inf (17.196)
{Rn } ε→0 n→∞ n
where the infimum is over all SLARs, {Rn }n∈N . We show now that for pure states the above
cost can be expressed in terms of the relative entropy. The proof of the mixed-state case is

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

17.8. THE ASYMPTOTIC REGIME 817

far more complicated (see the discussion in the ‘Notes and References’ section at the end of
this chapter).

Theorem 17.8.2. Let (ψ, γ) be an athermality state with ψ ∈ Pure(A). Then, the
SLAR-assisted athermality cost of (ψ, γ) is given by

Cost (ψ, γ) = D (ψ∥γ) , (17.197)

where D is the Umegaki relative entropy.

Proof. Since the cost of athermality under CTO assisted with SLAR can not be smaller that
the distillation rate under the same operations, we get from Exercise 17.8.3 that

Cost (ψ, γ) ⩾ D ψ γ . (17.198)
Our goal is therefore to prove the opposite inequality.
Let ε ∈ (0, 1/2) and {χn }n∈N be the sequence of pure states that satisfies all the properties
outlined in Lemma 17.8.1. In particular, each χn is very close to ψ ⊗n (for n sufficiently large)
so that for for sufficiently large n we have (see Exercise 17.8.4)
⊗n
Cost2ε , γ ⊗n ⩽ CostεRn χn , γ ⊗n .

Rn ψ (17.199)
Therefore, we focus now on finding upper bound on CostεRP n
(χn , γ ⊗n ).
By definition, the energy spread of χn is given by 4nα x∈[m] ax for some α ∈ ( 21 , 1), and
each χn has the form (cf. (17.187))
X√ n
|χn ⟩ = qt |t⟩A , (17.200)
t∈Sn

where Sn is the set of all types t ∈ Type(n, m) that satisfies 12 ∥t − p∥1 ⩽ nα−1 (i.e., using
the same notations discussed above (17.184) we have Sn := Sn,εn with εn := nα−1 ), and
{qt }t∈Sn form a probability distribution over the set of types in σn . Let kn be the number of
terms in the superposition above (hence kn ⩽ (n + 1)m ). Furthermore, let the set {µj }j∈[kn ]
n
denote the energy eigenvalues of the Hamiltonian H A . These eigenvalues correspond to the
n
energy eigenvectors |t⟩A that appear in the superposition (17.200). That is, each j ∈ [ℓ]
corresponds exactly to one type t that appears in the superposition (17.200). Although
the energies eigenvalues {µj } depend also on n, we did not add a subscript n to ease on
the notations. Without loss of generality
α
P we also assume that µ1 ⩽ · · · ⩽ µkn , so that the
energy spread of χn is µkn − µ1 ⩽ 4n x∈[m] ax (see Lemma 17.8.1). We will also denote
by s ∈ Sn the type that corresponds to the smallest energy µ1 , and by z n ∈ [m]n the
(n)
n n n
sequence of type s(n) so that H A |z n ⟩A = µ1 |z n ⟩A .
With these notations, we are ready to define the SLAR system Rn to be a kn -dimensional
quantum system whose Hamiltonian is given by
X
H Rn = (µj − µ1 )|j⟩⟨j|Rn . (17.201)
j∈[kn ]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

818 CHAPTER 17. QUANTUM THERMODYNAMICS

Note that the Hamiltonian H Rn has the same eigenvalues as the energies that appears in χn
n
shifted by µ1 . Observe that |1⟩⟨1|R is a zero-energy state of system Rn , and the maximal
Rn α
P
energy of H is given by µkn − µ1 ⩽ 4n x∈[m] ax so that {Rn }n∈N is indeed a SLAR. We
take the SLAR of system Rn to be
X√
|ϕRn ⟩ := qj |j⟩Rn (17.202)
j∈[k]

where qj := qt with t being the type that corresponds to the energy µj . By construction,
the state
n
ϕRn ⊗ |z n ⟩⟨z n |A (17.203)
has the exact same energy distribution as the state
n
|1⟩⟨1|Rn ⊗ χA
n (17.204)

(recall that |1⟩Rn corresponds to the zero energy of system Rn ). Hence, the above two states
are equivalent resources and can be converted from one to the other by reversible thermal
operations (i.e. an energy preserving unitary). We now use this resource equivalency to
compute the cost of χn in terms of the cost of the quasi-classical state |z n ⟩⟨z n |. We do it in
three steps:
n n
1. Replacing χAn with |1⟩⟨1|
Rn
⊗ χAn : By adding the resource (|1⟩⟨1|
Rn
, γ Rn ) we can only
increase the cost. Therefore,
n An n Rn An
CostεRn χA ⩽ CostεRn |1⟩⟨1|Rn ⊗ χA

n ,γ n ,γ . (17.205)
n n
2. Replacing |1⟩⟨1|Rn ⊗ χAn with ϕ
Rn
⊗ |z n ⟩⟨z n |A : As discussed above, these two states
are equivalent resources so that
n Rn An n An Rn An
CostεRn |1⟩⟨1|Rn ⊗ χA ε Rn n

n , γ = Cost Rn ϕ ⊗ |z ⟩⟨z | , γ . (17.206)
n n
3. Replacing ϕRn ⊗ |z n ⟩⟨z n |A with |z n ⟩⟨z n |A : The cost of |z n ⟩⟨z n | without the assistance
of Rn cannot be smaller than the cost of ϕRn ⊗ |z n ⟩⟨z n | with the assistance of Rn ,
since the latter is defined in terms of a minimum over all states in D(Rn ) (see the
minimization in (17.194)). Therefore,
n n n n
CostεRn ϕRn ⊗ |z n ⟩⟨z n |A , γ Rn A ⩽ Costε |z n ⟩⟨z n |A , γ A . (17.207)

Combining all the three steps above with (17.199), and using the fact in the quasi-classical
regime GPO has the same conversion power as CTO (see Theorem 17.3.1), we get that
⊗n
Cost2ε , γ ⊗n ⩽ Costε |z n ⟩⟨z n |, γ ⊗n

Rn ψ
ε
|z n ⟩⟨z n | γ ⊗n

Theorem 17.7.1→ = Dmax (17.208)
n n ⊗n

⩽ Dmax |z ⟩⟨z | γ ,

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

17.9. NOTES AND REFERENCES 819

where in the last inequality we used the fact that Dmax is always no smaller than its smoothed
version. Now, observe that
Dmax |z n ⟩⟨z n | γ ⊗n = − log z n γ ⊗n z n

(17.209)
X
=− ns(n) log⟨x|γ|x⟩ , x
x∈[m]

where in the last equality we used the fact that the sequence z n has a type s(n) . Hence, the
cost per each copy of ψ can not exceed
1 ⊗n 1
lim sup Cost2ε , γ ⊗n ⩽ lim sup D |z n ⟩⟨z n | γ ⊗n

Rn ψ
n→∞ n n→∞ n
X
= − lim s(n)
x log⟨x|γ|x⟩
n→∞
x∈[m] (17.210)
1
X
2
p − s(n)
1
⩽ nα−1 −−−−→ = − px log⟨x|γ|x⟩
x∈[m]

= D(ψ∥γ) .
This completes the proof.
Exercise 17.8.5. Prove explicitly the second line in (17.209).

17.9 Notes and References

There exist several strong operational justifications that the Gibbs state the free state of
the theory of athermality. First, in [191] it was shown that the Gibbs state is the unique
equilibrium state that a quantum system will evolve to under weak coupling with the thermal
bath. Second, in [29, 238] it was shown that if, in the implementation of a thermal operation,
one could freely introduce any other density operator σ inequivalent to the Gibbs state of
the ancillary system, then the QRT would become trivial. More precisely, it would be
possible to freely generate any density matrix ρ to arbitrary precision by consuming many
copies of σ. The final, and perhaps most compelling reason for considering the Gibbs state
to be free involves work extraction and the notion of passivity introduced in Sec. 17.1.2.
Theorem 17.1.1 and its corollary that a state is completely passive if and only if it is the Gibbs
state is due to [146, 184]. It is worth noting that one can also consider a resource theory of
thermodynamics where all states, including the Gibbs state, are considered resources. Such
a resource theory was investigated in [202].
In this chapter, we have expounded on the resource theory of athermality, which revolves
around the principle of energy conservation. However, its extension to other conserved
observables follows in a similar manner [238, 237], encompassing non-commuting observables
as well [109, 107, 150].
Thermal operations and closed thermal operations were first introduced in [137] although
the terminology used here was given much later in [31, 131]. The refinement of thermal

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

820 CHAPTER 17. QUANTUM THERMODYNAMICS

operations as given in Lemma 17.2.1 is due to [89]. The statement that in the quasi-classical
regime, CTO and GPO have the same conversion power (see Theorem 17.3.1) was first proved
in [137]. However, for the convertibility among general states (i.e., those not commuting
with the Hamiltonian), in [74] an example was given, demonstrating that GPO are strictly
more powerful than CTO. The set of Gibbs-Preserving Covariant (GPC) operations were
introduced in [151].
The characterization of quantum relative majorization in terms of semi-definite program-
ming can be found in [91]. Moreover, in [40] partial characterization of quantum relative
majorization was given in terms of an extension of Lorenz curves to the quantum domain.
The elegant characterization of quantum relative majorization in the (partially) qubit case
(i.e., Theorem 17.5.2) is due to [118]. Another characterization in which all states are qubits
was given in [4].
Corollaries 17.5.1 and 17.5.2, and Theorems 17.5.3 and 17.5.4, can be found in [89]. More
information on coherences in the theory of athermality, along with another set of constraints
similar to the one given in Theorem 17.5.4 can be found in [135]. More details on the SDP
formulation of exact interconversions in the theory of athermality can be found in [91].
In our proof of Theorem 17.8.2, we primarily drew from the work presented in [89].
Although the proof for the mixed state variant of the theorem was initially introduced
in [31], a more comprehensive and rigorous proof was later provided in the broader context
of [202]. It’s important to highlight that the proof outlined in [202] (specifically, Theorem 1)
stipulates that the
√
ancillary system, referred to (in this book) as the SLAR, should possess
a dimension of 2 n log n . Consequently, a lingering question remains regarding the possibility
of reducing this dimension to Poly(n), as is feasible in the pure-state scenario.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

Part VI

Appendices

821
APPENDIX A

Elements of Convex Analysis

We describe here a few properties of convex sets in a finite dimensional (real) Hilbert space
(e.g. Rn ) that are used quite often in quantum information. A set C ⊂ Rn is said to be
convex if for any two elements v, u ∈ C and any t ∈ [0, 1] the vector Ptv + (1 − t)u ∈ C.
Consequently, if v1 , . . . , vm ∈ C and p1 , . . . , pm are non-negative with x∈[m] px = 1 then
X
px vx ∈ C. (A.1)
x∈[m]

A.1 The Hyperplane Separation Theorem

A hyperplane in Rn is the set of all vectors v ∈ Rn that satisfy n · v = c for some fixed
(constant) c ∈ R and a fixed (normal) vector n ∈ Rn . It generalizes the equation of a plane
in R3 which has the form n1 x + n2 y + n3 y = c for all points (x, y, z) in a plane whose normal
vector is n = (n1 , n2 , n3 )T . The hyperplane separation theorem basically states that any two
convex sets with empty intersection can always be separated by a hyperplane (see Fig. A.1).
In its most generality (i.e. including the infinite dimensional case) it is also known as the
Hahn-Banach separation theorem which is itself a variant of the Hahn-Banach theorem.

The Hyperplane Separation Theorem

Theorem A.1.1. Let C1 and C2 be two disjoint convex subsets of Rn . Then there
exist a nonzero vector n ∈ Rn and a real number c ∈ R such that

n · r2 ⩽ c ⩽ n · r1 ∀r1 ∈ C1 and ∀r2 ∈ C2 . (A.2)

That is, n is the normal vector of the hyperplane {v ∈ Rn : n · v = c} that separates

C1 and C2 . Moreover, if the sets C1 and C2 are also closed and at least one of them is
compact then one can replace the inequalities above with strict inequalities.

Remark. The hyperplane separation theorem has numerous applications in convex analysis

823
824 APPENDIX A. ELEMENTS OF CONVEX ANALYSIS

Figure A.1: (a) A separating hyperplane between two polytopes. (b) A separating hyperplane
does not exists since one of the sets is not convex.

and beyond. Consequently, it has many variants and also has several proofs. Since this
theorem has been used many times in this book, we provide below its proof for the purpose of
self-containment. This is by no means aims to replace a more thorough study of this subject.
A reader interested in more details can follow standard textbooks on convex analysis.

Proof. We will define the vector n and then show that it has all the desired properties. The
key idea is to use the fact (see the proof below) that if C ⊆ Rn is closed and convex then
there exists a unique vector in C with a minimum (Euclidean) norm. Then, the vector n will
be taken to be the vector with minimal norm in the closer of C1 − C2 . We now discuss the
details.
Let C := C1 − C2 be the closure of the set {r1 − r2 : r1 ∈ C1 , r2 ∈ C2 }. Since the later
is convex , also its closure, C, is convex (see Exercise A.1.1). Let d := inf{∥n∥2 : n ∈ C}.
Geometrically, d is the distance between the two sets. Note that since C1 and C2 are disjoint,
the set C1 − C2 does not contain the zero vector. However, its closer may contain it. We will
first consider the case that d > 0, and later treat the case d = 0.
By definition of d, there exists a sequence nj ∈ C such that ∥nj ∥ → d. This sequence is
a Cauchy sequence since

∥nj − nk ∥2 = 2∥nj ∥ + 2∥nk ∥2 − ∥nj + nk ∥2 (A.3)

and ∥nj + nk ∥2 = 4∥(nj + nk )/2∥2 ⩾ 4d since the convex combination (nj + nk )/2 ∈ C.
Hence,
∥nj − nk ∥2 ⩽ 2∥nj ∥ + 2∥nk ∥2 − 4d (A.4)
which goes to zero as j, k → ∞. We define n ∈ C to be the limit of {nj }j∈N . Next, let
r1 ∈ C1 and r2 ∈ C2 , and observe that since both r1 − r2 and n are elements of C, any convex
combination t(r1 − r2 ) + (1 − t)n with t ∈ (0, 1) is also in C. Therefore, its square norm
cannot be smaller than d. Hence,

d ⩽ ∥t(r1 − r2 ) + (1 − t)n∥2
(A.5)
= t2 ∥r1 − r2 ∥2 + 2t(1 − t)(r1 − r2 ) · n + (1 − t)2 d

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

A.1. THE HYPERPLANE SEPARATION THEOREM 825

Subtracting both sides by (1 − t)2 d and dividing by t gives

(2 − t)d ⩽ t∥r1 − r2 ∥2 + 2(1 − t)(r1 − r2 ) · n . (A.6)

Finally, since the above inequality holds for all t ∈ (0, 1) it must also hold for t = 0. That is,

d ⩽ (r1 − r2 ) · n ∀r1 ∈ C1 and ∀r2 ∈ C2 . (A.7)

Note that if d > 0 this implies (A.2) (see Exercise A.1.2). It is therefore left to check the
case d = 0.
Suppose first that the interior of C1 − C2 is not empty. Therefore, there exists a sequence
K1 ⊂ K2 ⊂ · · · of non-empty closed subsets of the interior of C1 − C2 such that their union
is the interior of C1 − C2 . Since C1 − C2 does not contains the zero vector (recall that C1 and
C2 are disjoint sets) each Kj ⊆ C1 − C2 does not contains the zero vector. Moreover, since
Kj is closed it contains a non-zero vector nj ∈ Kj with minimal norm.
We now apply the same argument leading to (A.7) with C1 replaced with Kj and C2
replaced with the zero set {0} (which is disjoint from Kj ). For such choices, d in (A.7)
equals ∥nj ∥2 so that (A.7) becomes 0 ⩽ ∥nj ∥2 ⩽ v · nj for all v ∈ Kj . We can therefore
normalize all {nj } and argue that they satisfies v · nj ⩾ 0 for all v ∈ Kj . Finally, the
sequence of normalized vectors {nj } contains a convergence subsequence (since the sphere
in Rn is compact), and therefore its limit n also satisfies v · n ⩾ 0 for all v in the interior of
C1 − C2 . Hence, by continuity, the inequality v · n ⩾ 0 must also hold for all v in C1 − C2
itself. This completes the proof for the case that the interior of C1 − C2 is not empty.
If the interior of C1 − C2 is empty then its span has a dimension strictly smaller than
the dimension of the whole space. Therefore, it is contained in some hyperplane {v ∈ Rn :
v · n = c} so that v · n ⩾ c for all v in C1 − C2 . As we argued before, this implies (A.2).
The remaining part of the proof for the case that C1 , C2 are closed and compact is left as an
exercise.

Exercise A.1.1. Show that if C1 and C2 are two convex subsets of Rn then C1 − C2 is also
convex.

Exercise A.1.2. Show that if d > 0 then (A.7) implies (A.2).

Exercise A.1.3. Complete the proof above. That is, show that (A.2) holds with strict
inequalities if C1 and C2 are closed and at least one of them is compact.

Exercise A.1.4. Show that if C1 and C2 are two disjoint convex subsets of Rn , and if C1 is
open in Rn , then there exist a nonzero vector n ∈ Rn and a real number c ∈ R such that

n · r2 ⩽ c < n · r1 ∀r1 ∈ C1 and ∀r2 ∈ C2 . (A.8)

Hint: Use the theorem above and the fact that separating hyperplanes cannot intersect the
interiors of convex sets.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

826 APPENDIX A. ELEMENTS OF CONVEX ANALYSIS

A.2 Convex Hulls, Faces, and Polytopes

The convex hull of a set K ∈ Rn , denoted by Conv(K), is the smallest convex set in Rn
that contains K. Equivalently, it is the intersection of all convex sets containing K. If K
contains a finite number of vectors, i.e. its cardinality is |K| < ∞, then Conv(K) is called a
polytop, and it is the set containing all convex combinations of the vectors in K. That is, if
K = {v1 , . . . , vm } then
 
X X 
Conv(K) := px vx : 0 ⩽ p x ∈ R , px = 1 . (A.9)
 
x∈[m] x∈[m]

Note that by definition, the convex hull of a single vector v ∈ Rn consists of just the vector
v. Hence, the set of all m-dimensional probability vectors is a polytope in Rm .
As a simple example, consider the set Prob(m) consisting of all probability vectors in Rm .
That is, Prob(m) denotes the set of all m-dimensional vectors with non-negative components
that sum to one. It is simple to check that

Prob(m) = Conv{e1 , . . . , em } , (A.10)

where {ex }x∈[m] is the standard basis of Rm .

The closed and open intervals between two vectors v1 , v2 ∈ Rn is defined respectively as

[v1 , v2 ] := {tv1 + (1 − t)v2 : 0 ⩽ t ⩽ 1}

(A.11)
(v1 , v2 ) := {tv1 + (1 − t)v2 : 0 < t < 1} .

Face
Definition A.2.1. Consider a convex set C ⊆ Rn . A subset F ⊆ C is called a face of
C if for any v ∈ F and any v1 , v2 ∈ C such that v ∈ (v1 , v2 ) we have v1 , v2 ∈ F.

In other words, F is a face of C if for any v1 , v2 ∈ C we have that

F ∩ (v1 , v2 ) ̸= Ø ⇒ v1 , v2 ∈ F. (A.12)

To have a better understanding of this definition, let CP:= Conv{v1 , . . . , vm } be the convex
hull of m vectors in Rn (i.e. C is a polytope), and let x∈[m] px vx be a vector that belongs
to a face F of the polytope C. Then, for any x ∈ [m] with px ∈ (0, 1) we must have vx ∈ F.
Hence, any face of C must be a convex hull of a subset of {v1 , . . . , vm }. Note, however, that
the converse is not necessarily true. That is, a convex hull of a subset of {v1 , . . . , vm } is not
necessarily a face.
For any x ∈ [m] the set {vx } (consisting of a single vector) is a face of the convex polytope
C ⊂ Rn . It is also called a vertex of the polytope. Any face F of C that can be expressed
as F = Conv{vx , vy }, where x, y ∈ [m] and x ̸= y is called an edge of the polytope C. Note
that we do not claim that Conv{vx , vy } is necessarily a face, only that if it is a face, then

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

A.2. CONVEX HULLS, FACES, AND POLYTOPES 827

Figure A.2: Faces of a 3D cube. The dashed line is not a face since it contains points in open
intervals (the purple line) with end points that are outside of the dashed line.

it is called an edge. Finally, a facet of C is a face that can be expressed as a convex hull of
n − 1 distinct vectors in {v1 , . . . , vm }. Therefore, faces of convex sets generalize the notion
of vertices, edges and facets of polytopes (see Fig. A.2).
Every vector w ∈ Rn can be used to define a face of a compact convex set C ⊂ Rn given
by

Fw := v ∈ C : w · v = max w · u . (A.13)
u∈C

To show that this set is indeed a face of C, observe first that Fw is non-empty since C is a
compact set. Now, let v = tv1 + (1 − t)v2 where v ∈ Fw , t ∈ (0, 1), and v1 , v2 ∈ C. Then,
by definition
max w · u = w · v = tw · v1 + (1 − t)w · v2
u∈C
⩽ t max w · u + (1 − t) max w · u (A.14)
u∈C u∈C
= max w · u .
u∈C

Hence, the inequality above must be an equality which can only hold if both w · v1 =
maxu∈C w · u and w · v2 = maxu∈C w · u. That is, v1 , v2 ∈ Fw .

Exercise A.2.1. Show that if v ∈ Fw then any vector v′ ∈ C with the property that

(v − v′ ) · w = 0 (A.15)

is also in Fw .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

828 APPENDIX A. ELEMENTS OF CONVEX ANALYSIS

A.3 Extreme Points

Extreme Point
Definition A.3.1. An extreme point of a convex set C ⊆ Rn is a point that does not
belong to any open interval of C.

In other words, an extreme point is a point that cannot be expressed as tv + (1 − t)w, for
some t ∈ (0, 1) and two distinct vectors v, w ∈ C (i.e. v ̸= w). Observe that by definition if
a convex set C = {v} consists of a single vector v ∈ Rn then v is an extreme point of C.

Exercise A.3.1. Let w ∈ Rn , C ⊆ Rn be a compact convex set, and let F be a face of

C. Show that if v ∈ Rn is an extreme point of F then it is also an extreme point of C.
Conversely, show that if v ∈ F is an extreme point of C then it is also an extreme point of
F.

Krein–Milman theorem
Theorem A.3.1. Every compact convex set of Rn equals to the closed convex hull
of its extreme points.

Remark. The theorem above indicates the significance and importance of extreme points
in convex analysis. The theorem implies in particular that the set of extreme points of a
compact convex set in Rn is non-empty. In its proof below we make use of the Zorn’s lemma
from set theory.

Proof. Let C ⊆ Rn be a non-empty compact convex set. We first prove that the set of extreme
points of C is non-empty. If C consists of a single vector then we are done. Otherwise, let
v1 , v2 ∈ C be two distinct vectors (i.e. v1 ̸= v2 ). From the hyperplane separation theorem
(see Theorem A.1.1) there exists a vector w1 ∈ Rn such that w1 · v1 > w1 · v2 . This implies
that the face Fw1 of C does not contain the point v2 (see the definition of Fw in (A.13)).
We next apply the same procedure to Fw1 . Specifically, if this set contains a single point
then that point is an extreme point, and from Exercise A.3.1 it is also an extreme point
of C so that we are done. Otherwise, the face Fw1 contains two vectors (that are not the
same) that can be separated by a hyperplane with a normal vector w2 . Hence, the face
Fw2 := {v ∈ Fw1 : w2 · v = maxu∈Fw1 w2 · u} of Fw1 does not contain one of the two vectors.
Continuing in this way, if the process does not stop at some step j for which Fwj contains a
single point (and therefore it must be an extreme point), then we get an infinite sequence of
faces {Fwj }∞
ȷ=1 that are ordered by strict inclusion

Fw1 ⊃ Fw2 ⊃ · · · ⊃ Fwj ⊃ · · · (A.16)

Such a sequence of compact closed convex sets has a minimal element (Zorn’s lemma) which
we denote by F. From the Exercise A.3.2 below, it follows that F is itself a face of C.
Therefore, if it contains more than one point then we can continue with the same procedure

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

A.3. EXTREME POINTS 829

to get a strict subset F′ ⊂ F in contradiction with the minimality of F. Hence, F must

contain exactly one element. This element must be an extreme point. We therefore proved
that the set of extreme point of C is non-empty.
To complete the proof, denote by K the closed convex hull of the extreme points of C.
Suppose by contradiction that there exists a vector u ∈ C that is not in K. Since the one
element set {u} and the set K are two closed disjoint sets, it follows from the hyperplane
separation theorem (Theorem A.1.1) that there exists a vector w ∈ Rn such that
w·v <w·u ∀v∈K. (A.17)
The above inequality implies that the face Fw as defined in (A.13) does not intersect the
set K. But from the first part of the proof, it follows that Fw contains an extreme point.
This point is not in K which is a contradiction to the definition of K. This completes the
proof.
Exercise A.3.2. Show that for any j ∈ N, the minimal set F is a face of Fwj .
As an example, consider the set STOCH(m, n) of all m × n column stochastic matrices.
This set is clearly convex. Its extreme points are the column stochastic matrices that has
in each column exactly one component equals to one and the rest are zero (can you prove
it?). Denote by {Fj }j∈[mn ] the mn extreme points of STOCH(m, n). Then, any matrix
M ∈ STOCH(m, n) can be expressed as
X
M= tj Fj (A.18)
j∈[mn ]

where t = (t1 , . . . , tmn )T is a probability vector.

The following theorem shows that it is always possible to bound the number of elements
in a convex combinations of vectors in Rn .

Carathéodory’s Theorem
Theorem A.3.2. Let K be a subset of Rn . If v ∈ Conv(K) then v can be written as
a convex combination of at most n + 1 elements of K.

Proof. Let v ∈ Conv(K). Then, there exists m ∈ N, an m-dimensional probability vector

p := (p1 , . . . , pm )T , and m vectors w1 , . . . , wm ∈ K such that
X
v= px w x . (A.19)
x∈[m]

If m ⩽ n+1 then we are done. Otherwise, m > n+1 so that the vectors w2 −w1 , . . . ., wm −w1
must be linearly dependent (since there are m − 1 > n of them). Let λ2 , . . . , λm ∈ R be
m − 1 numbers, not all zero, such that
m
X
λx (wx − w1 ) = 0 . (A.20)
x=2

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

830 APPENDIX A. ELEMENTS OF CONVEX ANALYSIS
Pm
Now, denote λ1 := − x=2 λx so that the equation above becomes
X
λx wx = 0 . (A.21)
x∈[m]

P
Observe that since x∈[m] λx = 0 the set {λx }x∈[m] contains at least one strictly positive
number (as we assume that not all of them are zero). We can therefore define

px
µ := min : λx > 0 , x ∈ [m] . (A.22)
λx

By definition, µ has the property that qx := px − µλx ⩾ 0 for all x ∈ [m]. Observe also
T
P
that x∈[m] qx = 1 so that q = (q1 , . . . , qm ) is a probability vector. In addition, from
the definition of µ, there exists at least one y ∈ [m] (the minimizer of (A.22)) such that
qy = py − µλy = 0. Without loss of generality suppose that y = m. We then get that the
convex combination
X X X
qx wx = qx w x = (px − µλx )wx
x∈[m−1] x∈[m] x∈[m]
X (A.23)
(A.21)→ = px w x = v .
x∈[m]

Hence, v can be expressed as a convex combination of m − 1 vectors in K. We then repeat

the process until we express v as a convex combination of n + 1 vectors in K.

Exercise A.3.3. Let

n o
C := N ∈ Rn×n : ∥N r∥2 ⩽ 1 ∀r∈R n
s.t. ∥r∥2 = 1 . (A.24)

1. Show that C is a compact convex set in Rn×n .

2. Show that a matrix O ∈ Rn×n is an extreme point of C if and only if O is an orthogonal

matrix (i.e. OT O = In ). Hint: Show first that N ∈ C if and only if N T N ⩽ In .

3. Show that every N ∈ C can be expressed as a convex combination of a finite number of

orthogonal matrices.

Exercise A.3.4. Let C ∈ Rn be a compact set (i.e., closed and bounded). Show that it’s
convex hull, Conv(C), is also compact. Hint: Use Carathéodory’s theorem.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

A.4. POLYHEDRONS 831

A.4 Polyhedrons
Polyhedron
Definition A.4.1. Let r1 , . . . , rm ∈ Rn be m vectors, and c1 , . . . , cm ∈ Rm . The set
n
C := v ∈ Rn : v · rx ⩽ cx ∀ x ∈ [m] . (A.25)

is called a convex polyhedron.

Exercise A.4.1. Show that C in (A.25) is a closed convex set.

The extreme points of polyhedrons are called vertices and our next goal is to characterize
them. Intuitively, one would expect that an extreme point e of the polyhedron C as defined
above should saturate some of inequalities given in (A.25). That is, we would expect that
e · rx = cx at least for some x ∈ [m]. The following theorem makes this intuition rigorous.

Theorem A.4.1. Let C ⊆ Rn be the polyhedron as defined in (A.25). Then, a point

e ∈ C is an extreme point of C if and only if the set
n o
K := rx : e · rx = cx , x ∈ [m] (A.26)

span Rn ; i.e. span{K} = Rn . In particular, if e is a vertex then |K| ⩾ n.

Proof. Let e ∈ C and suppose first that span{K} ̸= Rn . Then, there exists a vector v ∈ Rn
such that v · rx = 0 for all rx ∈ K. Since e ∈ C, for all rx ̸∈ K we must have rx · e < cx .
Thus, for sufficiently small ε > 0 we have for all x ∈ [m]

(e + εv) · rx ⩽ cx and (e − εv) · rx ⩽ cx . (A.27)

The equations above implies that w± := e ± εv ∈ C. On the other hand, e = 12 (w+ + w− ),

so that e is not an extreme point.
Suppose now that span{K} = Rn , and suppose that e = tv + (1 − t)w for some two
vectors v, w ∈ C and t ∈ (0, 1). By definition of C, we have rx · v ⩽ cx and rx · w ⩽ cx for
all x ∈ [m]. On the other hand, rx · e = cx for all rx ∈ K. That is, for any rx ∈ K we have

cx = rx · (tv + (1 − t)w) = trx · v + (1 − t)rx · w ⩽ tcx + (1 − t)cx = cx . (A.28)

| {z }
e

The above equation implies that rx · v = rx · w = cx for all rx ∈ K. Since span{K} = Rn

we must have that the linear system of equations rx · v = cx , over all x ∈ [m] with rx ∈ K
(here the components of v are viewed as the variables of the linear system of equations) has
a unique solution. This means that v = u = e, so that e is an extreme point.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

832 APPENDIX A. ELEMENTS OF CONVEX ANALYSIS

It is natural to ask what is the relationship between polytopes and polyhedrons. Re-
markably, if a polyhedron is bounded then it is a polytope.

Corollary A.4.1. Convex polyhedrons have a finite number of extreme points and if
they are bounded then they are polytopes (i.e. they are convex hulls of finitely many
vertices).

Proof. Let C be a polyhedron as in (A.25). From the Theorem A.4.1 we know that e is an
extreme point of C if and only if e is a solution to the linear system of equations, rx · e = cx ,
where x is running over all x ∈ [m] such that rx ∈ K. Since span{K} = Rn , the solution
to each such linear system of equations is unique, and moreover, since |K| ⩾ n there can
be no more than n extreme points ( m
m

n
is the number of n distinct vectors that can be
chosen from the set {r1 , . . . , rm }). Hence, polyhedrons have a finite number of extreme
points. Now, if C is also bounded it must be compact since convex polyhedrons are closed
(see Exercise A.4.1). Hence, in this case, from Krein–Milman theorem (i.e. Theorem A.3.1)
C is the convex hull of its extreme points. Since we proved that C has a finite number of
vertices, C must be a polytope.

A.5 Affine Subspaces and the Birkhoff Polytope

Affine Subspace
Definition A.5.1. Let A be a subspace of a vector space V , and let v ∈ V . The
translation n o
A := A+v := w+v : w ∈A (A.29)
is called an affine subspace of V . The dimension of A is defined as the dimension of
A.

From its definition, it is clear that if v ∈ A then A = A. The relevance of affine subspaces
to our study here is that shifting a subspace by a fixed vector does not change any of the key
properties of convex sets. Therefore, many of the theorems already covered in this chapter,
can be generalized in a straightforward manner to incorporate affine subspaces. For example,
in Theorem A.4.1 we assume that the polyhedron C is in Rn . Clearly, since all n-dimensional
vectors spaces over R isomorphic to Rn we can replace Rn with any n dimensional subspace
A of some vector space V , and moreover the theorem still holds if we replace A with an
affine subspace A since by shifting a polyhedron by a fixed vector we do not change any of
its properties.
An affine subspace A has the property P that for any v1 , . . . , vm ∈ A and any m real
numbers t1 , . . . , tm ∈ R that satisfies x∈[m] tx = 1 we have
X
tx vx ∈ A . (A.30)
x∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

A.5. AFFINE SUBSPACES AND THE BIRKHOFF POLYTOPE 833

Note that the coefficients {tx } can be negative (hence, in general, they do not form a prob-
ability vector).
As an example of an affine subspace, consider the subspace A ⊂ Rn×n consisting of all
the n × n real matrices whose rows and columns sum to zero. That is, N = (νxy ) ∈ A if and
only if X X
νx′ y = νxy′ = 0 ∀ x, y ∈ [n] . (A.31)
x′ ∈[n] y ′ ∈[n]

Then, the set n o

A := A + In := N + In : N ∈ A , (A.32)

is an affine subspace of Rn×n .

Exercise A.5.1. Let A and A be as above.

1. Show that A above is indeed a subspace, and show that |A| = (n − 1)2 .

2. Show that M := (µxy ) ∈ A if and only if its components satisfy

n
X X
µ x′ y = µxy′ = 1 ∀ x, y ∈ [n] . (A.33)
x′ =1 y∈[n]

The affine subspace A as defined in (A.32) contains the set of all doubly stochastic
matrices. A doubly stochastic matrix is an n × n matrix whose components are non-negative
and has the property that the entries of each row and column sums to one. The set of all
n × n doubly stochastic matrices is a polytope in the real vector space Rn×n , and we will
denote it by Bn (after Birkhoff). Doubly stochastic matrices appear quite often in several
resource theories.

Exercise A.5.2. Let Bn be the Birkhoff polytope.

1. Show that Bn is indeed a polytope. Hint: Show first that it is a bounded polyhedron
in A as defined in (A.25) (with the dot product replaced by the Hilbert Schmidt inner
product) and then use Corollary A.4.1.

2. Show that any permutation matrix is an extreme point of Bn . Recall that the entries
in each row or column of a permutation matrix consists of zeros except for one entry
being equal to 1.

The exercise above states that any permutation matrix is a vertex of Bn . It turns out
that there are no other vertices for Bn .

The Birkhoff-von Neumann Theorem

Theorem A.5.1. The vertices (extreme points) of Bn are exactly the permutation
matrices.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

834 APPENDIX A. ELEMENTS OF CONVEX ANALYSIS

Proof. We will prove the theorem by induction. The case n = 1 is trivial, so we assume now
that n > 1 and that the theorem holds for (n − 1) × (n − 1) doubly stochastic matrices. We
will denote by A the affine subspace (A.32). Therefore, M := (µxy ) ∈ A is in Bn if and only if
its entries satisfies µxy ⩾ 0. These n2 inequalities defines the polyhedron Bn . Now, according
to Theorem A.4.1, if M is an extreme point then the total number of equalities µxy = 0 must
2
P
be at least |A| = (n − 1) (see first part of Exercise A.5.1). Now, since y∈[n] µxy = 1 for all
x ∈ [n], M cannot contain a row (or column) with all zeros. On the other hand, suppose each
row of M has at least two non-zero components. In this case, the number of zero components
of M would not exceed n(n − 2) which is strictly smaller than (n − 1)2 (so that this case is
not possible). We therefore conclude that at least one of the rows, say the y-row, has exactly
one non-zero component, say the x-component of the y-row. This (x, y)-component must be
equal to 1 since the row sums to 1. This in turn implies that in the y-column, except for
the x-component, all the other components are zero. Therefore, crossing out the x-row and
y-column results with an (n − 1) × (n − 1) doubly stochastic matrix that is also an extreme
point. The proof is then concluded by the induction assumption.

Exercise A.5.3. Show that any n × n doubly stochastic matrix can be expressed as a convex
combination of m ⩽ (n − 1)2 + 1 permutation matrices. Hint: Use the above arguments in
conjunction with Carathéodory’s Theorem.

A.6 Polarity and Half Spaces

Definition A.6.1. Let C ∈ Rn be a non-empty set. The set

n
◦
C := w ∈ Rn : w · v ⩽ 1 for all v ∈ C (A.34)

is called the polar of C.

Note that irrespective of the set C, the polar C◦ is always a closed convex set that contains
the zero vector. It is also straightforward to check that the polar of Rn is the zero vector
and the polar of the set consisting only of the zero vector is the whole space Rn .

Exercise A.6.1. Let C, K ∈ Rn be two non-empty sets.

1. Show that if C ⊆ K then K◦ ⊆ C◦ .

2. Show that C ⊆ (C◦ )◦ .

3. Suppose C = Conv{v1 , . . . , vm } is a polytope with m vertices v1 , . . . , vm ∈ Rn . Show

that n
◦
C := w ∈ Rn : w · vx ⩽ 1 for all x ∈ [m] (A.35)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

A.6. POLARITY AND HALF SPACES 835

Note that from the third part of the exercise above we see that the polar of a polytope
is a polyhedron.

The Bipolar Theorem

Theorem A.6.1. Let C ⊆ Rn be a closed convex set that contains the zero vector.
Then, C = (C◦ )◦ .

Proof. The part C ⊆ (C◦ )◦ was given in the exercise above. We therefore prove here that
(C◦ )◦ ⊆ C. Suppose by contradiction that there exists a vector w ∈ (C◦ )◦ that is not in C.
Then, since C is closed convex set, from the hyperplane separation theorem (see Theorem A.2)
there exists a vector r ∈ Rn and a constant c ∈ R such that w · r > c > v · r for all v ∈ C.
Since the zero vector belongs to C, by taking v = 0 we get c > 0. Therefore, defining s := 1c r
we conclude that both w · s > 1 and 1 > v · s for all v ∈ C. The latter implies that s ∈ C◦ ,
but then the former implies that w ̸∈ (C◦ )◦ in contradiction with our assumption. This
completes the proof.

Exercise A.6.2. Let ε > 0 and Bε (0) := {v ∈ Rn : ∥v∥ ⩽ ε} be a ball of radius ε (in the
Euclidean norm). Show that
Bε (0)◦ = B1/ε (0) . (A.36)

The Polar of a Polytope

Theorem A.6.2. Let C be a polytope in Rn . Then, C◦ is also a polytope in Rn .

Proof. Without loss of generality we assume that the interior of C is not empty, and further-
more we assume that the zero vector is in the interior of C (otherwise, we can shift C so that
the origin of the coordinate system is in its interior). Therefore, there exists ε > 0 such that
Bε (0) ⊂ C. From Exercise A.6.1 this implies that

C◦ ⊂ Bε (0)◦ = B1/ε (0) , (A.37)

where the last equality follows from Exercise A.6.2. That is, C◦ is a bounded polyhedron.
From Corrolary A.4.1 it must be a polytope.

Every hyperplane separates Rn into two half spaces. A closed half-space of Rn is therefore
the set of all vector v ∈ Rn that satisfies n · v ⩽ c for some fixed c ∈ R and a fixed (normal)
vector n ∈ Rn . Therefore, a convex polyhedron as defined in (A.25) can be viewed as the
intersection of finitely many half-spaces. Similarly, in the following theorem we show that
a convex polytope can also be expressed as the intersection of finitely many half-spaces
(the half-spaces that are defined by its facets); see Fig. A.3. This means in particular that
every polytope is a polyhedron (recall that the converse of this assertion is also true if the
polyhedron is bounded).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

836 APPENDIX A. ELEMENTS OF CONVEX ANALYSIS

Figure A.3: Intersection of half-spaces

Theorem A.6.3. Let C := Conv{v1 , . . . , vm } be a convex polytope in Rn that

contains the zero vector. Then there exist k ∈ N vectors s1 , . . . , sk ∈ Rn such that
v ∈ C if and only if
sx · v ⩽ 1 ∀x = 1, . . . , k . (A.38)
In other words, C is the intersection of k half-spaces, and the facets of C are given by
n o
Fx := v ∈ C : v · sx = 1 ∀ x ∈ [k] . (A.39)

Remark. The condition that C contains the zero vector is just for convenience and in fact
unnecessary. Specifically, if 0 ̸∈ C then the theorem still hold if we replace the equations
sx · v ⩽ 1 with sx · v ⩽ rx , where rx are some real numbers (see Exercise A.6.3).

Proof. From Theorem A.6.2, the polar of C is itself a polytope. Therefore, there exists
k ∈ N, and s1 , . . . , sk ∈ Rn , such that C◦ = Conv{s1 , . . . , sk }. From the bipolar theorem
(Theorem A.6.1) we get

C = (C◦ )◦
n o (A.40)
(A.35)→ = v ∈ Rn : v · sx ⩽ 1 ∀ x ∈ [k] .

This completes the proof.

Exercise A.6.3. Show that the theorem above still holds even if 0 ̸∈ C as long as the
equations sx · v ⩽ 1 are replaced with sx · v ⩽ rx , where rx are some real numbers

The Supporting Hyperplane Theorem

Theorem A.6.4. Let C ⊂ Rn be a convex set and let v be a point on its boundary
(i.e. v belongs to the closure of C and not belong to the interior of C). Then, there
exists s ∈ Rn with the property that

v′ · s ⩾ v · s ∀ v′ ∈ C . (A.41)

Exercise A.6.4. Prove the supporting hyperplane theorem above. Hint: Use Theorem A.2.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

A.7. SUPPORT FUNCTIONS 837

A.7 Support Functions

Definition A.7.1. Let C be a compact subset of Rn . Then, the support function of

C, fC : Rn → R is defined for all v ∈ Rn as

fC (v) = max v · r : r ∈ C . (A.42)

Note that if C = Conv(r1 , . . . , rk ) is the convex hull of k vectors then

fC (v) = max v · rj , (A.43)
j∈[k]

is a sublinear functional. One of the most useful facts about support functions is the following
theorem.

Theorem A.7.1. Let C1 and C2 be two subsets of Rn . Suppose also that C1 is

closed, compact, and convex. Then,

C1 ⊇ C2 ⇐⇒ fC1 (v) ⩾ fC2 (v) ∀ v ∈ Rn . (A.44)

Proof. The direction that C1 ⊇ C2 implies fC1 ⩾ fC2 follows trivially from the definition. On
the other hand, if C1 ̸⊇ C2 then there exists a vector r ∈ C2 such that r ̸∈ C1 . Hence, the
sets {r} and C1 are two disjoint closed compact convex sets of Rn . From the hyperplane
separation theorem (see Theorem A.2) there exists c ∈ R and a vector n ∈ Rn such that
n · r′ < c < n · r ∀ r′ ∈ C1 (A.45)
Taking the maximum over r′ ∈ C1 gives
fC1 (n) < c < n · r ⩽ fC2 (n) . (A.46)
Hence, fC1 ̸⩾ fC2 . This completes the proof.

Lemma A.7.1. Let C1 and C2 be two compact convex sets of Rn . Then, their
support functions satisfy

fC1 +C2 (v) = fC1 (v) + fC2 (v) ∀ v ∈ Rn . (A.47)

Proof. By direct calculation we have

fC1 +C2 (v) = max v · (r1 + r2 )
r1 ∈C1 , r2 ∈C2

= max v · r1 + max v · r2 (A.48)

r1 ∈C1 r2 ∈C2

= fC1 (v) + fC2 (v) .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

838 APPENDIX A. ELEMENTS OF CONVEX ANALYSIS

A.8 Convex Cones

Cone and Dual Cone
Definition A.8.1. A subset K ⊆ A of a real Hilbert space A is called a cone if for
any non-negative real number t ⩾ 0, and any v ∈ K we have tv ∈ K. The dual of a
cone K in A is the set

K∗ := {v ∈ A : v · w ⩾ 0 for all w ∈ K} . (A.49)

It is simple to check (see Exercise A.8.1) that K∗ is both closed and convex.

Exercise A.8.1. Let A be a Hilbert space.

1. Show that if K ⊆ A is a cone then K∗ is closed and convex.

2. Show that if K1 , K2 ⊆ A are two cones such that K1 ⊆ K2 then K∗2 ⊆ K∗1 .

Example. Let A be a Hilbert space and consider the space Herm(A). Recall that Herm(A)
represents the (real) vector space of all Hermitian matrices acting on a Hilbert space A.
Since Herm(A) ∼ = Rn , with n := |A|2 , the definition of a cone and dual cone can be applied
to the vector space Herm(A). An important example of a cone in this space is the cone of
positive semidefinite matrices, K := Pos(A). This is a cone since if Λ ∈ Herm(A) is positive
semidefinite, i.e. Λ ⩾ 0, then also tΛ ⩾ 0 for all t ⩾ 0. Intrestingly, this cone is a self dual
cone in the sense that K∗ = K (see the exercise below).

Exercise A.8.2. Let A be a Hilbert space and consider the space Herm(A).

1. Show that the cone of positive semidefinite matrices is a self dual cone.

2. Show that the dual cone of the whole space K := Herm(A) is K∗ = {0} where 0 is the
zero matrix in Herm(A).

Theorem A.8.1. Let K ⊆ A be a cone in a Hilbert space A. Then, K∗∗ is the closer
of the smallest convex cone containing K. In particular, if K is a closed convex cone
then K∗∗ = K.

Proof. Let C be the closer of the smallest convex set containing K. By the definition of a
dual cone in (A.49), if w ∈ K then for all v ∈ K∗ we must have w · v ⩾ 0. On the other
hand,
K∗∗ := {u ∈ A : u · v ⩾ 0 for all v ∈ K∗ } . (A.50)
Therefore, if w ∈ K we must have w ∈ K∗∗ so that K ⊆ K∗∗ . Since K∗∗ is a closed convex set
it must contain C. Now, suppose by contradiction that the inclusion C ⊆ K∗∗ is strict. That

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

A.9. CONIC LINEAR PROGRAMMING AND SEMIDEFINITE PROGRAMMING 839

is, there exists v ∈ K∗∗ that is not in C. Then, from the hyperplane separation theorem (see
Theorem A.2) there exists a vector w ∈ Rn such that

w · v < µ := min w · u . (A.51)

u∈C

Since the zero vector belongs of C we have in particular that µ ⩽ 0 and w · v < 0. We argue
next that µ must be zero. Otherwise, µ < 0 so that there exists r ∈ C with w · r < 0. But
since C is a cone also tr ∈ C for any t > 0 so we get from the definition of µ that µ ⩽ w · (tr)
which goes to −∞ as t → ∞. This is not possible since according to (A.51) µ is bounded
from below. We therefore conclude that µ = 0. This in turn implies that w ∈ C∗ ⊆ K∗
(where we used the second part of Exercise A.8.1 in conjunction with the fact that K ⊆ C).
However, recall that v ∈ K∗∗ which implies in particular that v · w ⩾ 0, in contradiction
with w · v < 0. Therefore, our initial assumption that v ̸∈ C was incorrect. This completes
the proof.

A.9 Conic Linear Programming and Semidefinite Pro-

gramming
Conic linear programming (CLP) and particularly semidefinite programming (SDP) have
been used quite often in the field of quantum information science, as many of the optimization
problems involve linear functions. In this short subsection, we will only mention a few useful
properties of the vast field of CLP and SDP that will be useful to us later on. A reader who
is interested in more details on the subject will find some useful references in the section on
‘History and further reading’ at the end of this chapter.
In this book we will encounter many optimization problems that can be expressed in terms
of two cones. Specifically, let A1 and A2 be two Hilbert spaces, let K1 ⊆ V1 ⊆ Herm(A1 )
and K2 ⊆ V2 ⊆ Herm(A2 ) be two convex cones in two subspaces of Hermitian matrices V1
and V2 , and let N : V1 → V2 be a linear map between the two vector spaces. Let also
H1 ∈ V1 and H2 ∈ V2 be two (fixed) Hermitian matrices. With these notations, many of the
optimization problems that we will encounter in this book can be expressed as the following
problem which we will call here the primal problem.

The Primal Problem

Find α := inf Tr [ηH1 ]
(A.52)
Subject to N (η) − H2 ∈ K2 and η ∈ K1

Remark. The primal problem above has been expressed with respect to two vector spaces
of Hermitian matrices V1 and V2 since these are what we typically encounter in quantum
physics. However, everything that we will discuss in this section is also applicable for any
finite dimensional abstract Hilbert spaces V1 and V2 by replacing the Hilbert-Schmidt inner
product above with the inner product of the vector space V1 .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

840 APPENDIX A. ELEMENTS OF CONVEX ANALYSIS

Any η ∈ K1 that satisfies N (η) − H2 ∈ K2 is called a feasible plane or a primal feasible

plane. If there are no feasible planes then by convention α := +∞. Moreover, a primal
feasible plane η is said to be optimal if Tr[ηH1 ] = α.
The optimization problem above is a conic linear program (CLP), and if the cones K1
and K2 are Pos(A1 ) and Pos(A2 ), respectively, then (A.52) is known as an SDP optimization
problem. The latter can be solved efficiently with the help of one of the many SDP algorithms
studied in literature. While we will not study in this book the SDP algorithms themselves,
whenever an optimization problem can be formulated in an SDP we will simply say that the
problem can be solved efficiently (and algorithmically) on a computer.
There are many types of CLP and SDP optimization problems. Still, almost all of them
can be formulated as above. For example, consider the SDP optimization problem:
Find α := inf Tr [ηH]
Subject to Tr[ηωj ] = cj ∀ j ∈ [m]
η ∈ Herm(A) ,
where A is some Hilbert space, {cj } are some constants in R, and H and {ωj } are fixed
Hermitian matrices in Herm(A). We would like to show that this problem can be formulated
as in (A.52). For this purpose, define A1 := A, and K1 := Herm(A). Let V2 be the vector
space of m × m diagonal (Hermitian) matrices, and define a linear map N : Herm(A) → V2
via
N (η) := Diag Tr[ηω1 ], Tr[ηω2 ], . . . , Tr[ηωm ] ∀ η ∈ Herm(A) . (A.53)
Note that this map is indeed linear. Denoting by C the diagonal matrix with diagonal
(c1 , . . . , cm ), and by K2 the cone in V2 that contains only the zero matrix, we can express the
problem above as
Find α := inf Tr [ηH]
(A.54)
Subject to N (η) − C ∈ K2 and η ∈ K1
which has the same form as (A.52) after identifying C with H2 and H with H1 .
The primal problem (A.52) can also be expressed with a single cone. This can be done
as follows. Consider the vector space V := V1 ⊕ V2 and the cone K = K1 × K2 ⊆ V . Denote
also by H̃1 := H1 ⊕ 0 ∈ V , and by Ñ : V → V2 the linear map
Ñ (ξ) := N (η) − ζ ∀ ξ := η ⊕ ζ ∈ V . (A.55)
Note that with these notations we have that Tr[ηH1 ] = Tr[ξ H̃1 ], and the condition N (η) −
H2 ∈ K2 is equivalent to Ñ (ξ) = H2 since the latter is equivalent to N (η) − H2 = ζ ∈ K2
(we assume that ξ is arbitrary element of K). Hence, we get that the primal problem above
can be expressed as h i
Find α := inf Tr ξ H̃1
(A.56)
Subject to Ñ (ξ) = H2 and ξ ∈ K
That is, we were able to express the primal problem as an optimization problem that involves
a single cone. In fact, note that the optimization problem above has the exact same form

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

A.9. CONIC LINEAR PROGRAMMING AND SEMIDEFINITE PROGRAMMING 841

as the primal problem if we take K2 in the primal problem to be the cone consisting only of
the zero matrix.

A.9.1 Duality
Every primal CLP optimization problem has a dual problem. The dual problem of the primal
CLP problem given in (A.52) is given as follows.

The Dual Problem

Find β := sup Tr [ζH2 ]
(A.57)
Subject to H1 − N ∗ (ζ) ∈ K∗1 and ζ ∈ K∗2

Any ζ ∈ K∗2 that satisfies H1 − N ∗ (ζ) ∈ K∗1 is called a dual feasible plane. If there are no
dual feasible planes than by convention β := −∞.
Exercise A.9.1. Show that with the notations of (A.56), the dual problem can be expressed
as
Find β := sup Tr [ζH2 ]
Subject to H̃1 − Ñ ∗ (ζ) ∈ K∗ and ζ ∈ V2
The significance of the dual problem is that quite frequently α = β. We start first by
showing that α ⩾ β.

Weak Duality
Lemma A.9.1. For any primal feasible plane η, and dual feasible plane ζ, we have

Tr [ηH1 ] ⩾ Tr [ζH2 ] (A.58)

That is, α ⩾ β.

Proof. Let η and ζ be as in the lemma. Since H1 − N ∗ (ζ) ∈ K∗1 and η ∈ K1 we have from
the definition of a dual cone that the inner product
Tr H1 − N ∗ (ζ) η ⩾ 0 .

(A.59)
This inequality can be expressed as
Tr[H1 η] ⩾ Tr[ζN (η)] . (A.60)
On the other hand, since N (η) − H2 ∈ K2 , and ζ ∈ K∗2 we have that the inner product

Tr N (η) − H2 ζ ⩾ 0 . (A.61)
The above inequality can be expressed as
Tr[ζN (η)] ⩾ Tr [ζH2 ] . (A.62)
Combining (A.60) and (A.62) produce (A.58). This completes the proof.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

842 APPENDIX A. ELEMENTS OF CONVEX ANALYSIS

Exercise A.9.2. Let η ∈ K1 be such that N (η) − H2 ∈ K2 , and let ζ ∈ K∗2 be such that
H1 − N ∗ (ζ) ∈ K∗1 . Show that if in addition

Tr H1 − N ∗ (ζ) η = Tr N (η) − H2 ζ = 0

(A.63)

then α = β.
Exercise A.9.3. Show that if α = −∞ there are no dual feasible planes, and if β = +∞
there are no primal feasible planes.
The following theorem, known also as the strong duality theorem, is the key result of this
section that we will use quite often in the book. It provides a sufficient condition for α = β
to hold. We will use the notation int(K) to denote the interior of a cone K.

Strong Duality
Theorem A.9.1. We have α = β if one of the following two conditions hold:

1. K1 and K2 are closed convex cones and there exists a primal feasible plane.

2. There exists η ∈ int(K1 ) that satisfies N (η) − H2 ∈ int(K2 ) (this is known as

Slater’s condition), and here exists a primal optimal plane.

It turns out that in all the problems that we will consider in this book these mild condi-
tions (also known as Slater’s conditions) will hold so that α = β.
Proof. If α = −∞ then from Exercise A.9.3 there are no dual feasible planes so that by
convention β = −∞. Hence, in this case α = β. We therefore consider now the case
α > −∞ (i.e. α is bounded from below) and prove the sufficiency of the first condition.
From (A.56) and Exercise A.9.1 it is sufficient to prove the theorem for the case K2 = {0}.
We will therefore denote by K := K1 and assume that it is closed. Consider the convex cone
n o
C := N (η), Tr[ηH1 ] : η ∈ K ⊂ V2 ⊕ R . (A.64)

Since the set K is closed, also the set C is closed in V2 ⊕ R (recall that we are working
in finite dimensions). Note that any η ∈ K that satisfies N (η) = H2 results with a point
(H2 , Tr[ηH1 ]) ∈ C. We therefore interested in the intersection of the cone C with the line
n o
L := (H2 , t) : t ∈ R . (A.65)

The intersection C ∩ L consists of points of the form {(H2 , Tr[ηH1 ])} over all primal feasible
planes η. This intersection is closed (since both L and C are closed), and is not empty since
there is a primal feasible plane. Moreover, since the set of numbers {Tr[ηH1 ]} over all primal
feasible planes η is bounded from below (recall α > −∞), there exists a feasible optimal
plane η0 such that α = Tr[η0 H1 ]. In the rest of the proof, η0 will denote this feasible optimal
plane.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

A.10. FIXED-POINT THEOREM 843

From the Weak Duality Lemma we know that α ⩾ β. To show the converse, we will
show that for any ε > 0 we have β ⩾ α − ε so that we must have α = β. Set ε > 0 and
observe that from its definition, the point (H2 , α − ε) ̸∈ C. Therefore, from the hyperplane
separation theorem (Theorem A.2) there exists a hyperplane (ζ, s) ∈ V2 ⊕ R and a constant
c ∈ R such that

Tr[N (η)ζ] + sTr[ηH1 ] < c < Tr[ζH2 ] + s(α − ε) ∀η∈K. (A.66)

Note that on the left-hand side we have the inner product between (ζ, s) and (N (η), Tr[ηH1 ]) ∈
C, and on the right-hand side the inner product between (ζ, s) and (H2 , α − ε). Since we
can take η = 0 we must have c > 0. On the other hand, if we take η = η0 the left-hand
side becomes Tr[H2 ζ] + sα and when comparing it with the right-hand side we conclude that
1 s
s < 0. Moreover, since the rescaling ζ 7→ |s| ζ and s 7→ |s| does not change the inequalities
above, we can assume without loss of generality that s = −1. Therefore, since c > 0 the
right-hand side of the equation above gives

Tr[ζH2 ] > α − ε . (A.67)

It is therefore left to show that ζ is a dual feasible plane so that β, which is defined as the
supremum of Tr[ζH2 ] over all dual feasible planes, is also greater than α − ε. Indeed, since
K is a cone, we must have

Tr[N (η)ζ] − Tr[ηH1 ] ⩽ 0 ∀η∈K. (A.68)

Otherwise, if for some η ∈ K the left-hand side above is positive, then the inequality on the
left-hand side of (A.66) (with s = −1) will be violated for tη with t a positive real number
that is sufficiently large. The equation above can be expressed as
h i
Tr η H1 − N ∗ (ζ) ⩾ 0 ∀η∈K, (A.69)

which is equivalent to H1 −N ∗ (ζ) ∈ K∗ . Hence, ζ is a dual feasible plane. This completes the
proof of the sufficiency of the first condition. For the second condition see Exercise A.9.4.

Exercise A.9.4. Prove the sufficiency of the second condition (slater’s condition) in the
theorem above. Hint: Define C as in the proof above but with int(K) replacing K, and use the
version in (A.8) of the hyperplane separation theorem.

A.10 Fixed-Point Theorem

We end this short review on convex analysis with an important result in analysis known
as Brouwer’s fixed-point theorem. We will only state the theorem without proving it as the
proof involves material that is not covered in this book.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

844 APPENDIX A. ELEMENTS OF CONVEX ANALYSIS

Brouer’s Fixed-Point Theorem

Theorem A.10.1. Let n ∈ N, C ⊂ Rn a compact convex set, and f : C → C be a
continuous function. Then, there exists v ∈ C such that

f (v) = v . (A.70)

In quantum information this theorem is typically used for functions from density matrices
to density matrices. One example of such linear functions are quantum channels. However,
observe that the theorem above holds for all continuous functions (not only linear ones).

A.11 Notes and References

There are many excellent books on convex analysis. We followed [13] in the presentation of
conic linear programming. Also [230] gives an overview on semidefinite programming with
a focus on problems in quantum information.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

APPENDIX B

Operator Monotonicity and Operator

Convexity

Let I be an interval in R, and consider a real function f : I → R. Any such function

can be extended to Hermitian matrices whose eigenvalues are in I. This can be done as
follows. First, if D := Diag(λ1 , . . . , λn ) is an n × n diagonal
matrix with the diagonal entries
λ1 , . . . , λn ∈ I then f (D) := Diag f (λ1 ), . . . , f (λn ) . If η is an Hermitian matrix with
eigenvalues λ1 , . . . , λn ∈ I, it can be expressed as η = U DU ∗ , where D := Diag(λ1 , . . . , λn )
and U is a unitary matrix. We then define f (η) := U f (D)U ∗ . In the rest of this section, we
will always assume this extension of a real function f : I → R to Hermitian matrices.

Exercise B.0.1. Let M ∈ Cn×n be a square complex matrix, and let I be an interval in R
containing the eigenvalues of M M ∗ . Show that for any function f : I → R we have

M f (M ∗ M ) = f (M M ∗ )M . (B.1)

Hint: Use the singular value decomposition M = U DV (and in particular M ∗ = V ∗ DU ∗ ),

and express separately both sides of the equation above in terms of U , D, and V .

B.1 Definitions and Basic Properties

Definition B.1.1. Let I be an interval in R, and f : I → R a real function.

1. We say that f is operator monotone if for every Hilbert space A and any
η, ζ ∈ Herm(A) that satisfies η ⩾ ζ we have f (η) ⩾ f (ζ).

2. We say that f is operator convex if for every Hilbert space A, η, ζ ∈ Herm(A),

and t ∈ [0, 1]
f (tη + (1 − t)ζ) ⩽ tf (η) + (1 − t)f (ζ) . (B.2)

845
846 APPENDIX B. OPERATOR MONOTONICITY AND OPERATOR CONVEXITY

Remark. In addition, we say that f is operator anti-monotone if −f is operator monotone,

and operator concave if −f is operator convex.
Not every monotonic function is operator monotone, and not every convex function is
operator convex. For example, consider the function f (r) = r2 . This function is monoton-
ically increasing in the domain [0, ∞). However, it is not operator monotone. To see why,
consider the matrices    
2 1 1 1
η=  and ζ :=   . (B.3)
1 1 1 1
 
1 0
Observe that η − ζ =   ⩾ 0 so that η ⩾ ζ. On the other hand, a simple calculation
0 0
reveals that  
3 1
f (η) − f (ζ) = η 2 − ζ 2 =   ̸⩾ 0 . (B.4)
1 0
Hence, f is not operator monotone.
As another example, consider the function f (r) = r3 . This function is convex on the
interval [0, ∞). However, it is not operator convex on that interval.
3
Exercise B.1.1. Show  that the
 function
 f (r)
 = r is not operator convex on the interval
3 1 1 1
[0, ∞). Hint: Take η =  , ζ =  , and t = 1 .
2
1 1 1 1

Exercise B.1.2. Show that the function f (r) = a + br (defined on any interval) is operator
monotone for any a ∈ R and b ⩾ 0. Show that it is operator convex on any a, b ∈ R.

Exercise B.1.3. Let f1 , f2 : I → R be two real functions and define for any r ∈ I, f (r) :=
af1 (r) + bf2 (r) for some fixed non-negative real numbers a, b ∈ R+ .

1. Show that if f1 and f2 are operator convex then f is operator convex.

2. Show that if f1 and f2 are operator monotones then f is operator monotone.

In this book we will only work with continuous functions. In this case, the condition (B.2)
for operator convexity can be replaced with a more special condition in which we take t = 21 .

Lemma B.1.1. Let f : I → R be a continuous function. Then, f is operator convex

if and only if for any Hilbert space A, any η, ζ ∈ Herm(A)

η+ζ f (η) + f (ζ)
f ⩽ . (B.5)
2 2

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

B.2. KEY EXAMPLES 847

Proof. Clearly, if f satisfies (B.2) then it satisfies (B.5). We therefore show that (B.5)
implies (B.2). Let η, ζ ∈ Herm(A) and suppose (B.5) holds. Observe that for t = 1/4 we get

1 3 1 1 1 1
f η+ ζ =f η+ ζ + ζ
4 4 2 2 2 2

1 1 1 1
(B.5)→ ⩽ f η + ζ + f (ζ)
2 2 2 2 (B.6)
1 1
(B.5)→ ⩽ f (η) + f (ζ) + f (ζ)
4 2
1 3
= f (η) + f (ζ).
4 4
Hence, the condition (B.2) holds for t= 41 and t = 34 . Similarly, by repetition (e.g. taking
convex combinations 12 η + 12 14 η + 43 ζ , etc) it follows that (B.2) must hold for all dyadic
rationals, i.e. numbers of the form t = 2mn where n ∈ N is arbitrary and m is any integer in
[2n ]. Since the set of such dyadic rationals is dense in [0, 1], it follows from the continuity of
f that (B.2) holds for all t ∈ [0, 1]. This completes the proof.
Exercise B.1.4. Use the lemma above to prove that the function f (t) = t2 is operator
convex on any interval. Hint: Show that the difference between f (η)+f (ζ) η+ζ

2
and f 2
can be
expressed as a square of an Hermitian matrix.

B.2 Key Examples

Consider the function f (r) = rα for some α ∈ R. We already saw that for α = 2 the function
is operator convex on any interval (see Exercise B.1.4). Another relatively simple case, is the
case α = −1. In this case, the function f (r) = 1r is operator anti-monotone in the interval
1
I = (0, ∞). To see this, suppose η ⩾ ζ > 0. Then, conjugating both sides of η ⩾ ζ by η − 2
1 1 1 1
gives I ⩾ η − 2 ζη − 2 . This means that all the eigenvalues of η − 2 ζη − 2 are no greater than one.
Therefore, all the eigenvalues of its inverse are at least one. Hence,
1 −1
1
η − 2 ζη − 2 ⩾I. (B.7)
1 1 1
The above inequality is equivalent to η 2 ζ −1 η 2 ⩾ I and after conjugating both sides by η − 2
we get ζ −1 ⩾ η −1 .
As another example, consider the case α = 12 . In this case we need to show that if
1 1
η ⩾ ζ ⩾ 0 then η 2 ⩾ ζ 2 . Suppose first that η > 0. In this case, by conjugating both sides
1
of η ⩾ ζ by η − 2 gives
1 1 1 1
I ⩾ η − 2 ζη − 2 = N ∗ N where N := ζ 2 η − 2 . (B.8)

The above inequality implies in particular that the maximal eigenvalue of the complex matrix
1 1
N cannot exceed one. Observe that the matrix N is similar to the matrix η − 4 N η 4 =

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

848 APPENDIX B. OPERATOR MONOTONICITY AND OPERATOR CONVEXITY

1 1 1
η − 4 ζ 2 η − 4 which is Hermitian. Since similar matrices has the same eigenvalues we conclude
1 1 1 1 1 1
that I ⩾ η − 4 ζ 2 η − 4 . Conjugating both sides by η 4 we conclude that η 2 ⩾ ζ 2 . The case
that η is not strictly positive (but still positive semidefinite) follows from the fact that η ⩾ ζ
implies that η +εI ⩾ ζ for any ε > 0. Hence, since η +εI > 0 we conclude from the argument
1 1
above that (η + εI) 2 ⩾ ζ 2 . Since this inequality holds for√all ε > 0 it must also hold for
ε = 0. This completes the proof that the function f (r) = r is operator monotone in the
domain [0, 1].
It is possible to show that for any α ∈ [0, 1] this function is operator monotone on the
domain [0, ∞). In Table B.1 we summarized everything that is known in literature about
the operator monotonicity and convexity of the function f (r) = rα . In the section ‘History
and further readings’ we give more information about where the proofs can be found.

Table B.1: The Function f : I → R : r 7→ rα

Interval Range of Operator Operator Operator Anti- Operator

I α Monotone Convex Monotone Concave

[0, ∞) [0, 1] Yes No No Yes

(0, ∞) [−1, 0) No Yes Yes No
[0, ∞) [1, 2] No Yes No No
(0, ∞) (−∞, −1) ∪ (2, ∞) No No No No

Other important examples of functions that appears a lot in applications are the log
function f (r) = log(r) defined on the interval (0, ∞) as well as the function f (r) = −r log r.
The former is known to be both operator concave and operator monotone, while the latter
is known to be operator concave.

B.3 Trace Functions

A trace function is a function from Herm(A) to the real line, of the form

η 7→ Tr[f (η)] ∀ η ∈ Herm(A) , (B.9)

where f : R → R. Such functions appear in many applications, and we will see later on
that certain key quantities in quantum information, such as entropies and relative entropies,
are defined in terms of trace functions. For our purposes, we will always assume that f is
continuous.

Exercise B.3.1. Let f : R → R be continuously differentiable, fix η, ζ ∈ Herm(A), and

define
g(t) := Tr[f (η + tζ)] ∀t∈R. (B.10)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

B.3. TRACE FUNCTIONS 849

Use the divided difference approach discussed in Appendix D.1 to show that the function g(t)
is continuously differentiable and

g ′ (t) = Tr [ζf ′ (η + tζ)] . (B.11)

Theorem B.3.1. Let f : R → R be continuous, and A a finite dimensional Hilbert

space.

1. If the function f (t) is monotonically non-decreasing in R then the function

η 7→ Tr[f (η)] is monotonically non-decreasing in η ∈ Herm(A).

2. If the function f (t) is convex in R then the function η 7→ Tr[f (η)] is convex in
η ∈ Herm(A).

Proof. Part 1. Suppose first that f is differentiable so that f ′ (t) ⩾ 0 for all t ∈ R. Under
this assumption we have that f ′ (ξ) ⩾ 0 for any ξ ∈ Herm(A) (note that ξ does not have to
be positive semidefinite). Let η, ζ ∈ Herm(A) be such that η ⩾ ζ. We need to show that
Tr[f (η)] ⩾ Tr[f (ζ)]. Set ρ := η − ζ and observe that ρ ∈ Pos(A). For any t ∈ [0, 1] define
the function
g(t) := Tr [f (ζ + tρ)] , (B.12)
so that g(0) = Tr[f (ζ)] and g(1) = Tr[f (η)]. Therefore,
Z 1
g(1) − g(0) = g ′ (t)dt
Z0 1
Exercise B.3.1→ = Tr [ρf ′ (ζ + tρ)] dt (B.13)
0
Z 1
Tr ρ1/2 f ′ (ζ + tρ) ρ1/2 dt

ρ⩾0 −−−−→ =
0

Finally, since f ′ (ζ + tρ) ⩾ 0, also ρ1/2 f ′ (ζ + tρ) ρ1/2 ⩾ 0 so that the integrand on the right-
hand side of the equation above is non-negative. Hence, g(1) ⩾ g(0). This completes the
proof for the case that f is differentiable. The proof of the case that f is only continuous (but
not necessarily differentiable) follows from continuity by taking a sequence of continuously
differentiable functions whose limit is f (such a sequence P always exists).
Part 2. Consider the spectral decomposition of η = x∈[m] λx Πx , where Πx := |x⟩⟨x|,
{|x⟩}x∈[m] form an orthonormal eigenbasis of A, and each λx ∈ R. Let {|ψy ⟩}y∈[m] be another
orthonormal basis of A. Then,
XD X E X X
Tr[f (η)] = ψy f (λx )Πx ψy = f (λx )⟨ψy |Πx |ψy ⟩
y∈[m] x∈[m] y∈[m] x∈[m]
X X (B.14)
f is convex→ ⩾ f λx ⟨ψy |Πx |ψy ⟩ .
y∈[m] x∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

850 APPENDIX B. OPERATOR MONOTONICITY AND OPERATOR CONVEXITY
P
Since η = x∈[m] λx Πx we conclude that
X
Tr[f (η)] ⩾ f ⟨ψy |η|ψy ⟩ ∀ η ∈ Herm(A) . (B.15)
y∈[m]

Now, let t ∈ [0, 1], η, ζ ∈ Herm(A), and {|ψy ⟩}y∈[n] be an orthonormal basis of A consisting
of the eigenvalues of tη + (1 − t)ζ. For these choices we get
X
Tr f (tη + (1 − t)ζ) = ⟨ψy |f (tη + (1 − t)ζ)|ψy ⟩
y∈[m]
X
|ψy ⟩ is an eigenvector
of tη+(1−t)ζ
−−−−→ = f ψy tη + (1 − t)ζ ψy
y∈[m]
X
= f t⟨ψy η|ψy ⟩ + (1 − t)⟨ψy |ζ|ψy ⟩ (B.16)
y∈[m]
X X
f is convex→ ⩽ t f ⟨ψy |η|ψy ⟩ + (1 − t) f ⟨ψy |ζ|ψy ⟩
y∈[m] y∈[m]

(B.15)→ ⩽ tTr[f (η)] + (1 − t)Tr[f (ζ)] .

This completes the proof.

Exercise B.3.2. Let K ∈ L(A), α ∈ (0, ∞), and define the function f : Pos(A) → R via

f (ρ) := Tr [(K ∗ ρK)α ] ∀ ρ ∈ Pos(A) . (B.17)

Show that if ρ, σ ∈ Pos(A) satisfy ρ ⩾ σ then f (ρ) ⩾ f (σ).

von Neumann’s Trace Inequality

Theorem B.3.2. Let M, N ∈ Cn×n be two complex matrices with singular values
µ1 ⩾ · · · ⩾ µn and ν1 ⩾ · · · ⩾ νn , respectively. Then,
X
Tr[M N ] ⩽ µ x νx . (B.18)
x∈[n]

Proof. Using the singular value decomposition, we have M = U1 D1 V1 and N = U2 D2 V2 ,

where D1 := Diag{µ1 , . . . , µn }, D2 := Diag{ν1 , . . . , νn }, and U1 , U2 , V1 , V2 are four unitary
matrices. We therefore need to show that

Tr[D1 U D2 V ] ⩽ Tr[D1 D2 ] , (B.19)

where U := V1 U2 and V := V2 U1 are two unitary matrices. For any k ∈ [n] let
X
Πk := |x⟩⟨x| , (B.20)
x∈[k]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

B.3. TRACE FUNCTIONS 851

and observe that the diagonal matrices D1 and D2 can be expressed as

X X
D1 = (µk − µk+1 )Πk and D2 = (νk − νk+1 )Πk , (B.21)
k∈[n] k∈[n]

with the convention that µn+1 = νn+1 = 0. Denoting by ak := µk − µk+1 and bk := νk − νk+1 ,
and using the triangle inequality we get that
X X
Tr[D1 U D2 V ] ⩽ ak bℓ Tr[Πk U Πℓ V ] and Tr[D1 D2 ] = ak bℓ Tr[Πk Πℓ ] . (B.22)
k,ℓ∈[n] k,ℓ∈[n]

Therefore, the proof will be concluded by showing that for each k, ℓ ∈ [n]

Tr[Πk U Πℓ V ] ⩽ Tr[Πk Πℓ ] . (B.23)

Without loss of generality suppose ℓ ⩾ k. Then,

X X
Tr[Πk U Πℓ V ] = ⟨x|U Πℓ V |x⟩ ⩽ ⟨x|U Πℓ V |x⟩ ⩽ k , (B.24)
x∈[k] x∈[k]

where the last inequality follows from the fact that ⟨x|U Πℓ V |x⟩ ⩽ 1 (see Exercise B.3.3).
Since Tr[Πk Πℓ ] = k the equation above implies (B.23). This completes the proof.

Exercise B.3.3. Use Cauchy-Schwarz inequality to show that ⟨x|U Πℓ V |x⟩ ⩽ 1.

Exercise B.3.4. Using the same notations as above, show that

X
max Tr[M U N V ] = µ x νx . (B.25)
U,V ∈U(n)
x∈[n]

Ruhe’s Trace Inequality

Theorem B.3.3. Let M, N ∈ Herm(A) be two Hermitian matrices with eigenvalues
µ1 ⩾ · · · ⩾ µn and ν1 ⩾ · · · ⩾ νn , respectively. Then,
X X
µx νn+1−x ⩽ Tr[M N ] ⩽ µ x νx . (B.26)
x∈[n] x∈[n]

Proof. Since M is Hermitian we can work in its eigenbasis so that without loss of gener-
ality we will assume that M = D1 := Diag(µ1 , . . . , µn ) is a diagonal matrix. We will also
decompose N = U D2 U ∗ , where D2 := Diag(ν1 , . . . , νn ), and U is unitary. Thus,
X
Tr[M N ] = Tr [D1 U D2 U ∗ ] = ak bℓ Tr [Πk U Πℓ U ∗ ] , (B.27)
k,ℓ∈[n]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

852 APPENDIX B. OPERATOR MONOTONICITY AND OPERATOR CONVEXITY

where ak , bℓ , and Πk , are the same as in the proof of the von-Neumann trace inequality above.
Note that while the eigenvalues {µk } and {νk } can be negative, the differences ak := µk −µk+1
and bk := νk − νk+1 are non-negative for all k ∈ [n]. Combining this with (B.23) and the
equation above we conclude that
X X
Tr [D1 U D2 U ∗ ] ⩽ ak bℓ Tr [Πk Πℓ ] = Tr[D1 D2 ] = µ x νx . (B.28)
k,ℓ∈[n] x∈[n]

This completes the proof of the upper bound in (B.26).

For the proof of the lower bound in (B.26), we repeat the exact same lines as above with
D1 remain unchanged, but with D2 := Diag(νn , . . . , ν1 ). This change of order in the diagonal
of D2 implies that for each k ∈ [n] the coefficient bk ⩽ 0. We therefore get in this case that
X X
Tr [D1 U D2 U ∗ ] = ak bℓ Tr [Πk U Πℓ U ∗ ] = ak |bℓ | (−Tr [Πk U Πℓ U ∗ ])
k,ℓ∈[n] k,ℓ∈[n]
X X (B.29)
(B.23)→ ⩾ ak |bℓ | (−Tr [Πk Πℓ ]) = ak bℓ Tr [Πk Πℓ ] = Tr[D1 D2 ] .
k,ℓ∈[n] k,ℓ∈[n]

Finally, observe that Tr[D1 D2 ] equals to the left-hand side of (B.26). This completes the
proof.

B.4 Characterization of Operator Convexity

In this subsection we characterize operator convex functions in terms of Jensen’s inequality.
These characterizations will be used numerous times in this book, particularly in the study
of quantum divergences and Rényi relative entropies.

The Operator Jensen’s Inequality (Isometry Form)

Theorem B.4.1. Let f : I → R be a real function. Then, f is operator convex on I
if and only if for any isometry V : A → B (with |A| ⩽ |B| < ∞), and any
ρ ∈ Herm(B),
f V ∗ ρV ⩽ V ∗ f (ρ)V .

(B.30)

Remark. Note that for |A| = 1 and isometry V = |ψ⟩ ∈ B one obtains the more familiar
Jensen’s inequality
f ⟨ψ|ρ|ψ⟩ ⩽ ⟨ψ|f (ρ)|ψ⟩ . (B.31)
In this case, it is sufficient to require that f is convex.
Proof. Suppose first that f is operator convex, and let V : A → B be an isometry. For
simplicity denote by m = |A| and n = |B|, and let U be a unitary matrix obtained from the
n × m isometry V by adding n − m columns to V . That is, U can be expressed as

U= V N (B.32)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

B.4. CHARACTERIZATION OF OPERATOR CONVEXITY 853

where N is an n × (n − m) matrix.  
M11 M12
Every matrix M ∈ L(B) can be expressed in a block matrix form as M =  ,
M21 M22
where the block matrix M11 is m × m and the rest of the block matrices are such that M is
n × n. With this in mind, note that for any ρ ∈ Herm(B)
   
∗ ∗ ∗
V V ρV V ρN
U ∗ ρU =   ρ V N = 

 . (B.33)
∗ ∗ ∗
N N ρV N ρN
Finally, we define a linear map E : Herm(B) → Herm(B) via
1 1
E(σ) = σ + ZσZ ∀ σ ∈ Herm(B) , (B.34)
2 2
 
Im 0
where Z =  . A key property of this map is that it acts as a type of a dephasing
0 −In−m
map (in fact, it belongs to a type of quantum channels known as the pinching channels).
Particularly, note that
 
∗
V ρV 0
E(U ∗ ρU ) =   ∀ ρ ∈ Herm(B) . (B.35)
∗
0 N ρN
This also implies that for any ρ ∈ Herm(B)
 
∗
f (V ρV ) 0
f E(U ∗ ρU ) = 

 . (B.36)
∗
0 f (N ρN )
With these notations we get from (B.36)
∗
∗

f V ρV = f E(U ρU )
11

1 ∗ 1 ∗
f is operator convex→ ⩽ f (U ρU ) + f (ZU ρU Z)
2 2
11
1 ∗ 1
U and UZ are unitaries→ = U f (ρ) U + ZU ∗ f (ρ) U Z = E U ∗ f (ρ)U
2 2 11 11

(B.35)→ = V ∗ f (ρ)V .
(B.37)
Therefore, f satisfies the condition given in (B.30).
We next assume that f satisfies (B.30) and use it to show that f is operator convex. Let
A be a Hilbert space, t ∈ [0, 1], and define V : A → A ⊕ A to be the matrix
 
1
t2 IA
V = 1
 , (B.38)
A
(1 − t) 2 I

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

854 APPENDIX B. OPERATOR MONOTONICITY AND OPERATOR CONVEXITY

where I A is the identity matrix on A. It is simple to check that V ∗ V = I A so that V is an

isometry. Moreover, a direct calculation gives for any ρ, σ ∈ Herm(A)
 
ρ 0
V∗  V = tρ + (1 − t)σ . (B.39)
0 σ
We therefore get that
   
ρ 0
f tρ + (1 − t)σ = f V ∗ 

V 
0 σ
   
ρ 0 f (ρ) 0 (B.40)
(B.30)→ ⩽ V ∗ f   V = V ∗  V
0 σ 0 f (σ)
(B.39)→ = tf (ρ) + (1 − t)f (σ) .
Hence, f is operator convex. This completes the proof.
The Operator Jensen’s Inequality (Algebraic Form)
Theorem B.4.2. Let f : I → R be a real function, and suppose that 0 ∈ I and
f (0) ⩽ 0. Then, f is operator convex on I if and only if for any Hilbert space A, any
ρ, σ ∈ Herm(A), and any M1 , M2 ∈ L(A) such that M1∗ M1 + M2∗ M2 ⩽ I we have

f (M1∗ ρM1 + M2∗ σM2 ) ⩽ M1∗ f (ρ)M1 + M2∗ f (σ)M2 . (B.41)

Remark. The condition that f (0) ⩽ 0 cannot be removed from the theorem above. This
condition is necessary for this version of Jensen’s inequality, since by taking |A| = 1 and
setting M1 = M2 = 0 in (B.41) we get that f (0) ⩽ 0.
Proof. Suppose first that f is operator convex, and let ρ, σp ∈ Herm(A), and M1 , M2 ∈
L(A) be such that M1∗ M1 + M2∗ M2 ⩽ I. Define M3 := I − M1∗ M1 − M2∗ M2 so that
∗
P
x∈[3] Mx Mx = I. Finally, denote by
   
M1 ρ 0 0
   
V := M2  and ω :=  0 σ 0 . (B.42)
   
   
M3 0 0 0
Observe that the matrix V : A → A ⊕ A ⊕ A is an isometry since V ∗ V = x∈[3] Mx∗ Mx = I.
P
We therefore get
f (M1∗ ρM1 + M2∗ σM2 ) = f (V ∗ ωV )
(B.30)→ ⩽ V ∗ f (ω)V
(B.43)
(B.42)→ = M1∗ f (ρ)M1 + M2∗ f (σ)M2 + M3∗ f (0)M3
∗ ∗
f (0) ⩽ 0 −−−−→ ⩽ M1 f (ρ)M1 + M2 f (σ)M2 .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

B.4. CHARACTERIZATION OF OPERATOR CONVEXITY 855

This complete the first direction of the theorem.

For the converse, suppose that (B.41) holds for all M1 and M2 as in the theorem above.
Then, by taking M2 to be the zero matrix, and M1 to be a projection in Herm(A), we get
that the inequality (B.41) implies, in particular, that for any projection Π ∈ Herm(A) and
any ρ ∈ Herm(A) we have
f (ΠρΠ) ⩽ Πf (ρ)Π . (B.44)
Now, define the unitary matrix U : A ⊕ A → A ⊕ A via
 
1/2 A 1/2 A
t I −(1 − t) I
U :=   (B.45)
(1 − t)1/2 I A t1/2 I A

This matrix is the unitary extension of the isometry V as defined in (B.38). Particularly,
note that U and the isometry V in (B.38) satisfy the relation
   
1/2 A A A
t I 0 I 0
UΠ =   = V 0A where Π :=   . (B.46)
(1 − t)1/2 I A 0 0A 0A

Combining this with (B.39) yields

     
f tρ + (1 − t)σ 0 ρ 0
  = f ΠU ∗   U Π
0 f (0) 0 σ
   
ρ 0
(B.44)→ ⩽ Πf U ∗   U Π
0 σ
  (B.47)
f (ρ) 0
= ΠU ∗   UΠ
0 f (σ)
 
tf (ρ) + (1 − t)f (σ) 0
(B.39)→ =   .
0 0

Hence, f is operator convex. This completes the proof.

Exercise B.4.1. Show that a function f : I → R with 0 ∈ I and f (0) ⩽ 0 is operator

P only ∗if for allAn ∈ N, all ρ1 , . . . , ρn ∈ Herm(A), and all M1 , . . . , Mn ∈ L(A)
convex if and
such that x∈[n] Mx Mx ⩽ I we have
 
X X
f Mx∗ ρx Mx  ⩽ Mx∗ f (ρx ) Mx . (B.48)
x∈[n] x∈[n]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

856 APPENDIX B. OPERATOR MONOTONICITY AND OPERATOR CONVEXITY

B.5 The Kobu-Ando Operator Mean

Operator convexity can also be characterized in terms of the following operator mean.

Definition B.5.1. Let f : [0, ∞) → [0, ∞) be a continuous function. The

Kobu-Ando operator mean (also known as the perspective function associated with
f ) is a map
#f : Pos(A) × Pos>0 (A) → Pos>0 (A) (B.49)
defined for any ρ ∈ Pos(A) and σ ∈ Pos>0 (A) by
1
1 1
1
ρ#f σ := σ 2 f σ − 2 ρσ − 2 σ 2 (B.50)
√
Moreover, if f (t) = t then #f is denoted simply by # and is called the operator
geometric mean.

Remark. We exchanged the roll of σ and ρ from the original definition as it will be more
convenient in the context of quantum information to work with this definition.

Theorem B.5.1. Let f : [0, ∞) → [0, ∞) be a continuous function. The following

are equivalent:

1. f is operator convex.

2. #f is jointly convex.

Proof. We first prove the direction 1 ⇒ 2. Let ρ = tρ1 + (1 − t)ρ2 and σ = tσ1 + (1 − t)σ2
1 1
with t ∈ (0, 1), ρ1 , ρ2 ∈ Pos(A) and σ1 , σ2 ∈ Pos>0 (A). Define the matrices M1 := (tσ1 ) 2 σ − 2
1 1
and M2 := (1 − t)σ2 2 σ − 2 . Observe that these matrices form a generalized measurement;
i.e. M1∗ M1 + M2∗ M2 = I A . Moreover, in terms of these matrices we can express the term
1 1
σ − 2 ρσ − 2 in (B.50) as
1 1 −1 −1 −1 −1
σ − 2 ρσ − 2 = M1∗ σ1 2 ρ1 σ1 2 M1 + M2∗ σ2 2 ρ2 σ2 2 M2 (B.51)
Now, from Jensen’s operator inequality (B.41) it follows that
1
−1
1
−1
1
1 − −
f σ − 2 ρσ − 2 ⩽ M1∗ f σ1 2 ρ1 σ1 2 M1 + M2∗ f σ2 2 ρ2 σ2 2 M2 (B.52)
1 1 1 1 1
Conjugating both sides by σ 2 (·)σ 2 and recalling that M1 σ 2 = (tσ1 ) 2 and M2 σ 2 = (1 −
1
t)σ2 2 gives
ρ#f σ ⩽ tρ1 #f σ1 + (1 − t)ρ2 #f σ2 . (B.53)
That is, #f is jointly convex. For the direction 2 ⇒ 1 observe that for σ = I A we get
ρ#f σ = f (ρ) so that the convexity of f follows from the joint convexity of #f .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

B.6. LIEB’S CONCAVITY THEOREM 857

For the function f (t) = tα for some α ∈ R we use the notation

1 α
1
1 1
ρ#α σ := σ 2 σ − 2 ρσ − 2 σ 2 (B.54)

Observe that if ρ and σ commutes then

ρ#α σ = ρα σ 1−α (B.55)

The Kobu-Ando operator mean can also be applied to operators on the vector space of
super operators consisting of all linear transformations from L(A) to itself. Particularly, in
the proof of the theorem below, for any ρ ∈ L(A) we will consider the linear operators

Lρ (ω) := ρω and Rρ (ω) := ωρ ∀ ω ∈ L(A) . (B.56)

Observe that Lρ , Rρ : L(A) → L(A) are linear operators belonging to the Hilbert space
L(A → A).

Exercise B.5.1. Let ρ, σ ∈ L(A), α ∈ [0, ∞), and consider the left and right operators, Lρ
and Rσ as define above. Show that:

1. Commutativity; Lρ ◦ Rσ = Rσ ◦ Lρ .

2. If ρ, σ ∈ Herm(A) then Lρ and Rσ are self-adjoint with respect to the Hilbert-Schmidt

inner product.

3. If ρ ∈ Pos>0 (A) then Lρ and Rρ are invertible with inverses

L−1
ρ = Lρ−1 and R−1
ρ = Rρ−1 . (B.57)

4. If ρ ⩾ 0 then
Lαρ = Lρα and Rαρ = Rρα . (B.58)

5. If ρ ⩾ 0 and σ > 0 then

Rρ #α Lσ := Rρα Lσ1−α . (B.59)

B.6 Lieb’s Concavity Theorem

Lieb’s Concavity Theorem
Theorem B.6.1. Let K ∈ Cm×n and α ∈ (0, 1). Then, the function

f (ρ, σ) := Tr K ∗ ρα Kσ 1−α

(B.60)

is jointly concave.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

858 APPENDIX B. OPERATOR MONOTONICITY AND OPERATOR CONVEXITY

Proof. Observe that

f (ρ, σ) = ⟨K , ρα Kσ 1−α ⟩HS
= ⟨K , Lρα Rσ1−α (K)⟩HS (B.61)
(B.59) → = ⟨K , Rρ #α Lσ (K)⟩HS .
Therefore, f is jointly concave since from Theorem B.5.1 it follows that #α is jointly concave
for α ∈ (0, 1).
When combining Lieb’s theorem above with the Young’s inequality (2.75) we get the
following result.

Corollary B.6.1. Let η, ρ ∈ Pos(A) and α ∈ (0, 1). Then, the function
h 1i
ρ 7→ Tr ηρα η α (B.62)

is concave.

Proof. Set M := ηρα η, p := α1 and q := 1−α 1

. Finally, let σ ∈ Pos(A) and denote by
1−α
N := σ . Then, by definition
Tr ηρα ησ 1−α = Tr[M N ]

1 1
Young′ s inequality (2.75)→ ⩽ Tr[M p ] + Tr[N q ] (B.63)
p q
h 1 i
= αTr ηρα η α + (1 − α)Tr[σ] .
h
α
α1 i
Therefore, isolating the term Tr ηρ η gives
h 1 α 1−α 1 − α
α
α1 i
Tr Tr ηρ ησ
ηρ η −
⩾ Tr[σ] (B.64)
α α
Now, recall that the Young’s inequality achieves equality for N q = M p which is equivalent
1
to σ = ηρα η α . Combining this with the inequality above we conclude that
h 1i n1 1−α o
Tr ηρα η α = max Tr ηρα ησ 1−α − Tr[σ] . (B.65)
σ∈Pos(A) α α
Now, from Lieb’s theorem, the first term on the right-hand side is jointly concave in ρ and
σ, whereas the second term is linear in σ and in particular concave. Hence, this immediately
implies that the term on the left-hand side is concave in ρ (see Exercise B.6.1 below for more
details on this last assertion).
Exercise B.6.1. Let f : Pos(A) × Pos(A) → R be a jointly concave function. Show that the
function
g(ρ) := max f (ρ, σ) (B.66)
σ∈Pos(A)

is concave in ρ, assuming the maximum above is achievable (for all ρ).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

B.7. THE QUANTUM WEIGHTED GEOMETRIC MEAN INEQUALITY 859

B.7 The Quantum Weighted Geometric Mean Inequal-

ity
For any two numbers a and b, the weighted-geometric mean a1−s bs , with s ∈ [0, 1], can never
be smaller than the minimum between a and b. This inequality is sometimes used in proofs
related to hypothesis testing (particularly, in the proof of the classical Chernoff bound). Here
we discuss a generalization of this inequality when a and b are replaced with matrices M
and N . Note that a minimum between two positive operators do not exists. We therefore
express the minimum as
1
min{a, b} = (a + b − |a − b|) (B.67)
2
so that the right-hand side can be generalized to operators. With this identification in mind,
we can generalize the inequality min{a, b} ⩽ a1−s bs as follows.

Theorem B.7.1. For any two positive semidefinite matrices M, N ⩾ 0 (of the same
finite dimension) and any 0 ⩽ α ⩽ 1 the following inequality holds
1 h i
Tr M + N − M − N ⩽ Tr M 1−s N s .

(B.68)
2

Exercise B.7.1. Show that if (B.68) holds for all s ∈ [1/2, 1] then it must also hold for all
s ∈ [0, 1/2].
Proof. Since the term |M − N | can be expressed as |M − N | = 2(M − N )+ − (M − N ), the
inequality (B.68) is equivalent to
Tr(M − N )+ ⩾ Tr[M ] − Tr M 1−s N s .

(B.69)
The identity M − N = (M − N )+ − (M − N )− gives
M ⩽ M + (M − N )− = N + (M − N )+ . (B.70)
Combining the above inequality with the operator monotonicity of the function f (t) = ts for
s ∈ [0, 1] gives s
M s ⩽ N + (M − N )+ ∀ s ∈ [0, 1] . (B.71)
With this inequality at hand, we get
Tr[M ] − Tr M 1−s N s = Tr M 1−s (M s − N s )

h s i
(B.71)→ ⩽ Tr M 1−s N + (M − N )+ − N s
h 1−s s i
s
(B.71) with 1 − s→ ⩽ Tr N + (M − N )+ N + (M − N )+ − N (B.72)
h 1−s s i
= Tr[N ] + Tr(M − N )+ − Tr N + (M − N )+ N
see (B.73) below→ ⩽ Tr(M − N )+

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

860 APPENDIX B. OPERATOR MONOTONICITY AND OPERATOR CONVEXITY

where on the last inequality we used the fact that N + (M − N )+ ⩾ N so that

1−s
N + (M − N )+ ⩾ N 1−s . (B.73)

This completes the proof.

B.8 The Schur Complement

We conclude this section with a very useful tool that among other things determines the
positive definiteness of a block matrix in terms of its block matrices. We will also see that
it can be used to prove operator convexity of certain functions.
Consider an Hermitian matrix
 
ρ η
M =  , (B.74)
∗
η σ

where ρ ∈ Herm(A), σ ∈ Herm(B) and η ∈ C|A|×|B| . Then, the Schur complement of the
block σ of M is defined as the matrix

M/σ := ρ − ησ −1 η ∗ , (B.75)

where σ −1 is taken to be the generalized inverse if the inverse of σ does not exists 1 Similarly,
the Schur complement of the block ρ of M is defined as the matrix

M/ρ := σ − η ∗ ρ−1 η . (B.76)

The significance of this definition is given in the following theorem.

Theorem B.8.1. Let M be the Hermitian block matrix given in (B.74). Then,
M ⩾ 0 if and only if at least one of the following two conditions holds:

1. ρ ⩾ 0 and M/ρ ⩾ 0.

2. σ ⩾ 0 and M/σ ⩾ 0.

Proof. We will show the equivalence of the second condition with M ⩾ 0. The main idea of
the proof is to define the matrix
 
A
I 0
L :=   , (B.77)
σ −1 η ∗ I B
1
The generalize inverse of a complex matrix σ is the matrix σ −1 that satisfies σσ −1 σ = σ.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

B.8. THE SCHUR COMPLEMENT 861

and to observe that L is invertible with inverse

 
A
I 0
L−1 =   . (B.78)
−1 ∗ B
−σ η I

Then, by direct calculation, we have (see Exercise B.8.1)

 
−1 ∗
ρ − ησ η 0
M = L∗  L (B.79)
0 σ
 
ρ − ησ −1 η ∗ 0
and since L is invertible we must have M ⩾ 0 if and only if   ⩾ 0. This
0 σ
completes the proof.
Exercise B.8.1.
1. Verify that L−1 as given in (B.78) is indded the inverse of L.
2. Verify the equality in (B.79).
3. Show the equivalence of M ⩾ 0 with the first condition of the theorem above (in the
proof above we only showed the equivalence with the second condition).

Corollary B.8.1. Let f : Pos(A) × L(A) → Pos(A) be a function defined by

f (ρ, η) := η ∗ ρη ∀ρ ∈ Pos(A) , ∀η ∈ L(A) . (B.80)

Then, f is jointly convex.

Proof. Let ρ0 , ρ1 ∈ Pos(A) and η0 , η1 ∈ L(A), and define

   
ρ0 η0 ρ η1
M0 :=   and M1 :=  1  . (B.81)
η0∗ η0∗ ρ−1
0 η0 η1
∗
η ∗ −1
ρ
1 1 η 1

From Theorem B.8.1 above M1 , M2 ⩾ 0. Therefore, for any t ∈ [0, 1] we have

 
tρ0 + (1 − t)ρ1 tη0 + (1 − t)η1
0 ⩽ tM0 + (1 − t)M1 =   . (B.82)
∗ ∗ ∗ −1 ∗ −1
η0 + (1 − t)η1 tη0 ρ0 η0 + (1 − t)η1 ρ0 η1

Since the matrix above is positive semidefinite, its Schur complements is also positive semidef-
inite (i.e. we are using Theorem B.8.1 once again). Hence, in particular, the Schur comple-
ment
∗ −1
tη0∗ ρ−1 ∗ −1

0 η0 + (1 − t)η1 ρ0 η1 − tη0 + (1 − t)η1 tρ0 + (1 − t)ρ1 tη0 + (1 − t)η1 ⩾ 0 . (B.83)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

862 APPENDIX B. OPERATOR MONOTONICITY AND OPERATOR CONVEXITY

In terms of the function f , the above equation can be expressed as

tf (ρ0 , η0 ) + (1 − t)f (ρ1 , η1 ) ⩾ f t0 ρ0 + (1 − t)ρ1 , tη0 + (1 − t)η1 . (B.84)

Hence, f is jointly convex.

Exercise B.8.2.

1. Show that the function f : Pos(A) → Pos(A) given by f (ρ) := ρ−1 for all ρ ∈ Pos(A)
is convex.

2. Show that the function f : L(A) → Pos(A) given by f (η) := η ∗ η for all η ∈ L(A) is
convex.

Exercise B.8.3. Use Theorem B.5.1 to prove Corollary B.8.1.

B.9 Notes and References

A review on operator monotonicity and convexity is given in [25], and the more recent short
course by [43] can be helpful. The proofs of the assertions given in Table B.1 can be found
in these references. In the proof of Theorem B.5.1 we followed [70], and the extremely short
proof of Lieb’s concavity theorem (Theorem B.6) as presented here is due to [171].

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

APPENDIX C

Elements of Representation Theory

In this chapter we provide a relatively short review of group theory and representation
theory. We only review concepts from representation theory that are particularly useful for
applications in quantum information theory, and that we are using in this book. Therefore,
this section does not attempt to provide an extensive review of the exceedingly vast subject
of representation theory. Further, much of the material discussed here can be found in
standard textbooks on representation theory. Yet, a reader not familiar with groups and
their representations will find this section self-contained and sufficient for the understanding
of the material discussed in this book. Particularly, most of the material in this section is
used in the study of the resource theory of asymmetry (see Chapter 15).

C.1 Groups

Definition C.1.1. A group G is a set of objects equipped with internal composition

operation from G × G to G satisfying the following three axioms:

1. Existence of identity. There exists an element e ∈ G satisfying for all g ∈ G,

ge = eg = g.

2. Existence of an inverse. For every g ∈ G there exists an element g −1 ∈ G

satisfying g −1 g = gg −1 = e.

3. Associativity. For any g1 , g2 , g3 ∈ G, (g1 g2 )g3 = g1 (g2 g3 ).

A group G is called abelian, or commutative, if in addition to the above properties

for any g, h ∈ G, gh = hg.

As a very simple example of a group, consider the set of all integers in Z. This set
together with the ‘addition’ operation forms a group. That is, for any a, b ∈ Z we have
a + b ∈ Z and the + operation satisfies all the axioms in the definition above. In particular,

863
864 APPENDIX C. ELEMENTS OF REPRESENTATION THEORY

0 is considered as the identity element.

Another example of a group is the Cyclic group, denoted by Zn for some n ∈ N. This is
a finite group consisting of the n integers {0, 1, . . . , n − 1}. The group composition operation
is addition modulo n, and the identity element is 0. We will discuss additional examples
shortly.

Definition C.1.2. Let G1 and G2 be two groups. A group homomorphism is a map

f : G1 → G2 that preserves the group operation. That is, for any g, h ∈ G1 , the map
f satisfies
f (g1 g2 ) = f (g1 )f (g2 ) (C.1)
where the product on the left-hand side is in G1 , and the product on the right-hand
side is in G2 .

Note that a group homomorphism maps the identity element e1 ∈ G1 to the identity
element e2 ∈ G2 ; i.e. f (e1 ) = e2 . This in turn implies that a homomorphism satisfies
−1
f (g) = f (g −1 ) for all g ∈ G1 since

f (g)f (g −1 ) = f (gg −1 ) = f (e1 ) = e2 . (C.2)

A subset G′ ⊆ G is called a subgroup of G if G′ is itself a group. In particular, given a

homomorphism f : G1 → G2 between two groups, the image of G1

Im(f ) := {f (g) : g ∈ G1 } (C.3)

is a subgroup of G2 , and the group kernel

Ker(f ) := f −1 (e2 ) := {g ∈ G1 : f (g) = e2 } (C.4)

is a subgroup of G1 .

Exercise C.1.1. Prove that Im(f ) and Ker(f ) are indeed subgroups of G2 and G1 , respec-
tively.

In this book we will consider two types of groups, finite groups and Lie groups. Finite
groups are groups with a finite number of elements. For example, the set of all bijections from
a given finite set to itself form a group known as the permutation group (or symmetric group)
denoted by Sn . It is known (Cayley’s theorem) that every finite group G is isomorphic to a
subgroup of the symmetric group acting on the elements of G. Consequently, the symmetric
group plays an important role in various areas of theoretical physics.
Lie groups, on the other hand, are groups that are also smooth differentiable manifolds.
That is, a Lie group can be parametrized with a chart of local coordinates, and the smooth-
ness of the manifold means that for any g, h ∈ G the inversion map g 7→ g −1 and the
multiplication map (g, h) 7→ gh are smooth maps. Here are several examples of Lie groups
that are most popular in physics:

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

C.1. GROUPS 865

1. The rotation group in Rn , denoted by SO(n), is a collection of all n × n matrices that

corresponds to rotations in Rn . That is,
SO(n) := O ∈ Rn×n : OT O = In , det(O) = 1 .

(C.5)
The group composition operation is the multiplication between matrices, and the iden-
tity element of the group is the n × n identity matrix. Note that in the trivial case
that n = 1 the group SO(1) consist of a single element (i.e. the identity element),
the number 1. In the case that n = 2 the group can be parametrized with a single
parameter, specifically
  
 cos θ − sin θ 
SO(2) :=   : θ ∈ [0, 2π) (C.6)
 sin θ cos θ 

As a manifold, this group is isomorphic to the circle. Note also that the inversion of
a group element corresponds to θ 7→ 2π − θ which is clearly a smooth (differentiable)
map. Similarly, the composition of two group elements corresponds to the mapping
(θ1 , θ2 ) 7→ θ1 + θ2 mod 2π which is a differentiable map.
The case n = 3 corresponds to the group SO(3) which corresponds to rotations in R3 .
Each rotation in R3 can be described as a rotation by an angle θ ∈ [0, 2π) along some
axis of rotation. Let n ∈ R3 be the unit vector pointing in the direction of the axis of
rotation, and denote by w := cos(θ/2), and (x, y, z)T := sin(θ/2)n. Then, SO(3) is a
(n)
collection of all matrices Rθ that can be expressed as:
 
1 − 2y 2 − 2z 2 2xy − 2zw 2xz + 2yw
 
(n)
Rθ =  2xy + 2zw 1 − 2x2 − 2z 2 2yz − 2xw  (C.7)
 
 
2xz − 2yw 2yz + 2xw 1 − 2x2 − 2y 2
(n)
It can be shown that if v ∈ R3 then Rθ v is a vector obtained from v after rotating
it by an angle θ along the axis of the direction n. The group SO(3) can also be
parametrized with the three Euler’s angles, as opposed to the axis parametrization
above.
2. The unitary group of degree n, denoted U (n), is the group of all n×n unitary matrices.
Note that the determinant can be viewed as a group homomorphism det : U (n) → U (1)
since any unitary matrix has a determinant equals to eiθ for some θ ∈ [0, 2π). Observe
that the kernel of this group consists of all n×n unitary matrices with determinant one.
This subgroup of U (n) is denoted by SU (n) and is called the special unitary group. In
quantum mechanics, the case n = 2 corresponds to rotations of 21 -spin particles and
therefore plays an important role in physics. This group be expressed as
  
 a b 
2 2
SU (2) :=   : |a| + |b | = 1 , a, b ∈ C (C.8)
 −b̄ ā 

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

866 APPENDIX C. ELEMENTS OF REPRESENTATION THEORY

Denoting by a = s0 + is1 and b = s2 + is3 , and condition |a|2 + |b|2 = 1 becomes

s · s = s20 + s21 + s22 + s23 = 1. Therefore, the expression above reveals that the underlying
manifold of SU (2) is the three-sphere S 3 , i.e. the sphere or radius one in R4 . Such
a sphere can be characterized with hyperspherical coordinates α, β ∈ [0, π] and γ ∈
[0, 2π) satisfying

s0 = cos α
s1 = sin α cos β
s2 = sin α sin β cos γ
s3 = sin α sin β sin γ . (C.9)

Alternatively, in quantum physics it is more convinient to parameterize the group ele-

ments via a = r0 + ir3 and b = r2 + ir1 where r0 , r1 , r2 , r3 ∈ R. With this parametriza-
tion, any matrix U ∈ SU (2) of the form (C.8) can be expressed as

U = r0 I + i (r1 σ1 + r2 σ2 + r3 σ3 ) (C.10)

where σ1 , σ2 , and σ3 , are the three Pauli matrices defined in Exercise 2.3.19. Given
that r02 + r12 + r22 + r32 = 1, it is convenient to denote by cos(θ) := r0 and by n :=
√ 1 2 (r1 , r2 , r3 )T so that
1−r0

U = cos(θ)I + i sin(θ) (n1 σ1 + n2 σ2 + n3 σ3 )

(C.11)
= cos(θ)I + i sin(θ)n · σ .

where we introduced the notation n · σ to mean n1 σ1 + n2 σ2 + n3 σ3 . Therefore, any

element of SU(2) has the above form.

3. The general linear group, denoted GL(n, F) (in short GL(n) or GL(A), where A is
a Hilbert space of dimension |A| = n), is defined as the set of all n × n invertible
matrices. This set is a group under matrix multiplication. An important subgroup
of GL(n) that appears for example in multipartite entanglement theory, is the special
linear group SL(n). It consists of all n × n matrices with determinant one.

4. The symplectic group is defined for any n ∈ N as

Sp(2n, F) := M ∈ F2n×2n : M T J2n M = J2n

(C.12)
 
0 In
where J2n :=   and the field F can be either R or C. It is straightforward to
−In 0
check that the above set is indeed a group under matrix multiplication. The symplectic
group comes up in several contexts in physics including both classical and quantum
mechanics. For example, it represents symmetries of canonical coordinates that pre-
serve the Poisson bracket of classical mechanics. One of its remarkable properties is

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

C.1. GROUPS 867

that for n = 1 we have the equality Sp(2, F) = SL(2, F). 

This equality
 follows from
a b
a simple observation that any 2 × 2 complex matrix M =   with a, b, c, d ∈ F
c d
satisfies  
0 ad − bc
M T J2 M =   = det(M )J2 , (C.13)
bc − ad 0
 
0 1
where J2 :=  .
−1 0

5. The real line R under addition is a (non-compact) Lie group.

In the examples above, the groups SO(n), U (n), SU (n), are compact, whereas GL(n) or
the real line R for example are not compact. Compact Lie groups are the simplest examples
of continuous groups, and as such, plays an important role in numerous applications in
physics.

Exercise C.1.2. Show that

eiθ(n·σ) = cos(θ)I + i sin(θ)n · σ . (C.14)

From the exercise above it follows that all the elements of SU(2) can be expressed as
iθ(n·σ)
e . It will be convenient (see the next exercise) to parametrize the elements of SU(2)
with the matrices
(n) θ
Tθ := e−i 2 (n·σ) (C.15)
where θ ∈ [0, 4π) and n ∈ R3 is a unit vector. Note that we divided θ by −2 to obtain the
following relation between SU(2) and SO(3).

Exercise C.1.3. Consider the map f : SU(2) → SO(3) defined by

h θ i
−i 2 (n·σ) (n)
f e = Rθ . (C.16)

1. Show that f is a group homomorphism. That is, given two unit vectors n1 and n2 , and
two rotation angles θ1 and θ2 ,
h θ1 θ2
i h θ1 i h θ2 i
f e−i 2 (n1 ·σ) e−i 2 (n2 ·σ) = f e−i 2 (n1 ·σ) f e−i 2 (n2 ·σ) . (C.17)

2. Show that f is 2 : 1 (two-to-one) and onto. That is, every element in SO(3) corre-
sponds exactly to two elements in SU(2). Hint: Denote w := cos(θ/2) and (x, y, z)T =
(n)
sin(θ/2)n, and use the fact that Rθ can be expressed as in (C.7).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

868 APPENDIX C. ELEMENTS OF REPRESENTATION THEORY

C.2 Group Representations

Definition C.2.1. Let A be a Hilbert space and G a group. A homomorphism

π : G → L(A) mapping every element g ∈ G to a matrix π(g) ∈ L(A) is called a
representation of G.

Remark. The image of π in the definition above is a subset of L(A) consisting of |A| × |A|
invertible matrices. Therefore, one can replace L(A) in the definition above with the general
linear group GL(A). Note that since π is a homomorphism it follows that π(e) = I A . More-
over, note that for any group G and any Hilbert space A, there exists a group representation
π(g) := I A for all g ∈ G. This representation is called the trivial representation.

For a given representation π : G → L(A), a subspace B ⊆ A is said to be a G-invariant

subspace of A if for all |ψ⟩ ∈ B and all g ∈ G the vector π(g)|ψ⟩ is also in B. Observe that
if B is a G-invariant subspace of A then the restriction of π(g) to the subspace B, denoted
by π(g) B , can itself be viewed as a representation of G, taking elements in G to elements
in L(B). This representation is called a subrepresentation of π. Note that the mapping
from G to L(B) given by g 7→ π(g) B is a subrepresentation of π if and only if B is a G-
invariant subspace of A. Therefore, quite often we refer to a G-invariant subspace of A as
a subrepresentation of A, where we think of A (somewhat implicitly) as the representation
π : G → L(A).
According to this definition, every representation A has at least two subrepresentations.
These are the trivial representations in which B = {0} or B = A (in the latter, meaning
that π is a subrepresentation of itself). An irreducible representation, in short called irrep, is
a representation that has no proper (i.e. non-trivial) subrepresentation. Therefore, if B ⊆ A
is a G-invariant subspace w.r.t. an irrep then B = {0} or B = A.
For example, consider a representation of SO(2) in R4 defined by

 
cos θ − sin θ 0 0
 
 
 sin θ cos θ 0 0 
θ 7→ 


 (C.18)
 0 0 cos θ sin θ 
 
0 0 − sin θ cos θ

Clearly, the above representation has two proper subrepresentations of R4 . Each of these
two subrepresentations cannot be reduced further, so they are irreps. However, these two
irreps are equivalent.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

C.2. GROUP REPRESENTATIONS 869

Equivalent Representations
Definition C.2.2. Two representations or subrepresentations, π1 : G → L(A) and
π2 : G → L(B), are said to be equivalent if there exists an isomorphism η : A → B
(in particular, |A| = |B| and η is invertible) such that

π1 (g)η = ηπ2 (g) . (C.19)

If there is no such an intertwiner map η we say that the two representations are
inequivalent.

Note that in particular, since η is invertible, for each g ∈ G the matrix π1 (g) in (C.19)
is similar to the matrix π2 (g). In Fig. C.1 we drew a commutativity diagram describing the
equivalence of two representations that are related via (C.19). Note that the action of the
representation π1 on the Hilbert space A is mirrored by η to the Hilbert space B in which
it takes the form of π2 . Note also that the directions of all the arrows in the figure are
reversible.

Figure C.1: A commutativity diagram for two equivalent representations. Each arrow is reversible.

Schur’s Lemma
Theorem C.2.1. Let G be a group, and A1 and A2 be two Hilbert spaces. Also let
π1 : G → L(A1 ) and π2 : G → L(A2 ) be two irreducible representations of G, and
suppose there exists a complex matrix (linear transformation) T : A1 → A2 that is
equivalent under the action of G; that is, T π1 (g) = π2 (g)T for all g ∈ G. Then,

1. If π1 and π2 are inequivalent representations then T = 0.

2. If π1 = π2 := π (in particular A1 = A2 := A) then T = λI A for some λ ∈ C.

Proof. Part 1. The idea of the proof is to look at the kernel and image of T . Let |ψ⟩ ∈
Ker(T ). Then, from the commutativity property of T , for all g ∈ G
T π1 (g)|ψ⟩ = π2 (g)T |ψ⟩ = π2 (g)0 = 0 . (C.20)
That is, if |ψ⟩ ∈ Ker(T ) then also π1 (g)|ψ⟩ ∈ Ker(T ) for all g ∈ G. In other words, Ker(T )
is a G-invariant subspace of A1 . Now, recall that π1 is an irrep, and therefore since Ker(T )
is a G-invariant subspace of A1 we must have Ker(T ) = {0} or Ker(T ) = A1 .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

870 APPENDIX C. ELEMENTS OF REPRESENTATION THEORY

Next, let |ϕ⟩ ∈ Im(T ). Then, there exists |ψ⟩ ∈ A1 such that T |ψ⟩ = |ϕ⟩. Therefore,
using the commutativity property of T we get that for all g ∈ G

π2 (g)|ϕ⟩ = π2 (g)T |ψ⟩ = T π1 (g)|ψ⟩ . (C.21)

That is, if |ϕ⟩ ∈ Im(T ) then also π2 (g)|ϕ⟩ ∈ Im(T ) for all g ∈ G. Hence, Im(T ) is a
G-invariant subspace of A2 , and since π2 is an irrep we must have either Im(T ) = {0} or
Im(T ) = A2 .
Combining everything, we conclude that there are two options: (1) Ker(T ) = {0} and
Im(T ) = A2 , or (2) Ker(T ) = A1 and Im(T ) = {0}. From Exercise 2.3.4 (1) can only
hold if A1 = A2 and T is invertible. This option is not possible since we assume in Part
1 that π1 and π2 are inequivalent. Option (2) on the other hand implies that T = 0 (see
Exercise 2.3.4). This completes the first part of the proof.
Proof of Part 2. The proof is based on the fundamental theorem of algebra that states that
every non-constant single-variable polynomial with complex coefficients has at least one com-
there exists λ ∈ C that is a root for the characteristic polynomial ofAT;
plex root. Therefore,
A
i.e. det T − λI = 0. This means that there exists a non-zero vector |ψ⟩ ∈ Ker T − λI .
We then get for all g ∈ G

T − λI A π(g)|ψ⟩ = π(g) T − λI A |ψ⟩ = 0 ,

(C.22)

where we used the commutativity of T with π(g). The above

A A A
∈
equation implies that π(g)|ψ⟩
Ker T − λI for all g ∈ G and all |ψ⟩ ∈ Ker T − λI . Therefore, Ker T− λI is a
A
must have Ker T − λI = A (recall
G-invariant subspace of A, and since π is an irrep we
that we already ruled out the case Ker T − λI A = {0}). We therefore conclude that
T = λI A .

Exercise C.2.1. Show that all the irreps (over a complex field) of an abelian group G are
1-dimensional.

The following theorem demonstrates that all representations of a finite group can be
decomposed into a direct sum of irreps.

Theorem C.2.2. Let G be a finite group. Then, every representation π : G 7→ L(A)

can be decomposed into a direct sum of irreps of A.

Proof. If there are no proper (i.e. non-trivial) subrepresentation of A then π is itself an irrep
and the proof is done. Therefore, suppose A1 is a proper G-invariant subspace corresponding
to the proper subrepresentation π1 : G → L(A1 ) (i.e. π1 is subrepresentation of π). Let
P : A → A be the projection to the subspace A1 , and define the operator T : A → A as

1 X
T := π(g)P π(g −1 ) . (C.23)
|G| g∈G

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

C.3. UNITARY PROJECTIVE REPRESENTATIONS 871

Observe that the operator T is an intertwiner, that is, for any h ∈ G

1 X
T π(h) = π(g)P π(g −1 )π(h)
|G| g∈G
1 X
= π(g)P π(g −1 h)
|G| g∈G
1 X (C.24)
a := h−1 g −−−−→ = π (ha) P π(a−1 )
|G| a∈G
1 X
= π(h) π (a) P π(a−1 )
|G| a∈G
= π(h)T .

1 X
T |ψ⟩ = π(g −1 )P π(g)|ψ⟩
|G| g∈G
(C.25)
1 X 1 X
P π(g)|ψ⟩ = π(g)|ψ⟩ → −−−−→ = π(g −1 )π(g)|ψ⟩ = |ψ⟩ = |ψ⟩ .
|G| g∈G |G| g∈G

Second, observe that for any |ψ⟩ ∈ A (not necessarily in A1 ) we have T |ψ⟩ ∈ A1 . Combining
this with the above equation gives T 2 |ψ⟩ = T |ψ⟩ for all |ψ⟩ ∈ A. Hence, T 2 = T ; i.e.
T : A → A is a projection and an intertwiner. Therefore, both Im(T ) = A1 and A0 := Ker(T )
are G-invariant subspaces and we have A = A1 ⊕ A0 (as representations). Repeating the
process we can continue in this way to decompose A0 and A1 into direct sum of G-invariant
subspaces until we decompose A into a direct sum of irreducible G-invariant subspaces.

C.3 Unitary Projective Representations

A unitary representation of a group G, is a group representation π : G → L(A) in which all
the elements π(g) are unitary operators in L(A). In this case we denote the representation
π by U and the elements π(g) by Ug . The set of unitaries {Ug } is itself a group in L(A).
Therefore, if the mapping g 7→ Ug is one-to-one then the group {Ug } is isomorphic to G. For
example, the elements of the group SU (2) as defined in C.8 are themselves unitary matrices
and therefore the group SU (2) equals its unitary representation on L(C2 ).
In quantum mechanics, quantum states are represented with a density matrix ρ ∈ D(A).
Therefore, we will mostly be interested in the action of a group G on density matrices. Such
an action (i.e. representation of G) can still be unitary but with a slight modification given
the fact that we are not intrested in maps of the form |ψ⟩ → Ug |ψ⟩ but instead in maps of

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

872 APPENDIX C. ELEMENTS OF REPRESENTATION THEORY

the form |ψ⟩⟨ψ| → Ug |ψ⟩⟨ψ|Ug∗ or more generally

ρ 7→ Ug (ρ) := Ug ρUg∗ . (C.26)
The above mapping is the typical way a symmetry is represented in the space of quantum
states (we will discuss it in much more details in Chapter 15). We will therefore be mostly
interested in group representations of the form g 7→ Ug , where Ug is defined as above such
that for each g ∈ G the matrix Ug is unitary. Note that if the mapping g 7→ Ug is a unitary
representation of G then also g 7→ Ug is a group representation. However, the latter does
not implies the former since Ug is insensitive to phases, i.e. it is invariant under Ug 7→ eiθ Ug .
Therefore, in order to describe the representation of a symmetry in quantum mechanics, we
need to relax a bit the requirement that a representation is unitary, and instead require it
to be “only” a projective unitary representation.

Projective Unitary Representation

Definition C.3.1. Let G be a group and A a Hilbert space. A representation
U : G → L(A), defined by the mapping g 7→ Ug , is called a projective unitary
representation of G if for each g ∈ G the matrix Ug is unitary, and

Ug Uh = ω(g, h)Ugh ∀g, h ∈ G ,

(C.27)
Ue = I A

where ω(g, h) ∈ C with |ω(g, h)| = 1. The phase factor ω(g, h) is also called a cocycle.

Remark. The cocycle must satisfy

ω(g, e) = ω(e, g) = 1 ∀g∈G, (C.28)
since
Ug = Ug Ue = ω(g, e)Uge = ω(g, e)Ug . (C.29)
Similarly, the cocycles must satisfy the cocycle equation
ω(a, bc)ω(b, c) = ω(a, b)ω(ab, c) ∀ a, b, c ∈ G , (C.30)
since on one hand
Ua Ub Uc = ω(b, c)Ua Ubc = ω(b, c)ω(a, bc)Uabc (C.31)
and on the other hand
Ua Ub Uc = ω(a, b)Uab Uc = ω(a, b)ω(ab, c)Uabc . (C.32)
It is important to note that two projective unitary representations of the same group
can have different cocycles. If two projective unitary representations g 7→ Ug and g 7→ Vg
have the same cocycle, i.e. ωU (g, h) = ωV (g, h) for all g, h ∈ G, then we say that the two
representations are in the same factor system. Note also that two subrepresentations of a
given projective unitary representation U always have the same cocycle as U .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

C.3. UNITARY PROJECTIVE REPRESENTATIONS 873

Exercise C.3.1. Let g 7→ Ug be a projective unitary representation of a group G with a

cocycle ω(g, h) for all g, h ∈ G. Show that g 7→ Ūg is also a projective unitary representation
with cocycle ω(g, h). Why (in general) g 7→ Ug∗ is not a representation of G?

As an example, consider the group G = Zn × Zn consisting of the cartesian products of

two copies of the cyclic group. The elements of the group are pair of integers (p, q) with
both p, q ∈ {0, 1, . . . , n − 1} and the group operation, denoted by the addition symbol +, of
two pairs is given by

(p, q) + (p′ , q ′ ) := p + p′ (mod n) , q + q ′ (mod n) .

(C.33)

Define the phase and shift operators T, S : Cn → Cn by

2πx
T |x⟩ = ei n |x⟩ , S|x⟩ := x + 1 (mod n) ∀ x ∈ {0, 1, . . . , n − 1} (C.34)

Note that both S and T are unitary matrices. We define a projective unitary representation
W : G → L(Cn ) via
(p, q) 7→ Wp,q := S p T q ∀ (p, q) ∈ G . (C.35)
Since p and q are integers we have that Wp,q is a unitary matrix. Observe that
2πx 2πx
ST |x⟩ = ei n S|x⟩ = ei n |x + 1 (mod n) (C.36)

whereas
2π(x+1)
T S|x⟩ = T |x + 1 (mod n) = ei n |x + 1 (mod n) . (C.37)
Therefore, we conclude that
2π
ST = ei n T S . (C.38)
In the exercise below you show that {Wp,q } is a projective unitary representation of G. The
operators Wp,q are known as the Hiesenberg-Weyl operators.

Exercise C.3.2. Use the relation (C.38) to show that the mapping (p, q) 7→ Wp,q forms a
projective unitary representation of Zn × Zn . Find its cocycle.

C.3.1 Direct Sum Decompositions

For finite groups, we saw in Theorem C.2.2 that group representations can be decomposed
into a direct sum of irreps. We show now that this also holds for all projective unitary
representations.

Theorem C.3.1. Let G be a group, and let U : G → L(A) : g 7→ Ug be a projective

unitary representation of G. Then, U can be decomposed into a direct sum of irreps
of A.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

874 APPENDIX C. ELEMENTS OF REPRESENTATION THEORY

Proof. If there are no proper (i.e. non-trivial) subrepresentation of A then U is itself an

irrep and the proof is done. Therefore, suppose B is a proper G-invariant subspace of A.
Let B ⊥ be the orthogonal complement of B in A. Let |ϕ⟩ ∈ B ⊥ , and observe that since B
is G-invariant we have that for all |ψ⟩ ∈ B and all g ∈ G, Ug |ψ⟩ ∈ B. This means that for
any |ψ⟩ ∈ B we have
⟨ψ|Ug |ϕ⟩ = ⟨ϕ|Ug−1 |ψ⟩ = 0 . (C.39)
That is, Ug |ϕ⟩ ∈ B ⊥ for all g ∈ G, so that B ⊥ is G-invariant. Repeating the process we can
continue in this way to decompose B and B ⊥ into direct sum of proper G-invariant subspaces
until we decomposed A into a direct sum of irreducible G-invariant subspaces.
The theorem above states that a projective unitary representation g 7→ Ug can be de-
composed as
M k
Ug = Ug(j) ∀g∈G (C.40)
j=1
(j)
where each U (j) : g 7→ Ug is an irrep of U . Note that some of these irreps may be equivalent.
It will be convenient to group such equivalent irreps, and denote by Irr(U ) the set of all
equivalent classes of irreps that appear in the decomposition above. That is, any λ ∈ Irr(U )
represents a class of equivalent irreps. The number mλ of equivalent irreps in the same
λ-equivalence class is called multiplicity. With these notations, the above decomposition can
be expressed as M M
Ug = Ug(λ,x) , (C.41)
λ∈Irr(U ) x∈[mλ ]

(λ,x)
where for each x ∈ [mλ ] the map U (λ,x) : g 7→ Ug is an irrep belonging to the λ-equivalence
class.
For example, consider the unitary representation θ 7→ Uθ of SU (2) in R4 , where Uθ is the
(1) (2)
4 × 4 matrix given in (C.18). Clearly, we can express Uθ = Uθ ⊕ Uθ , where
   
(1) cos θ − sin θ cos θ sin θ
Uθ :=   and Uθ(2) :=   (C.42)
sin θ cos θ − sin θ cos θ

This is the direct sum decomposition into irreps of θ 7→ Uθ . Note that in this case we
have only one equivalence class, without loss of generality we can name it λ = 1, and this
(1) (2)
equivalence class contains two irreps given by Uθ and Uθ , so that the multiplicity of this
irrep is m1 = 2 (i.e. mλ=1 = 2).
As another example, consider the group U (1) and its representation θ 7→ Uθ , where
X
Uθ = eiθk |k⟩⟨k| . (C.43)
k∈[n]

Clearly, this representation already written as direct sum of irreps. Observe that for each k,
the map θ 7→ eiθk |k⟩⟨k| defines a 1-dimensional irrep of Uθ (recall that for abelian groups all

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

C.3. UNITARY PROJECTIVE REPRESENTATIONS 875

irreps are 1-dimensional; see Exercise C.2.1). In this case the equivalence class of irreps is
labeled by λ = k and the multiplicity mk = 1 for all k ∈ [n].
The following theorem slightly simplify the decomposition (C.41).

Theorem C.3.2. Let A be a Hilbert space and g 7→ Ug be a projective unitary

representation of a group G in L(A). The representation Ug induces the following
structure on A M M
A= Aλ := Bλ ⊗ Cλ , (C.44)
λ∈Irr(U ) λ∈Irr(U )

where Bλ is an irreducible G-invariant subspace of A, and Cλ = Cmλ . Moreover,

under this decomposition, each Ug takes the form

Ug ∼
M
= Ug(λ) ⊗ I Cλ , (C.45)
λ∈Irr(U )

(λ)
where Ug acts irreducibly on Bλ , and I Cλ is the identity matrix on Cλ .

Remark. The subspace Bλ is called the representation space, and the subspace Cλ is called
the multiplicity space. They are mathematical objects and we will think about them later
on as virtual subsystems. Moreover, the above decomposition of A means that there exists
an orthonormal basis {|λ, m, x⟩Aλ }λ,m,x whose elements are

|λ, m, x⟩Aλ := |λ, m⟩Bλ |x⟩Cλ (C.46)

where {|x⟩Cλ }x∈[mλ ] is an orthonormal basis of the multiplicity space Cλ , and {|λ, m⟩Bλ }dm=1
λ

is an orthonormal basis of representation space Bλ , where dλ := |Bλ |.

Proof. We first argue that without loss of generality the intertwiner map between two irreps
(λ,x) (λ,x′ )
Ug and Ug in the decomposition (C.41)can be taken to be unitary. Indeed, by definition
(λ,x) (λ,x′ ) (λ,x)
of equivalent representations, if T is the intertwiner between Ug and Ug then Ug T =
′
(λ,x ) (λ,x) ′
(λ,x )
T Ug . Since both Ug and Ug are unitary matrices we must have

T ∗ T = T ∗ Ug∗(λ,x) Ug(λ,x) T
∗ ′ ′ (C.47)
= Ug (λ,x ) T ∗ T Ug(λ,x ) ∀g∈G.

(λ,x′ )
Therefore, since Ug is an irrep of G, from the second part of Schur’s Lemma (see The-
orem C.2.1) it follows that T ∗ T = λI for some λ ∈ C. Since T ∗ T > 0 (recall that T is
invertible) we can redefine T 7→ √1λ T so that the new T is unitary.
(λ)
Now, denote by Ug := Ug(λ,1) and by Tx(λ) the unitary intertwiner satisfying

Ug(λ,x) = Tx(λ) Ugλ Tx∗(λ) . (C.48)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

876 APPENDIX C. ELEMENTS OF REPRESENTATION THEORY

Taking the direct sum over x ∈ [mλ ] on both sides of the equation above gives
M
Ug(λ,x) = T λ Ugλ ⊗ Imλ T ∗λ

(C.49)
x∈[mλ ]

(λ)
where Imλ is the mλ ×mλ identity matrix, T λ := ⊕x∈[mλ ] Tx is a unitary matrix, and Ugλ ⊗Imλ
is viewed as
Ugλ ⊗ Imλ = Ugλ ⊕ · · · ⊕ Ugλ . (C.50)
| {z }
mλ -times

Finally, taking the direct sum over all λ ∈ Irr(U ) on both sides of (C.49), we get that the
unitary matrix T := ⊕λ T λ satisfies
M M M
Ug(λ,x) = T Ugλ ⊗ Imλ T ∗ . (C.51)
λ∈Irr(U ) x∈[mλ ] λ∈Irr(U )

The proof is concluded with the identification of the subspaces Bλ and Cλ as the subspaces
on which Ugλ and Imλ act upon (with Imλ = I Cλ ).

C.3.2 G-Invariant Operators

Definition C.3.2. Let g 7→ Ug be a projective unitary representation of a group G

in L(A). An operator ρ ∈ L(A) is called G-invariant with respect to this
representation if [ρ, Ug ] = 0 for all g ∈ G.

Invariant states, often called symmetric states, plays an important role in physics, par-
ticularly in the resource theory of asymmetry. The following theorem provide a simple
characterization of such states with respect to the decomposition (C.44) of the underlying
Hilbert space.

Characterization of G-Invariant Operators

Theorem C.3.3. Let ρ ∈ L(A) and U : G → L(A) be a projective unitary
representation. Then, ρ is G-invariant if and only if ρ can be decomposed as
M
ρA = uBλ ⊗ ρC λ
λ
(C.52)
λ∈Irr(U )

1
where uBλ = |Bλ |
I Bλ is the maximally mixed (uniform) state on system Bλ , and

ρC
A A A
λ := TrBλ Π ρ Π , (C.53)
λ λ λ

where ΠAλ is the projection to the subspace Aλ as defined in (C.44).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

C.3. UNITARY PROJECTIVE REPRESENTATIONS 877

Proof. We are working in a basis in which Ug has the form (C.45). Therefore, if ρA has
the form (C.52) then it clearly commutes with Ug for all g ∈ G so that ρA is G-invariant.
Conversely, suppose ρA is G-invariant, and denote
X
ρA = ρλλ′ where ρλλ′ := ΠAλ′ ρA ΠAλ . (C.54)
λ,λ′ ∈Irr(U )

Note that ρλλ′ is a linear map from Aλ → Aλ′ . Since ρ commutes with Ug for all g ∈ G it
follows immediately from the form (C.45) of Ug that
X
0 = [ρA , Ug ] =

ρλλ′ , Ug
λ,λ′
X ′
(C.55)
Ug(λ) Cλ
Ug(λ ) Cλ′

= ρλλ′ ⊗I − ⊗I ρλλ′
λ,λ′

Multiplying both sides of the equation above by ΠAλ′ from the right, and ΠAλ from the left,
we get for all λ and λ′
(λ) Cλ
(λ′ ) Cλ′

ρλλ′ Ug ⊗ I = Ug ⊗ I ρλλ′ . (C.56)

Now, by multiplying from the left both sides with I Bλ ⊗ T Cλ′ →Cλ , for some mλ × mλ′ matrix
T ∈ L(Cλ′ , Cλ ) and taking the partial trace over Cλ gives
′
ωλλ′ Ug(λ) = Ug(λ ) ωλλ′ where ωλλ′ := TrCλ I Bλ ⊗ T ρλλ′ .

(C.57)

Finally, from the first part of Schur’s lemma it follows that unless λ = λ′ we get ωλλ′ =
0. Since this holds for all T ∈ L(Cλ′ , Cλ ) we conclude from Exercise 2.3.31 that also
ρλλ′ = 0 for λ ̸= λ′ . Moreover,
from the second part of Schur’s lemma we get that
Bλ
ωλλ = TrCλ I ⊗ T ρλλ′ is proportional to the identity matrix for all T ∈ L(Cλ ). Hence,
from Exercise 2.3.30 we conclude that ρλλ = uBλ ⊗ ρC λ . This completes the proof.
λ

The theorem above apply to any operator ρ ∈ L(A). In this book we will only consider
G-invariant quantum states; i.e. G-invariant operators in D(A). For the case that ρ is a
pure quantum state we have the following corollary.

Corollary C.3.1. Let g 7→ Ug ∈ L(A) be a projective unitary representation of G

and let ψ ∈ Pure(A) be a pure state. The following are equivalent:

1. ψ is G-invariant; i.e. Ug ψUg∗ = ψ for all g ∈ G.

2. There exists an irrep λ with |Bλ | = 1 such that ψ ∈ Pure(Cλ ).

Proof. Taking ρ = ψ in (C.52), it follows that the right-hand side of (C.52) is a rank one
matrix only if and only if the direct sum consists of a single non-zero term, denoted by λ,
for which |Bλ | = 1. This completes the proof.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

878 APPENDIX C. ELEMENTS OF REPRESENTATION THEORY

C.4 Invariant Measures Over a Lie Group

In the proof of Theorem C.2.2, particularly Eq. (C.23) we used a type of group average
over all elements of the group. This type of averaging turns out to be extremely useful in
applications, and in particular, its extension to continuous groups can be used to prove a
variant of Theorem C.2.2 that is applicable to compact Lie groups.
A measure on a Lie group, denoted here by µ, is a map that assign a volume or size to any
subset of G. Typically, subsets of G have volume since G is a manifold. More rigorously,
let ΣG denotes all subsets of G including G itself (in mathematics ΣG is called σ-algebra).
We then define a measure on G as follows.

Definition C.4.1. A function µ : ΣG → R is called a measure on G if it satisfies the

following three conditions:

1. µ non-negative, i.e. µ(S) ⩾ 0 for all S ∈ ΣG

2. On the empty set µ(∅) = 0.

3. µ is countable additive, meaning that for any countable collections {Sx }∞

x=1 of
pairwise disjoint sets of G
∞
! ∞
[ X
µ Sx = µ(Sx ) . (C.58)
x=1 x=1

The definition above is consistent with what one would expect from a function that
quantify the volume or size of a region on a manifold. However, since we are interested here
in measures on Lie groups, we would like the measure also to be invariant under the action
of the group.
By definition of Lie groups, for a fixed group element h ∈ G, the map g 7→ hg is an
isomorphism between smooth manifolds (also known dif and only ifeomorphism). Such a
map transform any region S ⊆ G to the region hS := {hg : g ∈ G}. We then say that µ
is left-invariant if µ(hS) = µ(S) for all S ⊆ G and all h ∈ G. Similarly, we say that µ is
right-invariant if µ(Sh) = µ(S) for all S ⊆ G and all h ∈ G.
All groups have a left-invariant and right-invariant measures. This result is known as
Haar’s Theorem (the proof of Haar’s theorem goes beyond the scope of this book). For
compact groups these Haar measures are finite and unique up to a multiplicative constant.
If the two invariant-measures of a Lie group equals each other up to a multiplicative constant
then the group is said to be unimodular. All compact Lie groups are unimodular, and also
many non-compact groups that appear in applications in physics are unimodular. In this
book we will only consider unimodular Lie groups. Moreover, when the group is compact,
so that µ(G) < ∞, we will always implicitly assume that the Haar measure is normalized;
i.e. µ(G) = 1.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

C.4. INVARIANT MEASURES OVER A LIE GROUP 879

Exercise C.4.1. Let µL be a left-invariant measure on a group G. Prove the existence

of right-invariant measure. Hint: Define µR (S) := µL (S −1 ) for any subset S ⊆ G, where
S −1 := {g −1 : g ∈ S}.

Examples:
1. Consider the group U (1) := {eiθ : θ ∈ [0, 2π)}. This group is clearly homomorphic to
the group with elements in [0, 2π) under group operation of addition modulo 2π. For
any set S ⊆ [0, 2π) the Haar measure of U (1) is given by
Z
1
µ(S) = dθ , (C.59)
2π S

or equivalently, dµ(g) = 1
2π
dθ. Since U (1) ∼
= SO(2) this is also the Haar measure of
SO(2)

2. The Haar measure of SU (2). Recall from (C.9) that the group elements of SU (2) can
be characterize in terms of the Hyperspherical coordinates (α, β, γ). It turns out that
the Haar measure of a region R ⊆ SU (2) is given by
Z
µ(R) = sin(2α)dαdβdγ (C.60)
R

The Haar measure can be used to define various averages over a group. For example,
consider a function f : G → C. One can define the average of the function f over the
compact group G as Z
dg f (g) , (C.61)
G

where we use the short notation dg for the Haar measure dµ(g). Given a projective unitary
representation g 7→ Ug one can also define averages over elements of L(A) as
Z
G(ρ) := dg Ug ρUg∗ ∀ ρ ∈ L(A) . (C.62)
G

The map G : L(A) → L(A) is linear and is known as the G-twirling map.
Remark. If the group G is finite we can always replace that averages above with summations.
1
R P
In particular, for finite group the integral G dg can be simply replaced with a sum |G| g∈G ,
and under this replacement, all the theorems and statements below that apply for compact
Lie group, also apply for finite groups.

Exercise C.4.2. Consider the G-twirling map G.

1. Use the invariance property of the Haar measure to show that for any ρ ∈ L(A)

[G(ρ), Ug ] = 0 ∀g∈G. (C.63)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

880 APPENDIX C. ELEMENTS OF REPRESENTATION THEORY

2. Show that G ◦ G = G. That is, for all ρ ∈ L(A) we have G G(ρ) = G(ρ).
Exercise C.4.3. Let G be a compact group and let π : G → L(A) be a group representation
(not necessarily unitary). Define a map (·, ·)G : A × A → C as
Z
(ϕ, ψ)G := dg ⟨π(g)ϕ|π(g)ψ⟩ ∀ ψ, ϕ ∈ A , (C.64)
G

where dg is the (unique) right-invariant Haar measure.

1. Show that (·, ·)G defines an inner product.
2. Show that
(π(h)ϕ, π(h)ψ)G = (ϕ, ψ)G ∀h∈G; (C.65)
i.e. all the matrices π(h) are unitaries with respect to the inner product (·, ·)G .
3. Use the previous parts of this question in conjunction with Theorem C.3.1 to show that
π can be decomposed into a direct sum of irreps.

Theorem C.4.1. Let ρ ∈ L(A) and U : G → L(A) be a projective unitary

representation. An operator ρ ∈ L(A) is G-invariant if and only if G(ρ) = ρ.

Proof. Suppose G(ρ) = ρ. Then, for all h ∈ G

Z Z
∗
Uh ρ = Uh G(ρ) = dg Uh Ug ρUg = dg Uhg ρUg∗
ZG G
Z
′ ′ ∗ (C.66)
g := hg→ = dg Ug′ ρUh−1 g′ = dg ′ Ug′ ρUg∗′ Uh∗−1
G G
= G(ρ)Uh = ρUh .
Conversely, if ρ is G-invariant then
Z Z
G(ρ) = dg Ug ρUg∗ = dgρ = ρ . (C.67)
G G

In the next theorem we show that the average of {Ug } over the group (w.r.t. the Haar
measure) is an orthogonal projection.

Theorem C.4.2. Let G be a finite or compact Lie group and let g 7→ Ug be a

unitary representation acting on A. Define
n o Z
G
A := |ψ⟩ ∈ A : Ug |ψ⟩ = |ψ⟩ ∀ g ∈ G and Π := dg Ug . (C.68)
G

Then, Π is an orthogonal projector onto AG .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

C.5. ORTHOGONALITY BETWEEN IRREPS 881

Proof. For any g ∈ G we have

Z Z Z
Ug Π = dh Ug Uh = dh Ugh = dh Uh (C.69)
G G G

where in the last equality we used the fact that the Haar measure dh is invariant under the
group action. Hence Ug Π = Π for all g ∈ G and we get that
Z Z
∗
Π Π= dg Ug−1 Π = dg Π = Π . (C.70)
G G

This completes the proof.

C.5 Orthogonality Between Irreps

A remarkable result known as the Wyle theorem, follows directly from Theorems C.3.3
and C.4.1. It states that the components of the matrices {Ug } satisfy an orthogonality
relation. Note that from (C.45) we know that in the basis of (C.46)

⟨λ, m, x|Ug |λ′ , m′ , x′ ⟩ = δλλ′ δxx′ ⟨λ, m, x|Ug |λ, m′ , x⟩ . (C.72)

The following theorem states additional orthogonality condition satisfied by the matrix ele-
ments
uλmm′ (g) := ⟨λ, m, x|Ug |λ, m′ , x⟩ = ⟨λ, m|Ug(λ) |λ, m′ ⟩ . (C.73)
In the equation above, the set {|λ, m, x⟩}m,x form the basis of Aλ , whereas {|λ, m⟩}m forms
a basis of Bλ . In particular, on the left-hand side of the equation above there is no index x,
since from (C.45) the components uλmm′ (g) do not depend on x.

Theorem C.5.1. Let g 7→ Ug be a projective unitary representation of a group G in

L(A), and let uλmm′ (g) be the components of Ug as defined in (C.73). Then,
Z
′ δλλ′ δmk δm′ k′
dg uλmm′ (g)ūλkk′ (g) = . (C.74)
G |Bλ |

Proof. Take
ρ = |λ, m′ , x⟩⟨λ′ , k ′ , x| = |λ, m′ ⟩⟨λ′ , k ′ | ⊗ |x⟩⟨x| (C.75)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

882 APPENDIX C. ELEMENTS OF REPRESENTATION THEORY

and observe that

Z
′
⟨λ, m, x|G(ρ)|λ , k, x⟩ = dg ⟨λ, m, x|Ug |λ, m′ , x⟩⟨λ′ , k ′ , x|Ug∗ |λ′ , k, x⟩
ZG (C.76)
′
= dg uλmm′ (g)ūλkk′ (g) .
G

Now, denote by σ := G(ρ) and for any irrep µ denote by Bµ the representation space, and
by Cµ the multiplicity space. Then,
Z
Cµ
dg TrBµ ΠAµ Ug (|λ, m′ , x⟩⟨λ′ , k ′ , x|)Ug∗
Aµ
σµ := TrBµ Π σ =
ZG h i
′
(C.45)→ = dg TrBµ ΠAµ Ug(λ) |λ, m′ ⟩⟨λ′ , k ′ |Ug∗(λ ) ⊗ |x⟩⟨x|Cµ
G Z (C.77)
h ′
i
= δµλ δµλ′ dg Tr Ug(λ) |µ, m′ ⟩⟨µ, k ′ |Ug∗(λ ) |x⟩⟨x|Cµ
G
= δµλ δµλ′ δm′ k′ |x⟩⟨x|Cµ .

Since σ = G(ρ) is G-invariant (see Theorem C.4.1) we get form (C.52) (when applied to σ)
M
G(ρ) = uBµ ⊗ σµCµ
µ∈Irr(U ) (C.78)
Bλ Cλ
(C.77)→ = δλλ′ δm′ k′ u ⊗ |x⟩⟨x|

so that
δλλ′ δmk δm′ k′
⟨λ, m, x|G(ρ)|λ′ , k, x⟩ = . (C.79)
|Bλ |
Comparing this with (C.76) concludes the proof.
Note that the orthogonality relations in the theorem above can be used to obtain other
types of relations. For example, the relations (C.74) implies that (see Exercise C.5.1)
Z
′ δλλ′
dg ūλmm′ (g)Ug(λ ) = |λ, m⟩⟨λ, m′ |Bλ (C.80)
G |B λ |
(λ)
Moreover, this relation can be extended to Ug = λ∈Irr(U ) (Ug ⊗ I Cλ ) (see (C.45)) via
L

δλ,Irr(U )
Z
dg ūλmm′ (g)Ug = |λ, m⟩⟨λ, m′ |Bλ ⊗ I Cλ . (C.81)
G |B λ |
(
1 if λ ∈ Irr(U )
where δλ,Irr(U ) := . Taking m′ = m and summing over m results in the
0 otherwise
relation
δλ,Irr(U ) Bλ
Z
dg χ̄λ (g)Ug = I ⊗ I Cλ , (C.82)
G |Bλ |

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

C.6. THE REGULAR REPRESENTATION 883
h i
(λ)
where χλ (g) := Tr Ug is called the λ-character. The orthogonality relations discussed in
this section have several other interesting consequences. One of them, which we will discuss
later on in this chapter, is a one-to-one correspondence between a reduction of a matrix
ρ ∈ L(A) onto its λ-irrep and its characteristic function.

Exercise C.5.1. Prove the relation (C.80) and the equality ūλmm′ (g) = uλmm′ (g −1 ). Hint:
(λ) P λ ′
For the former, express Ug = k,k′ ukk′ (g)|λ, k⟩⟨λ, k | and use the orthogonality rela-
tions (C.74).

Exercise C.5.2 (Orthogonality of Characters). Use the orthogonality relations above to

show that the characters are orthogonal; i.e. show that for any λ, µ ∈ Irr(U )
Z
dg χ̄λ (g)χµ (g) = δλµ (C.83)
G

Use the above orthogonality relation to show that the character χ(g) := Tr[Ug ] satisfies
(
mλ if λ ∈ Irr(U )
Z
dg χ̄λ (g)χ(g) = . (C.84)
G 0 otherwise

C.6 The Regular Representation

C.6.1 Finite Groups

Definition C.6.1. Let G be a finite group, and let {ω(g, h)}g,h∈G be a cocycle of G
satisfying (C.28) and (C.30). The regular representation g 7→ Ugreg is a unitary
projective representation of G on the Hilbert space C|G| = span{|g⟩ : g ∈ G}
defined by the relation

Ugreg |h⟩ := ω(g, h)|gh⟩ ∀ g, h ∈ G . (C.85)

Note that for any fixed g ∈ G, Ugreg maps the basis {|h⟩}h∈G to itself (up to phases)
and therefore Ugreg must be a unitary matrix. Furthermore, for any g1 , g2 , h ∈ G we have by
definition
Ugreg
1
Ugreg
2
|h⟩ = ω(g2 , h)Ugreg
1
|g2 h⟩
= ω(g2 , h)ω(g1 , g2 h)|g1 g2 h⟩
(C.86)
(C.30)→ = ω(g1 , g2 )ω(g1 g2 , h)|g1 g2 h⟩
= ω(g1 , g2 )Ugreg
1 g2
|h⟩ ,
and since the equation above holds for all h we get that

Ugreg
1
Ugreg
2
= ω(g1 , g2 )Ugreg
1 g2
. (C.87)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

884 APPENDIX C. ELEMENTS OF REPRESENTATION THEORY

That is, g 7→ Ugreg is indeed a unitary projective representation of G with cocycle {ω(g, h)}g,h∈G .
Moreover, note that Ugreg can be expressed as
X
Ugreg = ω(g, h)|gh⟩⟨h| , (C.88)
h∈G

so that its character is given by

χreg (g) := Tr[Ugreg ] = |G|δg,e . (C.89)

The regular representation of U reg depends only on the group G and the cocycle ω. There-
fore, we will denote the set of equivalence classes of irreps of U reg by Irr(G, ω). From (C.84)
it follows that the dimension of the multiplicity space of any irrep λ ∈ Irr(G, ω) is given by
1 X
mλ = χ̄λ (g)χreg (g)
|G| g∈G
(C.90)
(C.89)→ = χ̄λ (e)
= Tr I Bλ = |Bλ | .

That is, the multiplicity space has the same dimension as the representation space. This
equality has the following remarkable application. Recall that according to (C.44), the
Hilbert space C|G| can be decomposed with respect to the irreps of U reg such that
M
C|G| = Bλ ⊗ Cλ . (C.91)
λ∈Irr(G,ω)

Since dλ := |Bλ | = mλ we conclude that

X
|G| = d2λ . (C.92)
λ∈Irr(G,ω)

n oλ∈Irr(G,ω)
The above relation implies that the vectors {vg }g∈G defined by vg := √1 uλ ′ (g)
dλ kk k,k′ ∈[dλ ]
|G|
belong to C (since they have exactly |G| components). Moreover, using this in conjunction
with the orthogonality relations (C.74) we conclude that {vg }g∈G is an orthonormal basis of
C|G| .

C.6.2 Compact Lie Groups

In order to define the regular representation on compact Lie groups it is necessary to intro-
duce first the Hilbert space L2 (G) of square integrable functions over a compact Lie group
G. The space L2 (G) consists of all integrable functions f : G → C such that
Z
dg |f (g)|2 < ∞ . (C.93)
G

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

C.6. THE REGULAR REPRESENTATION 885

It forms a Hilbert space under the inner product

Z
⟨f1 |f2 ⟩ := dg f1 (g)f2 (g) ∀ f1 , f2 ∈ L2 (G) . (C.94)
G

As the notation for the inner product above suggests, we will use the Dirac notation to
denotes the elements of L2 (G). This will make the analogy with the case of finite groups much
more apparent. Hence, the vector |f ⟩ for example corresponds to the function f (g) ∈ L2 (G).
We also denote by δ(g) the Dirac-delta on the group, defined by the relation
Z
⟨δ|f ⟩ = dg δ(g)f (g) = f (e) ∀ f ∈ L2 (G) . (C.95)
G

We will therefore denote |e⟩ := |δ⟩, so that f (e) = ⟨e|f ⟩. We point out that while δ(g) ̸∈
L2 (G) there is a way to make the concepts we discuss below mathematically rigorous via
the introduction of a rigged Hilbert space. However, this topic goes beyond the scope of this
book, and since we only use the Dirac delta function in this subsection we will not elaborate
on it here. For more information on this subject, we refer the reader to the section “Notes
and References” at the end of this chapter.
Continuing, for any h ∈ G we denote by |h⟩ the function δ(h−1 g) so that
Z
⟨h|f ⟩ = dg δ(h−1 g)f (g) = f (h) ∀ f ∈ L2 (G) . (C.96)
G

Observe also that

Z
⟨h|g⟩ = dg ′ δ(h−1 g ′ )δ(g −1 g ′ ) = δ(h−1 g) = δ(g −1 h) . (C.97)
G

With these notations, given a cocycle ω, we define the regular representation g 7→ Ugreg of a
compact Lie group (in analogy with its definition on finite groups) as

Ugreg |h⟩ := ω(h, g)|gh⟩ . (C.98)

The above definition implies that Ugreg can be expressed as

Z
Ugreg := dh ω(g, h)|gh⟩⟨h| , (C.99)
G

and therefore we get for example that

Z
reg
χ (g) := Tr Ugreg =

dh ω(g, h)δ(g) = δ(g) (C.100)
G

To illustrate the above definitions, consider the group U (1) and for simplicity consider
the trivial cocycle ω(θ, θ′ ) = 1 for all θ, θ′ ∈ U (1) ∼

= [0, 2π). The Hilbert space L2 U (1) is

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

886 APPENDIX C. ELEMENTS OF REPRESENTATION THEORY

the set of all square integrable functions on [0, 2π). Every function f ∈ L2 U (1) can be
expressed as f (θ) = ⟨θ|f ⟩, and in particular, the functions

fn (θ) := ⟨θ|n⟩ := einθ n∈Z, (C.101)

forms an orthonormal basis of L2 U (1) (due to the Fourier expansion). Indeed,
Z 2π
′ 1 ′
⟨n|n ⟩ := ⟨fn |fn′ ⟩ = dθ e−inθ ein θ = δnn′ . (C.102)
2π 0

1 dθ
Note that the inner product has the factor of 2π since the Haar measure in this case is 2π .
Hence, the functions in (C.101) are normalized with respect to this inner product. In this
example, the regular representation (C.99) takes the form
Z 2π
1
Uθreg = dθ′ |θ + θ′ ⟩⟨θ′ | , (C.103)
2π 0

where the summation θ + θ′ is mod 2π. The matrix components of Uθreg in the fn -basis {|n⟩}
is given by
Z 2π
reg ′ 1
⟨n|Uθ |n ⟩ = dθ′ ⟨n|θ + θ′ ⟩⟨θ′ |n′ ⟩
2π 0
Z 2π
1 ′ ′ ′ (C.104)
= dθ′ ein(θ+θ ) e−in θ
2π 0
= eiθn δnn′ .
Hence, with respect to the basis {|n⟩} we can express the regular representation as
X
Uθreg = einθ |n⟩⟨n| . (C.105)
n∈Z

That is, the regular representation is a direct sum of all the irreps of U (1) (cf. (C.43)).

C.6.3 Fourier Expansion in L2 (G)

In Theorem C.3.2 we proved that any projective unitary representation on a finite dimen-
sional space can be decomposed into a direct some of irreps. For compact groups this theorem
can be extended also to projective unitary representations on infinite dimensional space (see
subsection ‘Notes and References’). Therefore, the regular representation Ugreg can be de-
composed into a direct sum of finite dimensional irreps. Denoting as before by Irr(G, ω) the
set of all equivalence classes of irreps, we can decompose Ugreg as

Ugreg ∼
M
= Ug(λ) ⊗ I Cλ , (C.106)
λ∈Irr(G,ω)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

C.6. THE REGULAR REPRESENTATION 887

where for each λ the dimension dλ := |Bλ | < ∞. From (C.84) it follows that for any
λ ∈ Irr(G, ω) the dimension of the multiplicity space Cλ is given by
Z
mλ = dg χ̄reg reg
λ (g)χ (g)
ZG
(C.100)→ = dg χ̄reg (C.107)
λ (g)δ(g)
G
= χ̄reg
λ (e) = Tr[I
Bλ
] = dλ .
This remarkable result also implies that the Hilbert space L2 (G) can be decomposed as

L2 (G) ∼ Bλ ⊗ Cλ with Bλ ∼ = Cλ ∼
M
= = Cdλ . (C.108)
λ∈Irr(G,ω)

Now, define for any λ ∈ Irr(G, ω) and any k, k ′ ∈ [dλ ] the functions
λ 1 λ
fkk ′ (g) := √ u ′ (g) , (C.109)
dλ kk
(λ)
where uλkk′ (g) are the matrix elements of Ug as appear in (C.106). From the orthogonality
λ 2
relations (C.74) we have that {fkk ′ (g)} is an orthonormal set of functions in L (G), and
λ 2
from (C.108) we conclude that {fkk ′ (g)} is an orthonormal basis of L (G). We therefore

arrive at the following theorem.

Fourier Expansion
Theorem C.6.1. Let G be a compact Lie group and let ω be a cocycle. Any
function f (g) ∈ L2 (G) can be expanded as

X dλ
X
f (g) = cλkk′ ūλkk′ (g) , (C.110)
λ∈Irr(G,ω) k,k′ =1

where the coefficients cλkk′ ∈ C can be expressed as

Z
λ 1
ckk′ = dg uλkk′ (g)f (g) . (C.111)
dλ G

Remark. The relation above is the generalization of Fourier series. To see this, consider the
group U (1) whose elements are parametrized by θ ∈ [0, 2π). In this case we denote the
irreps by integers λ = n, and we know that they are all one dimensional. Therefore, uλkk′ (g)
becomes uλ (g) (since dλ = 1 so that k = k ′ = 1) and recall that λ = n. In other words,
uλkk′ (g) can be replaced with fn (θ) := eiθn (see (C.105)), and cλkk′ are replaced with cn . Hence,
for G = U (1) the two equations in the theorem above simplify to
Z 2π
X
inθ 1
f (θ) = cn e and cn = dθeinθ f (θ) (C.112)
n∈Z
2π 0

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

888 APPENDIX C. ELEMENTS OF REPRESENTATION THEORY

dθ
where we replaced g by θ and the Haar measure dg by 2π . This is precisely the Fourier
expansion of periodic function (with 2π period). The theorem above demonstrate that the
Fourier expansion is not a special feature of the group U (1) but it exists for any compact
Lie group.

Exercise C.6.1. Prove Theorem C.6.1 in full details. Hint: Use the arguments above it.

Class Functions

Definition C.6.2. A function f ∈ L2 (G) is called a class function if for all h, g ∈ G

we have
f (hgh−1 ) = f (g) . (C.113)

In other words, a function f : Gh → iC is a class function if it is constant on conjugacy

(λ)
classes. The character χλ (g) := Tr Ug discussed above is an example of a class function
since h i h i
(λ) (λ) (λ)
χλ (hgh−1 ) = Tr Uhgh−1 = Tr Uh Ug(λ) Uh−1
(C.114)
= Tr Ug(λ) = χλ (g) .

In the next theorem we show that the orthogonality between irreps implies that all class
functions are linear combinations of the characters.

Theorem C.6.2. Let f : G → C be a class function. Then, f can be written as

X
f (g) = aλ χλ (g) , (C.115)
λ∈Irr(U )

for some coefficients aλ ∈ C.

Proof. For every λ ∈ Irr(U ) let

Z
Mλ := dg f (g)Ug(λ) . (C.116)
G

Since f is a class function, the matrix Mλ satisfies for all h ∈ G

Z Z
(λ) ∗(λ) (λ) (λ) ∗(λ) (λ)
Uh Mλ Uh = dg f (g)Uh Ug Uh = dg f (g)Uhgh−1
G ZG
(λ)
g ′ := hgh−1 −−−−→ = dg ′ f (h−1 g ′ h)Ug′ (C.117)
ZG
(λ)
f (h−1 g ′ h) = f (g ′ ) −−−−→ = dg ′ f (g ′ )Ug′ = Mλ .
G

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

C.7. THE CHARACTERISTIC FUNCTION 889

(λ) (λ)
That is, [Mλ , Uh ] = 0 for all h ∈ G. Since h 7→ Uh is an irrep, we get from Schur’s lemma
that Mλ = bλ I Bλ for some aλ ∈ C. In terms the components, this relation can be expressed
as Z
dg f (g)uλkk′ = bλ δkk′ ∀ k, k ′ ∈ [dλ ] . (C.118)
G
bλ
In other words, the coefficients cλkk′ given in (C.111) satisfies cλkk′ = δ ′.
dλ kk
Denoting by
λ
aλ := db λ we get from (C.110) that

X dλ
X X dλ
X X
f (g) = cλkk′ ūλkk′ (g) = aλ ūλkk (g) = aλ χλ (g) . (C.119)
λ∈Irr(G,ω) k,k′ =1 λ∈Irr(G,ω) k,=1 λ∈Irr(G,ω)

This completes the proof.

C.7 The Characteristic Function

In Chapter 15 we will encounter certain functions known as characteristic functions that
play a major role in the QRT of asymmetry. We therefore discuss some of their properties
here.

Definition C.7.1. Let ρ ∈ L(A) and g 7→ Ug be a projective unitary representation

of a group G in L(A). The characteristic function of ρ is the function

χρ (g) := Tr [ρUg ] . (C.120)

Three basic properties of characteristic functions:

1. Invariance. Suppose V ∈ U(A) is a G-invariant unitary matrix, i.e. [V, Ug ] = 0 for all
g ∈ G. Then,

χV ρV ∗ (g) = Tr [V ρV ∗ Ug ] = Tr [ρV ∗ V Ug ] = Tr [ρUg ] = χρ (g) . (C.121)

2. Multiplicativity. Let ρ ∈ D(A) and σ ∈ D(B). Then,

χρ⊗σ (g) = Tr ρA ⊗ σ B UgA ⊗ UgB = χρ (g)χσ (g) .

(C.122)

Here we assumed that the characteristic function on AB is defined with respect to the
representation g 7→ UgA ⊗ UgB .

3. Boundedness. For all g ∈ G and all ρ ∈ D(A) we have

|χρ (g)| = Tr [ρUg ] ⩽ 1 , (C.123)

with equality if g = e is the identity element of G.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

890 APPENDIX C. ELEMENTS OF REPRESENTATION THEORY

In the next lemma we show that characteristic functions can be used to characterize
G-invariant states.

Lemma C.7.1. Let ψ ∈ Pure(A). Then, the following statements are equivalent.

1. |χψ (g)| = 1 for all g ∈ G.

2. ψ is G-invariant.

⟨ψ|Ug |ψ⟩ = eiθg . (C.124)

Since ψ is a pure normalized state, the above equation implies that

Ug |ψ⟩ = eiθg |ψ⟩ . (C.125)

That is, ψ is G-invariant.

Exercise C.7.1. Show that in both directions of the proof above {eiθg } must be a 1-dimensional
representation of G. Use this to conclude that ψ is G-invariant if and only if χψ (g) is a
1-dimensional unitary representation of G.
Exercise C.7.2. Let L be the representation of a generator of a Lie group G and ρ ∈ D(A).
1. Show that for any n ∈ N
∂n
Tr [ρLn ] = i−n iθL

χρ e (C.126)
∂θn θ=0

(n)
2. Let κL denotes the n-th order cumulant defined as

(n) ∂n
κL := i−n n log χρ eiθL (C.127)
∂θ θ=0

Show that the first and second order cumulants are the mean and the variance of the
observable (i.e. Hermitian matrix) L.
Let ρ ∈ L(A) and g 7→ Ug be a projective unitary representation of a group G in L(A).
The reduction of ρ onto the λ-irrep is the matrix

ρB
A A A
λ := TrCλ Π ρ Π . (C.128)
λ λ λ

Note that ρB Aλ A Aλ
λ above is the marginal of Π ρ Π
λ
in the representation space Bλ , whereas ρC
λ
λ

Aλ A Aλ
as define in (C.53) for G-invariant matrices is the marginal of Π ρ Π in the multiplicity
space Cλ .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

C.7. THE CHARACTERISTIC FUNCTION 891

Theorem C.7.1. Let ρ ∈ L(A), and let g 7→ Ug be a projective unitary

representation of a compact group G in L(A). Then, there is a one-to-one
correspondence between ρBλ and χρ (g) given by
λ

X h B i
χρ (g) = Tr ρλ λ Ug(λ)
λ
Z (C.129)
ρB
λ
λ
= |Bλ | dg χρ (g −1
)Ug(λ)
G

Remark. The relationship between the characteristic function of ρ and its reduction onto the
λ-irrep is known as the Fourier transform over the group.

Proof. The first equality follows directly from (C.45) since

" #
M X h i
χρ (g) = Tr ρA Ug = Tr ρA (Ug(λ) ⊗ I Cλ ) = Tr ρB (λ)

λ
λ
Ug . (C.130)
λ λ

For the second equality we will use the relation (C.81). Multiplying both sides of (C.81) by
ρ ∈ L(A) and taking the trace gives
Z
1
dg ūλmm′ (g)χρ (g) = λ, m′ ρB
λ λ, m
λ
(C.131)
G |Bλ |

Since the above equation holds for all m, m′ ∈ [|Bλ |] we conclude that
Z
(λ)
ρB
λ
λ
= |Bλ | dg χρ (g)Ug−1 (C.132)
G

where we used the fact that ūλmm′ (g) = uλmm′ (g −1 ) (see Exercise C.5.1).

Exercise C.7.3. Show that if ρ ∈ L(A) is G-invariant then its reduction the λ-irrep is given
by
ρB
A A B
λ
λ
= Tr ρ Π λ u λ. (C.133)

Exercise C.7.4. Let ρ ∈ L(A), and g 7→ Ug be a projective unitary representation over a

compact group G. Define
M B 1 Cλ
ρ̃A := ρλ λ ⊗ I . (C.134)
λ
|Bλ |

Show that ρ̃ satisfies Z

A
ρ̃ = dg χρ (g −1 )Ug (C.135)
G

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

892 APPENDIX C. ELEMENTS OF REPRESENTATION THEORY

C.8 Positive Definite Functions on a Group

The characteristic function discussed above is closely related to the concept of positive defi-
nite function over a group.

Definition C.8.1. A complex function f : G → C is said to be positive definite if

for all choices of n ∈ N, g1 , . . . , gn ∈ G, and c1 , . . . , cn ∈ C
X X
c̄x cy f (gx−1 gy ) ⩾ 0 . (C.136)
x∈[n] y∈[n]

Moreover, f is said to be normalized if f (e) = 1.

Remark. For the case that G is a compact Lie group and f is continuous, the definition
above is equivalent to the statement that
Z Z
dg dh c̄(g)f (g −1 h)c(h) ⩾ 0 (C.137)
G G
2
where c ∈ L (G).
Exercise C.8.1. Show that a complex function f : G → C is positive definite if for all
choices of n ∈ N, g1 , . . . , gn ∈ G, and c1 , . . . , cn ∈ C
X X
c̄x cy f (gy gx−1 ) ⩾ 0 . (C.138)
x∈[n] y∈[n]

Note that we replace f (gx−1 gy ) in (C.136) with f (gy gx−1 ).

Exercise C.8.2. Let G = Zn = {0, 1, . . . , n − 1} be the cyclic group with the group operation
being addition modulo n. Show that a function f : Zn → C is positive definite if and only if
X 2πxy
f (x)ei n ⩾ 0 ∀ y ∈ Zn . (C.139)
x∈Zn

Observe that the condition above also implies that the left hand side is real.
If a complex function f : G → C is a characteristic function, i.e. f (g) = Tr[ρUg ] for some
ρ ∈ D(A) and some (non-projective) unitary representation g 7→ Ug acting on A, then for
any c1 , . . . , cn ∈ C and g1 , . . . , gn ∈ G
X X X X
c̄x cy f (gx−1 gy ) = c̄x cy Tr ρUg∗x Ugy

x∈[n] y∈[n] x∈[n] y∈[n]

X (C.140)
C := cx Ugx → = Tr [CρC ∗ ] ⩾ 0 .
x ∈[n]

In other words, all characteristic functions are positive definite functions over the group.
Conversely, we will see below that every normalized positive definite function f over a group
is a characteristic function.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

C.8. POSITIVE DEFINITE FUNCTIONS ON A GROUP 893

Exercise C.8.3. Let G be a compact Lie group and f : G → C be a positive definite function
on G. For any two functions f1 , f2 ∈ L2 (G) define
Z Z
⟨f1 |f2 ⟩f := dg dh f1 (g)f2 (h)f (g −1 h) . (C.141)
G G

Show that ⟨f1 |f2 ⟩f is an inner product.

Recall that any function f (g) ∈ L2 (G) on a compact Lie group can be expressed as (see
Theorem (C.6.1))
X X dλ
f (g) = cλkk′ ūλkk′ (g) , (C.142)
λ∈Irr(G) k,k′ =1

where we chose the trivial cocycle ω(g, h) = 1 and denoted Irr(G) := Irr(G, ω = 1). In
the following theorem we use it to show that normalized positive definite functions are
characteristic functions.

Theorem C.8.1. Let G be a finite or compact Lie group and f (g) ∈ L2 (G). The
following are equivalent:

1. There exists a Hilbert space A, a state ψ ∈ Pure(A), and a (non-projective)

unitary representation g 7→ Ug ∈ L(A) such that

f (g) = χψ (g) = ⟨ψ|Ug |ψ⟩ . (C.143)

2. f is a normalized positive definite function on G.

Proof of Theorem C.8.1. Since we already saw that all characteristic functions are positive
definite. It is therefore left to show that 2 ⇒ 1. Suppose f is a normalized positive definite
function on G. Recall that if f is also a characteristic function of some state ρ then f
and ρ satisfies (C.129) with f replacing χρ . However, since we need to prove that f is a
characteristic function we use this relationship as a definition. That is, for any λ ∈ Irr(G)
we define the operator Z
Bλ
ρλ := dλ dg f (g −1 )Ug(λ) , (C.144)
G
(λ)
where g 7→ Ugis the λ-irrep of the regular representation of G. We first show that the
operator above is positive semidefinite. Let η ∈ L(Bλ ), multiply both sides of the equation
above by ηη ∗ , and take the trace to get
h i Z
∗ Bλ
Tr ηη ρλ = dλ dg f (g −1 )χηη∗ (g) . (C.145)
G

We next decompose χηη∗ into two characteristic functions. To do that, first observe that
since η, η ∗ ∈ L(Bλ ) we have
χη (g) = Tr [ηUg ] = Tr ηUg(λ) and χη∗ (g) = Tr [η ∗ Ug ] = Tr η ∗ Ug(λ) .

(C.146)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

894 APPENDIX C. ELEMENTS OF REPRESENTATION THEORY

Next, consider the second relation in (C.129) with η replacing ρ and h replacing g; that is,
Z
(λ)
η = dλ dh χη (h−1 )Uh . (C.147)
G
(λ)
Multiplying both of its sides by η ∗ Ug , with some g ∈ G, and taking the trace on both sides
gives Z h i
(λ)
χηη∗ (g) = dλ dh χη (h−1 )Tr Ug(λ) Uh η ∗
ZG (C.148)
−1
= dλ dh χη (h )χη∗ (gh) .
G
Substituting this into (C.145) and using the fact that χη∗ (gh) = χ̄η (h−1 g −1 ) gives
h i Z Z
∗ Bλ
Tr ηη ρλ = dλ 2
dh dg f (g −1 )χη (h−1 )χ̄η (h−1 g −1 ) (C.149)
G G

Finally, by changing variables k1 := h and k2 := h−1 g −1 the above equation becomes

−1

h i Z Z
∗ Bλ
Tr ηη ρλ = dλ 2
dk1 dk2 χη (k1 )f (k1−1 k2 )χ̄η (k2 )
G G (C.150)
Since f is positive definite→ ⩾ 0 .

Since η was arbitrary we conclude that ρB

λ ⩾ 0.
λ

We next construct the operator

M B
ρA := ρλ λ ⊗ uCλ . (C.151)
λ∈Irr(G)

From the analysis above this operator is positive semidefinite. We show next that its trace is
one (i.e. it is a density matrix) and that f can be expressed as the characteristic function of
ρA . To see this, recall that since f (g) ∈ L2 (G), it can be expressed as a linear combination
of the basis elements {uµk′ k (g)} as
dµ
X X
f (g) = aµkk′ uµk′ k (g) , (C.152)
µ∈Irr(G) k,k′ =1

where each aµkk′ ∈ C (for convenience we used uµk′ k (g) instead of ūµkk′ (g), so the coefficients
aµkk′ are different than the coefficients cµkk′ of (C.142)). Substituting this into (C.144) gives
dµ Z
X X
ρB
λ
λ
= dλ aµkk′ dg ūµkk′ (g)Ug(λ) (C.153)
µ∈Irr(G) k,k′ =1 G

Combining this with (C.80) we conclude that

dλ
X
ρB
λ
λ
= aλkk′ |λ, k⟩⟨λ, k ′ | . (C.154)
k,k′ =1

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

C.9. G-INVARIANT ISOMETRIES 895

Finally, combining this with the expression (C.130) for the characteristic function, we get
X h B i
(λ)
χρ (g) = Tr ρλ Ug
λ

λ
dλ
X (C.155)
(C.154)→ = aλkk′ uλk′ k (g)
k,k′ =1

(C.152)→ = f (g) .

Moreover, since f is normalized, i.e. f (e) = 1, we have λ Tr ρB

λ
= 1, so that ρA is a
P
p λ
quantum state. Finally, let |ψ AÃ ⟩ := ρA ⊗ I Ã |ΩAÃ ⟩ be a purification of ρA , and define the
unitary representation g 7→ Ugreg ⊗ I Ã . Then, with respect to this representation χψ (g) =
f (g).

C.9 G-Invariant Isometries

′
Let V : A′ → A be an isometry; i.e. V ∗ V = I A and |A′ | ⩽ |A|. One of the key properties of
isometries is that they can be completed to a unitary matrix by adding additional columns
to make it a square unitary matrix. This property is very useful in applications. Here we
ask the question if it is possible to complete a G-invariant isometry V : A′ → A into a
G-invariant unitary matrix W : A → A. However, we first need to define what we mean by
G-invariant isometry.
′
Let g 7→ UgA ∈ L(A′ ) and g 7→ UgA ∈ L(A) be two projective unitary representations
of a group G on spaces A′ and A. Recall that an intertwiner is a linear transformation
T : A′ → A that satisfies
′
T UgA = UgA T ∀g∈G. (C.156)
One can define a G-invariant isometry as the above intertwiner that is also an isometry.
In this definition, the G-invariance property of the isometry is given with respect to two
′
representations, namely UgA and UgA . Here we discuss another way to define a G-invariant
isometry that depends only on the representation g 7→ UgA .
As we are interested in extending an isometry V : A′ → A into a unitary W : A → A we
will assume that A′ is a subspace of A and denote by Π : A → A′ the projection onto A′ .
We say that V : A′ → A is G-invariant with respect to the projective unitary representation
g 7→ Ug ∈ L(A) if the operator Ṽ := V Π : A → A is G-invariant. Since the domain of Ṽ is
the whole space A, the operator/matrix Ṽ belongs to L(A) so that the statement that it is
G-invariant is well defined and is equivalent to the condition

Ṽ Ug = Ug Ṽ ∀g∈G. (C.157)

Note that in this definition, the G-invariance property is defined with respect to a single
representation g 7→ UgA , and there is no need to consider another representation on system
′
A′ (i.e. g 7→ UgA ).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

896 APPENDIX C. ELEMENTS OF REPRESENTATION THEORY

Theorem C.9.1. Let g 7→ Ug ∈ L(A) be a projective unitary representation acting

on the Hilbert space A, and let V : A′ → A be a G-invariant isometry with respect to
this representation of G. Then, there exists a G-invariant unitary matrix W : A → A
such that V P = W P , where P : A → A′ is the projection onto the subspace A′ of A.

Proof. Since Ṽ := V P commutes with Ug it follows that also Ṽ ∗ commutes with Ug . There-
∗
fore, P = ṼL Ṽ also commutes with Ug . Now, consider the irrep decomposition of the Hilbert
space A = λ Bλ ⊗ Cλ . From Theorem C.3.3 it follows that
M M
Ṽ = I Bλ ⊗ ṼλCλ and P = I Bλ ⊗ ΠC
λ ,
λ
(C.158)
λ λ
h i
where ṼλCλ := |B1λ | TrBλ ΠBλ Ṽ and ΠC 1
B Bλ
λ := |Bλ | TrBλ Π P , with Π being the projection
λ λ

onto the space Bλ . Now, observe that since P is a projection the condition P P = P gives
ΠCλ Πλ
λ Cλ
= ΠC Cλ
λ so that each Πλ is itself a projection in the space Cλ . Moreover, since
λ

P = Ṽ ∗ Ṽ we conclude that Πλ λ = Ṽλ∗Cλ ṼλCλ . Therefore, from Exercise 2.3.8 it follows that
C

for each λ, ṼλCλ can be completed to a unitary Wλ : Cλ → Cλ . That is, there exists a unitary
matrix Wλ ∈ L(Cλ ) satisfying Wλ ΠλCλ = ṼλCλ ΠC λ . Define the matrix W ∈ L(A) by
λ

M
W := I Bλ ⊗ WλCλ . (C.159)
λ

Then, clearly W ∗ W = I A so that W is a unitary matrix and from its definition W P = Ṽ P =

V P . This completes the proof.

C.10 The Symmetric Subspace

In this section, we embark on an exploration of the symmetric subspace, an integral concept
in quantum information theory. We initiate our discourse with an essential grounding in the
symmetric subspace, tailored from a quantum information viewpoint, to lay a solid founda-
tion for further exploration. Due to space constraints, this discussion will not encompass all
aspects of the symmetric subspace, particularly its significant role in quantum phenomena
like state estimation, optimal cloning, and the de Finetti theorem. For those readers keen
on a deeper dive into these areas, we recommend consulting the “Notes and References”
section at the end of this chapter for additional resources. Our primary objective here is to
succinctly review some of the fundamental properties of the symmetric subspace that are
crucial for the concepts developed in this book.
Let Sn be the finite symmetric group consisting of the n! permutations on n symbols.
Recall that for any Hilbert space A, we use the notation An := A⊗n . For any permutation
n
π ∈ Sn we define the unitary representation π 7→ PπA , where A is a m-dimensional Hilbert
n
space, and PπA is a unitary matrix in L(An ) given by
n
X
PπA := xπ−1 (1) · · · xπ−1 (n) x1 · · · xn . (C.160)
xn ∈[m]n

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

C.10. THE SYMMETRIC SUBSPACE 897

n
Exercise C.10.1. Show that {PπA }π∈Sn is indeed a unitary representation of Sn .

The symmetric subspace of An is denoted by Symn (A) and is defined as

n o
n An
Symn (A) := |ψ⟩ ∈ A : Pπ |ψ⟩ = |ψ⟩ ∀ π ∈ Sn (C.161)

According to Theorem C.4.2 in the appendix, the orthogonal projection to the symmetric
n
subspace Symn (A), denote by ΠA
Sym , is given by

n 1 X An
ΠA
Sym = P . (C.162)
n! π∈S π
n

In order to calculate the dimension of Symn (An ), observe that the action of the projection
above on any element |xn ⟩ := |x1 · · · xn ⟩ ∈ An of the standard basis of An gives the symmetric
vector
n 1 X
ΠASym |x n
⟩ = |xπ(1) · · · xπ(n) ⟩ . (C.163)
n! π∈S
n

Since the type of each sequence (xπ(1) · · · xπ(n) ) equals to the type of xn , the state above is
uniquely determined by the type of xn . Recall that t(xn ) denotes the type of xn , and X n (t)
denotes the set of all sequences xn ∈ [m]n whose type is t. Keeping this in mind, we define
for any type t ∈ Type(n, m) the unit vector

1 X
|φt ⟩ := √ |xn ⟩ , (C.164)
kt xn ∈X n (t)

where
n n
kt := |X (t)| = . (C.165)
nt1 , . . . , ntm
n
Observe that the state in (C.163) is proportional to |φt ⟩. Therefore, since the image of ΠA
Sym
is Symn (A), the set of vectors {|φt ⟩}t∈Type(n,m) is an orthonormal basis of Symn (A). This
implies that the dimension of the symmetric subspace is given by

|Symn (A)| = |Type(n, m)|

(C.166)

n+m−1
(8.87)→ = ⩽ (n + 1)m .
n

C.10.1 Characterization of the Symmetric Subspace

The symmetric subspace has another interesting characterization. We now show that every
state in Symn (A) can be written a linear combination of states of the form |ψ⟩⊗n .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

898 APPENDIX C. ELEMENTS OF REPRESENTATION THEORY

Theorem C.10.1. Let A be a Hilbert space and n ∈ N. Then,

n o
Symn (A) = span |ψ⟩⊗n : |ψ⟩ ∈ A . (C.167)

Proof. Let B := span{|ψ⟩⊗n : |ψ⟩ ∈ A} be the vector space on the right-hand side
of (C.167),Pand observe that B ⊆ Symn (A). We therefore need to show that Symn (A) ⊆ B.
Let |ψ⟩ = x∈[m] vx |x⟩, and for every xn ∈ [m]n let vxn := vx1 · · · vxn . Then,
X X
|ψ⟩⊗n = vxn |xn ⟩ = v1nt1 · · · vm
ntm
|χt ⟩ (C.168)
xn ∈[m]n t∈Type(n,m)

n
P
where |χt ⟩ = xn ∈X n (t) |x ⟩ is an unnormalized version of the normalized state defined
in (C.164).
Next, we define the polynomial f : Cm → B as
X
f (v) := v1k1 · · · vm
km
|χt ⟩ ∀ v ∈ Cm , (C.169)
t∈Type(n,m)

where kj := ntj ∈ N for all j ∈ [m]. Observe that the integers {kj }j∈[m] depends on t (for
simplicity of the exposition we did not add a subscript to indicate that). From (C.168)
we have f (v) ∈ B for all v ∈ Cn . We now argue that this implies that |χt ⟩ ∈ B for all
t ∈ Type(n, m) so that Symn (A) ⊆ B. Indeed, observe that for any t and corresponding
integers k1 , . . . , km we have
∂ n f (v1 , . . . , vm )
|χt ⟩ ∝ . (C.170)
∂v1k1 · · · ∂vm km v=0

Now, since f (v) ∈ V for all v ∈ Cm , and since all partial derivatives are limits of linear
combinations of f (v) at different points (v1 , . . . , vm ), we conclude that |χt ⟩ ∈ B. This
completes the proof.

Theorem C.10.2. Let U 7→ U ⊗n be the natural unitary representation of U(A)

acting on An . With respect to this representation Symn (A) is an irreducible subspace
of An .

Proof. Let |ψ1 ⟩, |ψ2 ⟩ ∈ Symn (A) be two non-zero vectors in the symmetric subspace. To
show that Symn (A) does not have an irreducible subspace, it will be enough to show that
there exists U ∈ U(A) such that
ψ1 U ⊗n ψ2 ̸= 0 . (C.171)
From Lemma C.10.1 it follows that both |ψ1 ⟩ and |ψ2 ⟩ can be expressed as linear combinations
of states of the form |φ⟩⊗n . Hence, there exists |φ1 ⟩, |φ2 ⟩ ∈ A such that ⟨ψ1 |φ⊗n
1 ⟩ ̸= 0 and
⟨ψ2 |φ⊗n
2 ⟩ ̸
= 0. For j = 1, 2 denote
n o
Gj := U ∈ U(A) : U |φj ⟩ = |φj ⟩ (C.172)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

C.10. THE SYMMETRIC SUBSPACE 899

and observe that both G1 and G2 are subgroups of U(A). By definition, |φj ⟩ is an eigenvector
corresponding to the eigenvalue one for any U ∈ Gj . Therefore, denoting m := |A|, from
the spectral decomposition of such U , we conclude that every U ∈ Gj has the form U =
Ũ ⊕ |φj ⟩⟨φj |, where Ũ is a unitary matrix acting on the (m − 1)-dimensional subspace
orthogonal to |φj ⟩. In other words, for every j ∈ {1, 2}
n o
Gj = Ũ ⊕ |φj ⟩⟨φj | : Ũ ∈ U(m − 1) . (C.173)

Now, from Theorem C.4.2 we get that for j = 1, 2 (with dVj being the Haar measure on Gj )
Z
Πj := dVj Vj⊗n , (C.174)
Gj

are, respectively, the orthogonal projections to the subspaces

n o
(An )Gj := |ψ⟩ ∈ An : Vj⊗n |ψ⟩ = |ψ⟩ ∀ Vj ∈ Gj . (C.175)

We next show that the dimension (An )Gj = 1. First, observe that |φj ⟩⊗n ∈ (An )Gj so the
dimension of (An )Gj is at least one. Let |ψ⟩ ∈ (An )Gj so that Vj⊗n |ψ⟩ = |ψ⟩ for all Vj ∈ Gj .
Since each such Vj has the form Vj = V˜j ⊕ |φj ⟩⟨φj | we can take in particular Ṽj = eiθ Pj
where where Pj is the projection to the subspace orthogonal to |φj ⟩, and θ is any phase in
[0, 2π]. For this choice we get (eiθ Pj ⊕ |φj ⟩⟨φj |)⊗n |ψ⟩ = |ψ⟩ for all θ ∈ [0, 2π]. But this is
only possible for a state |ψ⟩ that is proportional to |φj ⟩⊗n . That is, up to a proportionality
coefficient, the only element of (An )Gj is |φj ⟩⊗n . Hence, the projection to (An )Gj is

Πj = |φj ⟩⟨φj |⊗n . (C.176)

Finally, let W be any unitary matrix in U (m) that satisfies W |φ2 ⟩ = |φ1 ⟩. Then, we get
that
Z Z
dV1 dV2 ψ1 (V1 W V2 )⊗n ψ2
G1
D GZ2 Z E
⊗n ⊗n ⊗n
= ψ1 dV1 V1 W dV2 V2 ψ2 (C.177)
G1 G2
(C.176)→ = ψ1 φ1 (⟨φ1 |W |φ2 ⟩)n
⊗n
φ⊗n
2 ψ2
W |φ2 ⟩ = |φ1 ⟩→ = ψ1 φ⊗n
1 φ⊗n
2 ψ2 ̸= 0 .
Therefore, there must exists at least one unitary matrix V1 and one unitary matrix V2
such that U = V1 W V2 is satisfying (C.171). This completes the proof that Symn (A) is an
irreducible subspace.
The space An has another useful subspace called the antisymmetric subspace which we
denoted by Asy(An ). It is defined by
n o
n n sign(π) An
Asy(A ) := |ψ⟩ ∈ A : (−1) Pπ |ψ⟩ = |ψ⟩ ∀ π ∈ Sn (C.178)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

900 APPENDIX C. ELEMENTS OF REPRESENTATION THEORY

n
Exercise C.10.2. Show that {(−1)sign(π) PπA }π∈Sn is a unitary representation of Sn .

From Theorem C.4.2 it follows that the projection to the antisymmetric subspace is given
by
n 1 X n
ΠA
Asy = (−1)sign(π) PπA . (C.179)
n! π∈S
n

Now, observe that

n
X
ΠA n
Asy |x ⟩ = (−1)sign(π) |xπ(1) · · · xπ(n) ⟩ . (C.180)
π∈Sn

This vector is zero unless n ⩽ m. Otherwise, if n > m any sequence xn must contain at least
two components that are equal to each other. Without loss of generality suppose x1 = x2 .
Then, for any permutation π ∈ Sn the permutation π ′ defined by

π(2) if j = 1

′
π (j) := π(1) if j = 2 (C.181)

π(j) if j > 2


has the opposite sign than π, but yet have

|xπ′ (1) · · · xπ′ (n) ⟩ = |xπ(1) · · · xπ(n) ⟩ . (C.182)

Therefore, π and π ′ cancel their contributions in (C.180).

n
The analysis above also tells us that ΠA n n
Asy |x ⟩ = 0 whenever x contains two components
that are equal to each other. Therefore, the dimension of Asy(An ) is given by the number
of ways to choose n-distinct entries from {1, . . . , m}. That is, for m ⩾ n the dimension of
the antisymmetric subspace is given by

n m
|Asy(A )| = . (C.183)
n

C.10.2 The Bipartite Symmetric Subspace

The case n = 2 is very unique since in this case we get that

2 2 d+1 d
Sym(A ) + Asy(A ) = + = d2 = |A|2 . (C.184)
2 2

This means that the space A2 := A ⊗ A can be decomposed as

A2 = Sym(A2 ) ⊕ Asy(A2 ) . (C.185)

We already saw that Sym(A2 ) is an irreducible subspace of A2 , and we will see shortly that the
above decomposition is a decomposition into the two irreps of the “natural” representation
of the group U (m) (or SU (m)) on the space A2 .

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

C.10. THE SYMMETRIC SUBSPACE 901

In the case that n = 2 the set of all permutations on two elements, S2 , consists of only
two permutations. Therefore, from (C.160) we get that the projection to the symmetric
subspace takes the simple form
2 1 A2
ΠASym = I + F (C.186)
2
where F : A2 → A2 is known as the swap operator given by
X
F := |x⟩⟨y| ⊗ |y⟩⟨x| . (C.187)
x,y∈[m]

Similarly, the projection to the antisymmetric subspace is given by

A2 1 A2
ΠAsy = I −F . (C.188)
2
Exercise C.10.3. Show that for any unitary matrix U ∈ L(A) the swap operator commutes
with U ⊗ U .
The exercise above implies that
X
F = (U ⊗ U ) F (U ⊗ U )∗ = U |x⟩⟨y|U ∗ ⊗ U |y⟩⟨x|U ∗ (C.189)
x,y∈[m]

so that F is independent on the choice of basis of A.

−
The antisymmetric subspace has an orthonormal basis {|ψxy ⟩}x<y∈[m] given by

− 1
|ψxy ⟩ := √ (|xy⟩ − |yx⟩) ∀1⩽x<y⩽m. (C.190)
2
+
Similarly, the symmetric subspace has an orthonormal basis {|ψxy ⟩}x⩽y∈[m] given by
(
√1 (|xy⟩ + |yx⟩) if 1 ⩽ x < y ⩽ m
+ 2
|ψxy ⟩ := . (C.191)
|xx⟩ if x = y ∈ [m]

Theorem C.10.3. The subspace Asy(A2 ) is an irreducible subspace of A2 under the

natural representation of U (m); i.e. under U 7→ U ⊗ U .

Proof. Suppose by contradiction that Asy(A2 ) is not an irreducible subspace. Therefore,

there must exists non-zero vectors |ψ1 ⟩, |ψ2 ⟩ ∈ Asy(A2 ) such that

⟨ψ1 |U ⊗ U |ψ2 ⟩ = 0 ∀ U ∈ U (m) . (C.192)

Denote X X
− −
|ψ1 ⟩ = axy |ψxy ⟩ and |ψ2 ⟩ = bxy |ψxy ⟩. (C.193)
x<y x<y

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

902 APPENDIX C. ELEMENTS OF REPRESENTATION THEORY

Now, take U = DPπ , where D = x∈[m] eiθx |x⟩⟨x| is a diagonal unitary with θx ∈ [0, 2π],
P

and Pπ := x∈[m] |π −1 (x)⟩⟨x| is a permutation matrix with π ∈ Sm . Then,

X X
(U ⊗ U )|ψ2 ⟩ = −
bxy (U ⊗ U )|ψxy ⟩= bxy ei(θπ−1 (x) +θπ−1 (y) ) |ψπ−−1 (x)π−1 (y) ⟩ (C.194)
x<y x<y

Therefore, from (C.192) we get that for all permutations π ∈ Sm and all phases {θx }x∈[m] we
have m
X X
0 = ⟨ψ1 |U ⊗ U |ψ2 ⟩ = eiθx
eiθy a∗xy bπ(x)π(y) . (C.195)
x∈[m] y=x+1

Taking the derivative of both sides above with respect to θ1 gives

m
X
eiθy a∗1y bπ(1)π(y) = 0 . (C.196)
y=2

Since the equation above holds for all θy we must have a∗1y bπ(1)π(y) = 0 for all y = 2, . . . , m
and all permutations π ∈ Sm . Since |ψ2 ⟩ ̸= 0, for each y ∈ {2, . . . , m} there exists π ∈ Sm
such that bπ(1)π(y) ̸= 0. Hence, we must have a1y = 0 for all y = 2, . . . , d. Next, observe that
the relation (C.192) becomes
m
X m
X
e iθx
eiθy a∗xy bπ(x)π(y) = 0 . (C.197)
x=2 y=x+1

Therefore, taking the derivative with respect to θ2 and repeating similar lines as above we
conclude that a2y = 0 for all y = 3, . . . , m. Continuing in this way we get that axy = 0 for
all 1 ⩽ x < y ⩽ m in contradiction with the assumption that |ψ1 ⟩ ̸= 0. This concludes the
proof.

Exercise C.10.4. Extend the proof above for the case that n > 2. That is, prove that
Asy(An ) is an irreducible subspace of An under the natural representation of U (m) in An .

C.11 Notes and References

Representation theory is a vast area. For a more standard mathematical treatment we refer
e.g. [82]. We adapted the notations to ones that are more commonly used in physics. A
review of some of the topics covered here including applications to quantum estimation
theory can also be found in [46]. A review on the properties of the symmetric subspace
and their applications in quantum information are given in [113]. In the Appendix D we
also introduce the symmetric version of Uhlmann’s theorem (Theorem D.10.1) which is due
to [32].

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

APPENDIX D

Miscellany

D.1 The Divided difference

We will make use of the notion of the divided difference for f , for which we refer the reader
to [122, (6.1.17)] for more details. The divided difference for a function f : C → C, given a
sequence of distinct complex points, αi ∈ C, i = 1, . . . , n, is defined for i = 0, 1 by

f [α1 ] := f (α1 ) (D.1)

f (α1 ) − f (α2 )
f [α1 , α2 ] := , (D.2)
α1 − α2
and defined inductively by
f [α1 , . . . , αi−1 , αi ] − f [α1 , . . . , αi−1 , αi+1 ]
f [α1 , . . . , αi , αi+1 ] = , (D.3)
αi − αi+1
for i = 2, 3, . . . , n. It is well known that f [α1 , . . . , αi , αi+1 ] is a symmetric function in
α1 , . . . , αi+1 , e.g. [?, p’393]. For points that are not distinct it is defined by an appropriate
limit. For example, for x ̸= y we have

f [x, x] = f ′ (x)
f ′ (x) f (x) − f (y)
f [x, x, y] = − (D.4)
(x − y) (x − y)2
1
f [x, x, x] = f ′′ (x). (D.5)
2
Note that (D.5) can be obtained from (D.4) by setting h := y − x → 0 and expanding
f (y) = f (x + h) = f (x) + hf ′ (x) + 21 h2 f ′′ (x) + O(h3 ).
Theorem D.1.1. Let A = Diag(α1 , . . . , αn ) ∈ Cn×n be a diagonal square matrix, and
B = [bij ] ∈ Cn×n be a complex square matrix. Assume that f (x) : C → C satisfy one of the
following conditions:

903
904 APPENDIX D. MISCELLANY

1. f (x) is an analytic function in some domain D ⊂ C which contains α1 , . . . , αn , and

can be approximated uniformly in D by polynomials.
2. α1 , . . . , αn are in a real open interval (a, b) and f has n continuous derivatives in (a, b).
Then
(1) (2) (n)
f (A + tB) = f (A) + tLA (B) + t2 LA (B) + · · · + tn LA (B) + O(tn+1 ) (D.6)
(1) (2)
Here LA : Cn×n → Cn×n is a linear operator, and LA : Cn×n → Cn×n is a quadratic
homogeneous noncommutative polynomial in B, etc. For i, j = 1, . . . , n we have
(1) f (αi ) − f (αj )
[LA (B)]ij = f [αi , αj ]bij = bij (D.7)
αi − αj
(2)
X
[LA (B)]ij = f [αi , αk , αj ]bik bkj . (D.8)
k∈[n]

And more generally, for 1 ⩽ m ⩽ n − 1

n
X n
X
(m+1)
[LA (B)]ij = ··· f [αi , αk1 , αk2 , . . . , αkm , αj ]bik1 bk1 k2 · · · bkm−1 km bkm j (D.9)
k1 =1 km =1

In particular
(1)
X
Tr(LA (B)) = f ′ (αj )bjj (D.10)
j∈[n]
n n
(2) 1 X
′
X f ′ (αi ) − f ′ (αj )
Tr(LA (B)) = f [αi , αj ]bij bji = bij bji . (D.11)
2 i,j=1 i,j=1
2(α i − α j )

and more generally for 1 ⩽ m ⩽ n

n
(m) 1 X
Tr(LA (B)) = f ′ [αj1 , . . . , αjm ]bj1 j2 bj2 j3 · · · bjm−1 jm bjm j1 . (D.12)
mj
1 ,j2 ,...,jm =1

Remark. The expansion above can be naturally generalized to higher than the second order,
but for the purpose of this article, we will only need to expand f (A + tB) up to the second
order in t. Moreover, for our purposes we will only need to assume that the αi are real and
the condition 2 on f holds. We kept condition 1 on f in the theorem just to be a bit more
general.
Note that in all the expressions above, one must identify αi = αj with the limit αj → αi .
For example, the term
f ′ (αi ) − f ′ (αj ) 1
= f ′′ (αi ) for αi = αj . (D.13)
2(αi − αj ) 2
In particular, note that if B is diagonal, Eq. (D.11) gives the known second order term of
the Taylor expansion.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

D.1. THE DIVIDED DIFFERENCE 905

Proof. From the conditions on f , it is enough to prove the theorem assuming f is a poly-
nomial. By linearity, it is enough to prove all the claims for f (x) = xm . Clearly, in the
expansion
(A + tB)m = Am + tLA (B) + t2 QA (B) + O(t3 ) (D.14)
we must have
X
LA (B) = Ap BAq , (D.15)
0⩽p,q, p+q=m−1
X
QA (B) = Ap BAq BAr , (D.16)
0⩽p,q,r, p+q+r=m−2

where we expanded (A + tB)m up to first and second order in t. All that is left to show is
that these matrices coincide with the ones defined in Eqs. (D.7,D.8).
Indeed, since A is diagonal, the matrix elements of the LA (B) in Eq.(D.15) are given by
X αim − αjm
[LA (B)]ij = αip αjq bij = bij , (D.17)
0⩽p,q, p+q=m−1
αi − αj

which is equal to the exact same matrix elements given in Eq.(D.7).

In the same way, since A is diagonal, observe that the matrix elements of the QA (B) in
Eq.(D.16) are given by
X X
[QA (B)]ij = αip αkq αjr bik bkj . (D.18)
k∈[n] 0⩽p,q,r, p+q+r=m−2

On the other hand, a straightforward calculation gives for f (x) = xm

X
xm [αi , αk , αj ] = αip αkq αjr . (D.19)
0⩽p,q,r, p+q+r=m−2

Thus, the expressions in Eq. (D.8) and Eq. (D.16) for QA (B) are the same.
We now prove Eq. (D.11). Observe first that Eq. (D.8) yields
n
X
Tr(QA (B)) = f [αi , αi , αj ]bij bji , (D.20)
i,j=1

where we have used the symmetry f [αi , αj , αi ] = f [αi , αi , αj ]. Now, since bij bji is symmetric
under an exchange between i and j, we can replace f [αi , αi , αj ] in Eq. (D.20) with

1 1
(f [αi , αi , αj ] + f [αj , αj , αi ]) = f ′ [αi , αj ] , (D.21)
2 2
where for the last equality we used Eq. (D.4).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

906 APPENDIX D. MISCELLANY

Corollary D.1.1. Let f : R → R be as above, g : R → R be another function, and

h : R → R be the function h(t) = g(t)f ′ (t). Then, for any ρ, σ ∈ Herm(A) and
t ∈ (0, 1)

Tr[g(ρ)f (ρ + tσ + t2 η)] = Tr [g(ρ)f (ρ)] + tTr [h(ρ)σ]

1 (D.22)
+ t2 Tr [h(ρ)η] + Tr [σLh (σ)] − Tr [Lf (σ)Lg (σ)]
2

Proof. Let ξ := σ + tη and observe that

f (ρ + θξ) = f (ρ) + tL(1) 2 (2) 3

ρ (ξ) + t Lρ (ξ) + O(t )
(D.23)
−−−−→ = f (ρ) + tL(1) 2
L(1) (2) 3

ξ := σ + tη ρ (σ) + t ρ (η) + Lρ (σ) + O(t )

(1)
Since the map Lρ is self-adjoint we get

Tr g(ρ)L(1)
(1)
ρ (σ) = Tr Lρ g(ρ) σ
(D.24)
(1)
g(ρ) = g(ρ)f ′ (ρ) −−−−→ = Tr [g(ρ)f ′ (ρ)σ] = Tr[h(ρ)σ] ,

Lρ

(1)
where we use the fact that Lρ (g(ρ)) = g(ρ)f ′ (ρ) (we leave it as an exercise). Similarly,

Tr g(ρ)L(1)

ρ (η) = Tr [h(ρ)η] . (D.25)

Finally,
X
Tr g(ρ)L(2) g(αx )⟨x|L(2)

ρ (σ) = ρ (σ)|x⟩
x∈[m]
X
= g(αx )f [αx , αy , αx ]|⟨x|σ|y⟩|2
x,y∈[m]

f ′ (αx )

X f (αx ) − f (αy )
(D.4)→ = g(αx ) − |⟨x|σ|y⟩|2
αx − αy (αx − αy )2
x,y∈[m]

1 X h(αx ) − h(αy ) (g(αx ) − g(αy ))(f (αx ) − f (αy ))
σ=σ ∗
−−−−→ = − |⟨x|σ|y⟩|2
2 αx − αy (αx − αy )2
x,y∈[m]
1 X
= ⟨x|σ|y⟩ [Lh (σ)]yx − [Lg (σ)]xy [Lf (σ)]yx
2
x,y∈[m]
1 1
= Tr [σLh (σ)] − Tr [Lf (σ)Lg (σ)] .
2 2
(D.26)
This completes the proof.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

D.2. THE MAXIMAL f -DIVERGENCE: SINGULAR CASE 907

D.2 The Maximal f -Divergence: Singular Case

We prove here the formula given in (5.124) for the maximal f -divergence for the case that
σ is singular. For this purpose, we consider ρ, σ ∈ D(A), and we express ρ and σ in a block
form    
ρ11 ζ σ11 012
ρ=  and σ =   (D.27)
ζ∗ ρ22 021 022

where σ11 > 0 and 0 denotes a zero matrix. Note that we can always find a basis in which
ρ and σ has the above form. Moreover, unless specified otherwise, all inverses of matrices
will be 
understoodas generalized inverses. For example, the inverse of σ is understood as
−1
−1
σ11 012
σ :=  . Recall that from (B.75) that the Schur complement of the block ρ22 of
021 022
ρ is ρ/ρ22 = ρ11 − ζρ−1 ∗
22 ζ .

Lemma D.2.1. Let D be a classical divergence, ρ, σ ∈ D(A) be as in (D.27), and

denote by ρ̃ := ρ11 − ζρ−1 ∗
22 ζ and by σ̃ := σ11 . Then, the maximal quantum extension
of D can be expressed as
  
q
D(ρ∥σ) = inf D p   (D.28)
0

where the infimum is over all 1 < n ∈ N, all p ∈ Prob(n), and all POVMs {Λx }x∈[n−1]
acting on the support of σ that satisfy the following constraints: For all x ∈ [n − 1],
qx := Tr[σ̃Λx ] > 0 and

1 1
X px
σ̃ − 2 ρ̃σ̃ − 2 ⩾ Λx . (D.29)
qx
x∈[n−1]

Proof. From Lemma 5.3.1 and the optimization in (5.102) and (5.103), it follows that D(ρ∥σ)
can be expressed as in (D.28), where the infimum is over all 1 < n ∈ Prob(n), p ∈ Prob(n)
and 0 < q ∈ Prob(n − 1), such that there exists n − 1 density matrices {ωx }x∈[n−1] ⊂ D(A)
satisfying
X X
ρ⩾ px ωx and σ = qx ωx , (D.30)
x∈[n−1] x∈[n−1]
P
where we used the fact that the first relation above holds if and only if ρ = x∈[n−1] px ωx +
pn ωn , for some density matrix ωn . Note that since σ has the form given in (D.27), it follows
from the second relation above and the fact that q > 0 that also the density matrices {ωx }

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

908 APPENDIX D. MISCELLANY
 
(ωx )11 012
have the form ωx =   for all x ∈ [n − 1].
021 022
Define the matrix  
I11 −ζρ−1
22
L :=   . (D.31)
021 I22
This type of matrix has been used in the first chapter when we studied the Schur com-
plement
 of positive
 semidefinite matrices (see Sec. B.8). It has an inverse given by L−1 =
I11 ζρ−1
22
  and it satisfies
021 I22
 
−1 ∗
ρ11 − ζρ22 ζ 012
LρL∗ =  , LσL∗ = σ, and Lωx L∗ = ωx ∀ x ∈ [n − 1] . (D.32)
021 ρ22

Therefore, by applying the conjugation L(·)L∗ on both sides of (D.30) we get

 
ρ̃ 012 X X
 ⩾ px ωx and σ = qx ωx . (D.33)
021 ρ22 x∈[n−1] x∈[n−1]

Since L is invertible, we can conjugate the above relations by L−1 (·)(L∗ )−1 to get back (D.30).
Therefore the above relations are equivalent to (D.30). Moreover, since ρ22 ⩾ 0, it follows
that the relation above holds if and only if
X X
ρ̃ ⩾ px ω̃x and σ̃ = qx ω̃x . (D.34)
x∈[n−1] x∈[n−1]

1 1
where ω̃x := (ωx )11 and σ̃ := σ11 . Finally, denoting by Λx := qx σ̃ − 2 ω̃x σ̃ − 2 , and applying the
1 1
conjugation σ̃ − 2 (·)σ̃ − 2 to both sides of (D.34) gives the relations
1 1
X px X
σ̃ − 2 ρ̃σ̃ − 2 ⩾ Λx and Λx = I A . (D.35)
qx
x∈[n−1] x∈[n−1]

This completes the proof.

Exercise D.2.1. Prove that ρ̃ in the lemma above has trace no greater than one.
When D equals the f -divergence, Eq. (5.12) implies (together with the lemma above)
that the infimum in (5.102) can be expressed as
 

 X px ˜

Df (ρ∥σ) := inf qx f + f (0)pn , (D.36)
1<n∈N  qx 
x∈[n−1]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

D.2. THE MAXIMAL f -DIVERGENCE: SINGULAR CASE 909

where f˜ is defined in (5.13), and the infimum above is subject to the same conditions given
in Lemma D.2.1. Observe that if f˜(0) = ∞ then pn in (D.36) can be taken to be zero
(otherwise, Df (ρ∥σ) = ∞). This means that the first relation in (D.30) must also hold with
equality (recall the original condition (5.103)). But from the second relation in (D.30), and
the fact that qx > 0 for all x ∈ [n − 1], this is possible only if supp(ρ) ⊆ supp(σ). That is, in
the case f˜(0) = ∞ we have Df (ρ∥σ) = ∞ for supp(ρ) ̸⊆ supp(σ) and for supp(ρ) ⊆ supp(σ)
we can take pn = 0 in the optimization above and also replace the inequality sign of (D.29)
with an equality.
Going back to the general case, one natural choice/guess for the optimal n, p and
{Λx }x∈[n−1] , is to choose them such that we have equality in (D.29). This is possible for
example by taking n = r + 1, where r is the dimension of the support of σ, and for any
1 1
x ∈ [r] to take Λx = |ψx ⟩⟨ψx | with |ψx ⟩ being the x-eigenvector of σ̃ − 2 ρ̃σ̃ − 2 corresponding to
1 1
the eigenvalue px /qx (i.e. p is chosen such that px /Tr[σ̃Λx ] is the x-eigenvalue of σ̃ − 2 ρ̃σ̃ − 2 ).
For this choice we have X px
1 1
σ̃ − 2 ρ̃σ̃ − 2 = |ψx ⟩⟨ψx | (D.37)
qx
x∈[r]

which forces pn to be
X X px
pn = 1 − px = 1 − ⟨ψx |σ̃|ψx ⟩ = 1 − Tr[ρ̃] , (D.38)
qx
x∈[r] x∈[r]

where the last equality follows by multiplying both sides of (D.37) by σ̃ and taking the trace.
Moreover, for these choices of n, p and {Λx }, we have
X
X px px
qx f = Tr[σ̃|ψx ⟩⟨ψx |]f
qx qx
x∈[n] x∈[n]
X px
∀t ⩾ 0 f (t|ψx ⟩⟨ψx |) = f (t)|ψx ⟩⟨ψx | → −−−−→ = Tr σ̃f |ψx ⟩⟨ψx |
qx
x∈[n]
   (D.39)
X px
{|ψx ⟩} is orthonormal → −−−−→ = Tr σ̃f  |ψx ⟩⟨ψx |
qx
x∈[n]
h 1 i
−2 − 12
= Tr σ̃f σ̃ ρ̃σ̃ .

Note that we obtained the formula above for a particular choice of n, p and {Ex }. Therefore,
since this is not necessarily the optimal choice (recall Df is defined in terms of an infimum),
we must have h 1 i
− 12
−2
Df (ρ∥σ) ⩽ Tr σ̃f σ̃ ρ̃σ̃ + (1 − Tr[ρ̃])f˜(0) . (D.40)
Interestingly, to get this upper bound we did not even assume that f is convex, but if f is
operator convex we get an equality.
Before we state the theorem below, we point out a remarkable result from matrix analysis
that we will use below. Suppose f : [0, ∞) → R is operator convex. In Sec. B we saw that

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

910 APPENDIX D. MISCELLANY

operator convexity is a strong condition, as it provides a lot of information about f . In

particular, in [120] it was shown that if in addition f satisfies f˜(0) := limε→0+ εf ( 1ε ) < ∞,
then f necessarily have the form

f (r) = f (0) + f˜(0)r + g(r) ∀ r ∈ [0, ∞) , (D.41)

where g : [0, ∞) → R is an operator monotone decreasing function.

Closed Formula of The Maximal f -Divergence

Theorem D.2.1. Let ρ, σ ∈ D(A) given as in (D.27) with ρ̃ := ρ11 − ζρ−1 ∗
22 ζ and
σ̃ := σ11 , and let f := (0, ∞) → R be operator convex, with f (0) := limε→0+ f (ε) and
f (1) = 0. Then,
h 1 1
i
Df (ρ∥σ) = Tr σ̃f σ̃ − 2 ρ̃σ̃ − 2 + (1 − Tr[ρ̃])f˜(0) , (D.42)

where f˜(0) := limε→0+ εf ( 1ε ).

Remark. We use the convention 0 · ∞ = 0 in the case that both Tr[ρ̃] = 1 and f˜(0) = ∞.
The inverse of ρ22 in the theorem above is a generalized inverse (in case ρ22 is not invertible).
This in particular implies that if supp(ρ) ⊆ supp(σ) then ρ−1 22 = 0 and ρ̃ = ρ = ρ11 so that
Tr[ρ̃] = 1 and the second term on the right-hand side of (D.42) vanishes. Moreover, the
requirement that f (1) = 0 is not necessary, but if f (1) ̸= 0 then the resulting divergence Df
will not be normalized (i.e. we will get Df (1∥1) ̸= 0). Finally, observe that (D.42) can be
expressed in terms of the Kobu-Ando operator mean #f (see Definition B.5.1) as

Df (ρ∥σ) = Tr [ρ̃#f σ̃] + (1 − Tr[ρ̃])f˜(0) . (D.43)

Proof. We have already shown in (D.40) that Df (ρ∥σ) cannot be great than the right-
hand side of (D.42). Therefore, it is left to show the opposite inequality. Let 1 < n ∈ N,
{Λx }x∈[n−1] be a POVM acting on the support of σ, and p ∈ Prob(n). Suppose the conditions
in Lemma (D.2.1) hold with these elements. From Naimark’s theorem (see Theorem 3.3.2)
there exists a tuple of mutual orthonormal projectors {Px }x∈[n−1] ⊂ Pos(B), where B is
the extended Hilbert space, and an isometry V : A → B such that Λx = V ∗ Px V for all
x ∈ [n − 1]. We use this to compute

X px X px
qx f = Tr[Λx σ̃]f
qx qx
x∈[n−1] x∈[n−1]

h X px i
= Tr f Ex σ̃
qx
x∈[n−1]

h X px i
= Tr f Px V σ̃V ∗ ,
qx
x∈[n−1]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

D.2. THE MAXIMAL f -DIVERGENCE: SINGULAR CASE 911

where we put the P the trace. Now, since {Px } are orthogonal projectors we have
sum inside
P px px
x∈[n−1] f qx Px = f x∈[n−1] qx Px . Combining this with the cyclic property of the
trace we get
 
X p X p
x x
qx f = Tr V ∗ f Px V σ̃ 
qx qx
x∈[n−1] x∈[n−1]
 
X p
x ∗
Jensen’s Inequality (B.30) →→ ⩾ Tr f V Px V σ̃  (D.44)
qx
x∈[n−1]
 
X p
x
= Tr f Λx σ̃  .
qx
x∈[n−1]

To continue, we first consider the case that f˜(0) = ∞. From the remark below (D.36) we
know that Df (ρ∥σ) = ∞ unless supp(ρ) ⊆ supp(σ). Furthermore, if supp(ρ) ⊆ supp(σ) then
we can replace the inequality sign of (D.29) with an equality, so that the above equation
gives the desired inequality (recall that σ̃ = σ and ρ̃ = ρ if σ > 0 which in the context
here σ > 0 is effectively the same statement as supp(ρ) ⊆ supp(σ) since we can restrict all
computations to the support of σ)

X px h 1 1
i
qx f ⩾ Tr σf σ − 2 ρσ − 2 . (D.45)
qx
x∈[n−1]

Note that we did not include on the left-hand side of the equation above the term pn f˜(0)
since in this case (i.e. the case f˜(0) = ∞) we must have pn = 0 so the term pn f˜(0) = 0·∞ = 0
by convention.
Next, we consider the case f˜(0) < ∞. In this case we know that f has the form (D.41).
Hence, continuing from (D.44) we get

X px ˜
h X p
x
i h X p
x
i
qx f ⩾ f (0) + f (0)Tr Λx σ̃ + Tr g Λx σ̃
qx qx qx
x∈[n−1] x∈[n−1] x∈[n−1]
h 1 i
− 12
X
˜ −2
monotonicity decreasing of g → − −−−→ ⩾ f (0) + f (0) px + Tr g σ̃ ρ̃σ̃ σ̃
from (D.29) and operator

x∈[n−1]
X h 1 i
˜ −2 − 21
using again the relation
g(r)=f (r)−f (0)−f˜(0)r
→ −−−−→ = f (0) p x − Tr[ρ̃] + Tr f σ̃ ρ̃σ̃ σ̃
x∈[n−1]
(D.46)
Hence,

X px h 1 1
i
qx f + f˜(0)pn ⩾ f˜(0) (1 − Tr[ρ̃]) + Tr f σ̃ − 2 ρ̃σ̃ − 2 σ̃ . (D.47)
qx
x∈[n−1]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

912 APPENDIX D. MISCELLANY

Since the above inequality holds for any choice of 1 < n ∈ N, POVM {Λx }x∈[n−1] , and
p ∈ Prob(n) that satisfy the conditions in Lemma (D.2.1), we conclude that Df (ρ∥σ) is no
smaller than the right-hand side of the equation above. This concludes the proof.
rα −r
As an example, consider the function fα (r) = α(α−1)
which is known to be operator
convex for α ∈ (0, 2]. For this function we have
(
1 1
1
ε − 1−α
˜ ε α ε ε − 1 α(1−α)
if 0 < α < 1
fα (0) = lim+ = lim+ = (D.48)
ε→0 α(α − 1) ε→0 α(α − 1) ∞ if 1 ⩽ α ⩽ 2

For α ∈ [1, 2], unless supp(ρ) ⊆ supp(σ) we have Df (ρ∥σ) = ∞. For the case supp(ρ) ⊆
supp(σ) we have
1 h 1 1 α
1 1
i
Dfα (ρ∥σ) = Tr σ σ − 2 ρσ − 2 − σ − 2 ρσ − 2
α(α − 1)
(D.49)
1 h 1
−2 − 12
α i
= Tr σ σ ρσ −1 .
α(α − 1)

On the other hand, for the case α ∈ (0, 1), Eq. (D.42) gives

1 h 1 1 α
i 1 − Tr[ρ̃]
Dfα (ρ∥σ) = Tr σ̃ σ̃ − 2 ρ̃σ̃ − 2 − Tr[ρ̃] +
α(α − 1) α(1 − α)
(D.50)
1 h 1 1
α i
= Tr σ̃ σ̃ − 2 ρ̃σ̃ − 2 −1 .
α(α − 1)

Combining everything we conclude that for any α ∈ (0, 2] we have

 1 1 α

 Tr σ̃ σ̃− 2 ρ̃σ̃− 2 −1
Dfα (ρ∥σ) = α(α−1)
if α ∈ (0, 1) or supp(ρ) ⊆ supp(σ) (D.51)
∞ otherwise


Exercise D.2.2. Show that for α ∈ (0, 1) we have for any ρ, σ ∈ D(A)

Dfα (ρ∥σ) = lim+ Dfα (ρ∥σ + εI) (D.52)

ε→0

D.3 Smoothing with the Second Variable of Dmax

√
Lemma D.3.1. Let ρ, ω ∈ D(A), η ∈ Pos(A), and δ ∈ (0, 1). Set ε := 2δ and
suppose that ρ ⩽ η + δω. Then, there exists ρ′ ∈ Bε (ρ) such that
1
ρ′ ⩽ η. (D.53)
1−δ

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

D.3. SMOOTHING WITH THE SECOND VARIABLE OF Dmax 913

Remark. The condition in (D.53) can be written as

Dmax ρ′ p−1 η ⩽ log(p) − log(1 − δ)

(D.54)

where p := Tr[η]. Since ρ′ is ε-close to ρ we can conclude that

ε
ρ p−1 η ⩽ log(p) − log(1 − δ) .

Dmax (D.55)
1 1
Proof. Let G := η 2 (η + δω)− 2 and observe that
h 1 1
i
Tr[G∗ Gρ] = Tr (η + δω)− 2 η(η + δω)− 2 ρ
h i
− 12 − 12 (D.56)
η = η + δω − δω→ = 1 − δTr (η + δω) ω(η + δω) ρ
ρ ⩽ η + δω→ ⩾ 1 − δ .

Moreover, define ρ′ := GρG∗ /Tr[G∗ Gρ] and note that the above inequality implies that
GρG∗ ′
ρ ⩽
1−δ
1 1 1 1
η 2 (η + δω)− 2 ρ(η + δω)− 2 η 2 (D.57)
=
1−δ
η
ρ ⩽ η + δω→ ⩽ .
1−δ
It is left to show that ρ′ ∈ Bε (ρ). Using similar arguments as in (10.129) and (??) we
get that G∗ G ⩽ I A and P := 21 (G + G∗ ) ⩽ I A . Moreover, following similar arguments as
in (10.134) we get that F (ρ, ρ′ ) ⩾ Tr[ρP ]. Hence,

F (ρ, ρ′ ) ⩾ 1 − Tr[ρ(I − P )]
ρ ⩽ η + δω and I − P ⩾ 0→ ⩾ 1 − Tr[(η + δω)(I − P )]
= 1 − δ − Tr[η] + Tr[(η + δω)P ] (D.58)
1 1
by definition of P→ = 1 − δ − Tr[η] + Tr[η (η + δω) ]
2 2

1 1
(η + δω) 2 ⩾ η 2 → ⩾ 1 − δ .

Using the upper bound in (5.202) we conclude that

1 p
∥ρ − ρ′ ∥1 ⩽ 1 − F (ρ, ρ′ )2
2 p
⩽ 1 − (1 − δ)2 (D.59)
√
= 2δ − δ 2
√
⩽ 2δ = ε .

This completes the proof.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

914 APPENDIX D. MISCELLANY

The above lemma can be used to bound the smoothed max relative entropy in terms of
its following variant, defined via
(ε)
Dmax (ρ∥σ) := ′ min Dmax (ρ∥σ ′ ) ∀ ρ, σ ∈ D(A) . (D.60)
σ ∈Bε (σ)

That is, we use the brackets on ε to indicate that the smoothing is done with respect to the
second argument of Dmax .

Lemma D.3.2. Let ρ, σ ∈ D(A) be such that r := 2Dmax (ρ∥σ) < ∞. Then, for any
0 < ε < 1r √
2ε (ε)
Dmax (ρ∥σ) ⩽ Dmax (ρ∥σ) − log (1 − εr) . (D.61)

(ε′ )
Proof. Let σ ′ be such that Dmax (ρ∥σ) = Dmax (ρ∥σ ′ ). Since σ ′ ∈ Bε (σ) there exists 0 ⩽ δ ′ ⩽
ε′ and ω ′ , ω ∈ D(A) such that (cf. (5.170))

σ′ + δ′ω′ = σ + δ′ω . (D.62)

′ (ε′ )
Denote by t := 2Dmax (ρ∥σ ) = 2Dmax (ρ∥σ) . Then from its definition we have

ρ ⩽ tσ ′
⩽ t(σ ′ + δ ′ ω ′ ) (D.63)
= tσ + tδ ′ ω .
√ √
Hence, from Lemma D.3.1 there exists ρ′ ∈ Bε (ρ), with ε := 2δ ′ ⩽ 2ε′ such that
t t
ρ′ ⩽ ′
σ⩽ σ. (D.64)
1 − tδ 1 − tε′
We therefore conclude that
ε
Dmax (ρ∥σ) ⩽ Dmax (ρ′ ∥σ)
⩽ log t − log(1 − tε′ )

(ε′ )
(ε′ )
′ Dmax (ρ∥σ) (D.65)
= Dmax (ρ∥σ) − log 1 − ε 2
(ε′ )
(ρ∥σ) − log 1 − ε′ 2Dmax (ρ∥σ) .

⩽ Dmax

This completes the proof.

D.4 Two Proofs of the Classical Stein’s Lemma

Below we give a proof, which to the author’s knowledge, appears for the first time in this
book, and is based on the bounds (8.148). A more standard proof follows it.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

D.4. TWO PROOFS OF THE CLASSICAL STEIN’S LEMMA 915

Proof of Theorem 8.6.1. Suppose first that all the components of q consists of rational
T
numbers. That is, there exists k1 , . . . , km ∈ N such that q = kk1 , . . . , kkm , where k :=
k1 + · · · + km . From Theorem 4.3.2 the vector
M p p1 p2 p2 pm pm
1
r := px u(kx ) = ,..., , ,..., ,..., ,..., (D.66)
k1 k1 k2 k2 km km
x∈[m] | {z } | {z } | {z }
k1 -times k2 -times km -times

satisfies (p, q) ∼ (r, u(k) ). Without loss of generality we assume that the components of p and
q are ordered as in (4.116). Note that this is equivalent to r = r↓ . Moreover, observe that the
n
relation (p, q) ∼ (r, u(k) ) also implies that for any n ∈ N we have (p⊗n , q⊗n ) ∼ (r⊗n , u(k ) ).
We therefore get that for all n ∈ N
1 ε 1 ε n
Dmin p⊗n q⊗n = Dmin r⊗n u(k ) . (D.67)
n n
Combining this with the upper bound in (8.148) we get that

1 ε ℓn 1
Dmin p⊗n q⊗n ⩽ − log bℓn = − log n = log(k) − log(ℓn ) ,

(D.68)
n k n
n) ℓn
where bℓn := u(k (ℓn )
= kn
, and ℓn ∈ {0, 1, . . . , k n − 1} is the integer satisfying

r⊗n (ℓn )
< 1 − ε ⩽ r⊗n (ℓn +1)
. (D.69)

Similarly, from the lower bound in (8.148) we get

1 ε 1
Dmin p⊗n q⊗n ⩾ log(k) − log(ℓn + 1) .

(D.70)
n n
Hence, from the two bounds above we conclude that
1 ε 1
Dmin p⊗n q⊗n = log(k) − lim log(ℓn ) ,

lim (D.71)
n→∞ n n→∞ n

where we will see shortly that the limits above exists. It is therefore left to estimate ℓn .
For this purpose, we will estimate the sums in (D.69) using the notion of (weak) typicality.
Observe first that j in (D.69) can be expressed as a sequence xn = (x1 , . . . , xn ) ∈ [m]n (so
that the components of r⊗n can be expressed as rxn := rx1 · · · rxn ). Denote by Sℓn the set of
the ℓn sequences that correspond to the largest probabilities rxn . With these notations we
have X ↓ X
r⊗n j = rxn . (D.72)
j∈[ℓn ] xn ∈Sℓn

Let δ > 0 be arbitrary small number. From the definition of Sℓn it follows that if there
exists xn ∈ Sℓn such that rxn < 2−n(H(r)+δ) then the set Sℓn contains the set of δ-typical
sequences, Tn,δ (X). Therefore, in this case the sum above is greater than Pr (Tn,δ (X)).

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

916 APPENDIX D. MISCELLANY

However, for sufficiently large n this probability exceed 1 − ε in contradiction with (D.69).
Therefore, without loss of generality we can assume that for sufficiently large n all the
sequences xn ∈ Sℓn satisfies rxn ⩾ 2−n(H(r)+δ) . Combining this with the first bound in (D.69)
we have X
1−ε> r xn
xn ∈Sℓn (D.73)
−n(H(r)+δ)
⩾ ℓn 2 .
Hence, ℓn ⩽ (1 − ε)2n(H(r)+δ) which gives
1
lim sup log(ℓn ) ⩽ H(r) + δ . (D.74)
n→∞ n
Next, from the second bound in (D.69) we have
X
1−ε⩽ rxn
xn ∈Sℓn +1
X X
⩽ rxn + rxn (D.75)
xn ∈Sℓn +1 ∩Tn,δ (X) xn ̸∈Tn,δ (X)

⩽ (ℓn + 1)2−n(H(r)−δ) + Pr Tcn,δ (X)

where µn := Pr Tcn,δ (X) is the probability that a sequence is not δ-typical which is goes to
zero as n goes to infinity. Hence, ℓn ⩾ (1 − ε − δn )2n(H(r)−δ) so that
1
lim inf log(ℓn ) ⩾ H(r) − δ . (D.76)
n→∞ n
Since the two bounds in (D.74) and (D.76) holds for all δ > 0 they must hold also for δ = 0.
Hence,
1
lim log(ℓn ) = H(r) , (D.77)
n→∞ n

so that (D.71) gives

1 ε
Dmin p⊗n q⊗n = log(k) − H(r) = D r∥u(k) = D(p∥q) .

lim (D.78)
n→∞ n

This completes the proof for the case that q has rational components. For the general
case, let {sk }, {rk } ∈ Prob(m) ∩ Qm be two sequences of probability vectors with rational
components such that both sk → q and rk → q, and in addition
(p, sk ) ≻ (p, q) ≻ (p, rk ) . (D.79)
ε
The existence of such sequences follows from Exercise 4.3.24. Therefore, since Dmin is a
divergence we have
1 ε 1 ε 1 ε
Dmin p⊗n s⊗n ⩾ Dmin p⊗n q⊗n ⩾ Dmin p⊗n r⊗n

k k . (D.80)
n n n
Taking on all sides the limits n → ∞ followed by k → ∞ completes the proof.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

D.4. TWO PROOFS OF THE CLASSICAL STEIN’S LEMMA 917

Exercise D.4.1. Give more details for the last argument involving (D.80) and the limits
n → ∞ and k → ∞. For the limit n → ∞, consider two cases involving lim inf and lim sup
and then conclude at the end that the limit exists.
Next, we provide an alternative more traditional proof. This proof is based on the AEP
property.
Alternative proof of Theorem 8.6.1. Let ε > 0 and t ∈ [m]n be a probabilistic hypothesis
test satisfying αn (t) ⩽ ε. Observe first that any probabilistic hypothesis test satisfies
X
βn (t) = txn qxn
xn ∈[m]n
X
⩾ txn qxn (D.81)
xn ∈Rεn
X
(8.64)→ ⩾ 2−n(D(p∥q)+ε) txn pxn .
xn ∈Rεn

On the other hand, since αn (t) ⩽ ε we have

X
ε⩾1− txn pxn
xn ∈[m]n
X X
=1− t xn p xn − txn pxn
xn ∈Rεn xn ̸∈Rεn
X X (D.82)
txn ⩽ 1 −−−−→ ⩾ 1 − txn pxn − p xn
xn ∈Rεn xn ̸∈Rεn
X
=1− txn pxn − (1 − Pr(Rεn )p ) .
xn ∈Rεn

Combining the two equations above gives

βn (t) ⩾ 2−n(D(p∥q)+ε) (ε + Pr(Rεn )p ) (D.83)
p⊗n q⊗n ⩽ D(p∥q) since Pr(Rεn )p → 1 as n → ∞.
ε

This proves that lim supn→∞ Dmin
For the opposite inequality (i.e. the achievability part of the proof), let 0 < ε′ < ε, and
′
take t ∈ [m]n to be the vector whose xn -component is one if xn ∈ Rεn and zero otherwise. For
′
this choice we have αn (t) = 1 − Pr(Rεn )p ⩽ ε′ ⩽ ε for sufficiently large n (see Exercise 8.3.3).
Hence, with this choice of t we get
1
ε
p⊗n q⊗n ⩾ lim inf − log βn (t)

lim inf Dmin
n→∞ n→∞ n
1 ′
= lim inf − log Pr(Rεn )q
n→∞ n (D.84)
1 ′
(8.69)→ ⩾ lim − log 2−n(D(p∥q)−ε )
n→∞ n
= D(p∥q) − ε′ .
Since the above inequality holds for all 0 < ε′ < ε, the proof is completed.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

918 APPENDIX D. MISCELLANY

D.5 Alternative (direct) proofs of Theorem 12.6.1 and

Theorem 12.6.2
Theorem. Let ψ ∈ Pure(AB). Then, for any ε ∈ (0, 1)
1
Cost ψ AB = lim Costε ψ ⊗n = E ψ AB ,

(D.85)
n→∞ n

where E is the entropy of entanglement defined in (12.47).

Proof. For any ε ∈ (0, 1) we get from (12.101) that

Costε ψ ⊗n = minn log m : ∥p⊗n ∥(m) ⩾ 1 − ε

(D.86)
m∈[d ]

where d := SR(ψ AB ) and p ∈ Prob(d) is the Schmidt probability vector of ψ AB . The

components of p⊗n are the probabilities pxn := px1 · · · pxn , where x1 , . . . , xn ∈ [d]. We
can therefore think of xn as a sequence drawn from an i.i.d.∼ p-source. From the second
inequality of Theorem 8.1.3 we know that for any δ ∈ (0, 1),lthe number mof such sequences
AB AB
that are δ-typical cannot exceed 2n(E(ψ )+δ) . Taking m := 2n(E(ψ )+δ) we get that the
sum of the m largest components of p⊗n , i.e. ∥p⊗n ∥(m) , is greater than the probability of the
set of δ-typical sequences. For sufficiently large n this probability exceed 1 − ε. We therefore
conclude that for sufficiently large n
l m
ε ⊗n n(E(ψ AB )+δ)

Cost ψ ⩽ log 2 . (D.87)

Dividing by n and taking the limit n → ∞ we get that for all δ > 0
1
lim sup Costε ψ ⊗n ⩽ E(ψ AB ) + δ .

(D.88)
n→∞ n

Since the above inequality holds for all δ > 0 we must have for all ε ∈ (0, 1)
1
lim sup Costε ψ ⊗n ⩽ E ψ AB .

(D.89)
n→∞ n

Conversely, fix ε ∈ (0, 1) and for each n ∈ N, let mn ∈ [dn ] be the smallest integer
satisfying ∥p⊗n ∥(mn ) ⩾ 1−ε. Denote by Sn ⊂ [d]n the set of mn sequences xn with the highest
probabilities pxn . By definition, we have Costε (ψ ⊗n ) = log(mn ), ∥p⊗n ∥(m
n ) = Pr(Sn ), and
AB
|Sn | = mn . Suppose now, by contradiction, that there exists r < E ψ such that

1
Costε ψ ⊗n ⩽ r .

lim inf (D.90)
n→∞ n
This means that for any k ∈ N there exists n ⩾ k such that |Sn | = mn ⩽ 2rn . However,
from the third part of Theorem 8.1.3 (particularly Exercise 8.1.6) it follows that for any

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

D.5. ALTERNATIVE (DIRECT) PROOFS OF THEOREM 12.6.1 AND
THEOREM 12.6.2 919
δ > 0 there exists n sufficiently large such that Pr(Sn ) ⩽ δ. Taking 0 < δ < 1 − ε we get a
contradiction since by definition Pr(Sn ) = ∥p⊗n ∥(mn ) ⩾ 1 − ε. We therefore conclude that
1
Costε ψ ⊗n ⩾ E ψ AB .

lim inf (D.91)
n→∞ n
The two inequalities in (D.89) and (D.91) gives
1
Costε ψ ⊗n = E ψ AB .

lim (D.92)
n→∞ n

This concludes the proof.

Theorem. Let ψ ∈ Pure(AB). Then, for any ε ∈ (0, 1)
1
Distill ψ AB = lim Distillε ψ ⊗n = E ψ AB ,

(D.93)
n→∞ n

where E is the entropy of entanglement defined in (12.47).

Proof. Recall that for any ε ∈ (0, 1) we get from (12.94) that
$ %
k
Distillε ψ ⊗n = log min n

(D.94)
k∈{ℓn ,...,d } ∥p⊗n ∥(k) − ε

where d := SR(ψ AB ), p ∈ Prob(d) is the Schmidt probability vector of ψ AB , and ℓn is the

integer satisfying ∥p⊗n ∥(ℓn ) > ε and ∥p⊗n ∥(ℓn −1) ⩽ ε. Let δ ∈ (0, 1) and recall from the
l m
AB
proof of the previous theorem that for k = kn := 2n(E(ψ )+δ) we get that the sum of
the kn largest components of p⊗n , i.e. ∥p⊗n ∥(kn ) , is greater than the probability of the set
of δ-typical sequences. In the limit n → ∞ this probability goes to one and in particular
exceeds ε so that ℓn ⩽ kn for sufficiently large n. We therefore get from (D.94) that
$ %
k n
Distillε ψ ⊗n ⩽ log

(D.95)
∥p⊗n ∥(kn ) − ε

Hence, $ %
1 ε ⊗n
1 kn
lim sup Distill ψ ⩽ lim sup log
n→∞ n n→∞ n ∥p⊗n ∥(kn ) − ε
(D.96)

1 kn
−−−−→ 1 −−−−→ = lim sup
n→∞
p⊗n
(kn )
log
n→∞ n 1−ε
AB

=E ψ +δ .
Since the above inequality holds for all δ ∈ (0, 1) we must have
1
lim sup Distillε ψ ⊗n ⩽ E ψ AB .

(D.97)
n→∞ n

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

920 APPENDIX D. MISCELLANY

Conversely, suppose by contradiction that there exists r < Distill ψ AB such that

1
Distillε ψ ⊗n ⩽ r .

lim inf (D.98)
n→∞ n
For each n ∈ N let kn be such that
$ %
⊗n kn
Distillε ψ

= log ⊗n
(D.99)
∥p ∥(kn ) − ε

Then, from the two equations above we get that for any a ∈ N there exists n ⩾ a such that
$ %
kn
⩽ 2rn (D.100)
∥p⊗n ∥(kn ) − ε

Since ∥p⊗n ∥(kn ) ⩽ 1 it follows that also

kn kn
−1⩽ ⩽ 2rn (D.101)
1−ε 1−ε

so that
1
kn ⩽ 2n(r+ n log(1−ε)) + 1 − ε . (D.102)
′
Hence, for sufficiently large n we have kn ⩽ 2nr for some r′ < Distill ψ AB

. For each
n n
n ∈ N denote by Sn ⊂ [d] the set of kn sequences x with the highest probabilities pxn . To
summarize, we got that for any a ∈ N there exists n ⩾ a such that ∥p⊗n ∥(kn ) = Pr(Sn ), and
′
|Sn | = kn ⩽ 2nr . However, from the third part of Theorem 8.1.3 (particularly Exercise 8.1.6)
it follows that there exists n sufficiently large such that Pr(Sn ) ⩽ ε in contradiction with
the assumption that ∥p⊗n ∥(kn ) > ε. Hence, we must have

1
Distillε ψ ⊗n ⩾ Distill ψ AB .

lim inf (D.103)
n→∞ n
The proof is concluded by comparing the above inequality with (D.97).

D.6 Beyond States that are G-Regular

In Theorem 15.4.2, to demonstrate that G-equivalent states meet the criteria of (15.182),
it was essential to posit that both ψ and ϕ are G-regular. Even in the absence of this
assumption, we can still discern a relationship between the characteristic functions χψ (g)
and χϕ (g). To explore this, we introduce a subset, denoted as S ⊂ G, and given by:

S := g ∈ G : χψ (g) ̸= 0 or χϕ (g) ̸= 0 . (D.104)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

D.6. BEYOND STATES THAT ARE G-REGULAR 921

Note that the identity element belongs to S and since χψ (g −1 ) = χψ (g) we get that if g ∈ S
also g −1 ∈ S. Let H be the group generated by S; that is,

H := ⟨S⟩ := g1 · · · gn : g1 , . . . , gn ∈ S, n ∈ N . (D.105)

With these definitions we get that if ψ and ϕ are G-equivalent, then (15.182) still holds if
we replace G by H.

Theorem D.6.1. Let G be a finite or compact Lie group, ψ, ϕ ∈ Pure(A), and H be

the subgroup of G as defined in (D.105) and (D.104). If ψ and ϕ are G-equivalent
then there exists a 1-dimensional unitary representation of H, {eiθh }h∈H , such that

⟨ψ|Uh |ψ⟩ = eiθh ⟨ϕ|Uh |ϕ⟩ ∀h∈H. (D.106)

Remark. Note that if it is possible to extend the one dimensional representation {eiθh }h∈H
from H to G then in such cases (D.106) holds for all h ∈ G (recall that ⟨ψ|Ug |ψ⟩ = ⟨ϕ|Ug |ϕ⟩ =
0 for g ̸∈ H). Therefore, in such cases we get again the equivalence of Theorem 15.4.2.
However, such extensions of {eiθh }h∈H are not always exist (see Exercise D.6.1).
Proof. Following the same steps in the proof of Theorem 15.4.2 that led to (15.188), it follows
that |χφ1 (h)| = |χφ2 (h)| = 1 for all h ∈ S and in particular

⟨φ1 |Uh |φ1 ⟩ = eiθh , (D.107)

for some phases θh ∈ [0, 2π). Note that the equation above implies that for all h ∈ S

Ug |φ1 ⟩ = eiθh |φ1 ⟩ . (D.108)

Furthermore, if g, h ∈ S are such that gh ∈ S then

eiθgh |φ1 ⟩ = Ugh |φ1 ⟩ = Ug Uh |φ1 ⟩ = ei(θg +θh ) |φ1 ⟩ (D.109)

and we get
θgh = θg + θh mod 2π . (D.110)
Therefore, the set {eiθh : h ∈ S} can be completed to a 1-dimensional unitary representation
of H := ⟨S⟩. Indeed, for any element h ∈ H that is not in S, there exists n ∈ N and
g1 , . . . , gn ∈ S such that h = g1 · · · gn . For such h we define

θh := θg1 + · · · + θgn mod 2π . (D.111)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

922 APPENDIX D. MISCELLANY

so that we must have y∈[m] θky = x∈[n] θgx mod 2π. We therefore conclude that h 7→ eiθh
P P
is a 1-dimensional representation of the subgroup H of G. The proof is concluded with
the observation that for any g ∈ H that is not in S we have by the definition of S that
χψ (g) = χϕ (g) = 0. Therefore, in this case (D.106) holds trivially.
Exercise D.6.1. Consider the group SU (2) and consider its 2-element subgroup H =
{I2 , −I2 }. Show that the 1-dimensional unitary representation that takes I2 to 1 and −I2 to
−1 is incompatible with a 1-dimensional representation on SU (2). In other words, show that
there is no unitary representation of SU (2) that maps I2 to 1 and −I2 to −1.

D.7 Proof of Theorem 17.3.1

Figure D.1: Thermal operations that generate the state ωn .

Theorem. Let (ρ, γ) and (σ, γ̃) be two quasi-classical states of systems A and A′ ,
respectively. The following statements are equivalent:

1. (ρ, γ) can be converted to (σ, γ̃) by CTO.

2. (ρ, γ) can be converted to (σ, γ̃) by GPO.

Proof. Let p, q, g, g̃ be the probability vectors whose components are the diagonals of ρ, σ, γ, γ̃,
respectively. From the second statement of the theorem and (17.67) we have that (p, g) ≻
CTO
(q, g̃). To show that (p, g) −−−→ (q, g̃) we will construct a sequence of thermal operations
such that their limit maps (p, g) to (q, g̃). For convenience, we will think of X := A and
Y := A′ as two classical systems, and consider the Gibbs state of system X n Y n . This Gibbs
state can be written as
X
γ ⊗n ⊗ γ̃ ⊗n = gxn g̃yn |xn y n ⟩⟨xn y n | . (D.114)
xn ∈[m]n
y n ∈[k]n

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

D.7. PROOF OF THEOREM 17.3.1 923

We also consider the initial state

X px
ρ ⊗ γ ⊗(n−1) ⊗ γ̃ ⊗n = 1
gxn g̃yn |xn y n ⟩⟨xn y n | . (D.115)
n n
g x1
x ∈[m]
y n ∈[k]n

Our goal is to construct an energy preserving unitary (in fact a permutation) U such that
the state
ωn := TrX n Y n−1 U ρ ⊗ γ ⊗(n−1) ⊗ γ̃ ⊗(n−1) ⊗ γ̃

(D.116)
goes to σ as n goes to infinity; see Fig. D.1. We take three steps towards that goal:

Step 1: Projection to a Typical Subspace

We project the state in (D.115) to the strongly typical subspace. Specifically, let Tn (X)
1
and Tn (Y ) be the sets of all ε-strongly-typical sequences with ε = n1/3 ; i.e.,

Tn (X) := xn ∈ [m]n : ∥t(xn ) − g∥∞ ⩽ n−1/3

(D.117)
Tn (Y ) := y n ∈ [k]n : ∥t(y n ) − g̃∥∞ ⩽ n−1/3 .

Then, the projection of the initial state in D.115 to the corresponding typical subspace is
given by the sub-normalized state
n n
X px
η X Y := 1
gxn g̃yn |xn y n ⟩⟨xn y n | . (D.118)
n
gx1
x ∈Tn (X)
y n ∈Tn (Y )

From the theorem of strongly typical sequences, the state above has the following property.

Lemma D.7.1.
1 XnY n n→∞
η − ρ ⊗ γ ⊗(n−1) ⊗ γ̃ ⊗n 1
−−−→ 0 . (D.119)
2
1
Proof. First, observe that gn
g x1 x
= gxn−1 with xn−1 := (x2 , ..., xn ) so that
n n X X
Tr η X Y := px1 gxn−1 g̃yn
xn ∈Tn (X) y n ∈Tn (Y )
1/3
X (D.120)
Hoeffding’s inequality (??)→ ⩾ (1 − e−2n ) px1 gxn−1
xn ∈Tn (X)

Moreover, denoting by εn := n−1/3 and by ε′n := εn − n1 , we get from (8.129) that

′2 n→∞
X
px gxn ⩾ 1 − e−2(n−1)εn −−−→ 1 , (D.121)
xn ∈T n (X)

where the limit follows from the fact that for very large n we have (n − 1)ε′2
n ≈ n
1/3
.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

924 APPENDIX D. MISCELLANY
n n
The two equations above implies that limn→∞ Tr η X Y = 1, and therefore, from the
gentle operator lemma (see Lemma 5.5.1) it follows that (D.119) holds. This completes the
proof of the Lemma.
Setting
n n
ωn′Y := TrX n Y n−1 U η X Y

(D.122)
we get that
1 ′Y
ωn − ωnY 1 = 0 .
lim (D.123)
n→∞ 2

It is therefore sufficient to show that there exists a permutation channel U ∈ CPTP(X n Y n →

X n Y n ) such that limn→∞ ωn′ = σ.
n Y n →X n Y n
Step 2: Construction of the Unitary U X

For any s ∈ Type(n, m) and t ∈ Type(n, k) we denote by x, s, t, · the set of all sequences
(x, xn−1 , y n ) with the same x, same type s of xn , and the same type t of y n . Similarly,
·, s, t, y is used to denote the set of all components (xn , y n−1 , y) with the same y, same type
s of xn , and the same type t of y n . Fix s ∈ Type(n, m) and t ∈ Type(n, k), and denote
the cardinality of these sets by ax := x, s, t, · and by := ·, s, t, y . Observe that (see
Exercise D.7.1)
ax = sx |xn (s)||y n (t)| and by = ty |xn (s)||y n (t)| . (D.124)
Now, let R = (ry|x ) be the column stochastic matrix satisfying Rp = q and Rg = g̃. For
any x ∈ [m] and y ∈ [k] we define by induction on x
( )
X
ℓyx := min by − ℓyx′ , ry|x ax (D.125)
x′ <x
P
where for x = 1 we use the convention that x′ <1 ℓyx′ := 0. Observe that each integer
ℓyx ⩾ 0 (see Exercise D.7.1) and the sum
X X X
ℓyx ⩽ ry|x ax ⩽ ry|x ax = ax . (D.126)
y∈[k] y∈[k] y∈[k]

Therefore, for each x ∈ [m] there exists k disjoint sets {Lyx }y∈[k] with each Lyx ⊂ x, s, t, ·
and with |Lyx | = ℓyx . Now, observe also that from their definition, the integers {ℓyx } satisfy
X
ℓyx ⩽ by . (D.127)
x∈[m]

Therefore, there exists an injective map

[
f: Lyx → ·, s, t, · (D.128)
x∈[m],y∈[k]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

D.7. PROOF OF THEOREM 17.3.1 925

with the property that

f (Lyx ) ⊂ ·, s, t, y . (D.129)
We can extend f to a bijection

f : ·, s, t, · → ·, s, t, · (D.130)

Observe that the bijection f = fs,t as defined above can be defined for any s ∈ Type(n, m)
and t ∈ Type(n, k). Therefore, the set of bijections {fs,t }s,t can be used to define the
bijection map
πn (xn y n ) := fs,t (xn y n ) (D.131)
where t is the type of xn and s is the type of y n . Observe that πn is a thermal operation since
it does not change s and t (hence preserves the energy). We define the unitary (permutation)
channel U ∈ CPTP(X n Y n → X n Y n ) as

U (|xn y n ⟩⟨xn y n |) := |πn (xn y n )⟩⟨πn (xn y n )| ∀ xn ∈ [m]n , ∀ y n ∈ [k]n . (D.132)

We are now ready to estimate ωn′ with the above choice of U.

Step 3: Estimation of ωn′

Observe that the (xn , y n ) eigenvalue of ρ ⊗ γ ⊗(n−1) ⊗ γ̃ ⊗n is given by

px px −n H(s)+H(t)+D(s∥g)+D(t∥g̃)

gxn g̃yn = 2 (D.133)
gx gx

where x1 := x and we used (8.85) twice with t and s being the types of xn and y n , respectively.
n n
Denoting by cs,t := 2−n(H(s)+H(t)+D(s∥g)+D(t∥g̃)) we get from the definition of η X Y in (D.118)
that ωn′ can be expressed as
X X px X
ωn′ = cs,t TrX n Y n−1 [|πn (xn y n )⟩⟨πn (xn y n )|] . (D.134)
gx
(s,t)∈Cn x∈[m] (xn ,y n )∈⟨x,s,t,·⟩

where (recall εn := n−1/3 )

Cn := (s, t) : s ∈ Type(n, m), t ∈ Type(n, k), s ≈εn g, t ≈εn g̃ . (D.135)

Next, instead of summing over all the elements of x, s, t, · (of the third sumSabove), we
will restrict the summation only to sequences (xn , y n ) that belong to the subset y∈[k] Lyx ⊂
x, s, t, · and we will show that the remaining terms are negligible (i.e. goes to zero as n
goes to infinity). That is, we define
X X X X px
ωn′′ = cs,t TrX n Y n−1 [|πn (xn y n )⟩⟨πn (xn y n )|] , (D.136)
gx
(s,t)∈Cn x∈[m] y∈[k] (xn ,y n )∈Ls,t
yx

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

926 APPENDIX D. MISCELLANY

where we added s, t superscript to Lyx since it depends on the types s and t. We show now
that ωn′′ → σ as n → ∞, and since Tr[σ] = 1 we must have ∥ωn′ − ωn′′ ∥1 → 0 as n → ∞.
Now, observe that for (xn , y n ) ∈ Ls,t n n
yx we have that πn (x y ) ∈ ·, s, t, y , so that the last
n n
component of the sequence (x , y ) is yn = y. Therefore, we get that
X X X X px
ωn′′ = cs,t |y⟩⟨y|Y
gx
(s,t)∈Cn x∈[m] y∈[k] (xn ,y n )∈Ls,t
yx
X X X px s,t (D.137)
= cs,t ℓ |y⟩⟨y|Y .
gx yx
(s,t)∈Cn x∈[m] y∈[k]

Observe that we added explicitly the dependance of ℓyx on s and t.

Lemma D.7.2. Let {tn , sn }n∈N be a sequence of pair of types such that
(n)
(tn , sn ) ∈ Cn . Denote by ℓxy the coefficients (D.125) that corresponds to the pair of
types (sn , tn ). Then,
(n)
ℓyx
ry|x = lim (n) . (D.138)
n→∞ a
x

Proof. By definition, limn→∞ sn = g and limn→∞ tn = g̃ since (sn , tn ) ∈ Cn . Denote by

(n) (n)
{sx }x∈[m] and {ty }y∈[k] the components of sn and tn , respectively. Then, for x = 1 we get
 j k
(n)
ℓy1  t(n) ry|1 a1(n) 
g̃y

y
lim = lim min , = min , ry|1 = ry|1 , (D.139)
n→∞ a(n) n→∞  s(n) (n)
a1  g1
1 1

where the last equality follows from the fact that g̃ = Rg so that
X
g̃y = ry|x gx ⩾ ry|1 g1 . (D.140)
x∈[m]

Now, fix x ∈ [m] and suppose that the limit (D.138) (with x being replaced by x′ ) holds for
all x′ < x. We need to show that it also holds for x. Indeed,
 j k
(n) (n)
(n)  t(n) ry|x ax
P
ℓyx x′ <x ℓyx′

y
lim = lim min − ,
n→∞ a(n) n→∞  sx(n) ax
(n) (n)
ax
x

(D.141)
(n)
( )
g̃y X ℓyx′
= min − lim , ry|x .
gx x′ <x n→∞ ax(n)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

D.7. PROOF OF THEOREM 17.3.1 927

Since we assume by induction that (D.138) holds if we replace x with x′ < x we get
(n) (n) (n)
ℓyx′ ax′ ℓyx′
lim (n)
= lim (n) (n)
n→∞ ax n→∞ ax ax ′
(n) s,t
sx′ ℓyx′
= lim (n) (n)
n→∞ s x ax ′ (D.142)
(n)
gx′ ℓyx′
= lim
gx n→∞ ax(n)′
gx′
By induction→ = ry|x′
gx
Substituting this into (D.141) we get that
( )
(n)
ℓyx g̃y 1 X
lim = min − ry|x′ gx′ , ry|x = ry|x (D.143)
n→∞ a(n) gx gx x′ <x
x

where the last equality follows from

X X
g̃y = ry|w gw ⩾ ry|x′ gx′ + ry|x gx . (D.144)
w∈[m] x′ <x

This completes the proof of the limit in (D.138).

Finally, we compute the limit of ωn′′ as n goes to infinity. For this purpose, we first
observe that hs,t := cs,t |xn (s)||y n (t)| forms a probability P distribution over all pair of types
(s, t) ∈ Type(n, m)×Type(n, k) with the property that (s,t)∈Cn hs,t is the probability that a
given pair of sequences (xn , y n ) is εn -typical. Therefore, from the theorem of strong-typicality
this probability goes to one in the limit n → ∞. With this in mind, observe that
X X X ℓs,t yx
lim ωn′′ = lim hs,t px s,t |y⟩⟨y|Y
n→∞ n→∞
(s,t)∈Cn y∈[k] x∈[m]
ax
X X X
(D.138)→ = lim hs,t px ry|x |y⟩⟨y|Y
n→∞
(s,t)∈Cn y∈[k] x∈[m]
X X (D.145)
−−−−→ = ry|x px |y⟩⟨y|Y
X
lim hs,t = 1
n→∞
(s,t)∈Cn
y∈[k] x∈[m]
X
q = Rp −−−−→ = qy |y⟩⟨y|Y = σ Y .
y∈[k]

This completes the proof.

Exercise D.7.1.
1. Prove the relations in (D.124).
2. Show that the coefficients ℓxy as defined in (D.125) are non-negative.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

928 APPENDIX D. MISCELLANY

D.8 Continuity
In Section 17.6.1, we calculated the conversion distance between two athermality states
within the quasi-classical regime. Notably, in Theorem 17.6.2, we postulated that the Gibbs
states g and q′ possess positive rational components. In this section, we demonstrate that
the conversion distance is continuous. Consequently, one can utilize Theorem 17.6.2 to
approximate the conversion distance with arbitrary precision, even when g and g′ have
irrational components.
For this purpose, we fix two probability vectors pA ∈ Prob(m) and qB ∈ Prob(n) and
define for all gA ∈ Prob(m) and gB ∈ Prob(m) the function:

A B A A F B B

f g , g := T (p , g ) → − (q , g ) . (D.146)

Moreover, fix gA , g′A ∈ Prob(m) and gB , g′B ∈ Prob(n), and denote by δ := 21 gA − g′A 1
and ε := 21 gB − g′B 1 . Furthermore, let gmin
B ′B
and gmin be the smallest components of gB
and g′B , respectively. With these notations we prove the following continuity lemma.

Lemma D.8.1. Using the same notations as above,

ε
f gA , g′B − f gA , gB ⩽

B ′B
, (D.147)
min{gmin , gmin }

and if 0 < δ < 12 gmin

B
then we also have

2δ
f g′A , gB − f gA , gB ⩽ B .

(D.148)
gmin

Proof. Let q′B be optimal such that

1 B
f gA , gB = q − q′B 1
and (pA , gA ) ≻ (q′B , gB ) . (D.149)
2
From Lemma 4.3.4 there exists q′′B such that
1 ′B ε
q′B , gB ≻ q′′B , g′B q − q′′B 1 ⩽ B .

and (D.150)
2 gmin

Since pA , gA ≻ q′B , gB we also have pA , gA ≻ q′′B , g′B . Hence,

1 B
f gA , g′B ⩽ q − q′′B 1
2
1 B 1 ′B
Triangle inequality→ ⩽ q − q′B 1 + q − q′′B 1
(D.151)
2 2
ε
⩽ f gA , gB + B .

gmin

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

D.9. ALTERNATIVE PROOF OF BLACKWELL THEOREM 929

For the converse inequality, observe that by repeating the exact same lines as above, ex-
changing everywhere gB with g′B , we get
ε
f gA , gB ⩽ f gA , g′B + ′B

(D.152)
gmin
This completes the proof of the inequality (D.147).
For the proof of (D.148), as before, let q′B be optimal such that (D.149) holds. We
would like to find a vector q′′B that is close to q′B and that satisfies (pA , g′A ) ≻ (q′′B , gB ).
Since (pA , gA ) ≻ (q′B , gB ) there exists a column stochastic matrix such that EqA = q′B and
EgA = gB . Denote by g′B = Eg′A , and observe that since gA is δ-close to g′A , also gB is
δ-close to g′B (DPI under E). Moreover, by definition we have (pA , g′A ) ≻ (q′B , g′B ). Now,
from Lemma 4.3.4 there exists q′′B such that
1 ′B δ
q′B , g′B ≻ q′′B , gB q − q′′B 1 ⩽ ′B .

and (D.153)
2 gmin
Since pA , g′A ≻ q′B , g′B we also have pA , g′A ≻ q′′B , gB . Hence,

1 B
f g′A , gB ⩽ q − q′′B 1
2
1 B 1 ′B
Triangle inequality→ ⩽ q − q′B 1 + q − q′′B 1
2 2
δ
⩽ f gA , gB + ′B

gmin (D.154)
A B
δ
g′B ≈δ gB −−−−→ ⩽ f g , g + B
gmin − δ
1 B A B
2δ
δ ⩽ gmin −−− −→ ⩽ f g , g + B
.
2 gmin
The opposite inequality can be obtained using the exact same lines as above by exchanging
the roles of gA and g′A . This completes the proof of (D.148).

D.9 Alternative Proof of Blackwell Theorem

The main text presents a proof of the Blackwell theorem using the relation specified in
(4.3.2). Historically, this relation wasn’t identified, and alternative proofs for the theorem
were formulated using convex analysis techniques. To offer a comprehensive perspective,
this appendix section furnishes a more ”traditional” (albeit lengthier) proof of the Blackwell
theorem. This approach employs convex analysis, echoing the methodologies found in [57].
First, we aim to demonstrate that the relative majorization relation can be supplanted
by an inclusion relationship among convex sets. Specifically, for each k ∈ N, we denote the
following set of two-column matrices:
n o
M(p, q, k) := [Ep Eq] : E ∈ STOCH(k, n) ⊂ Rk×2 . (D.155)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

930 APPENDIX D. MISCELLANY

Observe that the set M(p, q, k) is convex and consists of all two-column matrices [r s] with
probability vectors r, s ∈ Prob(k) for which (p, q) ≻ (s, r).

Lemma D.9.1. Let n, m ∈ N, p, q ∈ Prob(n), and p′ , q′ ∈ Prob(m). The following

statements are equivalent:

1. (p, q) ≻ (p′ , q′ )

2. For all k ∈ N
M(p, q, k) ⊇ M(p′ , q′ , k) . (D.156)

Proof. Suppose first that (p, q) ≻ (p′ , q′ ) and fix k ∈ N. Then, if the two column matrix
[r′ s′ ] ∈ M(p′ , q′ , k) we get by definition that (p′ , q′ ) ≻ (r′ , s′ ) so that from the transitivity
of relative majorization we get that also (p, q) ≻ (r′ , s′ ). That is, the two column matrix
[r′ s′ ] ∈ M(p, q, k). Hence, the inclusion in (D.156) must hold.
Conversely, suppose (D.156) holds for all k ∈ N. In particular, it holds for k = m. In
this case we have
[p′ q′ ] ∈ M(p′ , q′ , m) ⊆ M(p, q, m) , (D.157)

so that there exists E ∈ STOCH(m, n) such that [p q] = [Ep Eq]; i.e. (p, q) ≻ (p′ , q′ ).

We therefore proved the equivalence between relative majorization and the inclusion
relations between the sets in (D.156). The significance of this observation is that now we can
make use of the fact that inclusion relation between compact sets is related to inequalities
between their support functions. Specifically, from Theorem A.7.1 we know that two compact
sets C1 and C2 satisfy C1 ⊆ C2 if and only if their support functions fC1 and fC2 satisfy
fC1 ⩽ fC2 everywhere in their domain. The support function of M(p, q, k), denoted by
fM(p,q,k) : Rk×2 → R, is given for any two-column matrix S = [s1 s2 ] ∈ Rk×2 by

fM(p,q,k) (S) := max Tr S T T : T ∈ M(p, q, k)

= max Tr S T [Ep Eq] : E ∈ STOCH(k, n)

T (D.158)
= max s1 Ep + sT2 Eq .
E∈STOCH(k,n)

Therefore, the lemma above in conjunction with Theorem A.7.1 implies that (p, q) ≻ (p′ , q′ )
if and only if for all k ∈ N and all S ∈ Rk×2

fM(p,q,k) (S) ⩾ fM(p′ ,q′ ,k) (S) . (D.159)

Our next goal is therefore to compute the optimization problem given in (D.158) for the
support function.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

D.9. ALTERNATIVE PROOF OF BLACKWELL THEOREM 931

Lemma D.9.2. Let n, m ∈ N, p, q ∈ Prob(n), and p′ , q′ ∈ Prob(m). For every

x ∈ [n] and y ∈ [m] denote by rx , ry ∈ R2 the vectors rx := (px , qx )T and
r′y := (p′y , qy′ )T . The following statements are equivalent:

1. (p, q) ≻ (p′ , q′ )

2. For any k ∈ N and any set of vectors v1 , . . . , vk ∈ R2+

X X
max{rx · vz } ⩾ max{r′y · vz } . (D.160)
z∈[k] z∈[k]
x∈[n] y∈[m]

Remark. The expressions appearing on the right-hand side and left-hand side of (D.160)
are precisely the support functions given in (D.159). They are known as sublinear function-
als. This lemma provides a characterization for relative majorization in terms of sublinear
functionals (see subsection A.7).

Proof. Denoting by {ez|x }x∈[n], z∈[k] the components (conditional probabilities) of E and by
{sz1 }z∈[k] and {sz2 }z∈[k] the components of s1 and s2 , we continue from (D.158)
XX
fM(p,q,k) (S) = max sz1 ez|x px + sz2 ez|x qx
E∈STOCH(k,n)
z∈[k] x∈[n]
XX
= max ez|x sz1 px + sz2 qx (D.161)
E∈STOCH(k,n)
x∈[n] z∈[k]
X
Exercise D.9.1→ = max sz1 px + sz2 qx .
z
x∈[n]

Finally, denoting the rows of S by vz := (sz1 , sz2 )T we conclude that

X
fM(p,q,k) (S) = max{rx · vz } . (D.162)
z∈[k]
x∈[n]

Combining this with (D.159) completes the proof of the equivalence between (p, q) ≻ (p′ , q′ )
and (D.160) with arbitrary vectors v1 , . . . , vk ∈ R2 . It is left to show that we can assume
that v1 , . . . , vk ∈ R2+ . Indeed, for each j ∈ [k] let vj′ := vj + (r, r)T ⩾ 0 for some sufficiently
large r > 0. In the Exercise D.9.2 below you show that the inequality in (D.160) holds with
v1 , . . . , vk if and only if it holds with v1′ , . . . , vk′ . Hence, without loss of generality we can
assume that all the vectors v1 , . . . , vk have non-negative components.

Exercise D.9.1. Explain in more details the derivation in the last line of (D.161).

Exercise D.9.2. Show that for sufficiently large r > 0, the inequality in (D.160) holds with
v1 , . . . , vk if and only if it holds with v1′ , . . . , vk′ , where for each j ∈ [k], vj′ := vj +(r, r)T ⩾ 0.

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

932 APPENDIX D. MISCELLANY

The next Lemma is a crucial simplification of the previous lemma. Specifically, we show
that it is sufficient to take k = 2 in Lemma D.9.2.

Lemma D.9.3. Using the same notations as in Lemma D.9.2, the following
statements are equivalent:

1. (p, q) ≻ (p′ , q′ ).

2. For any two vectors v1 , v2 ∈ R2+

X X
max{rx · v1 , rx · v2 } ⩾ max{r′y · v1 , r′y · v2 } . (D.163)
x∈[n] y∈[m]

3. M(p, q, 2) ⊇ M(p′ , q′ , 2).

Proof. We start by proving the equivalency of 1 and 2. From Lemma D.9.2 it is sufficient to
show that if (D.163) holds then (D.160) holds for all k ⩾ 3 (the case k = 1 is trivial). Let
v1 , . . . , vk ∈ R2 with k ⩾ 3. In order to prove the inequality in (D.160), we first observe that
the term maxz∈[k] {rx · vz } is the support function of the polytope C = Conv(v1 , . . . , vk ). We
order the set of vertices {v1 , . . . , vk } such that for any x ∈ {2, . . . , k} the vector vx − vx−1
is on the boundary of C (see Fig. D.2a). Specifically, recall that the support function of C is
given by fC (s) = maxx∈[k] s · vx for all s ∈ R2 . Therefore, the left-hand side of (D.160) can
be expressed as X X
max{rx · vz } = fC (rx ) . (D.164)
z∈[k]
x∈[n] x∈[n]

The key idea of the proof is to use the property of support functions under addition of sets
(see Theorem A.7.1). For this purpose, it would have been useful if it was possible to write
C as a sum of convex sets with each set in the sum being the convex hull of only two vectors
(so that (D.163) can be applied). While it is not possible to decompose C in this way, we
define now a set with the same support function as C, but for which such a decomposition
is possible.
We define the desired set in two steps. First, we define the set

C′ := C − R2+ := v − p : v ∈ C , p ∈ R2+ .

(D.165)

That is, the set C′ is an unbounded polyhedron consisting of all the vectors r ∈ R2 for which
there exists v ∈ C with the property that r ⩽ v. By definition, the support function of C′
equals that of C (see Exercise D.9.3). We will see now that the set C′ is a bit simpler to
work with as it contains only a few of the vertices of C (the ones that will be relevant for
the computation of the support of C).
In Fig. D.2b we a depicted the set C′ and its relation to C. Observe that the set C′ is
bounded by (1) the vertical line that passes through the vertex with the highest x-coordinate,
(2) the horizontal line that passes though the vertex with the highest y-coordinated, and

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

D.9. ALTERNATIVE PROOF OF BLACKWELL THEOREM 933

Figure D.2: (a) The polytope C. The red arrow represents the vector vx − vx−1 . (b) The polyhedron
C′ := C − R2+ is described with the blue area.

(3) the portion of the boundary of C that connects between the vertex with the highest y-
coordinate and the one with the highest x-coordinate. In particular, observe that the set of
all vertices of C′ is a subset of {v1 , . . . , vk }. For simplicity of notations, we take {v1 , . . . , vk′ }
with k ′ ⩽ k, to be the set of vertices of C′ . Moreover, observe that we can always arrange
the set {v1 , . . . , vk′ } such that for each x ∈ {2, 3, . . . , k ′ − 1} the vector vx is a “neighbour”
of vx−1 and vx+1 . That is, each vector vx − vx−1 is on the boundary of C′ and its angle
with the x-axis is in the interval [−π/2, 0] (see Fig. D.2b). Further, the angle of vx − vx−1
with the x-axis is non-increasing in x ∈ {2, . . . , k ′ } (i.e., the angle becomes closer to −π/2
as x-increases).
With the above ordering of the vertices {v1 , . . . , vk′ } we are now ready to construct the
second convex set. Denote by v0 = 0 the zero vector in R2 and for each y ∈ [k ′ ] let Ky be
the convex hull of the zero vector 0 ∈ R2 and the vector vy − vy−1 . That is, for each y ∈ [k ′ ]
we can express Ky = {t(vy − vy−1 ) : t ∈ [0, 1]}. We then define
K := K1 + · · · + Kk′
nX o
=
′
tx (vx − vx−1 ) : t = (t1 , . . . , tk′ )T ∈ [0, 1]k . (D.166)
x∈[k′ ]

Clearly, the set K is convex and its support function is given for any s ∈ R2 by
X
fK (s) = max ′ tx (vx − vx−1 ) · s . (D.167)
t∈[0,1]k
x∈[k′ ]

Now, recall that we are only interested in s ⩾ 0, so that the angle of s with the x-axis is
in the interval [0, π/2]. Therefore, the angle between s and the vector vx − vx−1 is non-
decreasing in x, and this angle is in the interval [0, π]. Therefore, there exists ℓ ∈ [k ′ ] such

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

934 APPENDIX D. MISCELLANY

that the dot product (vx − vx−1 ) · s is non-negative for all x ∈ [ℓ] and it is negative for all
x ∈ {ℓ + 1, . . . , k ′ }. Therefore, the maximum in (D.167) is obtained by taking tx = 1 for
x ∈ [ℓ] and tx = 0 for x ̸∈ [ℓ]. That is, we get that for s ⩾ 0
X
fK (s) = (vx − vx−1 ) · s = vℓ · s . (D.168)
x∈[ℓ]

On the other hand, observe that for all ℓ′ ∈ [k ′ ], by taking tx = 1 for x ∈ [ℓ′ ] and tx = 0 for
x ̸∈ [ℓ′ ], Eq. (D.167) gives
X
fK (s) ⩾ (vx − vx−1 ) · s = vℓ′ · s . (D.169)
x∈[ℓ′ ]

From the two equations above we therefore conclude that that for all s ⩾ 0 we have

fK (s) = max′ vx · s = fC′ (s) = fC (s) . (D.170)

x∈[k ]

We therefore found a convex set K that has the same support function as C on vectors in R2+ ,
and that can be expressed as K = K1 + · · · + Kk′ . From the property of support functions
under addition of sets (see Theorem A.7.1), we therefore get that for all s ∈ R2+
X
fC (s) = fK (s) = fKy (s) . (D.171)
y∈[k′ ]

Combining everything, for each k ⩾ 3 we get

X X X X X
max{rx · vz } = fC (rx ) = fK (rx ) = fKy (rx ) . (D.172)
z∈[k]
x∈[n] x∈[n] x∈[n] x∈[n] y∈[k′ ]

Now, from the definition of each Ky we get

X X
fKy (rx ) = max 0, rx · (vy − vy−1 )
x∈[n] x∈[n]
X X (D.173)
max 0, r′x · (vy − vy−1 ) = fKy (r′x )

(D.163)→ ⩾
x∈[n] x∈[n]

sincewe assume that (D.160) holds for k = 2 (and for each fixed y we can write the term
max 0, rx · (vy − vy−1 ) in the form maxj∈[2] rx · uj , where u1 := 0 and u2 := vy − vy−1 ).
Combining this with the previous equation we conclude that
X X X X
max{rx · vz } ⩾ fKy (r′x ) = fK (r′x )
z∈[k]
x∈[n] x∈[n] y∈[k′ ] x∈[n]
X X (D.174)
= fC (r′x ) = max{r′x · vz } .
z∈[k]
x∈[n] x∈[n]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

D.9. ALTERNATIVE PROOF OF BLACKWELL THEOREM 935

Hence, (D.160) holds for all k ∈ N, so that from Lemma D.9.2 we get that (p, q) ≻ (p′ , q′ ).
To prove the equivalency of the second and third statements of the lemma, recall that
due to (D.162) the condition (D.163) is equivalent to
2×2
fM(p,q,2) (S) ⩾ fM(p′ ,q′ ,2) (S) ∀ S ∈ R+ . (D.175)
As argued below (D.162), the condition (D.175) holds if and only if it holds for all S in R2×2
(not necessarily R2×2
+ ). We can therefore conclude from Theorem A.7.1 that the condition
above is equivalent to M(p, q, 2) ⊇ M(p′ , q′ , 2) so that the second and third statements of
the lemma are equivalent. This completes the proof.
Exercise D.9.3. Show that the support function of C equals the support function of C′ .
In the following exercise you simplify even further the expression given in (D.163).
Exercise D.9.4. Using the same notations as in Lemma D.9.2, show that
X X
(p, q) ≻ (p′ , q′ ) ⇐⇒ max{0, rx · v} ⩾ max{0, r′y · v} ∀ v ∈ R2 . (D.176)
x∈[n] y∈[m]

a+b |a−b|
rx = (1, 1)T .
P
Hint: Use the formula max{a, b} = 2
+ 2
in (D.163), and recall that x

Alternative Proof of Theorem 4.3.4. To see that 1 and 2 are equivalent, observe that
n o
M(p, q, 2) := [Ep Eq] : E ∈ STOCH(2, n)
  
 t·p t·q  (D.177)
=   : t ∈ [0, 1]n
 1−t·p 1−t·q 

where t ∈ [0, 1]n is the first row of E, and 1n − t is its second row (recall that E is a 2 × n
column stochastic matrix). Note that (t · p, t · q) are precisely the elements of T(p, q).
Therefore, the inclusion T(p, q) ⊇ T(p′ , q′ ) is equivalent to the inclusion M(p, q, 2) ⊇
M(p′ , q′ , 2). We already saw in Lemma D.9.3 that this latter inclusion is equivalent to
(p, q) ≻ (p′ , q′ ). Hence, we proved the equivalence of the first and second statement of the
theorem.
Finally, it is left to show the equivalence between the first and third statement of the
theorem. From Exercise (D.9.4) it follows that (p, q) ≻ (p′ , q′ ) is equivalent to the condition
that for any a, b ∈ R we have
X X
max{0, apx + bqx } ⩾ max{0, ap′y + bqy′ } (D.178)
x∈[n] y∈[m]

where we took v = (a, b)T in (D.176). Using on both sides of the equation above the fact
1

that for any r ∈ R, max{0, r} = 2 r + |r| we conclude that the equation above is equivalent
to the statement that X X
|apx + bqx | ⩾ |ap′y + bqy′ | , (D.179)
x∈[n] y∈[m]

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

936 APPENDIX D. MISCELLANY

where we removed from both sides the constant term

1X 1 X 1
(apx + bqx ) = (ap′y + bqy′ ) = (a + b) . (D.180)
2 2 2
x∈[n] y∈[m]

Finally, dividing both sides of (D.179) by a (we can assume without loss of generality that
a ̸= 0), and denoting t := −b/a gives
X X
|px − tqx | ⩾ |p′y − tqy′ | . (D.181)
x∈[n] y∈[m]

This completes the proof of the equivalence between first and third statements of the theorem.

D.10 Symmetric Purification

Let ρ ∈ D(An ) be a density matrix that is symmetric under permutations of the n-subsystems.
n
If ρA is not a pure state then it can be purified. In Exercise 2.3.32 we saw that all purifica-
n
tions of ρA are related by isometries. In this section we show that there exists a purification
n
of ρA that is itself symmetric under permutations, and satisfies in addition some useful
properties. We will denote by G : L(An ) → L(An ) the twirling map

n 1 X An An An
Gn (ω A ) := P −1 ω Pπ ∀ ω ∈ L(An ) . (D.182)
n! π∈S π
n

In this section we say that a density matrix σn ∈ L(An ) is symmetric if Gn (σn ) = σn or

n n
equivalently if σn PπA = PπA σn for all permutations π ∈ Sn .

Symmetric Uhlmann’s Theorem

√
Theorem D.10.1. Let ρ ∈ D(A), |ψ AÃ ⟩ := I ⊗ ρ ΩAÃ ⟩ be a purification of ρ, and
σn ∈ D(An ) be a symmetric density matrix. Then, there exists a symmetric
purification of σn , denoted by ϕn ∈ D(An Ãn ), that satisfies

F ρ⊗n , σn = ψ ⊗n ϕn ,

(D.183)

where F is the fidelity.

Remark. The symmetry of the state ϕn ∈ D (AÃ)n is with respect to permutations among
the n copies of AÃ. That is, the state ϕn satisfies
n n
[ϕn , PπA ⊗ PπÃ ] = 0 ∀ π ∈ Sn . (D.184)

Gilad Gour, Technion - Israel Institute of Technology, University of Calgary

Proof. From the polar decomposition there exists a unitary matrix U ∈ L(An ) such that
√ p √ p
σn ρ⊗n = U σn ρ⊗n . (D.185)

We then define
n √ n Ãn
|ϕn ⟩ := I A ⊗ σn U |ΩA ⟩. (D.186)
It is straightforward to check that this ϕn satisfies (D.183) (see Exercise D.10.1). Moreover,
note that ϕn is indeed a purification of σn . It is therefore left to show that ϕn is symmetric.
For this purpose, √
we first show that U can be taken to be symmetric. Since both matrices
√ √ ⊗n √
σn ρ and σn ρ⊗n are symmetric, from Theorem C.3.3 it follows that they can be
expressed as
√ p ⊗n M Bλ √ p ⊗n M
σn ρ = I ⊗ ηλCλ and σn ρ = I Bλ ⊗ ζλCλ , (D.187)
λ λ

where ηλCλ and ζλCλ are operators on the multiplicity space of the irrep λ. Define
√ p ⊗n √ p ⊗n −1 M −1
V := σn ρ σn ρ = I Bλ
⊗ ηλCλ ζλCλ (D.188)
λ

where all inverses are generalized inverses. Since V is a partial isometry (see Exercise D.10.2)
−1
it follows that each ηλ ζλCλ
Cλ
is a partial isometry. Therefore, we can complete each
−1
ηλCλ ζλCλ to a unitary matrix Uλ ∈ L(Cλ ). Defining
M
U := I Bλ ⊗ UλCλ (D.189)
λ

we get that U is symmetric and satisfies (D.185). Finally, since U is symmetric we get that
for all π ∈ Sn
n n n Ãn n n√ n n
PπA ⊗ PπÃ |ϕAn ⟩ = PπA ⊗ PπÃ σn U |ΩA Ã ⟩
n √ n n n
σn is symmetric→ = PπA ⊗ σn PπÃ U |ΩA Ã ⟩
n √ n n n
U is symmetric →→ = PπA ⊗ σn U PπÃ |ΩA Ã ⟩ (D.190)
n n n √ n n
ΩA Ã is symmetric→ = IπA ⊗ σn U |ΩA Ã ⟩
n Ãn
= |ϕA
n ⟩.
Hence, ϕn is symmetric.
Exercise D.10.1. Show explicitly that ϕn , as defined in (D.186) satisfies (D.183)
Exercise D.10.2. Let Λ ∈ L(A) and define V = Λ|Λ|−1 where the inverse is a generalized
inverse. Show that V is a partial isometry.

937
938
Bibliography

[1] A. Acı́n, A. Andrianov, L. Costa, E. Jané, J. I. Latorre, and R. Tarrach. Generalized

schmidt decomposition and classification of three-quantum-bit states. Phys. Rev. Lett.,
85:1560–1563, Aug 2000.

[2] J. Aczél and Z. Daróczy. On Measures of Information and their Characterizations,

volume 115 of Mathematics in Science and Engineering. Academic Press, 1975.

[3] J. Aczél, B. Forte, and C. T. Ng. Why the shannon and hartley entropies are ‘natural’.
Advances in Applied Probability, 6(1):131–146, 1974.

[4] P.M. Alberti and A. Uhlmann. A problem relating to positive linear maps on matrix
algebras. Reports on Mathematical Physics, 18(2):163 – 176, 1980.

[5] S. M. Ali and S. D. Silvey. A general class of coefficients of divergence of one distribution
from another. Royal Statistical Society, Wiley, 28, 1966.

[6] Jr. Arthur F. Veinott. Least d-majorized network flows with inventory and statistical
applications. Management Science, 17(9):547–567, 1971.

[7] G. Aubrun and S. J. Szarek. Two proofs of størmer’s theorem. 2015.

[8] K. Audenaert, M. B. Plenio, and J. Eisert. Entanglement cost under positive-partial-

transpose-preserving operations. Phys. Rev. Lett., 90:027901, Jan 2003.

[9] K. M. R. Audenaert, J. Calsamiglia, R. Muñoz Tapia, E. Bagan, Ll. Masanes, A. Acin,

and F. Verstraete. Discriminating states: The quantum chernoff bound. Phys. Rev.
Lett., 98:160501, Apr 2007.

[10] Koenraad M. R. Audenaert and Nilanjana Datta. alpha-z-rényi relative entropies.

Journal of Mathematical Physics, 56(2):022202, 2015.

[11] David Avis, Hiroshi Imai, Tsuyoshi Ito, and Yuuya Sasaki. Deriving tight bell in-
equalities for 2 parties with many 2-valued observables from facets of cut polytopes.
arXiv:0404014v3, 2004.

939
[12] Stephen D. Bartlett, Terry Rudolph, and Robert W. Spekkens. Reference frames,
superselection rules, and quantum information. Rev. Mod. Phys., 79:555–609, Apr
2007.

[13] A. Barvinok. A Course in Convexity. Graduate studies in mathematics. American

Mathematical Soc., 2002.

[14] David Beckman, Daniel Gottesman, M. A. Nielsen, and John Preskill. Causal and
localizable quantum operations. Phys. Rev. A, 64:052309, Oct 2001.

[15] Salman Beigi. Sandwiched rényi divergence satisfies data processing inequality. Journal
of Mathematical Physics, 54(12):122202, 2013.

[16] Ingemar Bengtsson and Karol Zyczkowski. Geometry of Quantum States: An Intro-
duction to Quantum Entanglement. Cambridge University Press, 2006.

[17] C. H. Bennett. Logical reversibility of computation. IBM Journal of Research and

Development, 17(6):525–532, 1973.

[18] C. H. Bennett. The thermodynamics of computation—a review. International Journal

of Theoretical Physics, 21(1572-9575):905–940, 1973.

[19] Charles H. Bennett. A resource-based view of quantum information. Quant. Inf.

Comput., 4(6):460–466, 2004.

[20] Charles H. Bennett, Gilles Brassard, Claude Crépeau, Richard Jozsa, Asher Peres, and
William K. Wootters. Teleporting an unknown quantum state via dual classical and
Einstein-Podolsky-Rosen channels. Phys. Rev. Lett., 70(13):1895–1899, Mar 1993.

[21] Charles H. Bennett, Gilles Brassard, Sandu Popescu, Benjamin Schumacher, John A.
Smolin, and William K. Wootters. Purification of noisy entanglement and faithful
teleportation via noisy channels. Phys. Rev. Lett., 76:722–725, Jan 1996.

[22] Charles H. Bennett, David P. DiVincenzo, Tal Mor, Peter W. Shor, John A. Smolin,
and Barbara M. Terhal. Unextendible product bases and bound entanglement. Phys.
Rev. Lett., 82:5385–5388, Jun 1999.

[23] Charles H. Bennett and Stephen J. Wiesner. Communication via one- and two-particle
operators on einstein-podolsky-rosen states. Phys. Rev. Lett., 69:2881–2884, Nov 1992.

[24] Mario Berta, Fernando G. S. L. Brandão, Gilad Gour, Ludovico Lami, Martin B. Ple-
nio, Bartosz Regula, and Marco Tomamichel. On a gap in the proof of the generalised
quantum stein’s lemma and its consequences for the reversibility of quantum resources.
2022.

[25] R. Bhatia. Matrix Analysis. Springer, 1997.

940
[26] Felix Binder, Luis A. Correa, Christian Gogolin, Janet Anders, and Gerardo Adesso.
Thermodynamics in the Quantum Regime. 0168-1222. Springer Nature Switzerland
AG 2018, 2018.

[27] I. Bjelaković and R. Siegmund-Schultze. Quantum stein’s lemma revisited, inequalities

for quantum entropies, and a concavity theorem of lieb. 2012.

[28] David Blackwell. Equivalent comparisons of experiments. Ann. Math. Statist.,

24(2):265–272, 06 1953.

[29] Fernando Brandão, Michal Horodecki, Nelly Ng, Jonathan Oppenheim, and Stephanie
Wehner. The second laws of quantum thermodynamics. Proceedings of the National
Academy of Sciences, 112(11):3275–3279, 2015.

[30] Fernando G. S. L. Brandão and Gilad Gour. Reversible framework for quantum re-
source theories. Phys. Rev. Lett., 115:070503, Aug 2015.

[31] Fernando G. S. L. Brandão, Michal Horodecki, Jonathan Oppenheim, Joseph M. Renes,

and Robert W. Spekkens. Resource theory of quantum states out of thermal equilib-
rium. Phys. Rev. Lett., 111:250404, Dec 2013.

[32] Fernando G. S. L. Brandão and Martin B. Plenio. A generalization of quantum stein’s

lemma. Communications in Mathematical Physics, 295(3):791–828, May 2010.

[33] Fernando G.S.L. Brandão, Matthias Christandl, and Jon Yard. A quasipolynomial-
time algorithm for the quantum separability problem. In Proceedings of the Forty-third
Annual ACM Symposium on Theory of Computing, STOC ’11, pages 343–352, New
York, NY, USA, 2011. ACM.

[34] Fernando G. S. L. Brandão and Nilanjana Datta. One-shot rates for entanglement
manipulation under non-entangling maps. IEEE Transactions on Information Theory,
57(3):1754–1760, March 2011.

[35] Fernando G. S. L. Brandão and Martin B. Plenio. Entanglement theory and the second
law of thermodynamics. Nature Physics, 4:873, 2008.

[36] Sarah Brandsen, Isabelle Jianing Geng, and Gilad Gour. What is entropy? a new
perspective from games of chance. 2021.

[37] Thomas R Bromley, Marco Cianciaruso, Sofoklis Vourekas, Bartosz Regula, and Ger-
ardo Adesso. Accessible bounds for general quantum resources. Journal of Physics A:
Mathematical and Theoretical, 51(32):325303, 2018.

[38] Nicolas Brunner, Daniel Cavalcanti, Stefano Pironio, Valerio Scarani, and Stephanie
Wehner. Bell nonlocality. Rev. Mod. Phys., 86:419–478, Apr 2014.

941
[39] Francesco Buscemi and Nilanjana Datta. Distilling entanglement from arbitrary re-
sources. Journal of Mathematical Physics, 51(10):102201, 2010.
[40] Francesco Buscemi and Gilad Gour. Quantum relative lorenz curves. Phys. Rev. A,
95:012110, Jan 2017.
[41] P. Busch. Quantum states and generalized observables: A simple proof of gleason’s
theorem. Phys. Rev. Lett., 91:120403, Sep 2003.
[42] Paul Busch. Informationally complete sets of physical quantities. International Journal
of Theoretical Physics, 30:1217–1227, September 1991.
[43] Eric A. Carlen. Trace inequalities and quantum entropy: An introductory course.
Contemporary Mathematics, 529:73–140, 2010.
[44] Kai Chen and Ling-An Wu. A matrix realignment method for recognizing entangle-
ment. Quantum Inf. Comput., 3(3):193–202, 2003.
[45] G. Chiribella, G. M. D’Ariano, and M. F. Sacchi. Optimal estimation of group trans-
formations using entanglement. Phys. Rev. A, 72:042338, Oct 2005.
[46] Giulio Chiribella. Optimal estimation of quantum signals in the presence of symmetry.
PhD thesis, University of Pavia, 2006.
[47] Eric Chitambar, Julio I. de Vicente, Mark W. Girard, and Gilad Gour. Entangle-
ment manipulation beyond local operations and classical communication. Journal of
Mathematical Physics, 61(4):042201, 2020.
[48] Eric Chitambar and Gilad Gour. Quantum resource theories. Rev. Mod. Phys.,
91:025001, Apr 2019.
[49] Eric Chitambar, Debbie Leung, Laura Mancinska, Maris Ozols, and Andreas Winter.
Everything you always wanted to know about locc (but were afraid to ask). Commu-
nications in Mathematical Physics, 328(1):303–326, May 2014.
[50] Matthias Christandl and Andreas Winter. “squashed entanglement”: An additive
entanglement measure. Journal of Mathematical Physics, 45(3):829–840, 2004.
[51] Dariusz Chruściński and Gniewomir Sarbicki. Entanglement witnesses: construc-
tion, analysis and classification. Journal of Physics A: Mathematical and Theoretical,
47(48):483001, 2014.
[52] Bob Coecke, Tobias Fritz, and Robert W. Spekkens. A mathematical theory of re-
sources. Information and Computation, 250:59 – 86, 2016. Quantum Physics and
Logic.
[53] Valerie Coffman, Joydip Kundu, and William K. Wootters. Distributed entanglement.
Phys. Rev. A, 61:052306, Apr 2000.

942
[54] Thomas M. Cover and Joy A. Thomas. Elements of Information Theory (Wiley Series
in Telecommunications and Signal Processing). Wiley-Interscience, 2006.

[55] I. Csiszár. Eine informationstheoretische ungleichung und ihre anwendung auf den
beweis der ergodizitat von markoffschen ketten. Magyar. Tud. Akad. Mat. Kutato Int.
Kozl., 8:85–108, 1963.

[56] Imre Csiszár. Axiomatic Characterizations of Information Measures. Entropy,

10(3):261–273, sep 2008.

[57] Geir Dahl. Matrix majorization. Linear Algebra and its Applications, 288:53 – 73,
1999.

[58] N. Datta. Min- and max-relative entropies and a new entanglement monotone. IEEE
Transactions on Information Theory, 55(6):2816–2826, June 2009.

[59] J. I. de Vicente, C. Spee, and B. Kraus. Maximally entangled set of multipartite

quantum states. Phys. Rev. Lett., 111:110502, Sep 2013.

[60] J. I. de Vicente, C. Spee, D. Sauerwein, and B. Kraus. Entanglement manipulation of

multipartite pure states with finite rounds of classical communication. Phys. Rev. A,
95:012323, Jan 2017.

[61] Sebastian Deffner and Steve Campbell. Quantum Thermodynamics. 2053-2571. Morgan
Claypool Publishers, 2019.

[62] I. Devetak, A. W. Harrow, and A. J. Winter. A resource framework for quantum

shannon theory. IEEE Transactions on Information Theory, 54(10):4587–4618, Oct
2008.

[63] I. Devetak and A. Winter. Distillation of secret key and entanglement from quantum
states. Proc. R. Soc. A., 461:207–235, 2005.

[64] George T. Diderrich. The role of boundedness in characterizing shannon entropy.

Information and Control, 29(2):149–161, oct 1975.

[65] David P. DiVincenzo, Christopher A. Fuchs, Hideo Mabuchi, John A. Smolin, Ashish
Thapliyal, and Armin Uhlmann. ”Entanglement of Assistance” in Quantum Com-
puting and Quantum Communications: First NASA International Conference, QCQC
’98, Palm Springs, California, USA, February 17-20, 1998, Selected Papers. Lecture
Notes in Computer Science. Springer, 1999.

[66] Andrew C. Doherty, Pablo A. Parrilo, and Federico M. Spedalieri. Complete family of
separability criteria. Phys. Rev. A, 69:022308, Feb 2004.

[67] F. DUPUIS, L. KRAMER, P. FAIST, J. M. RENES, and R. RENNER. GENERAL-

IZED ENTROPIES, pages 134–153. 2013.

943
[68] Frédéric Dupuis, Mario Berta, Jürg Wullschleger, and Renato Renner. One-shot de-
coupling. Communications in Mathematical Physics, 328:251–284, May 2014.

[69] W. Dür, G. Vidal, and J. I. Cirac. Three qubits can be entangled in two inequivalent
ways. Phys. Rev. A, 62:062314, Nov 2000.

[70] Ali Ebadian, Ismail Nikoufar, and Madjid Eshaghi Gordji. Perspectives of matrix
convex functions. Proceedings of the National Academy of Sciences, 108(18):7313–
7314, 2011.

[71] Bruce Ebanks, Prasanna Sahoo, and Wolfgang Sander. Characterizations of Informa-
tion Measures. World Scientific, apr 1998.

[72] T. Eggeling, D. Schlingemann, and R. F. Werner. Semicausal operations are semilo-

calizable. EPL (Europhysics Letters), 57(6):782, 2002.

[73] D.K. Faddeev. On the concept of entropy of a finite probability scheme (in Russian).
Uspekhi Matematicheskikh Nauk, 11:227–231, 1956.

[74] Philippe Faist, Jonathan Oppenheim, and Renato Renner. Gibbs-preserving maps
outperform thermal operations in the quantum regime. New Journal of Physics,
17(4):043003, 2015.

[75] Kun Fang, Gilad Gour, and Xin Wang. Towards the ultimate limits of quantum channel
discrimination. 2022.

[76] Kun Fang, Xin Wang, Marco Tomamichel, and Runyao Duan. Non-asymptotic en-
tanglement distillation. IEEE Transactions on Information Theory, 65(10):6454–6465,
2019.

[77] Hamza Fawzi and Omar Fawzi. Defining quantum divergences via convex optimization.
Quantum, 5:387, January 2021.

[78] Arthur Fine. Hidden variables, joint probability, and the bell inequalities. Phys. Rev.
Lett., 48:291–295, Feb 1982.

[79] Shmuel Friedland and Gilad Gour. An explicit expression for the relative entropy of
entanglement in all dimensions. Journal of Mathematical Physics, 52(5):052201, 2011.

[80] Tobias Fritz. Resource convertibility and ordered commutative monoids. Mathematical
Structures in Computer Science, 27:850–938, 2017.

[81] M. Froissart. Constructive generalization of bell’s inequalities. Il Nuovo Cimento B

(1971-1996), 64(241-251):0, 1981.

[82] W. Fulton and J. Harris. Representation Theory: A First Course. Graduate Texts in
Mathematics. Springer New York, 1991.

944
[83] Jochen Gemmer, Mathias Michel, and Gunter Mahler. Quantum Thermodynamics.
1616-6361. Springer, Berlin, Heidelberg, 2004.

[84] Mark W Girard, Gilad Gour, and Shmuel Friedland. On convex optimization problems
in quantum information theory. Journal of Physics A: Mathematical and Theoretical,
47(50):505302, 2014.

[85] Andrew Gleason. Measures on the closed subspaces of a hilbert space. Indiana Univ.
Math. J., 6:885–893, 1957.

[86] Gilad Gour. Family of concurrence monotones and its applications. Phys. Rev. A,
71:012318, Jan 2005.

[87] Gilad Gour. Entanglement of collaboration. Phys. Rev. A, 74:052307, Nov 2006.

[88] Gilad Gour. Quantum resource theories in the single-shot regime. Phys. Rev. A,
95:062314, Jun 2017.

[89] Gilad Gour. Role of quantum coherence in thermodynamics. PRX Quantum, 3:040323,
Nov 2022.

[90] Gilad Gour, Andrzej Grudka, Michal Horodecki, Waldemar Klobus, Justyna Lodyga,
and Varun Narasimhachar. Conditional uncertainty principle. Phys. Rev. A, 97:042130,
Apr 2018.

[91] Gilad Gour, David Jennings, Francesco Buscemi, Runyao Duan, and Iman Marvian.
Quantum majorization and a complete set of entropic conditions for quantum thermo-
dynamics. Nature Communications, 9(5352), 2018.

[92] Gilad Gour, Barbara Kraus, and Nolan R. Wallach. Almost all multipartite qubit
quantum states have trivial stabilizer. Journal of Mathematical Physics, 58(9), 09
2017. 092204.

[93] Gilad Gour, Iman Marvian, and Robert W. Spekkens. Measuring the quality of a
quantum reference frame: The relative entropy of frameness. Phys. Rev. A, 80:012307,
Jul 2009.

[94] Gilad Gour, Markus P. Muller, Varun Narasimhachar, Robert W. Spekkens, and
Nicole Yunger Halpern. The resource theory of informational nonequilibrium in ther-
modynamics. Physics Reports, 583:1 – 58, 2015.

[95] Gilad Gour and Carlo Maria Scandolo. Entanglement of a bipartite channel. Phys.
Rev. A, 103:062422, Jun 2021.

[96] Gilad Gour and Robert W. Spekkens. Entanglement of assistance is not a bipartite
measure nor a tripartite monotone. Phys. Rev. A, 73:062331, Jun 2006.

945
[97] Gilad Gour and Robert W Spekkens. The resource theory of quantum reference frames:
manipulations and monotones. New Journal of Physics, 10(3):033023, 2008.
[98] Gilad Gour and Marco Tomamichel. Optimal extensions of resource measures and
their applications. Phys. Rev. A, 102:062401, Dec 2020.
[99] Gilad Gour and Marco Tomamichel. Entropy and relative entropy from information-
theoretic principles. IEEE Transactions on Information Theory, 2021.
[100] Gilad Gour and Nolan R. Wallach. All maximally entangled four-qubit states. Journal
of Mathematical Physics, 51(11), 11 2010. 112201.
[101] Gilad Gour and Nolan R Wallach. Necessary and sufficient conditions for local manip-
ulation of multipartite pure quantum states. New Journal of Physics, 13(7):073013,
jul 2011.
[102] Gilad Gour and Nolan R. Wallach. Classification of multipartite entanglement of all
finite dimensionality. Phys. Rev. Lett., 111:060502, Aug 2013.
[103] Gilad Gour, Mark M. Wilde, S. Brandsen, and Isabelle J. Geng. Inevitability of
knowing less than nothing. 2022.
[104] Gilad Gour and Guo Yu. Monogamy of entanglement without inequalities. Quantum,
2:81, August 2018.
[105] Otfried Gühne and Géza Tóth. Entanglement detection. Physics Reports, 474(1):1 –
75, 2009.
[106] Yu Guo and Gilad Gour. Monogamy of the entanglement of formation. Phys. Rev. A,
99:042305, Apr 2019.
[107] Yelena Guryanova, Sandu Popescu, Anthony J. Short, Ralph Silva1, and Paul
Skrzypczyk. Thermodynamics of quantum systems with multiple conserved quanti-
ties. Nature Communications, 7:12049, 2016.
[108] Uffe Haagerup and Magdalena Musat. Factorization and dilation problems for com-
pletely positive maps on von neumann algebras. Communications in Mathematical
Physics, 303(2):555–594, 2011.
[109] Nicole Yunger Halpern, Philippe Faist, Jonathan Oppenheim, and Andreas Winter.
Microcanonical and resource-theoretic derivations of the thermal state of a quantum
system with noncommuting charges. Nature Communications, 7:12051, 2016.
[110] G.H. Hardy, J.E. Littlewood, and G Pólya. Some simple inequalities satisfied by convex
functions. Messenger of Mathematics, 58:145–152, 1929.
[111] Lucien Hardy. Quantum mechanics, local realistic theories, and lorentz-invariant real-
istic theories. Phys. Rev. Lett., 68:2981–2984, May 1992.

946
[112] Aram Harrow. Coherent communication of classical messages. Phys. Rev. Lett.,
92:097902, Mar 2004.

[113] Aram W. Harrow. The Church of the Symmetric Subspace. arXiv e-prints, page
arXiv:1308.6595, August 2013.

[114] Aram W. Harrow and Michael A. Nielsen. Robustness of quantum gates in the presence
of noise. Phys. Rev. A, 68:012308, Jul 2003.

[115] M. B. Hastings. Superadditivity of communication capacity using entangled inputs.

Nature Physics, 5:255–257, 2009.

[116] Patrick M. Hayden, Michal Horodecki, and Barbara M. Terhal. The asymptotic en-
tanglement cost of preparing a quantum state. J. Phys. A: Math. Gen., 34(35):6891,
2001.

[117] Martin Hebenstreit, Matthias Englbrecht, Cornelia Spee, Julio I. de Vicente, and Bar-
bara Kraus. Measurement outcomes that do not occur and their role in entanglement
transformations. New Journal of Physics, 23(3):033046, mar 2021.

[118] Teiko Heinosaari, Maria A. Jivulescu, David Reeb, and Michael M. Wolf. Extending
quantum operations. Journal of Mathematical Physics, 53(10):102208, 10 2012.

[119] Fumio Hiai and Milán Mosonyi. Different quantum f-divergences and the reversibility
of quantum operations. Reviews in Mathematical Physics, 29(07):1750023, 2017.

[120] FUMIO HIAI, MILAN MOSONYI, DÉNES PETZ, and CÉDRIC BÉNY. Quantum
f-divergences and error correction. Reviews in Mathematical Physics, 23(07):691–747,
2011.

[121] Fumio Hiai and Dénes Petz. The proper formula for relative entropy and its asymptotics
in quantum probability. Communications in Mathematical Physics, 143(1):99–114, Dec
1991.

[122] R.A. Horn and C.R. Johnson. Topics in Matrix Analysis. Cambridge University Press,
1999.

[123] Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press,
USA, 2nd edition, 2012.

[124] Karol Horodecki, Michal Horodecki, Pawel Horodecki, and Jonathan Oppenheim. Se-
cure key from bound entanglement. Phys. Rev. Lett., 94:160502, Apr 2005.

[125] Michal Horodecki, Karol Horodecki, Pawel Horodecki, Ryszard Horodecki, Jonathan
Oppenheim, Aditi Sen(De), and Ujjwal Sen. Local information as a resource in dis-
tributed quantum systems. Phys. Rev. Lett., 90:100402, Mar 2003.

947
[126] Michal Horodecki and Pawel Horodecki. Reduction criterion of separability and limits
for a class of distillation protocols. Phys. Rev. A, 59:4206–4216, Jun 1999.

[127] Michal Horodecki, Pawel Horodecki, and Ryszard Horodecki. Separability of mixed
states: necessary and sufficient conditions. Phys. Lett. A, 223(1–2):1–8, 1996.

[128] Michal Horodecki, Pawel Horodecki, and Ryszard Horodecki. Mixed-state entangle-
ment and distillation: is there a “bound” entanglement in nature? Phys. Rev. Lett.,
80(24):5239–5242, Jun 1998.

[129] Michal Horodecki, Pawel Horodecki, Ryszard Horodecki, Jonathan Oppenheim, Aditi
Sen(De), Ujjwal Sen, and Barbara Synak-Radtke. Local versus nonlocal information
in quantum-information theory: Formalism and phenomena. Phys. Rev. A, 71:062307,
Jun 2005.

[130] Michal Horodecki, Pawel Horodecki, and Jonathan Oppenheim. Reversible transfor-
mations from pure to mixed states and the unique measure of information. Phys. Rev.
A, 67:062104, Jun 2003.

[131] Michal Horodecki and Jonathan Oppenheim. Fundamental limitations for quantum
and nanoscale thermodynamics. Nature Communications, 4:2059, 2013.

[132] Michal Horodecki, Jonathan Oppenheim, and Carlo Sparaciari. Extremal distributions
under approximate majorization. Journal of Physics A: Mathematical and Theoretical,
51(30):305301, jun 2018.

[133] Ryszard Horodecki, PawelHorodecki, MichalHorodecki, and Karol Horodecki. Quan-

tum entanglement. Rev. Mod. Phys., 81(2):865, 2009.

[134] Everett Howe. A New Proof of Erdos’s Theorem on Monotone Multiplicative Functions.
The American Mathematical Monthly, 93(8):593–595, oct 1986.

[135] Piotr Ćwikliński, Michal Studziński, Michal Horodecki, and Jonathan Oppenheim.
Limitations on the evolution of quantum coherences: Towards fully quantum second
laws of thermodynamics. Phys. Rev. Lett., 115:210403, Nov 2015.

[136] V. Jaksic, Y. Ogata, Y. Pautrat, and C. A. Pillet. Entropic Fluctuations in Quantum

Statistical Mechanics-An Introduction. volume 95 of Quantum Theory from Small to
Large Scales: Lecture Notes of the Les Houches Summer School. Oxford University
Press, 2012.

[137] D. Janzing, P. Wocjan, R. Zeier, R. Geiss, and Th. Beth. Thermodynamic cost of
reliability and low temperatures: Tightening landauer’s principle and the second law.
International Journal of Theoretical Physics, 39(12):2717–2753, Dec 2000.

[138] E. T. Jaynes. Information theory and statistical mechanics. Phys. Rev., 106:620–630,
May 1957.

948
[139] E. T. Jaynes. Information theory and statistical mechanics. ii. Phys. Rev., 108:171–190,
Oct 1957.

[140] Harry Joe. Majorization and divergence. Journal of Mathematical Analysis and Ap-
plications, 148(2):287–305, 1990.

[141] Daniel Jonathan and Martin B. Plenio. Entanglement-Assisted Local Manipulation of

Pure Quantum States. Phys. Rev. Lett., 83(17):3566–3569, 1999.

[142] Daniel Jonathan and Martin B. Plenio. Minimal conditions for local pure-state entan-
glement manipulation. Phys. Rev. Lett., 83:1455–1458, Aug 1999.

[143] Matthew Klimesh. Inequalities that collectively completely characterize the catalytic
majorization relation. 2007.

[144] Ludovico Lami and Bartosz Regula. No second law of entanglement manipulation after
all. Nature Physics, 19(2):184–189, 2023.

[145] R. Landauer. Irreversibility and heat generation in the computing process. IBM
Journal of Research and Development, 5(3):183–191, July 1961.

[146] A. Lenard. Thermodynamical proof of the gibbs formula for elementary quantum
systems. Journal of Statistical Physics, 19(6):575–586, Dec 1978.

[147] Nicky Kai Hong Li, Cornelia Spee, Martin Hebenstreit, Julio I. de Vicente, and Barbara
Kraus. Identifying families of multipartite states with non-trivial local entanglement
transformations, 2023.

[148] Zi-Wen Liu, Xueyuan Hu, and Seth Lloyd. Resource destroying maps. Phys. Rev.
Lett., 118:060502, Feb 2017.

[149] Hoi-Kwong Lo and Sandu Popescu. Concentrating entanglement by local actions:

Beyond mean values. Phys. Rev. A, 63(2):022301, Jan 2001.

[150] Matteo Lostaglio, David Jennings, and Terry Rudolph. Thermodynamic resource the-
ories, non-commutativity and maximum entropy principles. New Journal of Physics,
19(4):043008, 2017.

[151] Matteo Lostaglio, Kamil Korzekwa, David Jennings, and Terry Rudolph. Quantum
coherence, time-translation symmetry, and thermodynamics. Phys. Rev. X, 5:021001,
Apr 2015.

[152] Albert W. Marshall, Ingram Olkin, and Barry Arnold. Inequalities: Theory of Ma-
jorization and Its Applications. Springer, 2011.

[153] Koji Maruyama, Franco Nori, and Vlatko Vedral. Colloquium: The physics of
maxwell’s demon and information. Rev. Mod. Phys., 81:1–23, Jan 2009.

949
[154] Iman Marvian. Symmetry, Asymmetry and Quantum Information. PhD thesis, Uni-
versity of Waterloo, 2012.

[155] Iman Marvian. Operational interpretation of quantum fisher information in quantum

thermodynamics. Phys. Rev. Lett., 129:190502, Oct 2022.

[156] Iman Marvian and Robert W Spekkens. The theory of manipulations of pure state
asymmetry: I. basic tools, equivalence classes and single copy transformations. New
Journal of Physics, 15(3):033001, 2013.

[157] Iman Marvian and Robert W Spekkens. Extending noether’s theorem by quantifying
the asymmetry of quantum states. Nature Communications, 5:3821, May 2014.

[158] Iman Marvian and Robert W. Spekkens. Modes of asymmetry: The application of
harmonic analysis to symmetric quantum dynamics and quantum reference frames.
Phys. Rev. A, 90:062110, Dec 2014.

[159] Keiji Matsumoto. Reverse test and characterization of quantum relative entropy. 2010.

[160] Keiji Matsumoto. A new quantum version of f-divergence. 2018.

[161] A. Messiah. Quantum mechanics, volume 1. North Holland,, 1967.

[162] Adam Miranowicz and Satoshi Ishizaka. Closed formula for the relative entropy of
entanglement. Phys. Rev. A, 78:032310, Sep 2008.

[163] Akimasa Miyake. Classification of multipartite entangled states by multidimensional

determinants. Phys. Rev. A, 67:012108, Jan 2003.

[164] Tetsuzo Morimoto. Markov processes and the h-theorem. Journal of the Physical
Society of Japan, 18(3):328–331, 1963.

[165] Xiaosheng Mu, Luciano Pomatto, Philipp Strack, and Omer Tamuz. From blackwell
dominance in large samples to rényi divergences and back again. ECONOMETRICA,
89(1):475–506, Jan 2021.

[166] R. F. Muirhead. Some methods applicable to identities and inequalities of symmetric

algebraic functions of n letters. Proceedings of the Edinburgh Mathematical Society,
21:144–162, 1902.

[167] Martin Müller-Lennert, Frédéric Dupuis, Oleg Szehr, Serge Fehr, and Marco
Tomamichel. On quantum rényi entropies: A new generalization and some proper-
ties. Journal of Mathematical Physics, 54(12):122203, 2013.

[168] Varun Narasimhachar and Gilad Gour. Low-temperature thermodynamics with quan-
tum coherence. Nature Communications, 6(1):7689, 2015.

950
[169] M. A. Nielsen. Conditions for a class of entanglement transformations. Phys. Rev.
Lett., 83:436–439, Jul 1999.

[170] Michael A. Nielsen and Isaac L. Chuang. Quantum Computation and Quantum Infor-
mation. Cambridge University Press, 2000.

[171] Ismail Nikoufar, Ali Ebadian, and Madjid Eshaghi Gordji. The simplest proof of lieb
concavity theorem. Advances in Mathematics, 248:531–533, 2013.

[172] Michael Nussbaum and Arleta Szkola. The Chernoff lower bound for symmetric quan-
tum hypothesis testing. The Annals of Statistics, 37(2):1040 – 1057, 2009.

[173] T. Ogawa and H. Nagaoka. Strong converse and stein’s lemma in quantum hypothesis
testing. IEEE Transactions on Information Theory, 46(7):2428–2433, 2000.

[174] Jonathan Oppenheim, Michal Horodecki, Pawel Horodecki, and Ryszard Horodecki.
Thermodynamical approach to quantifying quantum correlations. Phys. Rev. Lett.,
89:180402, Oct 2002.

[175] A. Ostrowski. Sur quelques applications des fonctions convexes et concaves au sens de
i. schur. J. Math. Pures Appl., 31:253–292, 1952.

[176] Vern Paulsen. Completely Bounded Maps and Operator Algebras. Cambridge Studies
in Advanced Mathematics. Cambridge University Press, 2003.

[177] Asher Peres. Separability criterion for density matrices. Phys. Rev. Lett., 77:1413–1415,
Aug 1996.

[178] Dénes Petz. Quasi-entropies for states of a von neumann algebra. European Mathe-
matical Society Publishing House, 21(4):787–800, 1985.

[179] M. Piani, M. Horodecki, P. Horodecki, and R. Horodecki. Properties of quantum

nonsignaling boxes. Phys. Rev. A, 74:012305, Jul 2006.

[180] M. B. Plenio. Logarithmic negativity: A full entanglement monotone that is not

convex. Phys. Rev. Lett., 95:090503, Aug 2005.

[181] Martin B. Plenio and Shashank Virmani. An introduction to entanglement measures.

Quant. Inf. Comput., 7(1&2):1–51, Jan 2007.

[182] John Preskill. Lecture Notes for Physics 229:Quantum Information and Computation.
CreateSpace Independent Publishing Platform, 2015.

[183] E. Prugovecki. Information-theoretical aspects of quantum measurement. International

Journal of Theoretical Physics, 16:321–331, May 1977.

[184] W. Pusz and S. L. Woronowicz. Passive states and kms states for general quantum
systems. Communications in Mathematical Physics, 58(3):273–290, Oct 1978.

951
[185] E. M. Rains. Bound on distillable entanglement. Phys. Rev. A, 60:179–184, Jul 1999.

[186] Alexey E Rastegin. Notes on general SIC-POVMs. Physica Scripta, 89(8):085101, jun
2014.

[187] Bartosz Regula. Convex geometry of quantum resource quantification. Journal of

Physics A: Mathematical and Theoretical, 51(4):045303, 2018.

[188] Bartosz Regula, Kun Fang, Xin Wang, and Mile Gu. 21(10):103017, oct 2019.

[189] Joseph M. Renes. Relative submajorization and its use in quantum resource theories.
Journal of Mathematical Physics, 57(12):122202, 2016.

[190] Alfréd Rényi. On measures of entropy and information. The 4th Berkeley Symposium
on Mathematics, Statistics and Probability, 1960, pages 547–561, 1961.

[191] Arnau Riera, Christian Gogolin, and Jens Eisert. Thermalization in nature and on a
quantum computer. Phys. Rev. Lett., 108:080402, Feb 2012.

[192] Roberto Rubboli and Marco Tomamichel. New additivity properties of the relative
entropy of entanglement and its generalizations. 2022.

[193] Ernst Ruch, Rudolf Schranner, and Thomas H. Seligman. The mixing distance. The
Journal of Chemical Physics, 69(1):386–392, 1978.

[194] Oliver Rudolph. Some properties of the computable cross-norm criterion for separa-
bility. Phys. Rev. A, 67:032312, Mar 2003.

[195] Jun John Sakurai. Modern quantum mechanics; rev. ed. Addison-Wesley, Reading,
MA, 1994.

[196] David Sauerwein, Nolan R. Wallach, Gilad Gour, and Barbara Kraus. Transformations
among pure multipartite entangled states via local operations are almost never possible.
Phys. Rev. X, 8:031020, Jul 2018.

[197] Valerio Scarani, Sofyan Iblisdir, Nicolas Gisin, and Antonio Acı́n. Quantum cloning.
Rev. Mod. Phys., 77:1225–1256, Nov 2005.

[198] Maximilian Schlosshauer. Decoherence, the measurement problem, and interpretations

of quantum mechanics. Rev. Mod. Phys., 76:1267–1305, Feb 2005.

[199] I. Schur. Uber einc klasse von mittelbildungen mit anwendungen auf die determinanten-
theorie. Sitzungsberichte der Berliner Mathematischen Gesellschaft, 22:9–20, 1923.

[200] C. Shannon. A Mathematical Theory of Communication. Bell System Technical Jour-

nal, 27:379–423, 1948.

952
[201] John A. Smolin, Frank Verstraete, and Andreas Winter. Entanglement of assistance
and multipartite state distillation. Phys. Rev. A, 72:052317, Nov 2005.
[202] Carlo Sparaciari, Jonathan Oppenheim, and Tobias Fritz. Resource theory for work
and heat. Phys. Rev. A, 96:052112, Nov 2017.
[203] C. Spee, J. I. de Vicente, and B. Kraus. The maximally entangled set of 4-qubit states.
Journal of Mathematical Physics, 57(5), 05 2016. 052201.
[204] C. Spee, J. I. de Vicente, D. Sauerwein, and B. Kraus. Entangled pure state transforma-
tions via local operations assisted by finitely many rounds of classical communication.
Phys. Rev. Lett., 118:040503, Jan 2017.
[205] Erling Størmer. Positive linear maps of operator algebras. Acta Mathematica,
110(none):233 – 278, 1963.
[206] Khatri Sumeet and Mark M. Wilde. Principles of quantum communication theory: A
modern approach. 2021.
[207] Barbara M. Terhal and Pawel Horodecki. Schmidt number for density matrices. Phys.
Rev. A, 61:040301, Mar 2000.
[208] M. Tomamichel. Quantum Information Processing with Finite Resources: Mathemati-
cal Foundations. SpringerBriefs in Mathematical Physics. Springer International Pub-
lishing, 2015.
[209] Marco Tomamichel, Mario Berta, and Masahito Hayashi. Relating different quantum
generalizations of the conditional rényi entropy. Journal of Mathematical Physics,
55(8):082206, 2014.
[210] Marco Tomamichel, Roger Colbeck, and Renato Renner. A fully quantum asymptotic
equipartition property. IEEE Transactions on Information Theory, 55(12):5840–5847,
2009.
[211] Robert R. Tucci. Relaxation method for calculating quantum entanglement. 2001.
[212] S. Turgut. Catalytic transformations for bipartite pure states. Journal of Physics A:
Mathematical and Theoretical, 40(40):12185, 2007.
[213] J. A. Vaccaro, F. Anselmi, H. M. Wiseman, and K. Jacobs. Tradeoff between ex-
tractable mechanical work, accessible entanglement, and ability to act as a reference
system, under arbitrary superselection rules. Phys. Rev. A, 77:032114, Mar 2008.
[214] Wim van Dam and Patrick Hayden. Universal entanglement transformations without
communication. Phys. Rev. A, 67:060302, Jun 2003.
[215] Tim van Erven and Peter Harremos. Rényi divergence and kullback-leibler divergence.
IEEE Transactions on Information Theory, 60(7):3797–3820, 2014.

953
[216] V. Vedral. The role of relative entropy in quantum information theory. Rev. Mod.
Phys., 74:197–234, Mar 2002.

[217] V. Vedral and M. B. Plenio. Entanglement measures and purification procedures. Phys.
Rev. A, 57(3):1619–1633, Mar 1998.

[218] F. Verstraete, J. Dehaene, B. De Moor, and H. Verschelde. Four qubits can be entangled
in nine different ways. Phys. Rev. A, 65:052112, Apr 2002.

[219] F. Verstraete, M. Popp, and J. I. Cirac. Entanglement versus correlations in spin

systems. Phys. Rev. Lett., 92:027901, Jan 2004.

[220] Frank Verstraete, Jeroen Dehaene, and Bart De Moor. Normal forms and entanglement
measures for multipartite quantum states. Phys. Rev. A, 68:012103, Jul 2003.

[221] G. Vidal, W. Dür, and J. I. Cirac. Entanglement cost of bipartite mixed states. Phys.
Rev. Lett., 89:027901, Jun 2002.

[222] G. Vidal and R. F. Werner. Computable measure of entanglement. Phys. Rev. A,

65:032314, Feb 2002.

[223] Guifré Vidal. Entanglement of pure states for a single copy. Phys. Rev. Lett., 83:1046–
1049, Aug 1999.

[224] Guifre Vidal. Entanglement monotones. J. Mod. Opt., 47:355, 2000.

[225] Guifré Vidal and Rolf Tarrach. Robustness of entanglement. Phys. Rev. A, 59:141–155,
Jan 1999.

[226] Nolan R. Wallach. Lectures on quantum computing, venice c.i.m.e., june 2004 (un-
published). 2004.

[227] Nolan R. Wallach. Geometric Invariant Theory. Springer Cham, sep 2017.

[228] Ligong Wang and Renato Renner. One-shot classical-quantum capacity and hypothesis
testing. Phys. Rev. Lett., 108:200501, May 2012.

[229] Xin Wang and Mark M. Wilde. Cost of quantum entanglement simplified. Phys. Rev.
Lett., 125:040502, Jul 2020.

[230] John Watrous. The Theory of Quantum Information. Cambridge University Press,
2018.

[231] Reinhard F. Werner. Quantum states with einstein-podolsky-rosen correlations admit-

ting a hidden-variable model. Phys. Rev. A, 40:4277–4281, Oct 1989.

[232] Mark M. Wilde. Quantum Information Theory. Cambridge University Press, second
edition, 2017.

954
[233] Mark M. Wilde, Andreas Winter, and Dong Yang. Strong converse for the classical
capacity of entanglement-breaking and hadamard channels via a sandwiched rényi
relative entropy. Communications in Mathematical Physics, 331(2):593–622, Oct 2014.

[234] Andreas Winter. Tight uniform continuity bounds for quantum entropies: Condi-
tional entropy, relative entropy distance and energy constraints. Communications in
Mathematical Physics, 347(1):291–313, Oct 2016.

[235] William K. Wootters. Entanglement of formation of an arbitrary state of two qubits.

Phys. Rev. Lett., 80:2245–2248, Mar 1998.

[236] S.L. Woronowicz. Positive maps of low dimensional matrix algebras. Reports on
Mathematical Physics, 10(2):165–183, 1976.

[237] Nicole Yunger Halpern. Beyond heat baths ii: framework for generalized thermo-
dynamic resource theories. Journal of Physics A: Mathematical and Theoretical,
51(9):094001, 2018.

[238] Nicole Yunger Halpern and Joseph M. Renes. Beyond heat baths: Generalized resource
theories for small-scale thermodynamics. Phys. Rev. E, 93:022126, Feb 2016.

[239] Elia Zanoni, Thomas Theurer, and Gilad Gour. Complete characterization of entan-
glement embezzlement. 2023.

[240] Li-Jun Zhao and Lin Chen. Additivity of entanglement of formation via an
entanglement-breaking space. Phys. Rev. A, 99:032310, Mar 2019.

955
956

Quantum Resource Theories
100% (1)
Quantum Resource Theories
835 pages
Intrusion Detection Honeypots
From Everand
Intrusion Detection Honeypots
Chris Sanders
3/5 (2)
Sparrow C 2018 PHD Thesis
No ratings yet
Sparrow C 2018 PHD Thesis
239 pages
Quantum Information Theory (Lecture Notes)
No ratings yet
Quantum Information Theory (Lecture Notes)
101 pages
Bassano Vacchini PDF
No ratings yet
Bassano Vacchini PDF
151 pages
Quantum Channels & Operations: Michael M. Wolf July 5, 2012
No ratings yet
Quantum Channels & Operations: Michael M. Wolf July 5, 2012
170 pages
QI Notes201123
No ratings yet
QI Notes201123
709 pages
Renes Lecture Notes14 PDF
No ratings yet
Renes Lecture Notes14 PDF
187 pages
Course On Quantum Computing
No ratings yet
Course On Quantum Computing
235 pages
An Undergraduate Course On Quantum Computing Peter Young
No ratings yet
An Undergraduate Course On Quantum Computing Peter Young
233 pages
Fundamentals of Quantum Information Theory: Michael Keyl
No ratings yet
Fundamentals of Quantum Information Theory: Michael Keyl
120 pages
Aqm Lecture Notes 77
No ratings yet
Aqm Lecture Notes 77
206 pages
Qit 18
No ratings yet
Qit 18
141 pages
Mathematical Introduction To Quantum Information Processing: Michael M. Wolf Mar 2023
No ratings yet
Mathematical Introduction To Quantum Information Processing: Michael M. Wolf Mar 2023
150 pages
Qclec
No ratings yet
Qclec
260 pages
QI Lecture Notes
No ratings yet
QI Lecture Notes
83 pages
Intro To Quantum Computing - Aaronson
No ratings yet
Intro To Quantum Computing - Aaronson
259 pages
Physics 160 Notes
No ratings yet
Physics 160 Notes
73 pages
Lecture Notes v2 18
No ratings yet
Lecture Notes v2 18
149 pages
Lectures Notes On Quantum Computing and Quantum Information
No ratings yet
Lectures Notes On Quantum Computing and Quantum Information
187 pages
Script QI
No ratings yet
Script QI
199 pages
All Lectures
No ratings yet
All Lectures
173 pages
The Mathematics of Entanglement: Summer School at Universidad de Los Andes
No ratings yet
The Mathematics of Entanglement: Summer School at Universidad de Los Andes
52 pages
The Mathematics of Entanglement: Summer School at Universidad de Los Andes
No ratings yet
The Mathematics of Entanglement: Summer School at Universidad de Los Andes
70 pages
Stochastic Mechanics
No ratings yet
Stochastic Mechanics
113 pages
Quantum Communication
No ratings yet
Quantum Communication
1,240 pages
Barak Shoshany PHYS 4P51 Lecture Notes
No ratings yet
Barak Shoshany PHYS 4P51 Lecture Notes
180 pages
Solutions QM
No ratings yet
Solutions QM
62 pages
Ed
No ratings yet
Ed
300 pages
The Temple of Quantum Computing
No ratings yet
The Temple of Quantum Computing
251 pages
Lectures On Quantum Computation, Quantum Error Correcting Codes and Information Theory
No ratings yet
Lectures On Quantum Computation, Quantum Error Correcting Codes and Information Theory
132 pages
ACaticha-Entropic Physics Book-July 2022
No ratings yet
ACaticha-Entropic Physics Book-July 2022
364 pages
IQC Masterfile
No ratings yet
IQC Masterfile
117 pages
Statistical Thermodynamics: Advanced Physical Chemistry
No ratings yet
Statistical Thermodynamics: Advanced Physical Chemistry
97 pages
Termodinámica Estadística
No ratings yet
Termodinámica Estadística
97 pages
MasterThesis Torbjörn Nilsson PDF
100% (1)
MasterThesis Torbjörn Nilsson PDF
353 pages
Valerio Scarani, Lynn Chua, Shi Yang Liu - Six Quantum Pieces - A First Course in Quantum Physics-World Scientific Publishing Company (2010)
No ratings yet
Valerio Scarani, Lynn Chua, Shi Yang Liu - Six Quantum Pieces - A First Course in Quantum Physics-World Scientific Publishing Company (2010)
158 pages
Theoretical Statistical Physics: Prof. Dr. Christof Wetterich Institute For Theoretical Physics Heidelberg University
No ratings yet
Theoretical Statistical Physics: Prof. Dr. Christof Wetterich Institute For Theoretical Physics Heidelberg University
175 pages
Theoretical Statistical Physics: Prof. Dr. Christof Wetterich Institute For Theoretical Physics Heidelberg University
No ratings yet
Theoretical Statistical Physics: Prof. Dr. Christof Wetterich Institute For Theoretical Physics Heidelberg University
175 pages
The Temple of Quantum Computing
No ratings yet
The Temple of Quantum Computing
250 pages
Barbados 2016
No ratings yet
Barbados 2016
111 pages
Qcqi Seminarquantum Computations
No ratings yet
Qcqi Seminarquantum Computations
100 pages
Barak Shoshany PHY 256 Lecture Notes
100% (1)
Barak Shoshany PHY 256 Lecture Notes
167 pages
Bjorn Malte Schafer-Theoretical Statistical Physics
No ratings yet
Bjorn Malte Schafer-Theoretical Statistical Physics
104 pages
Quantum Computing and Information
100% (2)
Quantum Computing and Information
182 pages
Statp 2223
No ratings yet
Statp 2223
104 pages
Aqm-David Gross
No ratings yet
Aqm-David Gross
128 pages
Simple Advanced
No ratings yet
Simple Advanced
63 pages
Tfy-3.365 Statistical Physics and Thermodynamics (Spring 2004)
No ratings yet
Tfy-3.365 Statistical Physics and Thermodynamics (Spring 2004)
3 pages
Preview-9781009369749 A49541815
No ratings yet
Preview-9781009369749 A49541815
53 pages
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Kellory the Warlock
From Everand
Kellory the Warlock
Lin Carter
No ratings yet
Osama the Gun
From Everand
Osama the Gun
Norman Spinrad
5/5 (1)
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
A Discourse Analysis of 1 Peter
From Everand
A Discourse Analysis of 1 Peter
Ervin Ray Starwalt
No ratings yet
Time-dependent Behaviour and Design of Composite Steel-concrete Structures
From Everand
Time-dependent Behaviour and Design of Composite Steel-concrete Structures
Massimiliano Bocciarelli
No ratings yet
Keys to Better Reading
From Everand
Keys to Better Reading
Judy McFall
No ratings yet
The First Science Fiction Novel MEGAPACK®: 6 Great Science Fiction Novels
From Everand
The First Science Fiction Novel MEGAPACK®: 6 Great Science Fiction Novels
John Gregory Betancourt
No ratings yet
Options Trading for Income: Learn the strategies and techniques for maximizing returns and minimizing risk in the options market (2023 Guide for Beginners)
From Everand
Options Trading for Income: Learn the strategies and techniques for maximizing returns and minimizing risk in the options market (2023 Guide for Beginners)
Lane Conner
No ratings yet
Human Nature Potential in Nurture
From Everand
Human Nature Potential in Nurture
David L. Hawk
No ratings yet
Resources of The Quantum World: A Modern Textbook On Quantum Resource Theories
No ratings yet
Resources of The Quantum World: A Modern Textbook On Quantum Resource Theories
958 pages
Chapter I: Qubits, Operators and Quantum Gates
No ratings yet
Chapter I: Qubits, Operators and Quantum Gates
18 pages
Entropy: Entropic Uncertainty Relations Via Direct-Sum Majorization Relation For Generalized Measurements
No ratings yet
Entropy: Entropic Uncertainty Relations Via Direct-Sum Majorization Relation For Generalized Measurements
14 pages
Preskill Linblad Ops Chap3
No ratings yet
Preskill Linblad Ops Chap3
62 pages
Gleason's Theorem: Helena Granstr Om August 31, 2006
No ratings yet
Gleason's Theorem: Helena Granstr Om August 31, 2006
54 pages
LN Week1
No ratings yet
LN Week1
24 pages
Emily Adlam - Foundations of Quantum Mechanics (Elements in The Philosophy of Physics) - Cambridge University Press (2021)
100% (1)
Emily Adlam - Foundations of Quantum Mechanics (Elements in The Philosophy of Physics) - Cambridge University Press (2021)
84 pages
Resource Theory of Quantum Coherence With Probabilistically Nondistinguishable Pointers and Corresponding Wave-Particle Duality (2020)
No ratings yet
Resource Theory of Quantum Coherence With Probabilistically Nondistinguishable Pointers and Corresponding Wave-Particle Duality (2020)
10 pages