0% found this document useful (0 votes)
44 views

Lecture 9

This document provides lecture notes on characterization, certification, and validation of quantum systems. It discusses how as quantum technologies advance, methods are needed to verify that quantum devices and simulations are working properly. The document outlines topics like quantum state tomography to determine a quantum state, certification methods to verify the correctness of a state, and process tomography to characterize quantum dynamics. It also discusses randomized benchmarking techniques to validate gate operations and estimates of noise. The goal is to introduce methods for characterizing, validating, and verifying quantum systems as their development progresses.

Uploaded by

henry5
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

Lecture 9

This document provides lecture notes on characterization, certification, and validation of quantum systems. It discusses how as quantum technologies advance, methods are needed to verify that quantum devices and simulations are working properly. The document outlines topics like quantum state tomography to determine a quantum state, certification methods to verify the correctness of a state, and process tomography to characterize quantum dynamics. It also discusses randomized benchmarking techniques to validate gate operations and estimates of noise. The goal is to introduce methods for characterizing, validating, and verifying quantum systems as their development progresses.

Uploaded by

henry5
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 87

Lecture notes:

Characterization, certification, and


validation of quantum systems
Martin Kliesch
Tex document compiled on April 30, 2020

Quantum simulations and quantum computing are among the most ex-
citing applications of quantum mechanics. More generally, in the quantum
technology research field one aims to develop new devices using quantum
superposition and entanglement. In a popular wording, these anticipated
developments will lead to the second quantum revolution.
A main milestone is the use of quantum capabilities to solve a (compu-
tational) problem that cannot practically be solved otherwise. Theoretical
proposals include integer factoring (Shor’s algorithm), speed-ups for op-
timization and machine learning algorithms, the simulation of complex
quantum systems, and certain sampling experiments specifically tailored
to that milestone.
But if one cannot obtain the output of a quantum simulation or com-
putation by conventional means how can one make sure that the outcome
is correct? The output of integer factorization can efficiently be checked
but, for instance, for the estimation of energies in quantum many-body
systems, or outcomes of dynamical simulations, the situation is much less
clear. Hence, for the development of trusted quantum technologies special
characterization and verification techniques are urgently required.
This course gives an introduction to the research field, to the problems
of characterization, validation, and verification, and first ways to solve
them. More specifically, quantum state tomography, quantum states cer-
tification, quantum process tomography, and randomized benchmarking
will be covered. In particular, the course provides an overview of the lat-
est developments in this still young and very active research field. The
approaches of the course are mainly of conceptual and mathematical na-
ture.

Quantum Information Characterization


Q. many-body physics
Verification
Applied math
Signal processing Validation

1
Contents
Contents 2

1. Introduction 4
1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2. Preliminaries 6
2.1. Notation and math . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2. Representation theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3. Random variables and tail bounds . . . . . . . . . . . . . . . . . . . . 13
2.4. Monte Carlo integration . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.1. Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5. Basic convex optimization problems . . . . . . . . . . . . . . . . . . . 16
2.5.1. Linear programs (LPs) . . . . . . . . . . . . . . . . . . . . . . . 16
2.5.2. Semidefinite programs (SDPs) . . . . . . . . . . . . . . . . . . 16
2.6. Quantum mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

I. Quantum states 20
3. Quantum state tomography (QST) 20
3.1. Informational completeness of measurements . . . . . . . . . . . . . . 21
3.2. Least squares fitting and linear inversion . . . . . . . . . . . . . . . . . 25
3.3. Frame theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4. Complex spherical/projective k-designs . . . . . . . . . . . . . . . . . . 28
3.4.1. Examples for 2-designs . . . . . . . . . . . . . . . . . . . . . . . 30
3.5. Symmetric measurements and the depolarizing channel . . . . . . . . . 31
3.5.1. 2-design based POVMS . . . . . . . . . . . . . . . . . . . . . . 32
3.5.2. Pauli measurements . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6. Compressed sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.6.1. Application to quantum state tomography . . . . . . . . . . . . 38
3.7. Projected least squares estimation . . . . . . . . . . . . . . . . . . . . 39
3.7.1. Proof of Theorem 3.21 for 2-design based POVMs . . . . . . . 40
3.8. Lower bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.9. Maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . . . 45
3.10. Confidence regions (additional information) . . . . . . . . . . . . . . . 46
3.11. Other methods (additional information) . . . . . . . . . . . . . . . . . 46

4. Quantum state certification 46


4.1. Direct fidelity estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2. Direct quantum state certification . . . . . . . . . . . . . . . . . . . . 54
4.3. Other works (additional information) . . . . . . . . . . . . . . . . . . . 62

II. Quantum dynamics 63


5. Preliminaries II 63
5.1. Quantum processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.2. Tensor networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.3. The Choi-Jamiołkowski isomorphism . . . . . . . . . . . . . . . . . . . 65
5.4. Inner products of superoperators and the χ process matrix . . . . . . 67
5.5. The diamond norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

2
5.6. Unitary k-designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6. Randomized benchmarking 72
6.1. The average gate fidelity . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.2. The standard RB protocol . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.3. Interleaved randomized benchmarking . . . . . . . . . . . . . . . . . . 76

7. Process tomography 78
7.1. Randomized benchmarking tomography . . . . . . . . . . . . . . . . . 78

8. Gate set tomography (additional information) 79

Bibliography 80

3
1. Introduction
Everything not explicitly covered in the lecture is written in gray and sometimes
indicated as “auxiliary information”.

1.1. Motivation
A central ultimate goal in quantum science is the development of a scalable universal
quantum computer allowing to run quantum algorithms. However, real quantum
systems are difficult to precisely control and subject to unavoidable noise. The noise
prevents the direct implementation of such algorithms. Fortunately, if the noise is
below a certain threshold then quantum error correction can be used in order to still
be able to successfully run quantum algorithms. This desired property is called fault
tolerance.
But still, for universal quantum computation, one needs to reduce the noise below
the fault tolerance threshold and, simultaneously, implement thousands of qubits.
These two seemingly conflicting requirements are expected to prevent the development
of universal quantum computers in the near future.
So for the time being, many researchers are following more modest goals. As a
prove of principle one would like to demonstrate the following1 .
Milestone (quantum supremacy [2])

By using quantum recourses, solve some (computational) problem that cannot


practically be solved otherwise.

Here, “practically” means that a corresponding classical computation would have


an infeasible long runtime. There are several proposal [3–7] to solve a presumably
useless sampling problem in order to demonstrate quantum supremacy.
There is already a caveat included in the definition of quantum supremacy.
The certification problem

If the solved problem cannot be solved otherwise, how can one make sure that
the outcome is actually correct? And if one has such a device that can —in

1 Aftersome criticism some researchers have stopped using the term “quantum supremacy”. But
due to its well-established technical meaning we stick to it.

Prioritize Verify

FT

Measure Fix

Figure 1.1.: The Smolin Spiral: In order to achieve fault tolerance (FT) “we must iteratively
improve devices by estimating sources of smaller and smaller errors, prioritize them, measure
them accurately, fix them, and verify they’ve been fixed” [1].

4
High complexity Low complexity
More information Less information

Gate set Low rank matrix Anstat state Fidelity Randomized


tomography reconstruction tomography estimation benchmarking
(MPS, RBMs)
Full tomography Certification

Classical simulation

Figure 1.2.: There is a range of characterization and validation techniques. The more complex
they are, in terms of measurement and computational effort, the more information they tend to
provide. The ones with and without full stable performance guarantees are marked in green and
red, respectively.

principle— demonstrate quantum supremacy but one cannot fully check that
it functions correctly can one then rightfully claim the achievement of quantum
supremacy?

We will introduce tools that can potentially resolve this problem. In particular, for
the case of trusted measurements quantum state certification can be used.
There are also practically relevant problems where it is expected that full universal
quantum computation is not required in order to achieve an advantage over classical
computing. In particular, when a quantum circuit is shallow enough so that it can
be completed before the built-up of noise reaches a critical level then it can be imple-
mented without error correction. There are already many examples suggesting that
the era of such Noisy Intermediate-Scale Quantum (NISQ) technology is starting now
[8]. Moreover, companies including D-Wave, Google, IBM, and Intel have already
heavily invested into building such devices.
In order to be able to improve such devices (see Figure 1.1) and to be able to fairly
compare different implementations it is crucial (i) to fully characterize their compo-
nents and (ii) to find heuristic noise estimation schemes that can be implemented
with as little requirements as possible. This course includes methods specifically tar-
geted at these goals: (i) quantum state and process tomography and (ii) randomized
benchmarking for estimating the noise in quantum gate implementations. As more
fundamental motivation we ask:
• What are the ultimate limits of measuring quantum systems?

• Where do Hamiltonians come from?

Many of us might have wondered about the point on Hamiltonians already in their
first quantum mechanics course, where specific Hamiltonians are mostly only postu-
lated. Quantum field theory introduces several quantization methods that give quan-
tum Hamiltonians from a classical understanding of physics. However, quantization is
typically mathematically not well-defined. Usually uncontrolled approximations are
used to derive effective Hamiltonians most physicists are working with. This type of
non-rigorous top-down approach might be unsatisfying in some aspects. Therefore, it
is very desirable to have a complementary bottom-up approach:
Operational approach

Find a description that explains all possible observations.

In quantum tomography one does exactly that: to learn the full quantum state or
process from measured data.
We outlined certification and tomography to give two different examples. In fact,
there is a whole range of characterization and verification methods, see Figure 1.2.

5
The method of choice depends e.g., on the amount and type of information one aims
to obtain, the system size, the type of physical correlation that one should expect,
the measurement resources, the computational resources, and the type and strength
of the noise.

2. Preliminaries
In this chapter we introduce some notation and preliminary results that will be used
later. The reader may start skip this chapter first and can come back to it later when
needed.
Mathematical quantum information related background information can be found
with all details, e.g. in Watrous’ lecture notes [9]. All technical terms that are neither
explained nor referenced can be found on Wikipedia.

2.1. Notation and math


We will use the following notation and preliminaries:
• [n] := {1, 2, . . . , n} for n ∈ N.
• Landau symbols (a.k.a. Bachmann-Landau notation or big-O-notation) O and
Ω are used for asymptotic upper and lower bounds, respectively. These bounds
can be relaxed to only hold up to logarithmic factors, which is denoted by Õ and
Ω̃, respectively. Θ and Θ̃ are used to denote that the corresponding upper and
lower bound hold simultaneously. For instance, for f defined by f (x) = x ln(x)2
the following holds: f (x) ∈ O(x2 ), f (x) ∈ Õ(x), f (x) ∈ Ω(x), and hence
f (x) ∈ Θ̃(x).
• Throughout this course we assume all vector spaces to be finite dimensional
except when explicitly stated otherwise.
• We use bra-ket notation. In particular, we denote the canonical basis of Cd by
{ |i i}i∈[d] and tensor products of vectors by |ψ i |φ i := |ψ i ⊗ |φ i.
• We set |1n i := (1, 1, . . . , 1)| ∈ Rn to be the vector of ones and often just write
|1 i instead of |1n i.
• We denote the Euclidean unit sphere in Cd by Sd−1 := { |ψ i ∈ Cd : hψ|ψi = 1}
(we index manifolds by their own dimension and not by the dimension of an
ambient space).
• We denote the Pauli matrices by
     
0 1 0 −i 1 0
σx := σ1 := , σy := σ2 := , σz := σ3 := , (2.1)
1 0 i 0 0 −1

and σ0 := 12×2 .
• Tensor products of the Pauli matrices σs1 ⊗ · · · ⊗ σsn with s ∈ {0, 1, 2, 3}n are
called Pauli strings. They are operator basis that is orthogonal in the Hilbert-
Schmidt inner product defined below.
• L(V, W ) denotes the vector space of linear (bounded) operators from a vector
space V to a vector space W . We set L(V ) := L(V, V ).
• Herm(H) ⊂ L(H) denotes the subspace self-adjoint operators on a Hilbert space
H.

6
• Pos(H) ⊂ Herm(H) denote the convex cone of positive semidefinite operators.
For X, Y ∈ Herm(H) we write X  Y if X − Y ∈ Pos(H).
• Let P, Q ∈ Pos(Cd ). Then
Tr[P Q] ≥ 0 , (2.2)
which can be seen by writing the trace as a sum over an eigenbasis of one of the
operators.
P
• A probability vector is a vector p ∈ [0, 1]d that is normalized, i.e., i pi = 1.
• S(H) := {ρ ∈ Pos(H) : Tr[ρ] = 1} is the convex set of P density operators. It
coincides with the set of convex combinations of the form i pi |ψi ihψi |, where
p is a probability vector.

• An operator X ∈ L(Cd ) can be diagonalized with unitaries if and only if X is


normal, i.e., [X, X † ] = 0.
• For a normal operator X, which is diagonalized as X = U DU † , and function
f : C → C we define
f (X) := U f (D)U † , (2.3)
where f (D) is the diagonal matrix obtained from D by applying f to all its
diagonal elements.

• The Hilbert-Schmidt inner product on operators L(H1 , H2 ) between Hilbert


spaces H1 and H2 is defined by hX, Y i := Tr[X † Y ].
• For a subspace S ⊂ L(H) we denote the orthogonal complement w.r.t. the
Hilbert-Schmidt inner product by S ⊥ .

• (Column) vectorization is a map | · i : Cn1 ×n2 → Cn1 n2 that stacks the columns
of a matrix A ∈ Cn1 ×n2 on top of each other. For all matrices A, B C with
fitting dimensions holds that

|ABC i = C | ⊗ A |B i , (2.4)

where X ⊗ Y ∼ = (Xi,j Y )i,j (defined by a block matrix) denotes the Kronecker


product. This formula is of great use numerically, e.g. to avoid explicity basis
expansions. There is a similar notion of row vectorization with a similar formula.

• The unitary group of a Hilbert space H is denoted by U(H) ⊂ L(H) and we set
U(d) := U(Cd ).
• The singular values decomposition (SVD) of a matrix X ∈ Cn1 ×n2 is given by
X = U ΣV † , where U ∈ U(n1 ) and V ∈ U(n2 ) are unitary and the only non-zero
entries of Σ are the positive singular values Σi,i for i ∈ [rank(X)]. The unitaries
U and V can be obtained by diagonalizing XX † and X † X, respectively.
Sometimes one uses a different form of the SVD X = Ũ Σ̃Ṽ † where Σ̃ is the
positive (rank(X) × rank(X))-matrix with the non-zero singular values of X
on the diagonal and Ũ and Ṽ are obtained from U and V by removing the
appropriate columns.
P
• For p ∈ R+ the `p -norm of a vector x ∈ Cd is kxk`p := ( i |xi |p ) .
1/p

For p = ∞ it is kxk`∞ := maxi |xi |. These norms satisfy Hölder’s inequality,

|hx, yi| ≤ kxk`p kyk`q ∀x, y ∈ Cd (2.5)

whenever 1/p + 1/q = 1 (with 1/∞ := 0). Moreover, the norm inequalities

kxk`q ≤ kxk`p ≤ s1/p−1/q kxk`q (2.6)

7
are satisfied for any 0 < p < q ≤ ∞ and x ∈ Cd with s non-zero elements; see
e.g. [10, Appendix A.1].
• For p ∈ (0, ∞] the Schatten p-norm kXkp of an operators X given by the `p -
norm of the vector of its singular values. A similar Hölder’s inequality and
similar norm inequalities as for the `p -norms hold for the Schatten p-norms
(with s replaced by the rank). In particular, for any operator X,
p
kXk∞ ≤ kXk2 ≤ kXk1 ≤ rank(X) kXk2 ≤ rank(X) kXk∞ (2.7)

and any operator Y from the same space as X,

| Tr[X † Y ]| ≤ kXk1 kY k∞ . (2.8)

• Of particular importance are


– the Schatten ∞-norm (coincides with the spectral norm k · kop a.k.a. oper-
ator norm),
– the Schatten 2-norm (coincides with the Frobenius norm k · kF , which is
induced by the Hilbert-Schmidt inner product and which is invariant under
vectorizations maps);
– and the Schatten 1-norm (coincides with the trace norm k · k1 a.k.a. nuclear
norm).
• The pseudo-inverse or Moore–Penrose inverse is a generalization of the matrix
inverse to all matrices. It can be conveniently calculated as follows. Let is write
a singular value decomposition of a matrix X ∈ Cn1 ×n2 as

X = U ΣV † . (2.9)

Then the pseudo-inverse Σ+ of Σ can be obtained component-wise by setting


(
0 if Σi,j = 0
Σi,j :=
+
(2.10)
1
Σi,j otherwise

for all i, j. Then the pseudo-inverse of X is

X + := U Σ+ V † . (2.11)

• For subsets S, T ⊂ V of a vector space V we be denote by S + T := {s + t : s ∈


S, t ∈ T } the Minkowski sum of S and T . The sets −T and S − T are define
similarly. For v ∈ V we set v + S := {v + s : s ∈ S}. By cone(S) := {λs : λ ∈
R+ , s ∈ S} we denote the cone generated by S. The linear span spanK (S) ⊂
is the linear subspace of all linear combinations of vectors in S and coefficients
in K. The subscript K is usually omitted when K is the underlying field of V .
The dimension of S is dim(S) := dim(span(S)).
• A subset C ⊂ Rn is called convex if (1 − λ)x + λy ∈ C for all x, y ∈ C and
λ ∈ [0, 1]. Similarly, a subset V ⊂ Rn is called affine set or affine subspace if
(1 − λ)x + λy ∈ V for all x, y ∈ V and λ ∈ R. The set V0 := V − V ⊂ Rn is the
unique linear subspace parallel to V and for any x0 ∈ V one has V = x0 + V0 ;
see Rockafellar’s book on convex analysis [11, Section 1] for more details.
• For a series of events A1 , A2 , . . . the union bound (a.k.a. Boole’s inequality)
hods X
P[A1 OR A2 OR . . . ] ≤ P[Ai ] . (2.12)
i

• A (standard) Gaussian random variable (RV) is a normally distributed RV with


zero mean and unit variance. A complex (standard) Gaussian RV is a RV of

8
which the real and imaginary part are each iid. standard Gaussian RVs. A com-
plex (standard) Gaussian (random) vector is a vector in Cd the components of
which are iid. complex standard Gaussian RVs. A Hermitian (standard) Gaus-
sian (random) matrix is a matrix in Herm(CC d ), of which the diagonal com-
ponents are iid. Guassian RVs and the ohter components iid. complex Gaussian
RVs.
• There is a unitarily invariant probability measure on the unitary group U(H)
of a Hilbert space H called the Haar measure. For instance, unitary matrices
diagonalizing Hermitian Gaussian matrices are distributed Haar randomly. This
fact can be used to numerically sample unitaries from the Haar measure.

2.2. Representation theory


Let us start with the most basic definitions. For a proper introduction we refer to
Simon’s book [12] and to Goodman and Wallach’s book [13] for the representation
theory of the standard matrix groups.
Definition 2.1 (Basic definitions from representation theory):

Let G and H be groups.

• f : G → H is a (group) homomorphism if f (g1 g2 ) = f (g1 )f (g2 ) for


all g1 ,g2 ∈ G. Note that this condition implies that f (eG ) = eH and
f g −1 = f (g)−1 for all g ∈ G.
• Let H ∼ = Cν be a Hilbert space with the unitary group U(H) ⊂ L(H).
A unitary representation of G on H is a homomorphism R : G → U(H).
Such representations are instances of linear group representations, which
are defined similarly with U(H) replaced by the group of invertible oper-
ators on H. We will only be concerned with unitary representations and,
hence, often omit the word “unitary”.
• A subspace V ⊂ H is said to be invariant if R(g)V ⊆ V for all g ∈ G.

• R is called irreducible if the only invariant subspaces are {0} and H itself.
Irreducible representations are also called irreps.
• Two representations R : G → U(H) and R̃ : G → U(H̃) are said to be
unitarily equivalent if there is a unitary operator W : H → H̃ such that
R̃(g) = W R(g)W † for all g ∈ G.

If Ri : G → Hi for i = 1, 2 are two representations of G then (R1 ⊕R2 )(g) := R1 (g)⊕


R2 (g) defines another representation R1 ⊕R2 : G → H1 ⊕H2 . This representation has
H1 and H2 as invariant subspaces. Conversely, if a representation R has a non-trivial
invariant subspace V then it can be decomposed as R = R|V ⊕ RV ⊥ . By iterating
this insight, we have the following statement (see e.g. [12, Theorem II.2.3]).
Proposition 2.2 (Decomposition into irreps):

Let R : G → L(H) be a unitary representation of a group G on a finite-


dimensional Hilbert space H. Then (R, H) can be decomposed into a direct
sum of irreps (Ri , Hi ) of G as
M M
H= Hi and R(g) = Ri (g) . (2.13)
i i

Several irreps Ri1 , . . . , Rim might be unitarily equivalent to each other. The max-
imum such m is called the multiplicity of that irrep. The space Cm in the resulting

9
identification
m
M
Rij (g) ∼
= Ri1 (g) ⊗ 1m ∈ L(H1 ⊗ Cm ) . (2.14)
j=1

is called the multiplicity space of Ri1 . The decomposition (2.13) is called multiplicity-
free if all irreps Ri are inequivalent, i.e., not isomorphic.
Theorem 2.3 (Schur’s lemma):

Let R : G → U(H) be an irrep of G on H. If A ∈ L(H) satisfies

AR(g) = R(g)A ∀g ∈ G (2.15)

then A = c 1 for some c ∈ C.

Proof. The condition (2.15) implies that R(h)A† = A† R(h) for all h = g −1 ∈ G.
Hence, this condition also holds for Re(A) := 21 (A + A† ) and Im(A) := 2i 1
(A − A† )
and A is a constant if they both are. Hence, it is sufficient to prove the theorem for
A ∈ Herm(H).
Let |ψ i be an eigenvector with A |ψ i = λ |ψ i and Eigλ (A) := { |ψ i : A |ψ i = λψ}
the full eigenspace. Then R(g) |ψ i ∈ Eigλ (A) for all g ∈ G because AR(g) |ψ i =
R(g)A |ψ i = λR(g) |ψ i. So, Eigλ (A) is an invariant subspace. Since Eigλ (A) 6= {0}
and R is an irrep, Eigλ (A) = H follows.

Corollary 2.4 (Irreps of abelian groups):

If G is abelian then every irrep has dimension 1.

Proof. Let R be an irrep of G on H. Theorem 2.3 implies that each g ∈ G has


representation R(g) = cg 1 for some constant cg . Hence, every subspace of H is
invariant under R. Since R is an irrep this is only possible if dim(H) = 1.
There is also a slightly more general version of Schur’s lemma:
Theorem 2.5 (Schur’s lemma II):

Let R : G → U(H) and R̃ : G → U(H̃) be two irreps of G on finite dimensional


Hilbert spaces H and H̃. If A ∈ L(H, H̃) satisfies

AR(g) = R̃(g)A ∀g ∈ G (2.16)

then either A = 0 or R1 and R2 are unitarily equivalent up to a constant factor.

Proof. The condition 2.16 implies that for all g ∈ G

R(g)A† = A† R̃(g) (2.17)

and, hence,

R(g)A† A = A† AR(g) (2.18)


R̃(g)AA = AA R̃(g) .
† †
(2.19)

Schur’s lemma (Theorem 2.3) implies that A† A = c 1 and AA† = c̃ 1 for constants
c, c̃. Obviously,
√ c = c̃, as can be seen from the eigenvalues. Either c = 0 so that A = 0
or W = A/ c is a unitary. In the latter case

W R(g) = R̃(g)W (2.20)

10
for all g ∈ G, i.e., R and R̃ are unitarily equivalent.
A unitary W relating two representations R and R̃ as in (2.20) is called an inter-
twining unitary of R and R̃.
Particularly interesting in the context of k-designs are two group representations on
H = (Cd )⊗k . Remember that the symmetric group Symk is the group of permutations
on k elements.
Two group representations on (Cd )⊗k

The symmetric group Symk and the unitary group U(d) both have a canonical
representation on (Cd )⊗k :

πk : Symk → U (Cd )⊗k , (2.21)

∆d : U(d) → U (C )
d ⊗k
. (2.22)

For σ ∈ Symk and U ∈ U(d) they are given by



πk (σ)( |ψ1 i ⊗ · · · ⊗ |ψk i) := ψσ−1 (1) ⊗ · · · ⊗ ψσ−1 (k) (2.23)
∆d (U )( |ψ1 i ⊗ · · · ⊗ |ψk i) := (U |ψ1 i) ⊗ · · · ⊗ (U |ψk i) . (2.24)

We note that πk (σ) and ∆d (U ) commute for any σ ∈ Symk and U ∈ U(d).

The symmetric and anti-symmetric subspace of (Cd )⊗k are defined to be



Hsymk := |Ψ i ∈ (Cd )⊗k : πk (σ) |Ψ i = |Ψ i ∀σ ∈ Symk , (2.25)

H∧k := |Ψ i ∈ (Cd )⊗k : πk (σ) |Ψ i = sign(σ) |Ψ i ∀σ ∈ Symk . (2.26)

By Psymk and P∧k we denote the orthogonal projectors onto these two subspaces,
respectively.
Let us briefly consider the case k = 2. It is easy to see that any matrix can be
decomposed into a symmetric and an anti-symmetric part, which are orthogonal to
each other. This implies that

(Cd )⊗2 = Hsym2 ⊕ H∧2 . (2.27)

Note that due to Corollary 2.4, both the symmetric and the antisymmetric subspace
are isomorphic to some Cm , where msym2 and m∧2 are the multiplicities of the two
different one-dimensional irreps of Sym2 .
For large k there is a similar decomposition with more summands called Schur-Weyl
decomposition. In ordert to state it we write λ ` k for an integer partition of the
form
l(λ)
X
k= λi (2.28)
i=1

of k into integers λi ≥ 1.
Theorem 2.6 (Schur-Weyl duality):

The action of Symk × U(d) on (Cd )⊗k given by the commuting representations
(2.23) and (2.24) is multiplicity-free and (Cd )⊗k decomposes into irreducible
components as M
(Cd )⊗k ∼= Wλ ⊗ Sλ . (2.29)
λ`k,l(λ)≤d

For any k ≥ 2, both Hsymk and H∧k occur as component in the direct
sum (2.29).

11
The spaces Wλ are called Weyl modules and Sλ Specht modules. Schur-Weyl duality
implies that the Weyl modules are the multiplicity spaces of the irreps of Symk and,
similarly, the Specht modules are the multiplicity spaces of the irreps U(d).
The last ingredient we need in order to prove Lemma 3.12 is the dimension of the
symmetric subspace.
Exercise 2.1 (Symmetric subspace):

• Calculate Psymk |ψ i for a product state |ψ i = |ψ1 i ⊗ · · · ⊗ |ψk i.


• Show that the dimension of the symmetric subspace Psymk (Cd )⊗k is
 
d+k−1
Tr[Psymk ] = . (2.30)
k
Hint: Argue first that this is the number of ways to distribute k indistinguishable
particles (bosons) into d boxes (modes).

Often it is helpful to write Psym2 in terms of the flip operator (a.k.a. swap operator)
F ∈ L(H⊗2 ), which is defined by

F |ψ i |φ i := |φ i |ψ i (2.31)

and can hence be written as


d
X
F= |i, j ihj, i | . (2.32)
i,j=1

Then
1
Psym2 = (1 + F) . (2.33)
2
From (2.32) one can see that Tr[F] = d, so that indeed Tr[Psym2 ] = 21 d(d + 1).
Proposition 2.7 (Invariant operators for k = 2):

Let E ∈ L(Cd ⊗ Cd ) be an operator such that

(U ⊗ U )E = E(U ⊗ U ) (2.34)

for all U ∈ U(d). Then


E = c1 Psym2 + c2 P∧2 (2.35)
for some constants c1 , c2 ∈ C depending on E.

Proof. Let us denote the representation of U(d) by ∆ : U(d) → L(Cd ⊗ Cd ). Schur-


Weyl duality (Theorem 2.6) for k = 2 tells us that irreps of ∆ as

(Cd )⊗2 ∼
= Wsym2 ⊗ Ssym2 ⊕ W∧2 ⊗ S∧2 , (2.36)

where Ssym2 and S∧2 carry the multiplicities of the irreps of ∆. But Sym2 is abelian,
so according to Corollary 2.4 dim(Ssym2 ) = dim(S∧2 ) = 1, i.e., the irreps Wsym2 and
W∧2 of ∆ are multiplicity-free.
Now we can write E as a block matrix corresponding to the decomposition (Cd )⊗2 ∼ =
Wsym2 ⊕ W∧2 ,  
E1,1 E1,2
E =: . (2.37)
E2,1 E2,2
As similar decomposition for ∆ is ∆ = ∆sym ⊕ ∆∧ (the off-diagonal blocks are zero).

12
The invariance ∆(U )E = E∆(U ) implies

∆sym (U )E1,1 = E1,1 ∆sym (U ) (2.38)


∆∧ (U )E2,2 = E2,2 ∆∧ (U ) (2.39)

for all U ∈ U(d). Hence, thanks to Schur’s lemma (Theorem 2.3), Ei,i = ci 1 for
i = 1, 2 and some constants c1 and c2 . Similarly, we obtain

∆sym (U )E1,2 = E1,2 ∆∧ (U ) (2.40)


∆∧ (U )E2,1 = E2,1 ∆sym (U ) (2.41)

for all U ∈ U(d). According to Schur’s lemma (the second version, Theorem 2.5) E1,2
and E2,1 are each either zero or an intertwining unitary for ∆sym and ∆∧ just as W
in (2.20). Since ∆sym and ∆∧ are irreps they cannot be intertwining unitaries and
must hence be zero. Together,
 
c1 1 0
E= = c1 Psym2 + c2 P∧2 . (2.42)
0 c2 1

2.3. Random variables and tail bounds


Tail bounds for random variables are bounds to the probability that a random variable
assumes a value that deviates from the expectation value, as visualized by the marked
area in Figure 2.1. Indeed, for any random variable X it is unlikely to assume values
that are much larger than the expectation value E[|X|]:
Theorem 2.8 (Markov’s inequality):

Let X be a non-negative random variable and a > t. Then

E[X]
P[X ≥ t] ≤ . (2.43)
t

Proof. The indicator function 1A of a set A is defined by


(
1 if x ∈ A
1A (x) := (2.44)
0 otherwise.

To prove Markov’s inequality we choose A to be the set {X ≥ t} := {ω : X(ω) ≥ t},


observe that
t 1{X≥t} ≤ X (2.45)
and take the expectation value of both sides of this inequality.
The variance can give tighter bounds on the tail than just the expectation values
E[|X|]:
Theorem 2.9 (Chebyshev’s inequality):

Let X be a mean zero random variable with finite variance σ 2 := E[X 2 ]. Then

σ2
P[|X| ≥ t] ≤ (2.46)
t2
for all t ≥ 0.

13
PDF

x
µ−t µ µ+t

Figure 2.1.: The (upper) tail of a random variable X is the probability of X being greater than
some threshold t. This probability is given by the corresponding area under the graph of the
probability density function (PDF) of X.

Proof. Apply Markov’s inequality to the random variable X 2 .

Note that in the case of a random variable Y that does not necessarily have a zero
mean Chebyshev’s inequality yields a tail bound by applying it to X := Y − E[Y ]; see
also Figure 2.1. The same argument can be made for the tails bounds that follow.
When a random variable is bounded then its empirical mean concentrates much
more than a naive application of Markov’s inequality suggests. More generally, the
following holds (see, e.g., [10, Theorem 7.20]):
Theorem 2.10 (Höffdings inequality):

Let X1 , . . . , Xn be independentPrandom variables with ai ≤ Xi ≤ bi almost


surely for all i ∈ [n] and Sn := i=1 Xi . Then
n

 
2 t2
P[Sn − E[Sn ] ≥ t] ≤ exp − nP , (2.47)
(bi − ai )2
 i=1 
2 t2
P[|Sn − E[Sn ]| ≥ t] ≤ 2 exp − Pn (2.48)
i=1 (bi − ai )
2

for all t ≥ 0.

Proof. The second statement directly follows from the first one. In order to prove the
first one, let s > 0, apply Markov’s inequality to

P[Sn − E[Sn ] ≥ t] = P[es(Sn −E[Sn ]) ≥ es t ] , (2.49)

use the independence of the Xi to factorize the exponential, use the bounds on Xi ,
and optimize over s.
When one can control the variance of the random variables then the following tail
bound can give a better concentration, especially for small values of t.
Theorem 2.11 (Bernstein inequality [10, Corollary 7.31]):

Let X1 , . . . , Xn be independent zero mean random variables with |Xi | ≤ a


almostPsurely and E[|Xi |2 ] ≤ σi2 for some a > 0, σi > 0, and i ∈ [n]. Set
Sn := i=1 Xi . Then, for all t > 0,
n

 
  t2 /2
P Sn − E[Sn ] ≥ t ≤ 2 exp − 2 (2.50)
σ + at/3
Pn
with σ 2 := i=1 σi2 .

14
Another related tail bound is Azuma’s inequality, which allows for a relaxation on
the independence assumption (super-martingales with bounded differences).

2.4. Monte Carlo integration


Traditionally, Monte Carlo methods are used in numerical integration, optimization,
and sampling from a distribution. Here, we will introduce importance sampling, a
method to estimate a sum or integral.
Specifically, we aim to compute an integral F that is written as an expectation
value Z
F := EX∼p [f (X)] = f (x)p(x) dx (2.51)

of some function f . In this setting, we can iid. sample X (1) , . . . , X (m) ∼ p and take
the empirical average
1 X
m
F̂ := f (X (i) ) (2.52)
m i=1

as estimator of F . It is not difficult to see that F̂ is unbiased, i.e., that E[F̂ ] = F .


If Var[f (X)] < ∞ then F̂ can be proven to be consistent, i.e., F̂ converges to F for
m → ∞ in an appropriate sense. Moreover,

Var[f (X)]
Var[F̂ ] = , (2.53)
m
i.e., the empirical variance also gives an estimate of the estimation error. Thanks to
the central limit theorem F̂ converges to a normal random variable with mean F and
variance Var[fm(X)] , which allows for the simple estimation of confidence intervals.
However, everything relies on the ability to sample from p. Two popular methods to
make such sampling efficient are importance sampling and Markov chain Monte Carlo
sampling.

2.4.1. Importance sampling


The idea is to rewrite the integrand f p in (2.51) as

fp
fp = q (2.54)
q
for some probability distribution q. Then we can apply the Monte Carlo sampling
idea (2.52) w.r.t. q, i.e., we draw X (1) , . . . , X (m) ∼ q to obtain the estimator

1 X p(X (i) )
m
F̂q := f (X (i) ) . (2.55)
m i=1 q(X (i) )

It holds that Eq [F̂q ] = F and


Z
1 1 f 2 p2
Varq [F̂q ] = Varq [f p/q] = . (2.56)
m m q
It can be shown that choosing q as

p |f |
q ∗ := (2.57)
Z
with normalization factor Z yields minimum variance. In particular, for f ≥ 0 we
even have Eq∗ [(f p/q ∗ )2 ] = Ep [f ]2 = Eq∗ [f p/q ∗ ]2 , i.e., Varq∗ [F̂q∗ ] = 0. So, if f does
not change its sign then a single sample is sufficient for the estimation! But in order

15
to obtain Z one needs to solve the integration problem first. However, finding non-
optimal but good choices for q can already speed up the integration.

2.5. Basic convex optimization problems


Convex optimization problems are minimization problems where a convex objective
function is optimized over a convex set, the feasible region. They have a rich duality
theory [11] that leads to polynomial-time algorithms for many important subclasses
of convex optimization problems [14]. Two important subclasses are linear programs
and semidefinite programs. They can be solved efficiently with standard software,
e.g., by using CVX(PY) [15, 16].

2.5.1. Linear programs (LPs)


LPs are convex optimizatio problems with linear objective function, linear equality,
and linear inequality constraints. These are exactly those problems where a linear
function is optimized over a convex polytope. A convex polytope is defined as a set
that is an intersection of finitely many half spaces.
As every convex optimization problem, an LP comes along with a primal and dual
formulation. Let us denote entrywise non-negativity of a vector x ∈ Rn by x ≥ 0.
Definition 2.12 (LPs):

A linear program (LP) is given by a tripel (A, c, b) with c ∈ Rn , b ∈ Rm , and


A ∈ Rm×n and comes along with the following pair of optimization problems:
Primal: Dual:
maximize c x|
minimize b| y
subject to Ax = b , subject to A| y ≥ c ,
x ≥ 0. y ≥ 0.

The variables x and y are called primal and dual variable respectively. A
primal feasible point is a point x that satisfies the constraints of the primal LP.
A primal optimal point is a primal feasible point x] so that c| x] is the outcome
of the maximization in the primal LP. Dual feasible points and dual optimal
points y ] are defined similarly via the dual LP.

Weak duality states that for an primal feasible point x and dual feasible point y
we have c| x ≤ b| y, which directly follows from the definition of the primal and dual
problem. The strong duality theorem for LPs states that c| x] = b| y ] for all optimal
primal and dual points x] and y ] .

An LP for the `1 -norm


The `1 norm of a vector z ∈ Rn can be written as

kzk`1 = min 1| y
subject to yi ≥ zi and yi ≥ −zi ∀i, (2.58)
y ≥ 0,

where 1 is the vector with all components being 1. We note that the `1 -norm of
complex vectors is not an LP but a so-called second-order cone program, which is a
special kind of SDP.

2.5.2. Semidefinite programs (SDPs)


SDPs are generalization of an LPs. Here, vectors are replaced by matrices and the
entrywise conic order is replaced by the semidefinite conic ordering on matrices. To

16
be more explicity, we folllow [17, Chapter 1.2.3] and definine SDPs for either real of
complex inner product spaces V and W . A linear operator Θ : L(V ) → L(W ) is said
to be Hermiticity-preserving if Θ(X) ∈ Herm(W ) for all X ∈ Herm(V ).
Definition 2.13 (SDPs):

A semidefinite program is specified by a triple (Ξ, C, B), where C ∈ Herm(V )


and B ∈ Herm(W ) are self-adjoint operators and Ξ : L(V ) → L(W ) is a
Hermiticity-preserving linear map. With such a triple, we associate the follow-
ing pair of optimization problems:
Primal: Dual:
maximize Tr(CX) minimize Tr(BY )
subject to Ξ(X) = B , subject to Ξ† (Y )  C ,
X  0. Y ∈ Herm(W ) .

Primal and dual feasible and optimal points are defined similarly as for LPs.

SDPs that characterized as in this definition are said to be in standard form. For
specific SDPs, equivalent formulations might often be more handy.
Weak duality refers to the fact that the value of the primal SDP cannot be larger
than the value of the dual SDP, i.e., that Tr(CX) ≤ Tr(DY ) for any primal feasible
point X and dual feasible point Y . This fact follows directly from the definition of
the primal and dual problem:

Tr[BY ] = Tr[Ξ(X)Y ] = Tr[Ξ† (Y )X] ≥ Tr[CX] . (2.59)

An SDP is said to satisfy strong duality if the optimal values coincide, i.e., if for some
optimal primal feasible and dual feasible point X ] and Y ] it holds that Tr(CX ] ) =
Tr(DY ] ). In fact, from a weak condition, called Slater’s condition, strong duality
follows. Slaters condition is that either one of the two following conditions is fulfilled:
(i) The primal problem is bounded above and there is a strictly feasible point of
the dual problem, i.e., there is Y ∈ Herm(W ) with Ξ† (Y )  0.
(ii) The dual problem is bounded below and there is a strictly feasible point of the
primal problem, i.e., there is X  0 with Ξ(X) = B.
There are efficient solvers for SDPs such as CVX(PY) [15, 16]. The underlying
numerical routines (interior point methods) come along with convergence proofs. This
means that one can view SDPs as functions in a similar sense as the sine function
is a function: both have a rigorous analytic characterization and polynomial time
algorithms for their evaluation with convergence guarantees.

An SDP for the spectral norm


For any matrix Z and y ≥ 0 holds that
 
y1 Z
kZkop ≤ y ⇔ y 1−Z Z ≥0
2 †
⇔  0, (2.60)
Z† y1

where the last equivalence can be checked, e.g., by using an SVD of Z and reducing
it to a 2 × 2 matrix problem. It follows that
   
y1 Z
kZkop = min y ∈ R :  0 (2.61)
Z † y1

17
and dualization yields

kZkop = max Tr(Z X2 ) + Tr(Z † X2† )


subject to Tr(X1 ) + Tr(X3 ) = 1 ,
  (2.62)
X1 X2
 0.
X2† X3

An SDP for the trace norm


An SDP that yields the trace norm of a matrix Z is given by the dual formulation
   
Y1 −Z
kZk1 = min Tr(Y1 )/2 + Tr(Y2 )/2 :  0 . (2.63)
Y −Z † Y2

The corresponding primal formulation is


   
1 X
kZk1 = max Tr(Z X)/2 + Tr(Z † X † )/2 : 0 . (2.64)
X X† 1

Since the spectral norm is dual to the spectral norm the equivalence (2.60) implies
that the optimal value of the primal problem for the trace norm (2.64) is indeed kZk1 .

2.6. Quantum mechanics


Quantum theory is build on basic postulates that formalize the mathematical model
of nature. A general version of the static postulates (dynamics will be covered in
Chapter II), so the ones concerning states and measurements, can be stated as follows.
Postulate (quantum states and measurements, general form):

• Every quantum system is associated with a separable complex Hilbert


space H.
• A measurement is given by a positive operator valued measure (POVM),
which in turn is given by a set of positive semidefinite
P operators (effects)
{Ei } (discrete version only) on H such that i Ei = 1. (Hence kEi kop ≤
1 for all i.)
• A quantum state described by a density operator, i.e., a positivea nor-
malized element of the dual space of (L(H), k · kop ).
aA functional ρ on L(H) is called positive if ρ(A† A) ≥ 0 for all A ∈ L(H).

The outcomes of a POVM measurement are given by the labels i of the effects Ei .
Exercise 2.2 (Measurement postulate):

1. Show that for finite dimensional Hilbert spaces the dual space of
(L(H), k · kop ) is indeed given by (L(H), k · k1 ).
2. Show that ρ ∈ Herm(Cd ) is positive semidefinite if and only if Tr[ρA† A] ≥
0 for all A ∈ L(Cd ). Also note that Tr[ρ] = kρk1 for all ρ ∈ Pos(H).
3. Explain how the measurement of an observable, i.e., von Neumann mea-
surements (a.k.a. projective measurements) are related to POVM mea-
surements.
4. That quantum states are dual to observables already tells us that the
trace norm is the canonical norm for quantum states. Find an operational
interpretation of the trace norm related to measurements.

18
A density matrix ρ ∈ S(Cd ) is called pure if there is a state vector |ψ i ∈ Cd such
that ρ = |ψ ihψ |.
Exercise 2.3 (Pure state condition):

Show that a state ρ ∈ S(Cd ) is pure if and only if Tr[ρ2 ] = 1.

Given two single quantum systems, their joint system should also be a quantum
system. This expectation is captured by the following.
Postulate (composite quantum systems):

The Hilbert space of two quantum systems with Hilbert spaces H1 and H2 ,
respectively, is the tensor product H1 ⊗ H2 .

This construction induces an embedding from L(H1 ) into L(H1 ⊗ H2 ) by

A 7→ A ⊗ 1 . (2.65)

Dually to that, for any state ρ ∈ S(H1 ⊗ H2 )

Tr[ρ (A ⊗ 1)] = Tr[ρ1 A] , (2.66)

where ρ1 is the that ρ reduced to system 1. The reduced state captures all information
of ρ that can be obtained from measuring system 1 alone and can be explicitly obtained
by the partial trace over the second subsystem

Tr2 : L(H1 ⊗ H2 ) → L(H1 ) (linear)


(2.67)
7 Tr2 [X ⊗ Y ] := X Tr[Y ]
X ⊗Y →

as ρ1 := Tr2 [ρ].
Exercise 2.4 (The swap-trick):

Let F ∈ L(H ⊗ H) be the flip operator (or swap operator), i.e., the linear
extension of the map |ψ i |φ i 7→ |φ i |ψ i. Show that

Tr1 [(X ⊗ 1)F] = X (2.68)

for any X ∈ L(H).

19
Part I

Quantum states
Quantum state tomography is the task to (re)construct a full description of a given
quantum state from measurement data. Due to the nature of quantum physics one
cannot measure a quantum state from just one single copy but many copies of the
same state are required for that task. As the quantum measurements are of prob-
abilistic nature one can also only hope to be able to reconstruct the state up to
some finite precision. Of course, the precision depends on the precise reconstruction
method, the number of measurements, and the type of measurements at hand. Also
the computational efficiency of the actual reconstruction algorithm is crucial. For in-
stance, in one of the largest state tomography implementations [18] an 8-qubit state
was reconstructed, which required two weeks post-processing.
This illustrates the need of tomographic schemes that are user friendly and work
with an optimal amount of resources at the same time.

3. Quantum state tomography (QST)


Let us start with a common method for QST, which is often used in theory when
then details concerning the number of measurements and accuracy do not matter.
A naive QST method

Let us assume that we are given nρ copies of an a priori unknown state ρ ∈


S(Cd ). Then we can fix a basis√ of Hilbert-Schmidt orthonormal observables
(Mi )i=0,...,d2 −1 with M0 = 1/ d. Then the other observables are traceless and
we can expand ρ in that basis as
2
dX−1
1
ρ= + Tr[ρMi ]Mi . (3.1)
d i=1

The expectation values Tr[ρMi ] can, at least in principle, be approximated


from experimental data.

This method has several drawbacks:

• d2 − 1 many measurement settings are required, which is much more than nec-
essary and infeasible for intermediate-sized quantum systems.
• The errors on the estimates of Tr[ρMi ] tend to add up in an unfavorable way.
• Moreover, they will typically result in an estimate of ρ that is not a positive
semi-definite operator.

In the following sections, we will address the drawbacks of the naive QST scheme
and introduce methods that (partially) resolve them. In order to be able to fairly
compare different strategies we will use the following tomography framework.

20
Sequential QST for iid. state preparations

In sequential QST a number nρ of copies of a state ρ ∈ P are given with P ⊂


S(Cd ) capturing prior knowledge on the state. The i-th copy of ρ is measured
with a POVM M(i) with mi measurement outcomes. Based on measurement
values and a description of (M(i) )i∈nρ an estimate (reconstruction) ρ̂ of ρ is
generated by a reconstruction algorithm. The error term is ρ̂ − ρ and kρ̂ − ρkp
is the reconstruction error (in p-norm). The combination of measurement data
acquisition and reconstruction algorithm is also called tomography scheme. Any
tomography scheme is probabilistic resulting in some failure probability, i.e.,
probability of the reconstruction being successful, which can be captured by
kρ̂ − ρkp ≤  for some predetermined  > 0.

Usually one does not use a new POVM for each copy of ρ so that many of the
POVMs M(i) are the same.
The randomness in any tomographic scheme comes at least from the measurements
being probabilistic. However, in general, also the measurement setup and potentially
even the reconstruction algorithm can contain probabilistic elements, each of which
can result in some failure probability of the measurement scheme.
In the end, there will always be a trade-off between resources:
• the number of samples nρ ,
• the effort of implementing the measurement setup (M(i) )i∈nρ ,
• prior knowledge P, and
• the computational cost of the reconstruction algorithm
and performance:
• kρ̂ − ρkp and
• the failure probability.
There are two paths of how our tomography setting could be generalized. The
generalization of sequential measurements are parallel measurements where ρ⊗nρ is
measured with one large joint POVM. This concept implicitly includes the possibility
of processing ρ⊗nρ in a quantum computer with unbounded circuit complexity (see
[19, 20] for QST including quantum computations). However, in order to keep things
practical and NISQ-area oriented we focus on sequential measurements where the
(i + 1)-th copy of ρ is measured after the i-th copy.
Another path of generalization could be to relax iid. assumption, i.e., the assump-
tion that the measured total state is of the form ρ⊗nρ . But to keep our considerations
simple we will keep the very convenient iid. assumption. When the total state is ρtot
instead of ρ⊗nρ the iid. mismatch error kρtot − ρ⊗nρ k1 would add to the reconstruc-
tion error in the worst case.

3.1. Informational completeness of measurements


In this section we put aside issues concerning sampling errors, noise and accuracy. In
such a simplified setting we address the following questions.
• How many expectation values are needed for QST?

• How many measurement settings (in terms of POVM measurements) are re-
quired?
• What is the relation between the number of outcomes in POVM measurements
and expectation values of observables?

• How many measurements are needed for pure states?


The real dimension of the space spanned by S(Cd ) is dimR (S(Cd )) = d2 − 1. Hence,
one would expect that d2 − 1 many operators are necessary and sufficient to measure
a state ρ ∈ S(Cd ). Largely following Heinosaari, Mazzarella, and Wolf [21], we will

21
now justify this intuition and extend it to the situation where the measured state is
known to be pure.
For a set of operators M = {Mi }i∈[m] ⊂ Herm(H) we define the measurement map
M : L(H) → Rm by its components

M(ρ)i := Tr[ρMi ] . (3.2)

component-wise for i ∈ [m]. Notice that if M is a POVM with m outcomes then


it is alreadyPdetermined by m − 1 operators; the last operator Mm is given by
Mm = 1 − i=1 Mi . In the following we will stick to the notation of denoting
m−1

by M a set of operators Mi and by M the associated measurement map with com-


ponents (3.2). Generally one says that measurements are informationally complete
(a.k.a. tomographically complete) if they can uniquely identify any state in an idealized
setting. More precisely:
Definition 3.1 (P-informational completeness):

Let P ⊆ S(H) be a subset of states. A collection of POVMs {M(i) } is called in-


L
formationally complete w.r.t. P if the concatenated measurement map i M(i)
is injective on P, i.e.,

M(i) (ρ) = M(i) (σ) ∀i ⇒ ρ = σ (3.3)

for all ρ, σ ∈ P. For the case P = S(H) we just call {M(i) } informationally
complete.

In short, informational completeness means that the “exact outcomes” uniquely


determine the state.
As we will see in the next proposition, informational completeness of a set of POVMs
can be related to informational completeness of just one POVM. The quantity that
matters here is
nM
[
S(M(1) , . . . , M(nM ) ) := spanC M(i) , (3.4)
i=1

which is the set of all possible complex linear combinations of all operators occurring in
the POVMs (M(1) , . . . , M(nM ) ). A set of operators S is called operator system if 1 ∈ S
and if it is closed under taking adjoints, i.e., S = S† . Clearly, S(M(1) , . . . , M(nM ) ) is
an operator system (assuming nM ≥ 1).
Proposition 3.2 (POVMs and operator systems [21, Proposition 1]):

Let S ⊆ L(H) be an operator system. Then there exists a POVM M such that
S = S(M) and M has dim(S) outcomes. Any POVM B satisfying S = S(B)
has at least dim(S) outcomes.

Proof. Exercise.
It seems that operator systems capture the information contained in POVMs they
are generated by. Indeed, this is formalized by [21, Proposition 2]:
Proposition 3.3 (Operator systems and informational completeness):

Let S ⊆ L(H) be an operator system and let P ⊆ S(H) be a set of states.


Then a POVM M satisfying S(M) = S is informationally complete w.r.t. P iff

(P − P) ∩ S⊥ = {0} . (3.5)

22
Proof. For all states ρ, σ ∈ P, we have:

M(ρ) = M(σ) ⇔ Tr[ρA] = Tr[σA] ∀A ∈ S


⇔ Tr[(ρ − σ)A] = 0 ∀A ∈ S
⇔ ρ − σ ∈ S⊥ .

One may only considers measurement data that comes in the form of expectation
values of observables rather than from full POVM statistics. We adjust the definition
of informational completeness also to that case.
Definition 3.4 (P-informational completeness for observables):

Let P ⊆ L(H) be a subset of states. A collection M = {Mi }i∈[m] ⊂ Herm(H)


of self-adjoint operators is called informationally complete w.r.t. P if its mea-
surement map (3.2) is injective on P, i.e.,

M(ρ) = M(σ) ⇒ ρ = σ (3.6)

for all ρ, σ ∈ P. For the case P = S(H) we just call M informationally complete.

Now let us compare the informational completeness of POVM measurements with


the one of expectation values of observables. First, the normalization constraint of
POVM measurements give rise to the identity operator in the generated operator sys-
tem, which does not yield any information on a quantum state (Tr[ρ1] = 1 ∀ρ S(H)).
Second, the operators of a POVM are positive semidefinite, which observables do not
need to be. This positivity constraint, however, does not affect the informational
completeness:
Proposition 3.5 (POVMs vs. observables [21, Proposition 3]):

Let P ⊆ S(H). The following are equivalent.


(i) There exists a POVM M with m outcomes that is informationally com-
plete w.r.t. P.
(ii) There exists a set {A1 , A2 , . . . , Am−1 } ⊂ Herm(H) of m − 1 self-adjoint
operators that are informationally complete w.r.t. P.

Proof. Exercise.

Hence, the discussion of informational completeness can be reduced to observables.


New concepts and notation (short outline): Minkowski sum, cones, affine
(sub)spaces, convex sets (see Section 2) and Herm(Cd ), Pos(Cd ), unit trace matri-
ces and density matrices as examples.
The measurement map can be analyzed in a topological setting. We adjust [21,
Proposition 6] from POVMs to observables:
Proposition 3.6 (Inf. compl. and topological embeddings):

Let P ⊆ S(H) be a closed subset. A set of self-adjoint operators


{M1 , . . . , Mm } ⊂ Herm(H) is informationally complete w.r.t. P iff the mea-
surement map is a topological embeddinga of P into Rm .
aA topological embedding is an injective continuous map with continuous inverse on its
domain. It is a good exercise to prove that the inverse of a continuous bijection is
automatically continuous if the domain is compact.

23
Proof. The following proof of [21, Proposition 6] also holds here. Also note that con-
tinuous embeddings with compact domains automatically have a continuous inverse
(see a post on StackExchange), i.e., are topological embeddings.

Pure states
Let us consider the special but relevant case of P the set of pure states

Pd := { |ψ ihψ | : |ψ i ∈ Cd , k |ψ ik`2 = 1} , (3.7)

i.e. rank-1 density matrices. Before making statements on general dimension d it is


instructive to have a look at qubits, i.e., d = 2. Here, the following parametrization
is particularly useful.
Bloch sphere

A qubit state ρ ∈ S(C2 ) can be uniquely written as


3
1 X
ρ= + ri σi , (3.8)
2 i=1

where krk`2 ≤ 1 is the Bloch vector and (σi )i∈[3] are the Pauli matrices (2.1). A
sate ρ is pure, i.e., rank(ρ) = 1, iff it is of the form ρ = |ψ ihψ |, or equivalently,
iff krk`2 = 1.
Exercise 3.1:

Prove the claims made on the uniqueness and purity.

The Bloch representation tells us that the set of pure one-qubit states is isomorphic
to the 2-dimensional unit sphere in R3 . However, there is no topological embedding
of the 2D sphere into R2 . Hence, Proposition 3.6 tells us that two expectation values
are not enough for the recovery of states in P2 . On the other hand, dimR (S(C2 )) = 3
with the isomorphism Herm(Cd ) = Rd the normalization condition Tr[ρ] = 1 ∀ρ ∈ S
2

implies that 3 expectation values are sufficient to recover any state in P2 . Together,
this yields the first entry in the list (3.11).
In general, the minimum number of observables that are informationally complete
w.r.t. P can be bounded as follows.
Theorem 3.7 (Informational completeness for pure states [21]):

Let α denote the number of 1’s in the binary expansion of d − 1. Then


(i) there exists a collection of m self-adjoint operators which is information-
ally complete w.r.t. pure states in S(Cd ), if
(
4d − 3 − α for odd d ,
m= (3.9)
4d − 4 − α for even d ≥ 4 .

(ii) there exists no such collection if



4d − 2α − 4 ∀ d > 1 ,

m ≤ 4d − 2α − 2 d odd, and α = 3 mod 4 , (3.10)


4d − 2α − 3 d odd, and α = 2 mod 4 .

This theorem implies that for d = 4, . . . , 10, the minimum number mmin of observ-
ables that are informationally complete is known to be as follows — the cases d = 2, 3

24
are covered by [21, Theorems 2]:

d 2 3 4 5 6 7 8 9 10
(3.11)
mmin 3 7 9 or 10 15 17 or 18 22 or 23 23 − 25 31 33 or 34

Proof idea of Theorem 3.7. For the proof of the upper bound (3.9) an information-
ally complete set of observables is constructed based on Milgram’s work [22]. The
subset of pure states P ⊂ S(Cd+1 ) is diffeomorphic1 to the complex projective space
CPd . Milgram [22] constructed immersions2 of the complex projective space CPd
into R4d−α−c , where c = 1 for even d and c = 0 otherwise. Then an identification
of the real vector space of observables as Herm(Cd ) ∼ = Rd can be used to obtain an
2

informationally complete set.


For the proof of the lower bound (3.10) an extension of the argument leading to
Proposition 3.3 is derived that takes the non-trivial topology and curvature of the set
of pure state P ∼= CPd into account. First it is observed that M is informationally
complete w.r.t. P iff M is a topological embedding of P into Rm . Next, measurement
maps are considered as smooth topological embeddings from P ∼ = CPd into Rm with
a derivative that is injective everywhere. In particular, it is shown that if
(a) the measurement map M : P → Rm is injective and
(b) for all P ∈ P the inclusion TP (P) ⊆ cone(P − P) of the tangent space TP (P) at
P into the cone generated by differences of pure density matrices holds
then the measurement map M is a smooth embedding of P into Rm . Condition (b)
is proven to always hold. Then a non-embedding result by Mayer [23] is used, which
states that for the values of m in the lower bound (3.10) no such embedding and
hence no injective measurement map can exist.
In fact, Heinosaari et al. [21] also provide a similar upper bound for generic (ran-
dom) measurement maps. For all m > 2(2d − 2) it essentially states that observables
M1 , . . . , Mm ∈ Herm(Cd ) drawn at random from a continuous non-vanishing distri-
bution are informationally complete with probability one. We will make use of similar
randomized strategies in Section 3.6 on compressed sensing.
How many different projective measurements, i.e., measurements of observables
are required for quantum state tomography? Theorem 3.7 implies that at least 4
observables are necessary for most dimensions. But in order to make the POVM
measurements projective, some overhead might be required. Goyeneche et al. [24]
provide a construction of 5 observables that allow for quantum state tomography,
which also puts a close-by upper bound on that number.

3.2. Least squares fitting and linear inversion


In the last section we have discussed the number measurement outcomes or expec-
tation values that are necessary to reconstruct a quantum state. Moreover, we have
discussed how this number changes when one knows that the state is pure. However,
so far we have only discussed injectivity of the measurement map. In this section we
discuss how the measurement map can be practically be inverted (without using prior
knowledge on the state).
Let us fix the Hilbert space to be H = Cd . Moreover, let us assume that we are
given a measurement map M with measurement operators M = {M1 , . . . , Mm } (see
(3.2)) that is tomographically complete. Now let us assume that we have estimates
yi of Tr[ρMi ] from measurement data, i.e.,

y = M(ρ) +  , (3.12)

1A diffeomorphism is a smooth bijection with smooth inverse.


2 An immersion is a differentiable function between differentiable manifolds whose derivative is
everywhere injective. Due to the inverse function theorem, immersions are local embeddings, i.e.,
locally injective differentiable functions.

25
where  ∈ Rm is additive noise (e.g, including the unavoidable statistical estimation
error). Now we wish to compute a reasonable first estimate ρ̂ of ρ from y and M.
Without loss of generality we can assume that M : Herm(Cd ) → Rm is injective,
since, if it is not, we can modify it by extending M by 1 and setting the corresponding
estimate yj := 1. Note that M is not surjective in general. In particular, whenever
m > d2 the image M(Herm(Cd )) ⊂ Rm is a zero-set in Rm . Hence, for a continu-
ously distributed sampling error  (e.g. with components from a normal distribution
N (0, σ 2 )) the probability that there exists some X ∈ Herm(Cd ) such that M(X) = y
is also zero. In such a situation it is a common and practical strategy to use the least
squares estimate:
Least squares estimator

The least squares estimator of M : Herm(Cd ) → Rm and data vector y ∈ Rm


is
ρ̂LS := arg min kM(X) − yk`2 . (3.13)
2

X∈Herm(Cd )

Note that the minimum is unique whenever M is injective.


Later, in Section 3.7, we will apply a certain post-processing to this estimate to
obtain an estimate that is close to an optimal estimate in many situations, also w.r.t.
trace norm errors. But first, we calculate useful closed form expressions of the least
squares estimate (3.13). The basic one is the following, which is called linear inversion.
Proposition 3.8 (Linear inversion):

Let M : Herm(Cd ) → Rm be an in injective linear map. Then the least squares


estimate (3.13) is
−1 †
ρ̂LS = M† M M (y) . (3.14)

Proof. Exercise.
−1 †
The linear map M† M M is the pseudo-inverse M+ or Moore–Penrose in-
verse of M whenever M is injective (or equivalently, a matrix representation of M
has linearly independent columns, which is equivalent to M† M being invertible). It
can be calculated via an SVD, see Eq. (2.11).

Linear inversion on POVM data

Let M := {M1 , . . . , Mm } be an informationally complete POVM and that copies


of a state ρ are measured. Let us assume that there are no noise sources, so that
the probability of observing outcome i ∈ [m] is pi := Tr[ρM
Pmi ]. By ni we denote
the number of times outcome i is observed out of nρ = i=0 ni measurements
in total. We estimate the probabilities pi by the frequency p̂i := ni /nρ , which is
also called the empirical estimate of pi . Thanks to informational completeness,
the estimate ρ̂LS := M+ (p̂) converges to ρ for nρ → ∞, which is also implied
by the following.

Exercise 3.2 (Sampling error):

Prove that
1
E kρ̂LS − ρk2 ≤ √ , (3.15)
λmin (M) nρ
where λmin (M) is the minimum singular value of M.

26
The linear inversion estimator (3.14) can be calculated more explicitly for many
relevant measurement settings that have additional structure. As we will see, this can
also allow to control λmin (M). Two important mathematical tools to capture such
additional structure are frame theory and complex projective k-designs.
In order for a POVM to be informationally complete its elements need to span the
full vector space Herm(Cd ). However, the POVM elements cannot constitute a basis,
see Exercise 3.3. Such generating sets of vectors are generally refereed to as frames.
There are several important properties that frames can have and that can, e.g., indeed
be used to further simplify the linear inversion estimator (3.14).
Exercise 3.3 (No PSD basis):

Prove that there is no basis of positive-semidefinite operators.

3.3. Frame theory


Frame theory is the theory of expanding vectors into a generating system that does
not need to be a basis. The development of this theory was very much motivated by
applications from signal processing and image processing in particular. For instance,
an important class of example is given by the wavelet transform, which is the math-
ematical tool JPEG compression is based on. Frame theory is naturally also used in
quantum state tomography, see e.g., [25, 26] for references in quantum information
theory and [27] for the mathematical duality theory.
Let us consider an inner product space (V, h · | · i). A set of vectors { |vi i} ⊂ V is
called a frame for V if there are constants 0 < A ≤ B < ∞ such that
X
(3.16)
2 2
A k |x ik ≤ | hvi |xi |2 ≤ B k |x ik ∀ |x i ∈ V .
i

The constants A and B are called the lower and upper frame bounds, respectively.
They are of practical importance in numerical applications. The self-adjoint operator
X
S := |vi ihvi | (3.17)
i

is called frame operator. For any |x i ∈ V we have


X
hx| S |xi = | hvi |xi |2 (3.18)
i

Hence, the bounds (3.16) imply that the eigenvalues of S are contained in the interval
[A, B]. In particular, S is a positive operator. In fact, the best frame bounds are

A = min(spec(S)) , B = max(spec(S)) , (3.19)

where spec(S) ⊂ R is the set of eigenvalues of S.


The vectors { |ṽi i} ⊂ V given as

|ṽi i := S −1 |vi i (3.20)

are called (canonical) dual frame. It is easy to see that the following frame expansions
hold for any |x i ∈ V ,
X X
|x i = hvi |xi |ṽi i and |x i = hṽi |xi |vi i . (3.21)
i i

27
Note that the frame operator of the dual frame is
X X
S̃ = |ṽi ihṽi | = S −1 |vi ihvi | (S −1 )† = S −1 SS −1 = S −1 . (3.22)
i i

In particular, the dual of the dual frame is the frame itself. Moreover, since the
eigenvalues of S −1 are contained in [1/B, 1/A] it follows that the upper and lower
frame constants of the dual frame are 1/A and 1/B, respectively.
The frame { |vi i} ⊂ V is called a tight frame for V if upper and lower frame
bounds coincide, i.e.,
Pif A = B. A tight2 frame is called a Parseval frame if it satisfies
Parseval’s identity i | hvi |xi |2 = kxk for all x, i.e., if it is a tight frame with unit
frame constant. Note that the dual frame of a tight (Parseval) frame is also a tight
(Parseval) frame with the inverse frame constant. Let us summarize these insights:
Proposition 3.9 (Dual frames):

Let us consider a frame with frame operator S and frame constants 0 < A ≤ B.
Then the frame operator of the dual frame is S −1 and has frame constants
0 < 1/B ≤ 1/A. In particular, the dual frame of a tight frame is again a tight
frame with inverse frame constant.

The frame potential of { |v1 i , . . . , |vn i} is Tr[S 2 ]. Normalized tight frames are the
ones that minimize the frame potential:
Theorem 3.10 (See [28, Section I.D] for a discussion and references):

Let { |v1 i , . . . , |vn i} ⊂ Sd−1 be a frame for Cd of unit norm vectors. Then

n2
Tr[S 2 ] ≥ . (3.23)
d
Moreover, the bound is saturated iff { |v1 i , . . . , |vn i} is a tight frame.

Proof. We remember that S is a positive operator. Let {λi }i∈[d] be the eigenvalues
of S. Since S is positive we have
d
X d
X
Tr[S] = n = λj = kSk1 and Tr[S 2 ] = λ2j = kSk2 (3.24)
2

j=1 j=1


and the inequality kSk1 ≤ d kSk2 establishes the bound. Moreover, this inequality
is only saturated when all singular values of S are the same.

3.4. Complex spherical/projective k-designs


Finite k-designs on a sphere are sets of evenly distributed points. More precisely,
a (finite) spherical k-design is a set of normalized vectors { |ψi i}i∈[n] ⊂ Sd−1 such
that the average value of certain k-th order polynomials g( |ψi i) (equal order k in
hi|ψi and hi|ψi ) over the set { |ψi i coincides with the average of g( |ψ i) over all

normalized vectors Sd−1 . Switching to density matrix representation we make the


following definition, which can be shown to be equivalent to the polynomial one above
[25, 29].

28
Definition 3.11 (k-design):

We call a probability measure µ on the complex unit sphere Sd−1 ⊂ Cd a


(complex spherical) k-design if
Z Z
( |ψ ihψ |)⊗k dµ(ψ) = ( |ψ ihψ |)⊗k dψ =: Kk , (3.25)
Sd−1 Sd−1

where dψ denotes the uniform U(d)-invariant probability measure on Sd−1 .


The corresponding distribution of |ψ ihψ | is called a (complex projective k-
design). A finite set of states is called a k-design if the uniform distribution
over { |ψi ihψi |} is a k-design.

See also Refs. [29, 30] for related definitions.

Exercise 3.4 (k-designs):

Prove that a k-design is also a k − 1 design for k ≥ 2.

In order to more conveniently characterize k-designs we calculate Kk more closely.


This average is also called Haar-average because it can be obtained from the Haar
measure, either on CPd−1 or on U(d). We summarize statements made by Renes et
al. [25] with the following lemma. For this lemma it is helpful to remember that the
symmetric subspace of (Cd )⊗k is the subspace spanned by vectors of the form |φ i ,
⊗k

where |φ i ∈ Cd .
Lemma 3.12 (k-th moment):

The operator Kk from Definition 3.11 is

k!(d − 1)!
Kk = P k, (3.26)
(k + d − 1)! sym

where Psymk is the projector onto the symmetric subspace (2.25) of (Cd )⊗k .

The required material on representation theory (Section 2.2) was also covered in
Lecture 5.] Lecture 5 In order to prove Lemma 3.12 we will need a bit of representation
theory, see Section 2.2 for the required preliminaries.
Proof of Lemma 3.12. Note that Kk is only supported on the symmetric subspace
and, since Kk = Kk† , that also its range is contained in the symmetric subspaces, i.e.
ker(Kk ) = ran(Kk ) ⊆ Hsymk .
Next we observe that for any U ∈ U(d) we have ∆d (U )† Kk ∆d (U ) = Kk and,
hence, Kk ∆d (U ) = ∆d (U )Kk . Schur’s lemma (Theorem 2.3) implies that Kk |Hsymk =
c 1|Hsymk . Together with ker(Kk ) = ran(Kk ) ⊆ Hsymk identity implies that Kk =
c Psymk .
In order to obtain c, note that Tr[Kk ] = 1 and Tr[Psymk ] is given Eq. (2.30).
Getting back to tight frames, we can make the following statement.
Proposition 3.13:

Complex spherical 1-designs are tight frames.

Proof. Let { |ψi i}i∈n be a 1-design. We note that the frame potential S of { |ψi i}i∈n
is
n
S = nK1 = 1 . (3.27)
d

29
Moreover,
n2 n2
Tr[S 2 ] = 2
Tr[1] = . (3.28)
d d
Hence, by Theorem 3.10, { |ψi i}i∈n is a tight frame.

More generally, one can can lift Theorem 3.10 on frame potentials to k-designs. For
this purpose, we define the frame operator of order k of { |ψi i}i∈n to be
n
X
Sk := (3.29)
⊗k
|ψi ihψi | .
i=1

Theorem 3.14 (k-designs and tight frames [25]):



A set of states { |ψi i}i∈[n] with n ≥ k+d−1 is a k-design iff { |ψi i }i∈[n] is a
⊗k
d−1
tight frame of the symmetric subspace, i.e., iff

n2 k!(d − 1)!
Tr[Sk2 ] = , (3.30)
(k + d − 1)!

This is the global minimum of { |ψi i}i∈[n] 7→ Tr[Sk2 ].

Proof. By the definition of Kt and Sk we have that { |ψi i}i∈[n] is a k-design iff

1
Kk = Sk . (3.31)
n

With Ψki := |ψi i we can write Sk as
⊗k

n
X k
k
Sk = Ψi Ψi . (3.32)
i=1

Since Tr[Kk ] = 1, Theorem 3.10 together with Lemma 3.12 finish the proof.

3.4.1. Examples for 2-designs


Stabilizer states (STABs)

STABs are states that are ubiquitous in quantum information and are defined
as follows. An n-qubit Pauli string is σs1 ⊗ · · · ⊗ σsn , where s ∈ {0, 1, 2, 3}n and
{σi } are the Pauli matrices (2.1). Then the Pauli group Pn ⊂ U(2n ) is the group
generated by all n-qubit Pauli strings and i1. An n-qubit state |ψ i is a stabilizer
state if there is an abelian subgroup S ⊂ Pn , called stabilizer (subgroup), that
stabilizes |ψ i and only |ψ i, i.e., |ψ i is the unique joint eigenvalue-1 eigenstate
of all elements in that subgroup. Such subgroups turn out to be generated by
n elements. Note that they cannot contain the element −1.
An example of such a subgroup is the one of all Pauli strings made of 1’s and
σz ’s.
The set of all stabilizer states is known to be a 2-design [31, 32], actually even
a 3-design but not a 4-design [28, 33, 34].

30
Exercise 3.5 (Stabilizer states):

Let ρ be an n-qubit stabilizer state with stabilizer S. Show that


1 X
ρ= S. (3.33)
2n
S∈S

Mutually unbiased bases (MUBs)

MUBs are sets of bases with minimal overlaps. More explicitly, two orthonor-
mal bases { |ψi i}i∈[d] ⊂ Cd and { |φi i}i∈[d] ⊂ Cd are said to be mutually
unbiased if | hψi |φj i |2 = d1 for all i, j ∈ [d]. For instance, if U ∈ U(d) is the
discrete Fourier transform the { |i i}i∈[d] ⊂ Cd and {U |i i}i∈[d] ⊂ Cd are mutu-
ally unbiased. The number of MUBs in Cd is upper bounded by d + 1 and in
prime power dimensions (e.g., for qubits) there are exactly d + 1 MUBs [35].
However, it is a well-known open problem to exactly obtain this number for
all d. Klappenecker and Roettler [36] showed that maximal sets of MUBs are
2-designs.

SIC POVMs

A symmetric, informationally complete (SIC) POVM is given by (see


Eq. (3.41)) a set of d2 normalized vectors { |ψj i}j∈[d2 ] ⊂ Sd−1 ⊂ Cd satisfying

1
| hψi |ψj i |2 = ∀i 6= j . (3.34)
d+1
“Symmetric” refers to the inner products being all equal. Renes et al. [25] have
shown that SIC POVMs are indeed 2-designs and have explicitly constructed
them for small dimensions.

3.5. Symmetric measurements and the depolarizing


channel
The depolarizing (quantum) channel models isotropic noise in quantum processes.
However, it also appears in the frame operator of several important measurement
settings. In this section and the next ones we follow Guta et al. [37, Appendix,
Section VI.A]. We will explicitly provide the pseudo-inverse of the measurement map
for several types of measurements.
Definition 3.15 (Quantum depolarizing channel):

The (quantum) depolarizing channel Dp : L(Cd ) → L(Cd ) with parameter


p ∈ [0, 1] is the linear map defined by
1
Dp (X) := pX + (1 − p) Tr[X] . (3.35)
d

Proposition 3.16 (The inverse of the depolarizing channel):

For any p > 0 the depolarizing channel is invertible (as a map) and the inverse
is given by
1 1−p 1
Dp−1 (X) = X − Tr[X] . (3.36)
p p d

31
Proof. Exercise.

Proposition 3.17 (Symmetric measurements):

Let M : S(Cd ) → Rm be given by measurement operators M =


{M1 , . . . , Mm } ⊂ Herm(Cd ). Suppose that the frame operator, which is given
as SM = M† M is SM = c Dp for constants c > 0 and p > 0 (possibly depending
on d and m). Then the dual frame is
1 1−p
M̃i = Mi − Tr[Mi ] 1 (3.37)
cp cpd
and the pseudo-inverse of M given by
 
1 1
M+ (y) = M† (y) − (1 − p) Tr[M† (y)] . (3.38)
cp d

Proof. We remember the adjoint measurement map (3.40) and the definition (3.17)
of the frame operator to note that indeed
m
X
M† M(X) = Tr[XMi ] Mi = SM . (3.39)
i=1

Since Dp is invertible for p 6= 0 the measurement operators M are a frame for


Herm(Cd ).
The inverse frame operator is obtained as SM
−1
= 1c Dp−1 with the inverse depolarizing
channel (3.36). The dual frame is obtained by M̃i = SM −1
(Mi ) and noting that the
Hilbert-Schmidt adjoint of M, with the component-wise definition (3.2), is given by
m
X
M (y) =

yi Mi . (3.40)
i=1

In order to obtain the pseudo-inverse of M we observe that M† M = SM and that


this operator is invertible. Hence, the pseudo-inverse can be obtained with the linear
inversion formula (3.14).
The pseudo-inverse (3.38) is given by scaling and shifting M† , so up to this scaling
and shifting it acts like a unitary. This justifies calling such measurements symmetric.
This is a weaker form of symmetry as the one required for SIC POVMs [25], which is
an instance of POVMs based on 2-designs.

3.5.1. 2-design based POVMS


There are two different approaches to connect a POVM M = {M1 , . . . , Mm } to frame
theory: (i) to consider M as a frame for Herm(Cd ) [38] and (ii) to construct M from a
frame for the Hilbert spaces Cd . While the idea (i) seems to be more straightforward,
(ii) can provides a lot of structure that can be exploited in various ways.
Throughout this section we consider a POVM M = {M1 , . . . , Mm } that is propor-
tional to a complex projective 2-design { |ψi ihψi |}i∈[m] ⊂ Cd ,

d
Mi = |ψi ihψi | (3.41)
m
for i ∈ [m].

32
Proposition 3.18:

M with measurement operators (3.41) is indeed a POVM.

Proof. Obviously, Mi  0. Taking the partial trace of K2 , using Lemma 3.12 and
remembering the basis expansion of the flip operator (2.32) yields
2 1
Tr2 [K2 ] = Tr2 [Psym2 ] = Tr2 [1 + F]
d(d + 1) d(d + 1)
(3.42)
1 1
= (d1 + 1) = 1 .
d(d + 1) d
Pm
Noting that i=1 Mi = d Tr2 [K2 ] shows that M is indeed a POVM.
The POVM M is a frame with a frame operator given in terms of the measurement
map (3.2) and dual frame as follows.
Lemma 3.19 (Frame from 2-design POVMs):

Any POVM M given by a 2-design as in (3.41) is a frame for Herm(Cd ) with


frame operator
d
SM = Dp (3.43)
m
and parameter p = d+1 .
1

Proof. By the definition of the measurement operators (3.41) the frame operator can
be written in terms of the 2-design as
m
d2 X
SM (X) = Tr[X |ψi ihψi |] |ψi ihψi | (3.44)
m2 i=1
" #
1 X
m
d2
= Tr1 (X ⊗ 1) |ψi ihψi | ⊗ |ψi ihψi | (3.45)
m m i=1
d2
= Tr1 [(X ⊗ 1)K2 ] . (3.46)
m
Using again Lemma 3.12 and the flip operator (2.33),

d2 2
SM (X) = Tr1 [(X ⊗ 1)Psym2 ]
m d(d + 1)
(3.47)
d
= (Tr[X]1 + Tr1 [(X ⊗ 1)F]) .
m(d + 1)

Now we can use the swap-trick (2.68) to obtain

d
SM (X) = (X + Tr[X]1)
m(d + 1)
(3.48)
d
= D1/(d+1) (X) .
m
This identity implies that SM is invertible. Hence, M is indeed a frame.
This lemma allows for a full frame characterization of the POVMs that are given
by spherical 2-designs.

33
Corollary 3.20 (Frame characterization and linear inversion for
POVMs from 2-designs):

M given by (3.41) is a frame for Herm(Cd ) with frame operator given by

d
SM (X) = (X + Tr[X]1) . (3.49)
m(d + 1)

The inverse frame operator is given by


m 
−1
SM (X) = (d + 1)X − Tr[X] 1 (3.50)
d
and the dual frame M̃ := {M̃1 , . . . , M̃m } by

m(d + 1)
M̃i = Mi − 1 . (3.51)
d
The pseudo-inverse of the measurement map M is given by
m
X m
X
M+ (y) = (d + 1) yi |ψi ihψi | − yi 1
i=1 i=1 (3.52)
m(d + 1) †
= M (y) − h1, yi 1 ,
d
where 1 ∈ Rm is the vector containing only ones.

Proof. Having characterized the frame operator in Lemma 3.19 this corollary directly
follows with Proposition 3.17 and Proposition 3.16.

3.5.2. Pauli measurements


Let us denote the n-qubit Pauli strings by W0 , . . . , Wd2 −1 with W0 = 1, where d =
2n . Note that the spectrum of each non-identity Pauli string is {−1, 1}, each with
degeneracy d/2. So, each Pauli string is Wi = Pi+ − Pi− , where

1
Pi± = (1 ± Wi ) (3.53)
2
is the projector onto the eigenvalue ±1 eigenspace (note that P0+ = 1 and P0− = 0).
Each Pauli string (observable) Wi is associated with a two-outcome POVM given by
Mi := {Pi+ , Pi− }. Now we consider a measurement setting given by the union of all
these POVMs plus the trivial measurement given by W0 = 1, i.e., by
2
d[ −1

M := Pi+ , Pi− . (3.54)
i=0

The frame operator is given by


2
dX −1

SM (X) = M M(X) =

Tr[Pi+ X]Pi+ + Tr[Pi− X]Pi−
i=0
2
1 (3.55)
dX −1
= (Tr[Wi X]Wi + Tr[X] 1)
i=0
2
d
= (X + d Tr[X] 1) ,
2

34
√ d2 −1
where we have used that {Wi / d}i=0 is an orthonormal basis of Herm(Cd ) w.r.t. the
Hilbert-Schmidt inner product. In terms of the depolarizing channel (Definition 3.15)
this result reads as
d(d2 + 1)
SM = D 21 . (3.56)
2 d +1

Each POVM Mi = {Pi+ , Pi− } is associated to measurement outcomes ±. For a state


ρ ∈ S(Cd ) the ideal measurement outcomes are Tr[ρPi± ] and a measurement vector
yi± ∈ R2 is the corresponding actually measured quantity. Together, this yields a
measurement vector y = (yio )i∈[d2 ],o∈{±} ∈ Rm with m = 2d2 .
Now use use Proposition 3.17 with c = 12 d(d2 +1) and p = 1/(d2 +1) to calculate the
pseudo-inverse of the measurement map. Note that 1 − p = d2d+1 and cp = d2 Using
2
1

(3.53), yi+ + yi− = 1, and Proposition 3.17 with c = 21 d(d2 + 1) and p = 1/(d2 + 1),
the liner inversion estimate (3.14) simplifies as
 2 2

2
−1 X
dX 2 dX −1 X
d 1
M+ (y) =  yo P o − 2 y o Tr[Pio ] 
d i=0 o=± i i d + 1 i=0 o=± i d
 2 
−1   2
2 1 +
dX dX −1
 1 d 2
d 1
=  y − yi− Wi + − 2 (3.57)
d i=0 2 i 2 d + 1 i=0 2 d
2
1 X +
d −1

= yi − yi− Wi .
d i=0

This formula has a nice interpretation: The difference yi+ − yi− is the empirical ex-
pectation of Wi and the pseudo-inverse is the corresponding empirical estimate of the
state ρ (the 1/d factor comes from the normalization of Wi ).

3.5.2.1. Full Pauli basis measurements


Often the measurement of a Pauli string Wi = σs(i) ⊗ . . . σs(i) yields one binary
1 n
measurement outcome per qubit even though the spectrum of Wi is degenerate. The
corresponding post-measurement states are the tensor products of the eigenstates of
σs(i) , . . . , σs(i) denoted by |bos i := bos11 ⊗ · · · ⊗ bosnn . The set of all tensor products
1 n
of all single qubit Pauli basis states

M := |bos i o∈{±}n ,s∈{x,y,z}n (3.58)

defines the full set of Pauli basis measurements. For n ≥ 2 qubits MPB is not a
2-design, as can be checked via Eq. (3.30). However, the pseudo-inverse of the mea-
surement map can still be inverted in a similar fashion as before.
Exercise 3.6 (Pauli basis measurements):

Find the frame operator, the dual frame, and the pseudo-inverse of the mea-
surement of Pauli-basis measurements.

This exercise yields the linear inversion estimator, i.e., the pseudo-inverse of the
measurement map as

1 X X n
O
M+ (y) = yso (3 |bos ihbos | − 1) . (3.59)
3n
s∈{x,y,z}n o∈{±}n i=1

35
3.6. Compressed sensing
The new material on linear programming (Section 2.5.1) was also covered in Lec-
ture 13.] Lecture 13 Compressed sensing (a.k.a. compressive sensing) is still quite a
young research field concerned with the reconstruction of signals from few measure-
ments, where a signal is anything one might want to recover from measurements.
The field was mainly initialized by works of Candès, Romberg, and Tao [39, 40] and
Donoho [41]. It has an extremely wide range of applications. In particular, it is of
great use in many technological applications including imaging, acoustics, radar, and
mobile network communication.
The problems considered in compressed sensing are typically of the following form.
There is a signal x ∈ V ∼
= Rn that is known to be compressible (e.g., a sparse vector
or low-rank matrix). Then one is given access to linear measurements

yi = hai , xi + i for i ∈ [m] (3.60)

that are given by measurement vectors ai ∈ V and additive noise i ∈ R that might
arise in the measurement process. Now the task is to reconstruct x from y and
(ai )i∈[m] . We wish the reconstruction to be efficiently implementable and guaranteed
to work for m being as small as possible.
In matrix notation we can rewrite (3.60) as

y = A(x) +  (3.61)

where the i-th component of the measurement map A is A(x)i := hai , xi. For a
simplified setting with m ≥ n and without noise ( = 0) we know from linear algebra
that the inverse problem (3.61) has a solution for all signals X iff A is injective. For
m < n, however, A is given by a short fat matrix, which cannot be injective.
Compressed sensing exploits a known compressibility of X in order to still prac-
tically solve the inverse problem (3.61) for several simple forms of compressibility,
the simplest one being sparsity. Indeed, signals that are sparse in a known basis (or
frame) have received the most attention [10]. Let us denote by kxk`0 := |{i : xi 6= 0}|
the sparsity or `0 -norm (which is not a norm) of a signal x ∈ Rn . Moreover, let us
consider again the noiseless case, i.e.  = 0. If the inverse problem given by (3.61)
has a unique solution then the solution must be the minimizer of the optimization
problem
minimize kzk`0 subject to A(z) = y . (3.62)
While this reconstruction is clearly optimal in terms of the required number of mea-
surements m, it cannot be implemented efficiently. Indeed, even for any fixed η ≥ 0,
the more general problem

minimize kzk`0 subject to kA(z) − yk`2 ≤ η (3.63)

is NP-hard, as can be proven by a reduction from exact cover by 3-sets problem [42]
(see also [10, Theorem 2.17]). In fact, the NP-hardness still holds when the `0 -norm
is replaced by an `q -norm for any q ∈ [0, 1) [43] (which are quasi-norms).
We note that these hardness results extend to matrices, where the sparsity is re-
placed by the matrix rank, i.e., where the recovery problem is of the form

minimize rank(Z) subject to kM(Z) − yk`2 ≤ η (3.64)

for some fixed η ≥ 0, where M(Z)i = Tr[Mi† Z] for measurement matrices Mi .


At first sight, these results seem to be bad news for the compressed sensing idea.
However, these results just mean that the problems (3.63) are hard to solve for general
matrices A and vectors y, i.e., that there cannot be an efficient algorithm that solves
all instances of the problem. In practical applications, however, one is typically given
very specific measurement setups that determine A and y and has also some control
allowing to tune these quantities.

36
Compressed sensing makes use of this observation. In typical settings, the measure-
ments A are drawn at random, where hard instances do practically not occur. The
function that measures the compressibility of the signal is replaced by a tractable
convex function. In particular, for the mentioned examples, the sparsity is replaced
by the `1 -norm and the matrix rank by the trace norm, i.e., the problems (3.63) and
(3.64) are replaced by

minimize kzk`1 subject to kA(z) − yk`2 ≤ η , (3.65)


minimize kZk1 subject to kM(Z) − yk`2 ≤ η , (3.66)

respectively. Such an idea of replacing a non-convex function by a convex one in an


optimization problem is called convex relaxation and can be done systematically for
a given set of the “most simple” signals [44] by using convex analysis [11]. Then, for
the easiest-to-analyze case of fully Gaussian measurement vectors, it turns out that
the number of measurements required for recovery (with high probability over the
random measurements) is

m∗ ≈ 2s ln(n/s) + 2s and (3.67)


m ≈ 3r(n1 + n2 − r)

(3.68)

for s-sparse signals in Rn and real n1 × n2 matrices of rank r, respectively [44–46].


Note that these numbers are much smaller than the dimensions of the ambient space,
m∗  n and m∗  n1 n2 , respectively. In particular, the number of measurements for
case of d × d rank-1 matrices is 6d − 1. This number is roughly a factor of 1.5 larger
than needed to guarantee injectivity of the measurement map M, which is roughly 4d
according to Theorem 3.7. However, this comparison is not completely fair as (3.68)
holds for real and non-Hermitian matrices but the complex Hermitian behaves very
similarly.
It should be emphasized that these results come along with rigorous recovery guar-
antees that can be extended to many different types of measurements (not just Gaus-
sian ones). However, for each new type of measurements the proofs need to be ad-
justed.
The new material on linear programming (Section 2.5.2) was also covered in Lec-
ture 14.] Lecture 14 Let us from now on only focus on low-rank matrix reconstruction,
as this is the relevant problem for quantum state tomography. We emphasize that
the convex optimization problem (3.65) (and also (3.66)) can be solved efficiently by
rephrasing it as SDP (see Section 2.5). Once we can write a constraint of the form
kwk`2 ≤ η as a PSD constraint we can include kM(Z) − yk`2 ≤ η as additional con-
straint into the dual SDP formulation of the trace norm (2.63) and obtain an SDP
formulation of the constraint trace norm minimization (3.66). But kwk`2 ≤ η can be
written as X
λi ≤ η 2 and wi2 ≤ λi (3.69)
i

and the latter constraints can be rewritten as PSD constraint using (2.60) as
 
λi wi
2
|wi | ≤ λi ⇔  0. (3.70)
wi∗ λi

There are several variations of the reconstruction (3.66). For instance, there is the
matrix Dantzig selector where the measured matrix is estimated by the minimizer of

minimize kZk 1 subject to M† (M(Z) − y) ≤ λ ,
op
(3.71)

where λ depends on the noise strength as it needs to satisfy λ ≥ M† () op . ma-
trix Lasso (least absolute shrinkage and selection operator) estimate obtained as the
optimal point of
1
minimize µ kZk1 + kM(Z) − yk`2 , (3.72)
2
2

37

where µ ≥ 2 M† () op . These optimization problems can all be written as SDPs and
yield very similar solutions.
There are also computationally more efficient algorithms to solve these convex op-
timization problems. For instance, one can use singular value thresholding, which is
an iterative algorithm where one iteratively (i) updates a matrix by adding η M† (y)
for some step size η and (ii) shrinks the singular values to keep it approximately low
rank. In fact, such algorithms are an instance of the alternating direction method of
multipliers (ADMM), see e.g. [47], where one solves a class of convex optimization
problems including (3.66) iteratively. These algorithms are relatively fast and also
allow for rigorous performance guarantees. Another direction is to us non-convex
optimization methods to directly approximately solve the rank minimization problem
(3.64). These methods are even faster and often need even fewer measurements but
rigorous guarantees very rare.
Finally, let us summarize what a recovery guarantee for low rank matrix recon-
struction exactly is. Typical cases are covered by the following definition.
What is a recovery guarantee?

Let X ∈ Herm(Cd ) be a signal of matrix rank r and Mi ∈ Herm(Cd ) for


i ∈ [m] be measurement matrices drawn iid. from some measure µ on Herm(Cd ).
Moreover, let
yi = Tr[Mi X] + i , i ∈ [m] , (3.73)
define a measurement vector, possibly corrupted by additive noise  bounded
as b() ≤ η (typically b = k · k`2 ).
Then a recovery guarantee provides a threshold m0 depending on r and d so
that for all m ≥ m0
X̂ − X ≤ h(η)
p
(3.74)
(typically p = 1) with probability at least 1 − δm .
Typically, δm ≤ Ce−c m for some constants C, c > 0. The function h captures
stability of the reconstruction against additive noise. Similarly, one can quantify
the robustness of the reconstruction against small violations against the low-
rank assumption. Typically, the robustness is quantified in terms of kX − Xr k1 ,
where Xr is the best rank-r approximation to X in one (and then all) Schatten
p-norms.
A recovery guarantee is called uniform if w.h.p. a drawn measurement map
recovers all matrices satisfying the rank constraint.

It is crucial that a “good” distribution µ of measurements is chosen. For instance,


if µ is a projective 4-design one can prove recovery guarantees with an essentially
optimal scaling [48] but if µ is a only a projective 2-design –such as a maximum set of
MUBs– then the measurement if not injective with high probability [49, Theorem 18].
This shows that the measurement matrices need to be able to cover the matrix space
densely enough in order for such randomized strategies to work.

3.6.1. Application to quantum state tomography


In quantum state tomography one often aims at the reconstruction of an (approxi-
mately) pure state. Hence, compressed sensing can be naturally applied here. In fact,
exactly this application has lead to a great break through in low-rank matrix recon-
struction [50, 51] by using matrix concentration inequalities that arose in quantum
information theory [52].
In the quantum state tomography setting there is a simple way to improve the recon-
struction via the trace norm minimization (3.66). We can simply add the constraint
that Z is a density matrix, which is indeed a PSD constraint. But then kZk1 = 1, so
the minimization is superfluous and we are left with a feasibility problem, i.e., with
the task to find a density matrix Z such that kM(Z) − yk`2 ≤ η. But then we can

38
µ, Reference m0 h(η) Remarks/properties Proof technique

Random Paulis, [50] O(r d ln(d)2 ) O(η r d) Golfing scheme [51]
O(r η), where
Random Paulis, [55] O(r d ln(d) )
6
Uniform, robust RIP [56]
η ≥ kM† ()kF
rank(Mi ) = 1, kMi kop . d, √
O(r d ln(d)) O(η/ m), p = 2 Uniform Bowling scheme [46]
(approx.) proj. 4-designs, [48]  
rank(Mi ) = 1, kMi kop . d, kk` r 1/p−1/2
O(r d ln(d)) O 2√
PSD-fit, uniform, robust NSP [54]
(approx.) proj. 4-designs, [54] m
 
r 2 dkk` NSP [54], Clifford
Random Clifford orbits [57] O(r3 d ln(d) O q
PSD-fit, uniform, robust
m1/q irreps [58]

Table 3.1.: List of recovery guarantees for compressed sensing based quantum state tomography.
“Random Paulis” refers to measurements of Pauli string observables. A random Clifford orbit is
a set of states obtained by applying all Clifford group operations to a certain fixed state.
There are several proof techniques for low-rank matrix reconstruction: The Golfing scheme relies
on dual certificates, the restricted isometry property quantifies the “distortion” of the relevant
signals under the measurement map, the Bowling scheme is a combination of geometric proof
techniques [44] and Mendelson’s small ball method [59], and the (stable and robust rank) null
space property (NSP) is a property of the measurement map that allows to quantify the robustness
of a reconstruction against violations of the low-rank assumption.

minimize that quantity instead. This yields the so-called PSD-fit

minimize kM(Z) − yk`2 subject to Z ∈ S(Cd ) , (3.75)


2

which was suggested by Baldwin et al. [53] and made rigorous by Kabanava et al. [54].
Besides having a slightly improved performance the main advantage of this reconstruc-
tion method is that it does not require an estimate on the noise strength.
For an overview of recovery guarantees for convex compressed sensing methods in
state tomography see Figure 3.1.

3.7. Projected least squares estimation


Extended norm-minimization estimation [60], or now called projected least squares
(PLS) estimation [37] complements linear inversion in order improve the reconstruc-
tion and to analytically give confidence regions (in trace norm) for the reconstructed
state. Similarly as in compressed sensing, the PLS estimator can take advantage of
low-rankness of the measured state [37].
First confidence guarantees in terms of a conditioning of the measurement map
have been provided by Sugiyama et al. [60]. These guarantess have recently been
improved and calculated explicitly for the following types of measurements by Guta
et al. [37]:
• 2-design based measurements (Section 3.5.1, Eq. (3.41)),
• Pauli observable measurements (Section 3.5.2, Eq. (3.54)),
• Pauli basis measurements (Section 3.5.2.1, Eq. (3.58)).
PLS estimation works in two steps: (i) a linear inversion (3.14) of the measurement
data is performed and (ii) the linear inversion estimate is projected onto the set of
density matrices. Remember that the least squares estimator can bey calculated as
ρ̂LS = M+ (y) (Proposition 3.8).
We again denote the measurement matrices by M = {M1 , . . . , Mm } ⊂ Herm(Cd )
and the corresponding measurement map by M : Herm(Cd ) → Rm . Then the PLS
estimator of a quantums state ρ ∈ S(Cd ) from measurement data y = M(ρ) +  is
explicitly given as [60]
ρ̂PLS := arg min kσ − ρ̂LS k2 , (3.76)
σ∈S(Cd )

39
where ρ̂LS = M+ (y) and M+ denotes the pseudo-inverse of M.
Theorem 3.21 (Error bounds for PLS [37]):

Let ρ ∈ S(Cd ) be a state, fix a total number of measurements of ρ to be nρ and


let ε ∈ [0, 1]. Consider the following three measurement setups with associated
function g,
(i) g(d) := 2d for 2-design based POVMs (3.41),
(ii) g(d) := d2 for Pauli observable measurements (3.54), and

(iii) g(d) := d1.6 for Pauli basis measurements (3.58).


Then the PLS estimator (3.76) satisfies
 
nρ ε2
P[kρ̂PLS − ρk1 ≥ ε] ≤ d exp − , (3.77)
43 g(d)r2

where r = min{rank(ρ), rank(ρ̂PLS )}.

We will outline the proof only for the case of 2-design based POVMs in Section 3.7.1.
The theorem tells implies that with probability at least 1 − δ the PLS estimation
yields an estimate of ρ within trace norm error ε whenever the number of measure-
ments is
rank(ρ)2
nρ ≥ 43 g(d) ln(d/δ) . (3.78)
ε2
The PLS estimation (3.76) can be written as an SDP, which can be seen with
the machinery of Section 2.5.2. However, the PLS estimator allows for an analytic
solution (up to one parameter), which allows to compute it much faster than the SDP
runtime.
Proposition 3.22 (State space projection [61], version of [37]):

Let PS : Herm(Cd ) → Herm(Cd ) be the Euclidean projection onto the set of


density matrices, i.e.,

PS (X) := arg min kX − ρkF (3.79)


ρ∈S(Cd )

for X ∈ Herm(Cd ). Then for X ∈ Herm(Cd ) with Tr[X] = 1 the projection


ρ = PS (X) can be calculated as follows from and eigenvalue decomposition
X = U diag(λ)U † , with λ ∈ Rd and U ∈ U(d). Set

ρ := U diag([λ − x0 1]+ )U † (3.80)

i := max{yi , 0}, 1 ∈ R is the vector with only 1-entries, and x0 is


where [y]+ d

chosen such that Tr[ρ] = 1. Moreover, x0 is the root of the function f defined
bya
d
X
f (x) := |λi − x| − d · x − Tr[X] . (3.81)
i=1
a We obtained a slightly different function than in [37, ArXiv version v1].

Proof. Exercise.

3.7.1. Proof of Theorem 3.21 for 2-design based POVMs


In order to fully explain the general approach of PLS estimation we review the proof of
Theorem 3.21 for the 2-design based measurements in more detail. Let us remember

40
that the linear inversion on the measurement data is given by M+ with the closed
form expression (3.52).
We note that M is a single POVM and we take
ni
yi = (3.82)

where ni is the number of times (out of nρ single measurements) the outcome i


corresponding to POVM element
d
Mi = |ψi ihψi | (3.83)
m
(cp. (3.41)) was measured.
First, we quantify the error that arises in the linear inversion.
Lemma 3.23 ([37, Appendix, Section VII.A]):

For measurements (3.82) set ρ̂LS := M+ (y) with the pseudo-inverse M+ from
(3.52). Then  
3τ 2 n
P[kρ̂LS − ρkop ≥ τ ] ≤ d exp − (3.84)
16d
for all τ ∈ [0, 2].

For the proof we use a matrix version of the Bernstein inequality from Theorem 2.11.
Theorem 3.24 (Matrix Bernstein inequality [62, Theorem 1.4]):

Let X1 , . . . , X` ∈ Herm(Cd ) be independent Hermitian random matrices with

E[Xi ] = 0 and kXi kop ≤ a almost surely (3.85)

for all i ∈ [`] and some a > 0. Set



σ 2 := E[Xi2 ] . (3.86)
op
i=1

Then, for all τ > 0,

h X̀ i  
τ 2 /2
P Xi ≥ τ ≤ d exp − . (3.87)
i=1
op σ + a τ /3
2

Pm of Lemma 3.23. From Tr[ρ] = 1 and M being a single POVM follows that
Proof
i=1 yi = 1. Hence, the pseudo-inverse of the measurement operator (3.52) becomes

m
X m
X
M+ (y) = (d + 1) yi |ψi ihψi | − yi 1
i=1 i=1
Xm
ni
= ((d + 1) |ψi ihψi | − 1) (3.88)
n
i=1 ρ

1 X
= Yk ,

k=1

where {Yk }k∈[nρ ] are iid. copies of a random matrix Y corresponding to the mea-
surement outcome of the POVM M: Y is (d + 1) |ψi ihψi | − 1 with probability
P[i] = Tr[Mi ρ] = m d
hψi | ρ |ψi i. By construction, it holds that E[Y ] = E[M+ (y)].

41
Since M is informationally complete we have

E[M+ (y)] = M+ (E[y]) = M+ M(ρ) = ρ , (3.89)

i.e., the linear inversion estimator ρ̂LS = M+ (y) is unbiased, which implies that
E[Y ] = ρ. Together, we have an error term

X 1
ρ̂LS − ρ = (Yk − E[Yk ]) . (3.90)

k=1

Now we wish to apply the matrix Bernstein inequality from Theorem 3.24 with
Xk = n1ρ (Yk − E[Yk ]). So we calculate

1 1
kY − E[Y ]kop ≤ max k(d + 1) |ψi ihψi | − 1 − ρkop
nρ nρ i∈[m]
1
≤ max d |ψi ihψi | − ((1 − |ψi ihψi |) + ρ) op
nρ i∈[m] | {z } | {z } (3.91)
d1...0 21...0
d
≤ =: a ,

since we have implicitly assumed d ≥ 2. Moreover,


Xm
d 2
E[Y 2 ] − E[Y ]2 = hψi | ρ |ψi i (d + 1) |ψi ihψi | − 1 − ρ2 . (3.92)
i=1
m

We note that (d + 1)2 − 2(d + 1) = d2 − 1, remember the frame operator (3.44), and
und use Lemma 3.19 to obtain
m
X
E[(Y − E[Y ])2 ] = E[Y 2 ] − E[Y ]2 = P[i] ((d + 1) |ψi ihψi | − 1) − ρ2
2

i=1
Xm
d
= (d2 − 1) hψi | ρ |ψi i |ψi ihψi | + 1 − ρ2
i=1
m
m
= (d − 1) SM (ρ) + 1 − ρ2
2
d
 
1 d 1
= (d − 1)
2
ρ+ + 1 − ρ2
d+1 d+1d
= (d − 1)ρ + d 1 − ρ2 .
(3.93)
This leads to
nρ 
hX 2 i
1
σ := E
2
(Yk − E[Yk ])
nρ op
k=1
1 X nρ


= E[Yk2 ] − E[Yk ]2 )

k=1
op (3.94)
1
= (d − 1)ρ + d 1 − ρ2
op

2d − 1
≤ .

We note that for τ ∈ [0, 2] we have

τ 2 /2 nρ τ 2 /2 3τ 2 nρ
≥ ≥ (3.95)
σ 2 + aτ /3 2d − 1 + 2d/3 16d

and an application of Theorem 3.24 with Xk = nρ (Yk


1
− E[Yk ]) yields (3.84).

42
Lemma 3.23 tells us that a number of copies of

16 d ln(d/δ)
nρ ≥ (3.96)
3 τ2
is sufficient to reconstruct any quantum state with error bounded by τ in spectral
norm with probability at least 1 − δ. However, the distinguishability of quantum
states is given by the trace norm, see Proposition 4.8.
The bound (2.7) implies that a the reconstruction error is bounded in trace norm
as
τ
kρ̂LS − ρk1 ≤ =: ε . (3.97)
rank(ρ̂LS − ρ)
The estimate ρ̂LS carries a reconstruction error due to the statistical estimation error.
It can be expected that the reconstruction error is quite isotropically distributed in
Herm(Cd ). Hence, one would expect that rank(ρ̂LS − ρ) = d with high probability,
since low rank matrices are a zero set in Herm(Cd ). This only leads to a sample
complexity of nρ ∈ Õ(d3 /ε2 ).
Now let us assume that ρ is of low rank r. If the reconstruction ρ̂LS is close to ρ
then ρ̂LS is approximately of low rank. For any operator X ∈ L(Cd ) we denote by Xr
the best rank-r approximation of X in trace norm (and then all Schatten p-norms for
p ∈ [1, ∞)). The error of this approximation is

σr (X) := min kX − Zk1 (3.98)


rank(Z)≤r

and the minimizer is approximation Xr . Note that σr (X) and Xr can be calculated
using a singular value decomposition.
The low rank of ρ cab be exploited as follows.
Lemma 3.25 (Approximate rank [37, Appendix, Section VIII]):

Let ρ ∈ S(Cd ) and ρ̂ ∈ Herm(Cd ) with Tr[ρ̂] = 1 such that kρ̂ − ρkop ≤ τ for
some τ ≥ 0. Then the projection of ρ to the density matrices

ρ̂P := arg min kσ − ρ̂k2 (3.99)


σ∈S(Cd )

satisfies
kρ̂P − ρ̂k1 ≤ 4rτ + 2 min{σr (ρ), σr (ρ̂)} (3.100)
for all r ∈ Z+ .

Proof. Proof sketch The proof works in two steps: (i) the threshold value x0 in
the density matrix projection (3.80) satisfies x0 ∈ [0, τ ]. (ii) for quantum states
ρ1 , ρ2 ∈ S(Cd )

kρ1 − ρ2 k1 ≤ 2r kρ1 − ρ2 kop + 2 min{σr (ρ1 ), σr (ρ2 )} (3.101)

holds for all r ∈ Z+ . These two statements are relatively straightforward to prove,
see [37, Appendix, Section VIII].

This leads to the following refined version of the PLS guarantee (Theorem 3.21) for
2-design based POVMs.
Theorem 3.26 (PLS for 2-design based POVMs [37]):

Let ρ ∈ S(Cd ) be a state, fix a total number of 2-design based POVM mea-
surements (3.41) of ρ to be nρ .

43
Then, for an r ∈ Z+ and ε ∈ [0, 1], the PLS estimator (3.76) satisfies
h i  
3 nρ ε2
P kρ̂PLS − ρk1 ≥ ε + 2 σr (ρ) ≤ d exp − . (3.102)
256 d r2

Proof. We apply Lemma 3.23 with τ = ε


4r to obtain
ε
kρ̂LS − ρkop ≤ (3.103)
4r
 
with probability at least 1 − d exp − 256drρ2 . For that “success case” Lemma 3.25
3εn

yields
kρ̂PLS − ρk1 ≤ ε + 2 min{σr (ρ̂PLS ), σr (ρ)} . (3.104)

We note that Theorem 3.26 exploits an approximate rank of the measured state.
This means that for an increasing number of measurements nρ the PLS estimate of
the measured state ρ estimates and increasing number of eigenvectors of ρ correctly.

3.8. Lower bounds


In the previous section we have discussed one concrete method for quantum state
tomography. This results provide upper bounds on the sample complexity of quantum
state tomography. In order to investigate the optimality of the tomographic methods
also lower bounds are required.
Naturally, the derivation of lower bounds seems to be much less straightforward.
The standard idea to obtain these bounds [63, Supplemental Material/Appendix] is
the following. First, one constructs a so-called -packing net for the subset of states
P ⊂ S(Cd ) one is interest in (e.g., states with at most rank r). (Such packing nets are
constructed using Levy’s lemma.) An -packing net for P is a set of points {ρi }i∈[s] ⊂ P
such that (i) kρi − ρj k1 >  for all i 6= j and (ii) s is maximal. Generaically, s scales
exponentially in the manifold dimension of P.
One can encode log2 (s) many bits with an integer i ∈ [s]. This can be used to send
a message of log2 (s) many bits by sending the receiving party the corresponding quan-
tum state ρi . Now if one would send instead the measurement outputs of quantum
state tomography of ρi with trace norm error  to the receiving party then one still
needs at least log2 (s) many bits, i.e., the measurement output must contain at least
this many bits. One can make this consideration rigorous by using information theory
[64] (such as Holevo’s bound, Fano’s inequality, and the data processing inequality)
to put a lower bound on the number of required measurements.
One way to capture that a number ` of single iid. measurements is insufficient for
quantum tomography is via the minimax risk

R∗ (, `) := inf  sup P[kρ̂n (y) − ρk1 > ] , (3.105)


ρ̂` ,{M(i) }i∈[`] ρ∈P

where the infimum is taken over ` admissible measurements M(i) (the POVM might be
different in every trial) and all estimators ρ̂` taking the measurement outputs y ∈ Rn ,
where each yi is an iid. sample taken from the measurement outcome probabilities
(Tr[Mj ρ])j [55]. This minimax risk is the failure probability of the best tomographic
(i)

strategy for the worst possible state.

44
Theorem 3.27 (Pauli observable measurements [55, Theorem 6]):

Let the set of feasible measurements be the observable measurements with


POVM elements (3.53) in d = 2n dimensions and let the set of possible states
P be the subset of states of S(Cd ) with rank at most r. Moreover, fix  ∈
(0, 1 − r/d) and δ ∈ [0, 1). Then R∗ (, `) < δ implies
 2 2
r d
`∈Ω , (3.106)
ln(d)

where the implicit constants depend on δ and .

The theorem tells us that the scaling in r and d of the number of mesurements in
Theorem 3.21(ii) is optimal up to ln(d)-factors.
With similar ideas one can derive lower bounds for the case of a single large POVM
acting on ` copies of the state, i.e., on ρ⊗` . Note that setting includes parallel and
interactive measurements. One can prove a lower bound on the number of measure-
ments ` required for tomography scaling as [19]
 
dr (1 − )2
`∈Ω , (3.107)
2 ln[d/(r)]

where the implicit constant depends on a confidence parameter δ. This bound implies
that also the scaling of the number of mesurements in Theorem 3.21(i) is optimal up to
ln(d)-factors. In particular, the global measurements on ρ⊗` do not yield an improved
scaling compared to sequential measurements.

3.9. Maximum likelihood estimation


In the maximum likelihood estimation (MLE) approach to quantum state tomog-
raphy, the final state estimate ρ̂ is the state which is most likely to reproduce the
measurement statistics [65, 66].
Let N be the total number of measurements of a POVM {Mi }i∈[m] and let ni be the
number of occurrences for outcome i. We further define the measurement frequencies
fi = ni /N and the probabilities pi (ρ) = Tr[ρMi ]. The state estimate ρ̂ is then the
state that maximizes the log-likelihood function given by
m
X
L(ρ) := fi ln pi (ρ), (3.108)
i=1

subject to the constraints ρ  0 and Tr[ρ] = 1.


We incorporate the constraint ρ  0 by parameterizing the state as ρ = A† A. The
constraint Tr[ρ] = 1 is effectively incorporated by introducing the Lagrange multiplier
λ to regularize L towards a small trace Tr[ρ]. The new objective function then reads
as
Xm
F (A) := fi ln(Tr[A† AMi ]) − λ Tr[A† A] . (3.109)
i=1

Exercise 3.7 (MLE):

1. A necessary condition for A being an extremum is that F (A + δA) =


2 
F (A) + O kδAkop . Show that this condition implies

Xm
fi
Mi A† = λA† . (3.110)
i=1
pi (ρ)

45
Hint: Use ln(1 + x) = x + O(x2 ) for small x.
Pm
2. We define the operators Rρ := i=1 pif(ρ)
i
Mi for i ∈ [m]. Use Eq. (3.110)
to show that λ = 1 and hence Rρ ρ = ρ, as well as Rρ ρRρ = ρ.

The last result Rρ ρRρ = ρ can be used to iteratively find the maximum via the
update rule
1
ρk+1 = Rρk ρk Rρk . (3.111)
N
3. Implement MLE numerically using the above iteration rule for the target
state |ψ i and the Pauli basis measurements from Section 3.5.2.1. Plot
the reconstruction error kρ̂ − |ψ ihψ |k1 and the log likelihood function for
m ∈ [100].
4. Repeat MLE for measurements with added Gaussian noise and plot the
reconstruction error over the noise strength η ∈ [0, 0.1] for m = 100.
5. Repeat this exercise for linear inversion (Section 3.2), compressed sens-
ing based reconstructions (Section 3.6), and projected least squares (Sec-
tion 3.7). Compare the results.

This exercise shows that the numerical implementation of MLE can have a relatively
slow convergence. However, there are much faster implementations [67] based on more
elaborate gradient ascent methods.
For an experimental comparison of MLE, least squares estimation (a different ver-
sion than considered here), and linear inversion see the work by Schwemmer et al.
[68]. They also show the following negative result.
Proposition 3.28 (Biases in QST [68]):

A reconstruction scheme for quantum state tomography that always outputs a


density operators is biased.

Hence, quantum state tomography methods using the density matrix structure
allow for trace norm error guarantees with better scalings in the dimension (see Sec-
tion 3.7) but come at the cost of being biased. Projected least squares estimation [37]
provides a way to obtain an unbiased estimate of a quantum state so that its nearest
density operator has a close to optimal trace norm error bound.

3.10. Confidence regions (additional information)


Confidence polytomes [69], Bayesian region estimates [70] , particle filtering from
QInfer [71]

3.11. Other methods (additional information)


Adaptive quantum tomography [72]

4. Quantum state certification


The new material on frame theory (Section 3.3, in particular Proposition 3.9) and
Monte Carlo integration (Section 2.4) were also covered in Lecture 8. Also Cheby-
shev’s and Höffding’s inequality from Section 2.3 will be used in Lecture 9.] Lecture

46
8 State certification is the task of making sure that a quantum state prepared in an
experiment σ is a sufficiently good approximation of a target state ρ. More precisely,
we make the following definitions.
Definition 4.1 (Quantum state certification):

Let ρ ∈ S(Cd ) be a quantum state (target). A (quantum state) validation


test or Tnσ of ρ consists of a quantum measurement on σ ∈ S((Cd )⊗nσ ) fol-
lowed by classical post-processing outputting either “accept” or “reject” and
satisfying the completeness condition
2
σ = ρ⊗nσ ⇒ P[“accept”] ≥ . (4.1)
3
Let  > 0 (tolerated error) and dist be a distance measure on S(Cd ). We call
validation test Tnσ an -certification test of ρ w.r.t. dist from nσ independent
samples if the following soundness condition is satisfied: for any sequence of
prepared states σ1 , . . . , σnσ ∈ S(Cd )
2
dist(ρ, σi ) >  ∀i ∈ [nσ ] ⇒ P[“reject”] ≥ . (4.2)
3
An -certification test of ρ w.r.t. dist from nσ iid. samples is defined similarly
with the σi being all the same state.
The sample complexity of a family of any such a test {Tnσ } is (the scaling of)
nσ with d and .

In hypothesis testing one has a null hypothesis H0 (usually the one one hopes to
disprove) and an alternative hypothesis H1 and one needs to figure out which is true
based on statistical data. In this setting, there are two types of error,

P[accept H1 |H0 ] (type I error) (4.3)


P[accept H0 |H1 ] (type II error) . (4.4)

In state certification we choose null hypothesis H0 to be kσ − ρk1 >  and σ = ρ


to be the alternative hypothesis H1 . Then P[“reject”|σ = ρ] is the type II error
and P[“accept”| kσ − ρk1 > ] the type I error. So, (4.2) is the requirement that the
type I error is bounded by 1/3 and (4.1) corresponds to the type II error being also
bounded by 1/3.
A validation test just checks for some errors, which we do not want to specify in
the general definition above. However, any meaningful validation test needs to accept
the target state with some confidence. As an example for validation, it is common to
check for particle loss in experiments with photons.
Often, especially in the computer science community, certification is also called
verification. However, in particular from an epistemological point of view, a physical
model or hypothesis can never be fully verified. Therefore, we will mainly stick to the
term certification but might still use verification interchangeably.
Quantum state certification just outputs the minimal information of whether a
certain target state has been prepared. It is not difficult to see that the confidence
value of 2/3 can be amplified:
Exercise 4.1 (Confidence amplification):

Let Tnρ be an -certification test of a quantum state ρ from nρ iid. samples


with maximum failure probability δ = 13 . We repeat the certification test N
times and obtain a new certification test by performing a majority vote on the
outcomes. Show that the new test satisfies the completeness and soundness

47
conditions

σ = ρ ⇒ P[“accept”] ≥ 1 − δ , (4.5)
dist(ρ, σ) >  ⇒ P[“reject”] ≥ 1 − δ , (4.6)

for all σ ∈ S(Cd ), where δ = e−c N and c > 0 is an absolute constant. The
parameter 1 − δ is also called the confidence of the test.

A certification test is only required to accept the target state. However, in practice,
such test will accept states from some region around the target state with large prob-
ability. Such a property of a certification test is called robustness (against deviations
from the target states). One way of how such a robustness can be guaranteed is by
estimating the distance of the targeted state ρ and the prepared state σ, as we will
see in Section 4.1 on fidelity estimation, which bounds on the distance. In this way,
one obtains more information (a distance) than just certification (just “accept” or
“reject”).
Clearly, one can also certify through full quantum state tomography. However, the
number of single sequential measurements in general required for tomography of a
state σ ∈ S(Cd ) scales as Ω(d rank(ρ)) and as Ω(d2 rank(ρ)2 ) in the case tow-outcome
Pauli string measurements [55]. So, for the relevant case of pure n-qubit states this
number scales as least as 2n . This measurement effort becomes unfeasible already for
relatively moderate n.
We will see that fidelity estimation can work with dramatically fewer measurements
than full tomography, when the target state has additional structure. In many situ-
ations, certification can work with even fewer measurements than fidelity estimation
due to an improved -dependence in the sample complexity.

4.1. Direct fidelity estimation


The fidelity is a measure of closeness for two quantum states. In order to define it

remember that the square root ρ of a positive semidefinite operator ρ ∈ Pos(Cd )
√ 2 √
is defined by ( ρ) = ρ and ρ  0 and can be obtained through an eigenvalue
decomposition. The fidelity of two quantum state ρ, σ ∈ S(Cd ) is defined as1
√ √ 2
F(ρ, σ) := ρ σ 1 . (4.7)

Note that
√ √ q 
ρ σ = Tr √ρσ √ρ (4.8)
1

The closeness measured by the fidelity is equivalent to the trace norm distance (see
Exercise 2.2.4) as captured by the Fuchs-van-de-Graaf inequalities [73, Theorem 1],
p 1 p
1− F(ρ, σ) ≤ kρ − σk1 ≤ 1 − F(ρ, σ) . (4.9)
2
Moreover, the fidelity is symmetric, i.e., F(ρ, σ) = F(σ, ρ) for all ρ, σ.
When at least one of the states ρ and σ is pure, say ρ = |ψ ihψ | then

F(ρ, σ) = hψ| σ |ψi = kρσk1 , (4.10)

which can easily be proven using (4.8). We will mostly encounter that case, i.e., where
one of the states is pure.
Indeed, in direct fidelity estimation (DFE) [63, 74] one has a target state ρ ∈ S(Cd )
and assumes to be given iid. state preparations of some state σ ∈ S(Cd ). The goal
is to estimate the fidelity Tr[σρ] for the case where the ρ is a pure state, i.e. of the

√ √
1 Some authors define the fidelity just as ρ σ (without the square).
1

48
form ρ = |ψ ihψ |. This estimation is then solved using Monte Carlo methods; see
Section 2.4 for the relevant tools.
For general direct fidelity estimation we fix a finite tight frame {Mλ }λ∈Λ ⊂ Herm(Cd )
with frame constant A; see Section 3.3 for an introduction to frame theory. By
{M̃λ }λ∈Λ ⊂ Herm(Cd ) we denote its dual frame, which has 1/A as frame constant
(Proposition 3.9). We define the maximum norm of the frame as C := maxλ∈Λ kMλ kop
and observe that, due to Hölder’s inequality (2.8),

|Wσ (λi )| ≤ C . (4.11)

Traditionally [63, 74], the frame (and also the dual frame) are taken to be the
normalized n-qubit Pauli basis {2−n/2 σs1 ⊗ · · · ⊗ σsn }s∈{0,1,2,3}n , which are an or-
thonormal basis. But it has been proven to be useful to consider more general frames
for quantum information tasks [75] and we will follow this trend. In general, Λ can
be a continuous set but we assume it to
 be finite here.
Given any operator σ ∈ Herm Cd we define its W -function (sometimes called
discrete Wigner function or quasi-probability distribution) and W̃ -function Wσ , W̃σ :
Λ → R by
Wσ (λ) := Tr[Mλ σ] , W̃σ (λ) := Tr[M̃λ σ] . (4.12)
This allows us to write (see the frame expansion (3.21))
" #
X X  
Tr[ρσ] = Tr ρ Wσ (λ)M̃λ = Tr ρM̃λ Wσ (λ)
X
λ∈Λ λ∈Λ (4.13)
= W̃ρ (λ)Wσ (λ)
λ∈Λ

for ρ, σ ∈ Herm(Cd ).
Now, we will use importance sampling from Monte Carlo integration (see Sec-
tion 2.4) to estimate the sum (4.13) for the case where the target state ρ ∈ S(Cd ) is
a pure state and the prepared states σ ∈ S(Cd ) are arbitrary. For this purpose we
rewrite the overlap (4.13) as
X Wσ (λ)
Tr[ρσ] = AW̃ρ (λ)2 (4.14)
λ∈Λ
A W̃ ρ (λ)

and define
qλ := AW̃ρ (λ)2 (4.15)
as importance sampling distribution on the sampling space Λ, where 1/A is the frame
constant of {M̃λ }, which we now argue to be the right normalization constant. The
tight frame condition for ρ can be written as
X X hρ, ρi
W̃ρ (λ)2 = |hM̃λ , ρi|2 = . (4.16)
A
λ∈Λ λ∈Λ

For ρ being a pure state, i.e., hρ, ρi = Tr[ρ2 ] = 1 (see Exercise 2.3), we indeed obtain
X
qλ = 1 . (4.17)
λ∈Λ

Next, we consider the estimator based on samples λ ∼ q given by

Wσ (λ)
Xλ := . (4.18)
A W̃ρ (λ)

49
We will exploit that Xλ with λ ∼ q is an unbiased estimator of the fidelity:
X Wσ (λ) X
Eλ∼q [Xλ ] = qλ = W̃ρ (λ)Wσ (λ) = Tr[ρσ] , (4.19)
λ∈Λ
A W̃ ρ (λ) λ∈Λ

where the last identity is again Eq. (4.13). Next, we take the empirical estimate of X
from ` samples,
1 X̀
Y := Xλi (4.20)
` i=1

where X (i) are iid. drawn as (4.18). This is also and unbiased estimator of Tr[ρσ], the
precision of which we can control by increasing `. In order to bound the confidence
that we have an estimation error |Y − Tr[ρσ]| ≤  for some desired  > 0 we need find
a maximum failure probability δ so that the tail bound

P[|Y − Tr[ρσ]| ≥ ] ≤ δ (4.21)

is satisfied for some , δ > 0 controlled by `; see Figure 2.1 for the idea of tail bounds.
Then we have an -good estimation of Tr[ρσ] with confidence 1 − δ.
However, we also need to take into account the error estimating Xλ from single
measurements. This for this purpose we will use an estimator Ŷ of the estimator Y ,
which uses finitely many measurements, and derive a tail bound of the form
h i
P Ŷ − Y ≥  ≤ δ (4.22)

for , δ > 0. More precisely, we consider the following protocol:


Protocol 4.2 (Extension of DFE [63] to frames):

Let {Mλ }λ∈Λ be a finite tight frame for Herm(Cd ) satisfying kMλ kop ≤ C for
all λ ∈ Λ and some constant C. Denote by A the frame constant of {Mλ } and
by {M̃λ }λ∈Λ the canonical dual frame.
Let ρ ∈ S(Cd ) be a target state with respect to which we wish to estimate the
fidelity from measurements of the observables {Mλ } and let  > 0 and δ > 0
be the parameters for the desired estimation accuracy and maximum failure
probability.
The protocol consists of the following steps applied to nσ iid. copies of a pre-
pared state σ ∈ S(Cd ):
(i) Take iid. samples λ1 , . .. , λ` ∼ q from the importance sampling distribu-
tion (4.15), where ` := 21δ (or as (4.44) for well-conditioned states).
(ii) Measure each Mλi a number of mi times for i ∈ [`] with mi be chosen as
 
2 C2
mi := ln(2/δ) (4.23)
` A2 W̃ρ (λi )2 2

(or as mi = 1 for well-conditioned states).


(iii) Take the empirical estimates Ŵσ (λi ) of Wσ (λi ) = Tr[Mλi σ] from these
measurements for i ∈ [`].
(iv) Estimate Xλi = AWW̃σ (λ i)
by X̂λi := AŴW̃σ (λi)
.
ρ (λi ) ρ (λi )
P` P`
(v) Estimate Y = ` i=1 Xλi by Ŷ := ` i=1 X̂λi .
1 1

(vi) Output Ŷ as estimate of Tr[ρσ].

50
Theorem 4.3 (Guarantee for DEF, frame version of [63]):

The fidelity estimate Ŷ from Protocol 4.2 satisfies

P[|Ŷ − F(ρ, σ)| ≤ 2] ≥ 1 − 2δ . (4.24)

The expected required number of copies of σ is

X̀ 1 2 C 2 |Λ|
E[nσ ] = E mi ≤ 1 + + ln(2/δ) . (4.25)
i=1
2 δ A 2

We note that a rescaling Mλ 7→ νMλ of the frame changes the constants as A 7→ ν 2 A


and C 7→ νC. So, the sample complexity is indeed invariant under such a rescaling.
The constant C 2 /A can be seen as an incoherence parameter [56] of the frame. Indeed,
if the frame is an orthonormal basis then A/C 2 measures a maximum effective rank
of its elements, see the norm inequalities (2.7).
Proof of Theorem 4.3. We start to bound the estimation error arising by taking the
empirical average in Step (v). We note that Xλ defined in (4.18) is an unbounded
random variable in general, as W̃ρ (λ) can be arbitrarily small. Hence, we will use
Chebyshev’s inequality (2.46) to make the tail bound (4.21) explicit. Using the defi-
nitions (4.15) and (4.18) of q and X and that X is the unbiased estimator (4.19), the
variance of X becomes
Varλ∼q [Xλ ] = Eλ∼q [Xλ2 ] − Tr[ρσ]2
X Wσ (λ)2
= 2 W̃ (λ)2
AW̃ρ (λ)2 − Tr[ρσ]2
A ρ
(4.26)
λ∈Λ
1 X
= Wσ (λ)2 − Tr[ρσ]2
A
λ∈Λ

= Tr[σ 2 ] − Tr[ρσ]2 ,

where we have used the frame condition (A = B in (3.16)) in the last step. Hence,

Varλ∼q [Xλ ] ≤ Tr[σ 2 ] ≤ 1 . (4.27)

Using the basic insight of Monte Carlo estimation (2.53), we obtain

Varq [Y ] = Eq [(Y − Tr[ρσ])2 ] ≤ 1/` . (4.28)

As Y is an unbiased estimator of Tr[ρσ], i.e., Eq [Y − Tr[ρσ]] = 0, we can directly


apply Chebyshev’s inequality (2.46) to arrive at
  1
P Y − Tr[ρσ] ≥  ≤ 2 (4.29)
 `
for any  > 0. Hence, for any δ > 0 and
1
`≥ (4.30)
2 δ
this failure probability is bounded by δ, i.e., the tail bound (4.21) is satisfied.
Now we bound the statistical error that arises from the estimation of Xλi from
measurement measurement setup i ∈ [`] in Step (iii). For this purpose we write for
each λ the eigendecomposition of Mλ as
X
Mλ = aλ,α Pλ,α (4.31)
α

51
 
with {Pλ,α } being the projector onto the eigenspaces and {aλ,α } ⊂ − kMλ kop , kMλ kop
the eigenvalues of Mλ . We note that the expected measurement outcome is

E[aλ,α ] = Tr[Mλ σ] = Wσ (λ) . (4.32)

Denoting by aλj ,αj the measurement outcome for measurement j ∈ [mi ] and consider
the following corresponding empirical estimate of Xλi (see (4.18))

1 X i m
X̂λi := aλi ,αj . (4.33)
mi AW̃ρ (λi ) j=1

Then the estimation error of the empirical estimator Ŷ of Y from (4.20) becomes

1 X̀
Ŷ − Y = (X̂λi − Xλi )
` i=1
(4.34)
1 X̀ X 1
mi

= aλi ,αj − Wσ (λi ) .
` i=1 j=1 mi AW̃ρ (λi )

Using the bound (4.11) and Höffding’s inequality (2.47) with t = ` and

C
bi,λj = −ai,λj = (4.35)
mi AW̃ρ (λi )

we find that (w.l.o.g. we assume that there are no i with W̃ρ (λi ) = 0)
 
2
−
P[|Ŷ − Y | ≥ ] ≤ 2 exp  1 P` 2 C2
. (4.36)
` i=1 ` mi A2 W̃ρ (λi )2

We wish that the tail bound (4.22) holds. Therefore, we impose the RHS of (4.36) to
be bounded by δ, which is equivalent to

2
ln(2/δ) ≤ 1
P` 2 C2
. (4.37)
` i=1 ` mi A2 W̃ρ (λi )2

The choice of mi as in (4.23) guarantees that this bound it always satisfied, i.e., that
the desired tail bound (4.22) holds. Then combination of the tails bounds (4.21) and
(4.22) with the union bound (2.12) proves the confidence statement (4.24).
In order to calculate the final sample complexity (4.25) note that mi is a random
variable itself, since W̃ρ (λi ) was randomly chosen. By the definition of the sampling
(4.15), we have X
E[mi ] = mi qλi
(4.38)
λi ∈Λ

2 C 2 |Λ|
≤1+ ln(2/δ) ,
` A 2
where the +1 comes from the ceiling in (4.23). Using the bound (4.30) on `, the
expected total number of measurements, the expected sample complexity, is

X̀ 1 2 C 2 |Λ|
E mi ≤ 1 + + ln(2/δ) . (4.39)
i=1
2 δ A 2

52
Example 4.4 (Paul measurements):

Let the frame for n qubits be given as

{Ms } := {σs1 ⊗ · · · ⊗ σsn }s∈{0,1,2,3}n (4.40)

and denote the corresponding frame operator by S. Then


1
d = 2n , |Λ| = d2 , C = 1, M̃s = Ms , A = d, (4.41)
d
where the last two identities can be obtained from the requirements S(M̃s ) =
P
Ms . and s | Tr[Ms X]|2 = d kXk2 for any operator X.
2

The sample complexity (4.25) hence becomes

1 2d
E[nσ ] ≤ 1 + + ln(2/δ) , (4.42)
2 δ 2
which is consistent with its original version [63].

Note that the sample complexity scales linearly in the Hilbert space dimension
for the case of Pauli measurements. In contrast, the number of Pauli measurements
required for state tomography scales as Ω̃(d2 rank(σ)2 ) [55].
The main contribution to the number of measurements in the derivation of the sam-
ple complexity above can be trace back to the application of Chebyshev’s inequality
in (4.29). This step can, however, be improved for the following class of states.
Definition 4.5 (Well-conditioned states):

Let {M̃λ }λ∈Λ ⊂ Herm(Cd ) be a frame. Then we call an operator ρ ∈ Herm(Cd )


well-conditioned with parameter α̃ > 0 if for each λ ∈ Λ either Tr[M̃λ ρ] ≥ α̃ or
Tr[M̃λ ρ] = 0.

For example, if the frame {M̃λ } is the dual frame of the Pauli stings, {M̃s } =
{2−n σs1 ⊗ · · · ⊗ σsn }, then each stabilizer state ρ (3.33) with stabilizer S on n-qubits
is well conditioned with parameter α̃ = 2−n ≡ 1/d, as
X
Tr[M̃s ρ] = 2−n Tr[M̃s S] = 2−n δ2n M̃s ∈S Tr[M̃s Ms ] = 2−n δMs ∈S . (4.43)
S∈S

Theorem 4.6 (DFE for well-conditioned states):

Let ρ ∈ Herm(Cd ) with kρk2 = 1 be a target “state” that is well-conditioned


with parameter α̃ > 0 w.r.t. the tight frame in Protocol 4.2 for fidelity esti-
mation. Moreover, we consider the protocol modified by setting mi = 1 for all
i ∈ [`] in Step (ii) and  
2 C2
` := ln(2/δ) (4.44)
A2 α̃2 2
in Step (i). Then fidelity estimate Ŷ from nσ = ` iid. measurements satisfies

P[|Ŷ − F(ρ, σ)| ≤ ] ≥ 1 − δ . (4.45)

Proof. With probability one we have W̃ρ (λi ) ≥ α̃ for all i ∈ [`]. Moreover, |Ŵσ (λi )| ≤
C. The estimator from Step (iv) of Protocol 4.2 is bounded as

C
|Xλi | ≤ (4.46)
Aα̃

53
with probability 1.
Hence, the estimator Ŷ is also bounded as |Ŷ | ≤ ACα̃ almost surely. Höffding’s
inequality (2.47) with t = /` yields
h i  
` A2 α̃2 2
P Ŷ − Tr[ρσ] ≥  ≤ 2 exp −
. (4.47)
2 C2

Imposing  
` A2 α̃2 2
2 exp − ≤δ (4.48)
2 C2
and solving for ` yields (4.44).
This theorem tells us direct fidelity estimation has a smaller sampling complexity
for well-conditioned states. For instance, well-conditioning in the Pauli basis leads to
a constant sample complexity:
Example 4.7 (Pauli measurements and well-conditioned states):

Consider the Pauli observable measurements from Example 4.4 and a pure
state ρ that is well-conditioned with parameter α̃ = α/d, where α > 0 is some
constant. This well-conditioning is equivalent to
(
≥ α or
Tr[σs1 ⊗ · · · ⊗ σsn ρ] ∀s ∈ {0, 1, 2, 3}n . (4.49)
=0

Then the sample complexity (4.44) becomes

2 ln(2/δ)
nσ ≤ 1 + . (4.50)
α 2 2
An example for well-conditioned states are stabilizer states, which are easy to
see to be well-conditioned with α = 1 using (3.33).

Exercise 4.2 (Certification w.r.t. the trace distance via DFE):

Fix parameters ˜, , δ > 0 with ˜ ≤ 12 2 . Let Ŷ be the direct fidelity estimator of
the fidelity F(ρ, σ) so that |Ŷ − F(ρ, σ)| ≤ ˜ with confidence 1 − δ. We consider
the protocol that accepts if Ŷ ≥ 1 − ˜ and rejects otherwise. As distance we
choose the trace distance distTr defined by distTr (ρ, σ) := 21 kρ − σk1 .

• Show that this protocol is an -certification test w.r.t. the trace distance
in the sense of Exercise 4.1, i.e., that the completeness and soundness
conditions are satisfied with confidence 1 − δ.
• What is the resulting sampling complexity of DFE in the well-conditioned
setting from Example 4.7?
• Let 0 < . Turn this protocol into a robust (, 0 )-certification test, i.e.,
into an -certification test that is guaranteed to accept all states within
an 0 -trace norm ball around ρ with confidence 1 − δ.

4.2. Direct quantum state certification


We start with considering the case of a single measurement. For this purpose, we
remember the connection of the trace distance to the distinguishability of states by
POVMs.

54
Proposition 4.8 (Operational interpretation of the trace distance):

Let ρ, σ ∈ S(Cd ) be states. Then


1
kρ − σk1 = sup Tr[P (ρ − σ)] (4.51)
2 0≤P ≤1

and the supremum is attained for the projector P + := 1R+(σ − ρ) onto the
positive part of ρ − σ.

Proof. First we show that the supremum is attained for P + . The self-adjoint operator
difference
ρ − σ = X+ − X− (4.52)
has a positive part X + ∈ Pos(Cd ) and a negative part X − ∈ Pos(Cd ). We note that
kX ± kop ≤ 1 and, since Tr[X + − X − ] = Tr[ρ − σ] = 0, we have Tr[X + ] = Tr[X − ].
Moreover, kρ − σk1 = Tr[X + ] + Tr[X − ]. The last two statements together yield that
the trace distance between the two states is
1
kρ − σk1 = Tr[X + ] = Tr[P + (ρ − σ)] , (4.53)
2
where P + is the orthogonal projector onto the support of X + and can be obtained
as P + = 1R+(ρ − σ) using spectral calculus (2.3), where 1R+ denotes the indicator
function of R+ , see (2.44).
In order to show that the supremum cannot become larger than the trace distance,
we consider some operator P with 0 ≤ P ≤ 1. Then, indeed,

Tr[P (ρ − σ)] = Tr[P X + ] − Tr[P X − ] ≤ Tr[P X + ]


1 (4.54)
≤ X + 1 = kρ − σk1 ,
2
where we have used Hölder’s inequality (2.8) and (4.53) in the last two steps.

This proposition means that the trace distance of two states is given by the max-
imum distinguishability by binary POVM measurements {P, 1 − P }. This distin-
guishability can be amplified by measuring iid. copies of a quantum state σ with
{P, 1 − P }. Next, we turn this basic insight into an -certification test of pure state

ρ = |ψ ihψ | , (4.55)

where |ψ i ∈ Cd is a state vector. However, for practical reasons we consider the


infidelity 1−F(ρ, σ) of states ρ and σ as distance measure instead of the trace distance.
We remember, that the Fuchs-van-de-Graaf inequalities (4.9) relate these two distance
measures.
We consider the POVM given by P = |ψ ihψ |. Then, for any σ we have

Tr[P σ] = F(ρ, σ) , (4.56)

i.e., the probability to obtain the POVM measurement outcome corresponding to ρ


is the fidelity of the two states. Next, we will consider several measurements with
the same POVM in order to boost the probability to detect deviations for the form
F(ρ, σ) < 1 −  with some targeted confidence 1 − δ.
In order to be able to capture a class of large measurement settings we consider
POVM measurements {Ω, 1 − Ω} with Tr[Ωσ] = 1. Moreover, we consider the case
of several independently prepared states and sequential measurements. We mostly
follow a work by Pallister et al. [76].

55
Protocol 4.9 (Naive direct quantum state certification):

Let ρ ∈ S(Cd ) be a pure target state and Ω ∈ Pos(Cd ) with kΩkop ≤ 1. Denote
by {Ω, 1 − Ω} the binary POVM given by Ω, call the outcome corresponding
to Ω “pass” and the one of 1 − Ω “fail”.
For state preparations σ1 , . . . , σnσ ∈ S(Cd ) the protocol consists of the follow-
ing steps.
1: for i = 1, . . . , nσ do
2: measure σi with {Ω, 1 − Ω}
3: if the outcome is “fail” then:
4: output “reject”
5: end protocol (break)
6: end if
7: end for
8: output “accept”

Proposition 4.10 (Performance guarantee I):

Let ρ ∈ S(Cd ) be a pure target state, , δ > 0, and consider the distance
measure on quantum states given by the infidelity defined by 1 − F(ρ, σ). The
test from Protocol 4.9 with Ω = ρ is an -certification test w.r.t. the infidelity
from nσ independent samples for

ln(1/δ)
nσ ≥ (4.57)

with confidence at least 1−δ according to Definition 4.1. Moreover, the protocol
accepts the target state ρ with probability one.

Proof. The probability of measurement outcome “pass” in step i ∈ [nσ ] is

P[“pass”|σi ] = Tr[Ωσi ] = Tr[ρσi ] = F(ρ, σi ) . (4.58)

Hence, the final probability that the protocol accepts is



Y
P[“accept”] = F(ρ, σi ) . (4.59)
i=1

Clearly, if σi = ρ for all i ∈ [nσ ] then the protocol accepts almost surely. Now let
us consider the case that the fidelity is small, i.e.,

F(ρ, σi ) = hψ| σi |ψi ≤ 1 −  ∀i ∈ [nσ ]. (4.60)

Then the probability that the protocol wrongfully accepts is

P[“accept”] ≤ (1 − )nσ . (4.61)

Now we wish this probability (type II error) be bounded by δ > 0, i.e.,

(1 − )nσ ≤ δ . (4.62)

This maximum type II error is achieved for



ln 1δ
nσ ≥  . (4.63)
ln 1−
1

56
We note that for  ∈ [0, a] ⊂ [0, 1) the following bounds hold
   
1 1 
 ≤ ln ≤ ln , (4.64)
1− 1−a a
 
which can be seen by using the fact that  7→ ln 1− 1
is smooth, has value 0 at 0,
its first derivative is lower bounded by 1, and its second derivative is positive. So, the
minimum nσ is
nσ ≈  ln(1/δ) (4.65)
for small  > 0. Moreover, for any nσ ≥ ln(1/δ)
 the required bound (4.62) is satisfied.

Perhaps surprisingly, the sample complexity of this protocol does not depend on
the physical system size at all. It has a zero type I error and one can control the
type II error via the parameter δ.
However, it is generically not practical to implement the POVM. So, we follow
Pallister et al. [76] and allow for more complicated strategies. Say, we have access to
a set of POVM elements

M ⊂ {M ∈ Pos(Cd ) : kM kop ≤ 1} , (4.66)

where ρ ∈ S(Cd ) is the target state. As one can only make finitely many mea-
surements, we assume that |M| < ∞ in order to avoid technicalities. Then we pick
POVM elements Pj ∈ M with some probability and consider the corresponding binary
POVMs Mj := {Pj , 1 − Pj }, where all Pj have output “pass” and 1 − Pj have output
“fail”. Now we modify Protocol 4.9 by including this probabilistic measurement
strategy.
Protocol 4.11 (Direct quantum state certification):

Let ρ ∈ S(Cd ) be a pure target state and (µj , Pj ) a measurement strategy, i.e.,
µ a probability vector and 0 ≤ Pj ≤ 1. For each POVM {Pj , 1 − Pj }, call the
outcome corresponding to Pj “pass” and the one of 1 − Pj “fail”.
For state preparations σ1 , . . . , σnσ ∈ S(Cd ) the protocol consists of the follow-
ing steps.
1: for i = 1, . . . , nσ do
2: Draw j ∼ µ and measure σi with {Pj , 1 − Pj }
3: if the outcome is “fail” then:
4: output “reject”
5: end protocol (break)
6: end if
7: end for
8: output “accept”

The measurement description (µj , Pj )j is also called a (measurement) strategy. The


resulting probability of measuring “pass” is
X
P[“pass”] = µj Tr[Pj σ] = Tr[Ωσ] (4.67)
j

with X
Ω := µj Pj (4.68)
j

being the effective measurement operator.


Now we make the constraint that

Tr[Ωρ] = 1 , (4.69)

57
i.e., that the there is no false reject of the target state ρ with probability one. In
particular, this means that Tr[Pj ρ] = 1 for each measurement setup j. This constraint
still allows for optimal measurement strategies:
Proposition 4.12 ([76, Proposition 8]):

Let ρ = |ψ ihψ | be a target state. Let 0 ≤ Ω0 ≤ 1 be an effective measurement


operator with Tr[Ω0 ρ] < 1 so that Protocol 4.11 is an -certification test w.r.t.
infidelity from nσ 0 iid. samples. Then there exists an effective measurement
operator 0 ≤ Ω ≤ 1 with Tr[Ωρ] = 1 so that Protocol 4.11 is an -certification
test w.r.t. infidelity from nσ iid. samples so that nσ ≤ nσ 0 holds for sufficiently
small .

The proof of this statement is a consequence of the Chernoff-Stein lemma, which


quantifies the asymptotic distinguishability of two distributions in terms of their rel-
ative entropy.
With the constraint (4.69) the only remaining hypothesis testing error is a false
acceptance, which is the event where a state σ with F(ρ, σ) < 1 −  is accepted. This
event has the worst-case probability over all states σ given as

P[“pass”] = max Tr[Ωσ] . (4.70)


σ: Tr[ρσ]≤1−

The maximum is given as follows.


Lemma 4.13 ([76], [77, Suppl. material, Section I]):

Let ρ ∈ S(Cd ) be a pure state, 0 ≤ Ω ≤ 1, Tr[ρΩ] = 1, and  > 0. Then

max Tr[Ωσ] = 1 − ν(Ω) , (4.71)


σ: Tr[ρσ]≤1−

where ν(Ω) := 1 − λ2 (Ω) is the spectral gap between the maximum eigenvalue
1 (corresponding to ρ) and the second largest eigenvalue λ2 (Ω) (among all d
eigenvalues).

Proof. We note that Tr[ρΩ] = 1 means that a state vector |ψ i with ρ = |ψ ihψ | is an
eigenvalue-1 eigenvector of Ω. Moreover, let us write Ω in spectral decomposition,
d
X
Ω= λ j Pj (4.72)
j=1

with 1 = λ1 ≥ λ2 ≥ · · · ≥ λd and P1 = ρ. For the case λ2 = 1 the choice σ = P2


yields a maximum of 1 in the maximization. Let us now consider the case λ2 < 1.
Then for
σ = (1 − )ρ + P2 (4.73)
we have
Tr[Ωσ] = 1 −  Tr[Ωρ] +  Tr[ΩP2 ]
(4.74)
= 1 −  + λ2 = 1 − (1 − λ2 ) ,
i.e., the claimed maximum in (4.71) is attained for some feasible σ.
To show that the claimed maximum is actually the maximum we consider some state
σ ∈ S(Cd ) with Tr[ρσ] ≤ 1 − . We write σ as convex combination σ = (1 − 0 )ρ + 0 ρ⊥

58
and observe that 0 ≥ . Then
d
X
Tr[Ωσ] = Tr[ρσ] + λj Tr[Pj σ]
j=2
d
X
≤ Tr[ρσ] + λ2 Tr[Pj σ]
j=2
 
Xd (4.75)
= 1 − 0 + λ2 0 Tr  Pj ρ⊥ 
j=2

= 1 −  + λ2  Tr[ρ ]
0 0 ⊥

= 1 − 0 + λ2 0 = 1 − (1 − λ2 )0
≤ 1 − (1 − λ2 ) .

Given a measurement strategy with effective measurement operator Ω this lemma


provides a closed form formula for the false acceptance probability (4.70). This allows
us to state the following guarantee for Protocol 4.11.
Proposition 4.14 (Performance guarantee II [76]):

Let ρ ∈ S(Cd ) be a pure target state and , δ > 0. We consider an effective


measurement operator 0 ≤ Ω ≤ 1 with Tr[Ωρ] = 1, which has a second largest
eigenvalue λ2 (Ω) < 1 (among the d eigenvalues).
Then the certification test from Protocol 4.11 is an -certification test w.r.t.
the infidelity from nσ independent samples for

ln(1/δ)
nσ ≥ (4.76)
 (1 − λ2 (Ω))

with confidence at least 1 − δ. Moreover, the protocol accepts the target state
ρ with probability one.

Proof. The proof is mostly analogous to the one of Proposition 4.10.


Thanks to Lemma 4.13, the probability of wrongfully accepting a state σ ∈ S(Cd )
with F(ρ, σi ) ≤ 1 −  is bounded as

P[“pass”|σi ] ≤ 1 − (1 − λ2 (Ω)) . (4.77)

Hence, the probability that Protocol 4.9 accepts is bounded as

P[“accept”] ≤ (1 − (1 − λ2 (Ω))) (4.78)



.

Imposing (1 − (1 − λ2 (Ω))) ≤ δ and solving for nσ yields


ln(1/δ)
nσ ≥   (4.79)
ln 1
1−(1−λ2 (Ω))

and the bound (4.64) finishes the proof.

This proposition tells us that as long as Ω has a constant gap between its largest
and second larges eigenvalue the sample complexity of the certification protocol has
the same scaling as the one where Ω is the target state itself. Now it depends on the
physical situation of what feasible measurement strategies Ω are. Given a set M of
feasible measurements we can single out an optimal strategy as follows.

59
Definition 4.15 (Minimax optimization):

Let ρ be a pure state and  > 0. Moreover, let us assume the we have access to
a compact set of binary measurements given by the operators M ⊂ {P : 0 ≤
P ≤ 1 , Tr[P ρ] = 1}.
Then the best strategy Ω for the worst case state preparation σ is

min max Tr[Ωσ] . (4.80)


Ω∈conv(M) σ: Tr[ρσ]≤1−

This quantity is called minimax value and a strategy Ω where the minimum is
attained is called minimax optimal.

Such minimax optimizations are common in game theory and risk analysis.
If there are no restrictions on the measurements of a pure target state ρ, i.e.,
M = {P : 0 ≤ P ≤ 1 , Tr[P ρ] = 1}, then Ω = ρ is minimax optimal.
For number of settings with physically motivated measurement restrictions the
minimax strategy, or at least one that is close to it, has been obtained. Popular
instances include the following settings:
• Stabilizer states and two qubit states with single qubit measurements [76]
• Ground states of locally interacting Hamiltonians [78]

• Bipartite states [79, 80], qubit case in an LOCC setting [81]


• Hypergraph states [78] with improvements in efficiency by Zhu and Hayashi [77]
• Stabilizer states [76, 78]

Here, we only outline the example of stabilizer states in more detail. Remember the
definition of stabilizer states from the box on STABs in Section 3.4.1.
Theorem 4.16 (Minimax optimal Pauli measurements for STABs [76]):

Let |ψ i we an n-qubit stabilizer state with stabilizer group S ⊂ Pn with


elements S = {1 = S0 , S1 , . . . , S2n −1 }. For i ∈ [2n − 1] denote by Pi :=
2 (1 + Si ) the projector onto the positive eigenspace of Si .
1

Then the minimax optimal measurement strategy for having Pauli observables
Pn as accessible measurements (see Definition 4.15) is given by measuring Si
with probability 2n1−1 . The resulting effective measurement operator Ω =
P2n −1
i=1 Pi satisfies Ω |ψ i = |ψ i and has the second largest eigenvalue
1
2n −1

2n−1 − 1
λ2 (Ω) = . (4.81)
2n − 1

Proof. By Lemma 4.13, the minimax optimum is

min max Tr[Ωσ] = min (1 − ν(Ω))


Ω∈X σ: Tr[ρσ]≤1− Ω∈X
(4.82)
= 1 −  max ν(Ω) ,
Ω∈X

where
X := {Ω ∈ conv(Pn ) : Ω |ψ i = |ψ i}
(4.83)
= conv(S) .
We argue that the minimization over conv(S) can be replaced by a minimization over
conv(S 0 ) with S 0 := S \ {1}. To see this, observe that if Ω = (1 − α)Ω0 + α1 for

60
α ∈ [0, 1] then ν(Ω) ≤ ν(Ω0 ). Then minimax optimal measurement strategies are of
the form n
2X −1
Ω= µj Pi . (4.84)
i=1

for a probability vector µ. We note that

Tr[Ω] = 2n−1 (4.85)

since Tr[Pi ] = 2n−1 .


Next, since |ψ i is an eigenvalue-1 eigenvector of Ω, we have

Ω = 1 ⊕ Ω̃ (4.86)

and, hence
λ2 (Ω) = Ω̃ op . (4.87)

Moreover, Tr[Ω̃] = 2n−1 −1. The operator Ω̃ with the minimal norm Ω̃ op under this
constraint is of the form Ω̃ = a1 for a > 0. Taking the trace of that equality, solving
for a and denoting the orthogonal projector of |ψ ihψ | by |ψ ihψ | := 1 − |ψ ihψ |

yields
2n−1 − 1
Ω = |ψ ihψ | + n (4.88)

|ψ ihψ |
2 −1
with
2n−1 − 1
λ2 (Ω) = n . (4.89)
2 −1
In order to finish the proof we show that Ω ∈ conv(S), i.e., that this choice of Ω is
indeed compatible with (4.84).
We write the stabilizer state |ψ ihψ | as combination of the stabilizers (see (3.33))
and use that Sj = 2Pj − 1,
 n

1 
2X −1
|ψ ihψ | = n 1 + Sj 
2 j=1
 n

1 
2X −1
= n 1+2 Pj − (2n − 1)1 (4.90)
2 j=1
  n
1 1
2X −1
= −1 1+ Pj .
2n−1 2n−1 j=1

With 1 = |ψ ihψ | + |ψ ihψ | this implies


n
2X −1
Pj = (2n − 1) |ψ ihψ | + (2n−1 − 1) |ψ ihψ | (4.91)

j=1

and, hence
n
1 2n−1 − 1
−1
2X
= + (4.92)

P |ψ ihψ | |ψ ihψ | ,
2n − 1 j=1 2n − 1
j

which is the Ω from (4.88) and also the measurement strategy from the theorem
statement.

61
Corollary 4.17 (Sampling complexity [76]):

Let us call the outcome corresponding to Pi “pass” and the one corresponding
to 1 − Pi “fail”. Then Protocol 4.11 is an -certification test of ρ w.r.t.
infidelity from nσ independent samples for

ln(1/δ)
nσ ≥ 2 (4.93)

with confidence 1 − δ. Moreover, ρ is accepted with probability 1.

Proof. According to Proposition 4.14 a number of measurements

ln(1/δ)
nσ ≥ (4.94)
 ν(Ω)

is sufficient, where
ν(Ω) = 1 − λ2 (Ω)
2n−1 − 1
=1− (4.95)
2n − 1
2n−1
= .
2n − 1
This results in
2n − 1 ln(1/δ)
nσ ≥ . (4.96)
2n−1 

So, restricting from all measurements to Pauli measurements results in at most a


constant overhead of 2, cp. Proposition 4.10. We note that only very few of the 2n − 1
non-trivial stabilizers of ρ are measured. More precisely, the measurements are the
ones of randomly subsampled stabilizer observables.

4.3. Other works (additional information)


First version for the certification of ground states of locally interacting Hamiltonians
by Cramer et al. [82]; extension by Hangleiter et al. [83], discussing also ground state
enabling universal quantum computation. In this line of research, fidelity witnesses
[83–85] can be used to measure and estimate on a fidelity lower bounds.
In fact, the work [82] solves the certification problem by an instance of ansatz state
tomography. Here, the measured state is assumed to be of matrix product form an
this form is efficiently reconstructed from measurement data. Similar ideas work for
permutationally invariant states [86–88].
Kalev et al. [89] have extended arguments from direct fidelity estimation [63] on sta-
bilizer states using Bernstein’s inequality to give an quadratically improved  scaling
for small .
Global von Neumann measurements on multiple iid. copies of the prepared quantum
state have also been considered [90] (even with mixed target states), which leads to
a sample complexity scaling as nσ ∈ O(d/) a version of -certification of quantum
states in S(Cd ).
There is a very helpful survey on quantum property testing [91], where several
methdos and notions of certification are reviewed.

62
Part II

Quantum dynamics
Quantum stats can be fully reconstructed tomographically with an essentially opti-
mal number of measurements [37, 55], see Chapter 3. In particular, the reconstruction
error can be bounded in the operationally relevant norm, the trace norm. Similar re-
sults hold for quantum state certification, where one has implemented a targeted state
and is tasked to certify the correctness of the implementation up to some trace norm
error, see Chapter 4.
In Part II of these note, we aim to find similar results for quantum processes. In
principle, one can use the Choi-Jamiołkowski isomorphism (see Section 5.3) to map a
quantum process to a quantum state and apply the characterization and verification
methods from Part I. However, this approach has drawbacks: (i) it might result in
measurements that are practically infeasible and typically require maximally entan-
gled states as available resource and (ii) the error is controlled in the “wrong” norm.
In a similar way as the trace norm is an operationally motivated norm for quantum
states (Proposition 4.8) the so-called diamond norm is an operationally motivated
norm for quantum processes, see Section 5.5. There is even a third potential problem:
To (partially) characterize a quantum process one needs to prepare quantum states,
to evolve them under the process, and to measure the final states. In this task so-
called state preparation and measurement (SPAM) errors can be a serious obstacle
for reliable characterization. Therefore, the development of quantum characterization
and verification methods for quantum processes requires a significant amount of extra
work.
As we will see in Chapter 6, one is able to estimate a weaker error measure than
the diamond norm efficiently using randomized benchmarking. This is typically done
in a way that is robust against SPAM errors. In Chapter 7 we will discuss state-of-
the-art methods for quantum process tomography. In particular, we will see that one
can reconstruct the most relevant part of a quantum process (the unital part) using
similar measurements as in randomized benchmarking. However, the error measure
is again a weaker one than the diamond norm. In Chapter 8 we will discuss gate
set tomography. Here, one reconstruct a full gate set from measurement data and is
able to estimate gate errors in the diamond norm. A disadvantage of this method is
that it comes at the expense of a large overhead in the measurement effort and the
amount of classical post-processing. Further improving those methods and finding
lower bounds on the required measurement and computational effort is still subject
to ongoing research, which makes this field particularly exciting.

5. Preliminaries II
In this chapter we introduce some more preliminaries required to discuss the character-
ization and validation of quantum processes. Let us start with a proper introduction
to quantum processes.

63
5.1. Quantum processes
A quantum processes is given by linear map taking density operators to density op-
erators and satisfying certain properties. Therefore, we start with introducing some
notation related to linear maps between operator spaces.
Let H, K be Hilbert spaces.
• The vector space of linear maps from L(H) to L(K) is denoted by L(H, K) :=
L(L(H), L(K)). We set L(H) := L(H, H) and denote the identity by idH :=
1L(H) ∈ L(H). Often we just write id when it is clear from the context what H
is.
• A map Φ ∈ L(H, K) is called Hermiticity-preserving if

Φ(Herm(H)) ⊂ Herm(K) , (5.1)

positive if
Φ(Pos(H)) ⊂ Pos(K) , (5.2)
and trace-preserving if
Tr[Φ(X)] = Tr[X] (5.3)
for all X ∈ L(H). Note that positive maps are also Hermiticity-preserving.
The map Φ is called completely positive (CP) if Φ ⊗ 1L(H0 ) is positive for all
Hilbert spaces H0 with identity map 1L(H0 ) ∈ L(H0 ). The set of CP maps is
denoted by CP(H, K) ⊂ L(H, K) and forms a convex cone. We set CP(H) :=
CP(H, H).
• A completely positive and trace preserving (CPT) map is also called a quantum
channel or just channel. The subset of CPT maps is denoted by CPT(H, K) ⊂
CP(H, K) and forms a convex set. Again, we set CPT(H) := CPT(H, H).
• A map Φ ∈ L(H, K) is called unital if Φ(1H ) = 1K . Note that Φ is trace-
preserving iff Φ† is unital.
So, essentially, quantum channels are maps that take density matrices to density
matrices even when applied to a part of a larger system. Usual unitary dynamics is
of this form:
Example 5.1 (Unitary channels):

We use calligraphic letters to denote the adjoint representation U ∈ L(H) of a


unitary U ∈ U(H) given by

U(X) := U XU † . (5.4)

These maps are quantum channels and are called unitary (quantum) channels.

Unitary channels are invertible and the inverses are again unitary channels.

5.2. Tensor networks


When dealing with higher order tensors, such as linear maps on operators, it is very
useful to use tensor network diagrams. A tensor network is a set of tensors together
with a contraction corresponding to pairs of indices where pairs of contracted indices
need to have the same dimension. Tensor networks have diagrammatic representa-
tions, which allow to visually track index contractions rather than spelling out these
contractions explicitly. Instances of tensor networks are, e.g., the workhorse of power-
ful simulation techniques for strongly correlated quantum systems [92]. In this course,
however, we will only use comparable small tensor networks, i.e., with just a few ten-
sors. But even in this case, tensor networks will help us to dramatically simplify a

64
Vectors |ψ i = ψ and ψ = hψ |

(
ψ
Tensor product |ψ i |φ i =
φ

Operator A = A =
∼ A

Flip operator

Superoperator X (A) = X A

X
Tensor product X ⊗ Y =
Y

Figure 5.1.: Basic examples for tensor network diagrams: A vector |ψ i in some vector space V ,
vectorization of an operator A, a map X ∈ L(V ) applied to that vectorization, the non-vectorized
version of the flip operator F, and a tensor product of two maps on operators X and Y.

number of calculations. For an introduction to tensor networks from a category the-


ory point of view see the work by Biamonte et al. [93] and a follow-up work on open
quantum systems [94].
In the diagrammatic representation tensors are denoted by boxes with one line for
each index; Figure 5.1 for examples. A contraction of two indices is represented by
connecting their representing lines. One can indicate whether an index corresponds to
a vector space or a dual vector space (functionals) by arrows (outgoing or incoming)
or the direction of the index (e.g. left/right). Tensor products of smaller tensors are
represented simply by drawing the smaller tensors into the same diagram.
Exercise 5.1 (The swap-trick revisited):

Solve Exercise 2.4 using tensor network diagrams.

5.3. The Choi-Jamiołkowski isomorphism


The Choi-Jamiołkowski isomorphism [95, 96] provides a duality between CP maps and
bipartite positive semidefinite operators and allows to identify channels with certain
states. It has many applications in quantum information theory and related fields. In
particular, it allows to practically check whether a given map is a quantum channel.
Throughout the whole section, let H and K be Hilbert spaces. For any vector space
V , we have the natural isomorphism

L(V ) = V ⊗ V ∗ , (5.5)

where V ∗ := L(V, C) is the dual space of V .


The Choi-Jamiołkowski isomorphism

C : L(H, K) → L(K ⊗ H) (5.6)

is an isomorphism of vector spaces. Let ( |i i)i∈[dim(H)] be a basis of H and let us


denote complex conjugation in w.r.t. that basis by cc. Then the Choi-Jamiołkowski

65
K H∗ C K H TrK H
X 7→ X 7→ X
K∗ H K∗ H∗ H∗

Figure 5.2.: The Choi-Jamiołkowski isomorphism and partial trace in terms of tensor network
diagrams (explained in Figure 5.1).
Left: Order-4 tensor X ∈ L(H, K) as a map from L(H) ∼ = H ⊗ H∗ to L(K) =∼ K ⊗ K∗ .
Middle: Its Choi-matrix C(X ) as an operator on K ⊗ H.
Right: Partial trace Tr1 [C(X )] of the Choi matrix C(X ). This operator corresponds to the
functional ρ 7→ Tr[X (ρ)].

isomorphism is given by the following identification:

L(H, K) = L(K) ⊗ L(H)∗ = K ⊗ K∗ ⊗ H∗ ⊗ H


∼ K ⊗ H∗ ⊗ K∗ ⊗ H = L(K ⊗ H∗ )
= (5.7)
cc
∼ L(K ⊗ H) ,
=

where the natural isomorphism (5.5) is denoted by “=”, the isomorphism of changing
the order of the vector spaces by “∼=”, and the last one refers to the conjugate linear
Hilbert space isomorphism H ∼ = H∗ composed with complex conjugation isomorphism;
see Figure 5.2 for a tensor network representation of C. The Choi-Jamiołkowski iso-
morphism can be written explicitly. In terms of the unnormalized maximally entan-
gled state
dim(H)
X
|1 i = |i, i i ∈ H ⊗ H (5.8)
i=1

the Choi matrix of X ∈ L(H, K) is

C(X ) = X ⊗ id( |1 ih1 |) . (5.9)

We note that U ⊗ U | |1 i = |1 i for all U ∈ U(H). Hence, the Choi-Jamiołkowski


isomorphism is invariant under orthogonal basis changes U ∈ O(H), as the dual basis
of H∗ changes as hi | 7→ hi | U † .
Exercise 5.2 (Choi-Jamiołkowski isomorphism):

Show that the characterizations of Choi-Jamiołkowski isomorphism from (5.9),


(5.7), and Figure 5.2 coincide. Moreover, show that

Tr[BX (A)] = Tr[(B ⊗ A| ) C(X )] (5.10)

for all X ∈ L(H, K), A ∈ L(H) and B ∈ L(K).

Now we can connect the Choi-Jamiołkowski isomorphism to the properties of quan-


tum channels.
Theorem 5.2 (CPT conditions):

For any map X ∈ L(H, K) the following equivalences hold:

(i) X is trace-preserving iff TrK [C(X )] = 1.


(ii) X is Hermiticity-preserving iff C(X ) is Hermitian.
(iii) X is completely positive iff X ⊗ idH ( |1 ih1 |) is positive semidefinite.

(iv) X is completely positive iff C(X ) is positive semidefinite.

66
(v) X is a CP map iff there are operators K1 , . . . , Kr ∈ L(H, K), where
r = rank(C(X )) so that
r
X
X (A) = Ki AKi† (5.11)
i=1

for all A ∈ L(H). Moreover, show that X is a CPT map iff (5.11) holds
Pr
with i=1 Ki† Ki = 1.

(vi) X is a CPT map iff it has a Stinespring dilation, i.e., there is a unitary
U ∈ U((K ⊗ H)⊗2 ) so that X (ρ) = Tr2,3 (U ρ ⊗ |00 ih00 | U † ), where Tr2,3
is the partial trace over the space the |00 i state is from.

Proof. As an exercise or see, e.g., [17, Chapter 2.2].


The last statement means that CPT maps are exactly the reductions of unitary
channels on a larger system.
There are two different normalization conventions for the Choi-Jamiołkowski iso-
morphism. For X ∈ L(H, K) we set
1
J(X ) := C(X ) . (5.12)
dim(H)

The theorem tells us that X is a quantum channel iff J(X ) is a density matrix with
the reduction to H (obtained by tracing over K) being a maximally mixed state. The
so-called Choi state of a channel X is

J(X ) = X ⊗ idH (φ+ ) ∈ S(K ⊗ H) , (5.13)

where
1
φ+ := |1 ih1 | ∈ S(H ⊗ H) (5.14)
dim(H)
is a maximally entangled state, i.e., has the strongest bipartite quantum correlations
possible in a precise sense. In particular, the Choi state can be prepared by applying
the channel to this state.
Also note that not every bipartite state corresponds to a channel. Indeed, the
Choi-Jamiołkowski isomorphism is an isomorphism of convex cones, C : CP(H, K) →
Pos(K ⊗ H) but CPT(H, K) is mapped to a proper subset of S(K ⊗ H). The reason
is that the trace-preservation constraint of channels corresponds to dim(H)2 many
equalities whereas the trace constraint of states is just one equality.
Exercise 5.3 (Depolarizing channel):

Remember the depolarizing channel Dp ∈ L(Cd ) from Definition 3.15. Show


that Dp ∈ CPT(Cd ) iff
1
− ≤ p ≤ 1. (5.15)
d+1
For which of those values of p is Dp also invertible and when is the inverse also
a channel?

5.4. Inner products of superoperators and the χ process


matrix
Let H ∼= Cd and K ∼= Cd be Hilbert spaces and E0 , E1 , . . . , Edd0 −1 ⊂ L(H, K) be a
0

Hilbert-Schmidt orthonormal basis for the linear operators from H to K. We note

67
that vector space of linear maps L(H) is also equipped with a canonical inner product
(the Hilbert-Schmidt inner product for superoperators) given by

hX , Yi = Tr[X † , Y] (5.16)

for any X , Y ∈ L(H, K), where the trace can be calculated as


0 0
dd
X −1 dd
X −1
Tr[X ] = hEi , X (Ei )i = Tr[Ei† X (Ei )] . (5.17)
i=1 i=1

We also note that for any X , Y ∈ L(H, K)

hX , Yi = hC(X ), C(Y)i , (5.18)

where C denotes the Choi-Jamiołkowski isomorphism (5.9). Moreover, a map X ∈


L(H) can be expanded into the induced product basis {Ei,j } ⊂ L(H), which is given
by
Ei,j (A) = Ei AEj . (5.19)
The expansion is
0
dd
X −1
X = xi,j Ei,j (5.20)
i,j=0

with
xi,j = hEi , X (Ej )i = Tr[Ei† X (Ej )] . (5.21)
For H = K one typically uses basises with E0 ∝ 1. Moreover, it is common to
use a different normalization convention. For qubits, i.e. d = 2n , it is common to use
Pauli strings 1 = P0 , P1 , . . . , Pd2 −1 with kPi kop = 1 for all i = 0, . . . , d2 − 1. Then
X ∈ L(Cd ) can be written as
2
dX −1
X (ρ) = χX
i,j Pi ρPj (5.22)
i,j=0

in terms of an argument ρ ∈ L(Cd ). This representation is called the χ process matrix


representation or just χ matrix of X . Taking the normalization into account, the χ
matrix elements are given as
1
i,j =
χX Tr[Pi C(Pj )] . (5.23)
d2

Exercise 5.4 (CPT conditions and χ-matrix):

Complement the list in Theorem 5.2 by expressing the CPT conditions in terms
of the χ process matrix (5.23).

5.5. The diamond norm


The diamond norm is a norm on maps that quantifies distances of quantum channels
in an operationally meaningful way.
We start with defining the (1 → 1)-norm on L(H, K) to be the operator norm
induced by the trace norm, i.g., by

kX k1→1 := sup kX (A)k1 . (5.24)


kAk1 ≤1

Note that if X is Hermiticity-preserving then the supremum is attained for a Hermitian

68

operator since in that case X (A) = 12 (X (A) + X (A)† ) = X 12 (A + A† ) . Moreover,
due to convexity, we have for any X ∈ L(H, K)

kX k1→1 = sup kX ⊗ id( |ψ ihφ |)k1 : k |ψ ik`2 = k |ψ ik`2 = 1 , (5.25)

where one can take |ψ i = |φ i if X is Hermiticity-preserving. This means that the


supremum is attained for rank-1 operators. As density operators are normalized in
trace norm this implies that channels are normalized in (1 → 1)-norm, i.e.,

kX k1→1 = 1 ∀X ∈ CPT(H, K) . (5.26)

In order to distinguish quantum channels one can use ancillary systems. This
motivates the definition of the diamond norm as a so-called CB-completion of the
(1 → 1)-norm, which is justified by Theorem 5.3 below. To begin with, we define
diamond norm by
kX k := kX ⊗ idH k1→1 . (5.27)
Note that the diamond norm inherites the above mentioned properties from the (1 →
1)-norm.
Theorem 5.3 (Complete boundedness and (sub)multiplicativity):

For any X ∈ L(H, K)

kX k := sup kX ⊗ idH0 k1→1 , (5.28)


H0

where the supremum is taken over all finite dimensional Hilbert spaces H0 .
Moreover,

kX ⊗ Yk = kX k kYk (5.29)


kX Zk ≤ kX k kZk (5.30)

for all X ∈ L(H, K), Y ∈ L(H0 , K0 ) and Z ∈ L(H0 , H).

Proof. For the proof we refer e.g. to [17, Chapter 3.3] or recommend to prove it as an
exercise.
The theorem tells us that the diamond norm is the maximum distinguishability of
quantum channels in the following sense. Let Φ = X − Y with X , Y ∈ CPT(H, K)
be the difference of two quantum channels. One can prepare copies of a state ρ ∈
S(H⊗H0 ) and apply either X or Y to the parts on H to obtain states on K⊗H0 . Then
Proposition 4.8 tells us that 21 kΦ ⊗ idH0 (ρ)k1 is the distinguishability of the output
states. Taking the supremum over all (pure) states ρ yields the distinguishability of X
and Y, which is given by the diamond norm distance 21 kX − Yk . In particular, the
theorem tells us that optimal distinguishability can be obtained by choosing H0 = H
in a similar sense as it can be detected when a map is not CP just using H0 = H.
Another way to distinguish quantum states is to prepare their Choi states and dis-
tinguish them, as characterized by Proposition 4.8 via the trace norm. The following
statements provides a relation of the two notions of distinguishability of quantum
channels.
Proposition 5.4 (Diamond norm and trace norm):

Let X ∈ L(H, K) and d := dim(H). Then

kJ(X )k1 ≤ kX k ≤ dim(H) kJ(X )k1 , (5.31)

where J denotes the Choi-Jamiołkowski isomorphism (5.13).

69
Remark: The upper bound can be improved. For a Hermitian-preserving map X ∈
L(H, K) the improved bound implies [97, Corollary 2]

kX k ≤ dim(H) kTr2 [| J(X )|]k∞ . (5.32)

Proof of Proposition 5.4. We prove the proposition in terms of C(X ) = dim(H) J(X ).
It holds that

kX k = sup k(1 ⊗ A) C(X )(1 ⊗ B)k1 : kAkF = kBkF = 1 , (5.33)

as p
can be seen from (5.25) and using tensor network diagrams. Choosing A = B =
1/ dim(H) (corresponding to the maximally entangled state (5.14)) establishes the
lower bound. The upper bound follows using Hölder’s inequality,

k(1 ⊗ A) C(X )(1 ⊗ B)k1 ≤ k1 ⊗ Akop kC(X )k1 k1 ⊗ Bkop


= k1kop kAkop kC(X )k1 k1kop kBkop (5.34)
≤ kAkF kBkF kC(X )k1 .

Exercise 5.5 (The diamond norm/trace norm inequalities are tight):

Show that the bounds in Proposition 5.4 are tight, i.e., that there are X , Y ∈
L(H, K) so that kJ(X )k1 = kX k and kX k = dim(H) kJ(X )k1 .

These results tell us that the distinguishing quantum channels via their Choi states
is in general not optimal.
It is non-obvious how the diamond norm can actually be computed in practically.
Watrous has shown that the diamond norm can be computet efficiently via an SDP
[98]. However, for the relevant case where the map is a difference of two unitary
channels the computation is much simpler:
Proposition 5.5 (Diamond norm distance of unitary channels):

For any U, V ∈ U(d) the diamond norm distance of the corresponding unitary
channels is q
1 2
kU − Vk = 1 − dist 0, conv{λi }i∈[d] , (5.35)
2
where λi are the eigenvalues of U † V and dist( · , · ) denotes the Euclidean dis-
tance and conv( · ) the convex hull, both in the complex plane.

This proposition reflects that the diamond norm distance is a worst-case qunantity,
where here the worst-case optimization is done over the spectrum of the unitary
“difference” U † V .

Proof of Proposition 5.5. Starting with (5.25) and, e.g. by using tensor network dia-
grams or using the Choi-Jamiołkowski isomorphism, (5.10) and the vectorization rules
for matrix products (2.4), we can write the diamond norm of the channel difference
as
kU − Vk = max{k(1 ⊗ A)( |U ihU | − |V ihV |)(1 ⊗ A)k1 : kAk2 = 1}
= max{k |AU ihAU | − |AV ihAV |k1 : kAk2 = 1} (5.36)

= max |A ihA | − |AU † V ihAU † V | : kAk = 1
1 2

70
According to Watrous’ lecture notes [9, Example 2.3], normalized vectors |ψ i , |φ i ∈
Sd−1 ⊂ Cd satisfy
p
k |ψ ihψ | − |φ ihφ |kp = 21/p 1 − | hψ|φi |2 . (5.37)

This yields
q 
1
kU − Vk = max 1 − |hA|AU † V i| : kAk2 = 1
2
2
q 
= max 1 − |Tr[A2 U † V ]| : kAk2 = 1
2

q 
= max 1 − |Tr[ρ U † V ]| : ρ ∈ S(Cd )
2

r (5.38)
= 1 − min |Tr[ρ U † V ]|
2
ρ∈S(Cd )
v ( )
u X 2
u X
= 1 − min
t pi λi : p ∈ [0, 1]d , pi = 1
i i
p
= 1 − dist(0, conv{λi }) .

The required further material on representation theory (Section 2.2) was also cov-
ered in Lecture 16 including an a characterization of irreps of abelian groups (Corol-
lary 2.4), an important extension of Schur’s lemma (Theorem 2.5), some more details
on Schur-Weyl duality (Theorem 2.6), and a characterization of certain invariant op-
erators (Proposition 2.7).] Lecture 16

5.6. Unitary k-designs


Unitary k-designs are distributions of unitaries that match the first k moments of the
Haar measure on the unitary group. In this sense, they are similarly defined as the
projective k-designs from Section 3.4 and similar results also hold here.
Definition 5.6 (Unitary k-desing):

The k-th moment operator Mµ ∈ L(Cd )⊗k of a distribution µ on U(d) is


(k)

defined by Z
M(k)
µ (X) := U ⊗k XU ⊗k † dµ(U ) (5.39)
U(d)

and we write M(k) for the k-th moment operator of the Haar measure. Then
a distribution µ on the unitary group is an unitary k-design if Mµ = M(k) .
(k)

A subset {U1 , . . . , UnG } ⊂ U(d) is called an unitary k-design if its uniform


distribution is one.

Example 5.7 (Clifford group):

The n-qubit Clifford group Cln ⊂ U(2n ) is the stabilizer of the Pauli group Pn
(see Section 3.4.1),

Cln := {U ∈ U(2n ) : U Pn U † ⊂ Pn } . (5.40)

The Clifford group is a unitary 3-design but not a unitary 4-design [33, 34, 58].

The k-th moment operator of the Haar measure can be calculated using representa-
tion theory. The following identity can be seen as a generalization of Proposition 2.7

71
since M(k) (A) commute with the representations (2.24) of U (d) and (2.23) of Symk
(see e.g. [99, integration formula section]),

1 X X dλ
M(k) (A) = Tr[Aσ]σ −1 Pλ , (5.41)
k! Dλ
σ∈Symk λ`k

where Symk is the symmetric group on k elements, dλ = dim(Sλ ) and Dλ = dim(Wλ )


are the dimensions of the Weyl modules Wλ and the Specht modules Sλ (in the Schur-
Weyl decomposition 2.29), respectively, σ acts on (Cd )⊗k by permuting the tensor
factors (we have omitted the πk of the representation (2.23)), and λ ` k denotes
integer partitions of k.
The (k-th) frame potential of µ on U(d) is defined to be
2k
Fµ(k) := EU,V ∼µ Tr[U † V ] (5.42)

and we again drop the subscript µ if µ is the Haar measure. For k ≤ d one can show
that
F (k) = k! . (5.43)
Then it holds that [100, Theorem 5.4] Fµ ≥ F (k) for any finite distribution µ on
(k)

U(d) and [101, Eq. (47)]


(k)
Mµ − M(k) 2 = Fµ(k) − F (k) . (5.44)
F

As important example, for k = 2 the only irreps are given by λ = (2) and λ = (1, 1).
It holds that d(2) = 1 = d(1,1) . Thanks to P(2) = 12 (1 + F) and P(1,1) = 12 (1 − F)
it turns out that D(2) = d(d+1)
1
and D(1,1) = d(d−1)
1
, where F denotes again the flip
operator.
A straightforward simplification of (5.41) yields

Tr[A] Tr[A] Tr[AF] Tr[AF]


M(2) (A) = 1− F+ F− 1
(d − 1)(d + 1) (d − 1)d(d + 1) (d − 1)(d + 1) (d − 1)d(d + 1)
(5.45)
for d ≥ 2. Note that this formula for the second moment operator is consistent with
Proposition 2.7. Indeed, M(2) (A) satisfies the invariance condition of this statement
and can be written as linear combination of Psym2 = P(2) and P∧2 = P(1,1) .

6. Randomized benchmarking
Randomized benchmarking (RB) can be used to practically measure the average error
rate of targeted quantum channels. It does not quantify the operationally best moti-
vated error measure –the diamond norm distance– but it can be practically measured
in a comparatively cheap way that is robust against state preparation and measure-
ment (SPAM) errors. The original version of RB [32, 102–105] quantifies the average
error of Clifford gates. With interleaved randomized benchmarking [106, 107] one can
measure in a similar way the average gate fidelity of a single Clifford gate. There are
several extensions [108–116] of these basic setups.

6.1. The average gate fidelity


The average error quantified in RB is given via the average fidelity of the output of
a channel. We define the average gate fidelity (AGF) between maps X , Y ∈ L(Cd ) to

72
be Z
Favg (X , Y) := dψ hX ( |ψ ihψ |), Y( |ψ ihψ |)i (6.1)

where the integral is taken according to the uniform Haar-invariant probability mea-
sure on state vectors. So, the average gate fidelity Favg (X , Y) is a measure of closeness
of X and Y.
Let us list some properties of the average gate fidelity.
• For any X , Y
Favg (X , Y) = Favg (Y † X , id) . (6.2)
This motivates the definition

Favg (X ) := Favg (X , id) . (6.3)

• The average gate fidelity is skew symmetric, Favg (X , Y) = Favg (Y, X )∗ .


• When X and Y are Hermiticity-preserving then their average gate fidelity is
real, Favg (X , Y) ∈ R, and hence symmetric,

Favg (X , Y) = Favg (Y, X ) (6.4)

• The distance measure corresponding to the AGF

r(X , Y) := 1 − Favg (X , Y) (6.5)

is called the average error rate and inherites those two properties. We set
r(X ) := 1 − Favg (X ) for X ∈ L(H).
The average gate fidelity is related to the diamond norm as follows.
Proposition 6.1 (AGF and diamond norm [109, Proposition 9]):

For any X ∈ CPT(Cd )


q
d+1 1 p
(1 − Favg (X )) ≤ kid − X k ≤ d(d + 1) 1 − Favg (X ) . (6.6)
d 2

Proof. The proof follows from Proposition 5.4 and [117, 118]

(d + 1) Favg (X ) = d F(φ+ , J(X )) (6.7)

with φ+ being the maximally entangled state (5.14).


A refinedment of these bounds taking the so-called unitarity into account has been
derived by Kueng et al. [119].
With the next theorem we will derive identities that are crucial for RB.
Theorem 6.2 (Twirling of channels [102, 118]):

Let X ∈ L(Cd ) be trace-preserving and µ be a unitary 2-design. Then

EU ∼µ [U X (U † ρ U )U † ] = Dp (ρ) , (6.8)

where
d Favg (X ) − 1
p= (6.9)
d−1
Tr[X ] − 1
= . (6.10)
d2 − 1

73
We note that the (0, 0) component of the chi process matrix (5.23) of X is
1
χ0,0 = Tr[X ] (6.11)
d2
and that Tr[X ] real if X is Hermiticity-preserving. Often one only considers qubits
and then (6.10) is sometimes stated in terms of χ0,0 .
Proof. The map tw : L(Cd ) → L(Cd ) given by

tw(X ) := EU ∼µ [U X (U † ( · ) U )U † ] (6.12)

is isomorphic to the second moment operator (5.39)

µ (A) := EU ∼U(d) [(U ⊗ U )A(U ⊗ U ) ] , (6.13)



M(2)

since in both cases an order-4 tensor is multiplied by U ⊗ U ⊗ U † ⊗ U † with properly


matched indices. Hence, it is sufficient to proved the statement for the case where µ
is the Haar measure.
It is not difficult to see that (U ⊗ U )M(2) (A) = M(2) (A)(U ⊗ U ) for all A ∈
L(Cd ⊗ Cd ) and U ∈ U(d). Hence, Proposition 2.7 implies that the range of M(2) is
two-dimensional and, hence,
dim(ran(tw)) = 2 . (6.14)
We note that tw(id) = id and tw( 1d Tr[ · ]) = 1d Tr[ · ]. Hence, id and 1d Tr[ · ] are
in the range of tw. Also note that tw(X ) is trace-preserving and, hence, tw(X ) must
be an affince combination of these two maps. This proves that (6.8) holds for some
p ∈ C.
In order to derive (6.9) we observe that

Favg (X ) = Favg (tw(X )) . (6.15)

Hence, we only need to calculate the AGF of Dp ,

Favg (X ) = Favg (Dp )


1 (6.16)
= p + (1 − p) ,
d
which implies (6.9).
A similar argument yields (6.10): We take the trace of (6.8) to obtain

Tr[X ] = Tr[tw(X )] = Tr[Dp ] = pd2 + (1 − p) , (6.17)

which is equivalent to (6.10).

We note that (6.10) is equivalent to

Tr[X ] − 1 1
Favg (X ) = + , (6.18)
d(d + 1) d

and
Tr[X ] = d(d + 1) Favg (X ) − d , (6.19)
where X ∈ L(C ) was assumed to be trace-preserving in Theorem 6.2.
d

This implies that the average gate fidelity can be connected to the cononical inner
product on L(Cd ) as [117, 118] (see also [119])

hY, X i = Tr[Y † X ] = d(d + 1) Favg (X , Y) − hX (1), Y(1)i (6.20)

for any X , Y ∈ L(Cd ). Indeed, if Y † X is trace-preserving then (6.20) simplifies to


(6.19). Note that this identity also connects the average gate fidelity to the Frobenius

74
norm meaning that the Frobenius norm is an average case error measure as well.
Also not that for a unitary channel U ∈ CPT(Cd ) with U ∈ U(d)

Tr[U] − 1 1 | Tr[U ] |2 − 1 1
Favg (U) = + = + . (6.21)
d(d + 1) d d(d + 1) d

This equality reflects that the average gate fidelity measures how close U is to 1 on
average, where here the average is taken over it’s spectrum.

6.2. The standard RB protocol


Standard RB [105, 120] aims to provide an estimate of the AGF averaged over a gate
set G = {G1 , . . . , GnG } ⊂ U(d) that is a subgroup and a unitary 2-design. Usually, G
is taken to be the Clifford group.
The quantity we would like to estimate ind standard RB is
1 X
F̄ := Favg (G̃, G) , (6.22)
nG
G∈G

where G̃ denotes the implementation of the gate G. In the analysis of the following
protocol we will see that RB provides indeed a consistent estimate of this quantity for
the case of gate independent noise. For an extension to gate dependent noise see the
initial work by Magesan et al. [105] for a first perturbative analysis and for a more
recent and more rigorous analysis the work by Wallman [121].
Protocol 6.3 (Standard RB):

Let G = {G1 , . . . , GnG } ⊂ U(d) be a subgroup. Moreover, let ρ ∈ S(Cd ) be the


initial state, and M = {M, 1 − M } ⊂ Pos(Cd ) be the measurement (usually,
ρ = |0 ih0 | = M ). For any sequence s ∈ [nG ]m set G (s) := Gsm . . . Gs1 to be the
gate sequence and
Fm,s := Tr[M G (s) −1 G(s)(ρ)] (6.23)
to be the sequence fidelity.
Then the standard RB protocol consist of the following steps.

• Draw sequence s ∈ [nG ]m uniformly at random, which we denote by


s ∼ [nG ]m .
• Implement the gate sequence S (s) := Gsm+1 Gsm . . . Gs1 (ρ), where the last
gate Gsm+1 := G (s) −1 is the inverse of the gate G(s) ∈ G.

• Obtain an estimate F̂m,s of Fm,s my measuring S (s) (ρ) with the measure-
ment M a number of ns times.
• For each m = 1, . . . , mmax repeat this estimation for sequences
s(1) , . . . , s(nm ) ∈ [nG ]m and set F̂m to be the corresponding empirical
estimate of the average sequence fidelity
 
F̄m := Es∼[nG ]m Fm,s . (6.24)

• Fit parameters A, B, p ∈ R to the model

F̄m = Apm + B (6.25)

to obtain estimates p̂, Â, B̂.

75
• Obtain an estimate F̄ˆ of the AGF (6.22) via

dF̄ − 1
p= , (6.26)
d−1
(remember the relation (6.9) of p and Favg ).

Analysis for gate independent noise. We denote the noisy implementations of the ini-
tial state ρ, the gates Gi , and the measurement M by ρ̃, G̃i , and M̃ , respectively.
We restrict our analysis to gate-independent noise, i.e.,

G̃i = ΛGi (6.27)

for some channel Λ ∈ CPT(Cd ). Setting Csj := Gsj . . . Gs1 for j ∈ [m], this assumption
allows us to rewrite the implemented gate sequence as

S̃ (s) = Λ Gsm+1 Λ Gsm Λ Gsm−1 . . . ΛGs1


   (6.28)
= Λ Cs†m ΛCsm Cs†m−1 ΛCsm−1 . . . Cs†1 ΛCs1 ,

where we have used that Gsm+1 = Cm †


. Since the gates Gsi are drawn iid. from a
unitary 2-design, Theorem 6.2 implies that
 
Es∼[nG ]m S̃ (s) = ΛEC∈G [C † ΛC]
(6.29)
= ΛDpm = ΛDpm

with p given by (6.9), which matches the estimation (6.26). Moreover, the average
sequence fidelity is

F̄m = Tr[ẼΛDpm (ρ̃)]


= pm Tr[ẼΛ(ρ̃)] + (1 − pm ) Tr[ẼΛ(1/d)] (6.30)
= p Tr[ẼΛ(ρ̃ − 1/d)] + Tr[ẼΛ(1/d)] .
m

This expression matches the fitting model (6.25) with

A := Tr[ẼΛ(ρ̃ − 1/d)] and B := Tr[ẼΛ(1/d)] (6.31)

being the so-called SPAM constants.


Note that the resulting estimate of (6.22) is robust against state preparation and
measurement (SPAM) errors, which are absorbed in the SPAM constants A and B.
In order to make RB scalable it is important to have an efficiently tractable group
structure so that the inverse of the gate sequence can be computed. For the important
example of the Clifford group the Gottesman-Knill theorem allows to compute the
inverse of G (s) in polynomial time in the number of qubits. Using probabilistic tails
bounds (see Section 2.3), one can prove that the estimation of the involved quantities
can also be done efficiently, even when using just two different sequence lengths m
[122].

6.3. Interleaved randomized benchmarking


It is also an important task to estimate the quality of the implementation of one
specific gate Gt . Interleaved RB [106] solves this task by providing an estimate of the
average gate fidelity between the targeted gate Gt and its implementation G̃t . This is
achieved in a way that is robust against SPAM errors as well as against errors in the
RB gate sequences used to extract the average gate fidelity.

76
Let us consider again gate independent noise Λ ∈ CPT(H) that again acts on every
gate G ∈ G, i.e., the implemented gates are

G̃ = ΛG (6.32)

and a noisy target gate Gt ∈ G with possibly different noise Λt ∈ CPT(H)

G̃t = Λt Gt . (6.33)

The idea of interleaved RB is to insert G̃t after every gate in the gate sequence G (s) in
standard RB, so to interleave the sequence with multiple applications of G̃t . However,
the gate independent noise Λ also needs to be estimated.
In more detail, in interleaved RB one estimates Favg (G̃t , Gt ) by applying standard
RB twice, (i) to obtain an estimate on Favg (Λ) and (ii) to obtain an estimate on
Favg (G̃t Λ, Gt ) = Favg (Gt† G̃t Λ); see Protocol 6.4 for an RB method to achieve (i).
The idea is that one can extract Favg (G̃t , Gt ) from Favg (G̃t Λ, Gt ) once the noise
strength given by Favg (Λ) is known. In order extract and estimation of Favg (G̃t , Gt )
from these two quantities an approximation of the form

Favg (X Y) ≈ Favg (X ) Favg (Y) (6.34)

is used. Then one obtains the desired average gate fidelity by taking estimates corre-
sponding to
Favg (G̃t , Gt ) = Favg (G̃t Gt† ) = Favg (Gt† G̃t )
Favg (Gt† G̃t Λ) (6.35)
≈ .
Favg (Λ)
Interleaved RB has been improved and simplified by Kimmel et al. [110, Section 6A].
They have found a bound on the apprximation error that is tighther than the prvious
bound [106]. In terms of χ0,0 from (6.11) it reads as
XY q
0,0 − χ0,0 χ0,0 ≤ 2 (1 − χ0,0 )χ0,0 (1 − χ0,0 )χ0,0 + (1 − χ0,0 )(1 − χ0,0 ) (6.36)
χ X Y X X Y Y X Y

for any X , Y ∈ CPT((C2 )⊗n ). This bound yields bounds on the error in (6.34) via
(6.10).
Protocol 6.4 (Modified RB):

For a target gate Gt ∈ G this protocol is obtained from Protocol 6.3 by replacing
the gate sequence G (s) by

GGt := Gt Gsm Gt Gsm−1 . . . Gt Gs1 . (6.37)


(s)

Everyting is now done w.r.t. this modified gate sequence. For instance, the last
gate is Gsm+1 := GGt .
(s) −1

Analysis. We assume the noise model given by (6.32) and (6.33) and that G is a
unitary 2-design.
It is not difficult to see that the implemented gate sequence can be written as
  
S̃ (s) = Λ Cs†m Φ Csm Cs†m−1 Φ Csm−1 . . . Cs†1 Φ Cs1 (6.38)

with Φ = Gt† G̃t Λ and Csi ∼ G iid. Hence, applying the same arguments as in the
analysis of the standard RB protocol yields
 
Es∼[nG ]m S̃ (s) = ΛDpm (6.39)

77
with the estimated average gate fidelity F̄ˆG being an estimate of

Favg (Φ) = Favg (Gt† G̃t Λ) (6.40)

as desired.

7. Process tomography
A quantum state can be reconstructed from measurement data using quantum state
tomography (Section 3). In a similar way, quantum process tomography can be
used to fully reconstruct quantum channels from measurement data. However, more
complicated measurement setups are required for process tomography.
In the most simple version, one can use linear inversion [123] to obtain the channel’s
χ process matrix (5.23). But this can be challenging to implement and leads to
a non-optimal sampling complexity (number of invocations of the channel). If the
χ matrix is sparse, then compressed sensing 1.0 can be used [124] to dramatically
reduce the measurement effort, cp. the reconstruction program (3.65). However, in
most situations the χ matrix cannot be expected to be sparse.
Another way to process tomography is to reduce it to state tomography by preparing
the channel’s Choi state [125]. But this method requires maximally entangled states,
see (5.13), which are often difficult prepare with high fidelity.
Flammia et al. [55] presents a process tomography protocol that is based on low-
rank matrix reconstruction with random Pauli measurements. These low rank re-
covery guarantees can be applied to the channel’s Choi state, since the rank of this
matrix representation equals the Kraus rank of the original channel. On first sight,
such an approach requires the use of an ancilla in order to implement the Choi state
physically in a concrete application as in the work by Altepeter et al. [125]. However,
Ref. [55] also provides a more direct implementation of their protocol that does not
require any ancillas. Valid for multi-qubit processes this trick exploits the tensor-
product structure of (multi-qubit) Pauli operators. The demerit of this approach is
that the number of individual channel measurements required to evaluate a single
Pauli expectation value scales with the dimension of the underlying Hilbert space.
The channel version of the PSD-fit (3.75), which is later called the CPT-fit [126],
has been suggested by Baldwin et al. [53] and numerically compared to full tomog-
raphy and compressed sensing by Rodionov et al. [127]. First recover guarantees
for the CPT-fit and other CS recovery gurantees for quantum process tomography
wree proven in Ref. [126]. Here, the channel’s input states are sampled from an
(approximate) projective 4-design and the output states are measured with random
observables (approximate) unitary 4-design eigenbases.
Minimizing the diamond norm as a reguralizer in compressed sensing has been
investigated in Ref. [126, 128]. It is argued that for certain signals including low
Kraus rank quantum channels [128, 129] this recovery perform at least as well in
terms of measurement effort compared to the conventional trace-norm minimization.
Another approach to quantum process tomography is the use of RB methods pio-
neered by Kimmel et al. [110] and explained in more detail in the following sections.
The advantage of this approach is some —not yet rigorously quantified— roboustness
against SPAM errors, which is inherited from randomized benchmarking.

7.1. Randomized benchmarking tomography


What kind of information on a channel X can be extracted from average gate fidelities?
One can use interleaved RB to learn about non-Clifford gates. On can see that the

78
deviations of a channel X from being unital is not “seen” by average gate fidelities.
But everything else can be reconstructed. More precisely, one can learn the unital
part of X, which is given by
 
1
Xu (ρ) := X ρ − Tr[ρ] , (7.1)
d

from average gate fidelities [110]. The following result extends and simplifies results
by Scott [100] and Kimmel et al. [110].
Theorem 7.1 ([99, Theorem 38]):

Let G ⊂ U(d) be a finite unitary 2-design. The orthogonal projection


PV : L(Cd ) → L(Cd ) onto the linear hull of unital and trace- and Hermiticity
preserving maps V ⊂ L(Cd ) is given by
1 X
PV (X ) = cU (X )U (7.2)
|G|
U ∈G

with coefficients
cU (X ) = α Favg (U, X ) − β Tr[X (1)] , (7.3)

where α = d(d + 1)(d2 − 1) and β = d1 αd − 1 .

The average gate fidelities Favg (U, X ) can be estimated using interleaved RB [110].
Hence, RB methods can be used to tomographically reconstruct the unital part of
any quantum channel.
In the case where G is the Clifford group and X is (close to) a unitary channel one
can use compressed sensing for the reconstruction [99]. It is sufficient to subsample
the U ’s from the Clifford group to recover the unitary (and hence unital) X from
a number of average gate fidelities scaling as Õ(d2 ). This caling is clearly optimal.
However, it is still non-obvious what the sample complex in terms of the number of
invocations of the channel X is when RB is used [99].

8. Gate set tomography (additional


information)
In gate set tomography (GST) [130, 131] one considers an initial state ρ ∈ S(Cd ), a
finite gatset G = {G1 , . . . , GnG } ⊂ U(d), and a finite POVM M = {M1 , . . . , Mm } ⊂
Pos(Cd ). Then one reconstructs a description of the triple (M, G, ρ) –up to some gauge
invariance– from measuring G (s) (ρ) with M for different gate sequences

G (s) := Gs` . . . Gs1 (ρ) (8.1)

with s ∈ [nG ]` .
The gauge freedom is given by linear invertible transformations on (M, G, ρ) that
preserve the output distributions
 
p(j|s) := Tr Mj G (s) (ρ) . (8.2)

The actual reconstruction of a representation of (M, G, ρ) from measurement data is


based on several estimation steps applied to the measurement data [132]. Since no
prior knowledge on (M, G, ρ) is assumed the measurement effort much larger as, e.g. in

79
randomized benchmarking tomography (Section 7.1). Moreover, the involved compu-
tations are challenging and do not (yet) come along with any theoretical guarantees.
On the upside, however, one can estimate diamond norm errors (Section 5.5) of the
implementation of G by optimizing over the gauge freedom. So far, this seems to be
the only quantum characterization and verification that has been used to estimate
error in diamond norm.

Bibliography
[1] S. Flammia, Characterization of quantum devices, QIP tutorial 2017, Seattle
(2017).
[2] J. Preskill, Quantum supremacy now? (2012).
[3] D. Shepherd and M. J. Bremner, Temporally unstructured quantum computa-
tion, Proc. Roy. Soc. A 465, 1413 (2009), arXiv:0809.0847.

[4] M. J. Bremner, A. Montanaro, and D. J. Shepherd, Average-case complexity


versus approximate simulation of commuting quantum computations, Phys. Rev.
Lett. 117, 080501 (2016), arXiv:1504.07999 [quant-ph].
[5] S. Aaronson and A. Arkhipov, The computational complexity of linear optics,
in STOC’11: Proc. 43rd Ann. ACM Symp. Theor. Comput. (ACM, 2011) pp.
333–342, arXiv:1011.3245 [quant-ph].
[6] S. Boixo, S. V. Isakov, V. N. Smelyanskiy, R. Babbush, N. Ding, Z. Jiang, M. J.
Bremner, J. M. Martinis, and H. Neven, Characterizing quantum supremacy in
near-term devices, Nature Physics 14, 595 (2018), arXiv:1608.00263 [quant-ph].

[7] J. Bermejo-Vega, D. Hangleiter, M. Schwarz, R. Raussendorf, and J. Eisert,


Architectures for quantum simulation showing a quantum speedup, Phys. Rev.
X 8, 021010 (2018), arXiv:1703.00466 [quant-ph].
[8] J. Preskill, Quantum computing in the NISQ era and beyond, Quantum 2 (2018),
10.22331/q-2018-08-06-79, arXiv:1801.00862 [quant-ph].

[9] J. Watrous, John watrous’s lecture notes, https://cs.uwaterloo.ca/


~watrous/LectureNotes.html, [accessed 2019-March-31].
[10] S. Foucart and H. Rauhut, A mathematical introduction to compressive sensing
(Springer, 2013).

[11] R. T. Rockafellar, Convex analysis, 2nd ed. (Princeton university press, 1970).
[12] B. Simon, Representations of finite and compact groups, 10 (Am. Math. Soc.,
1996).
[13] R. Goodman and N. R. Wallach, Representations and invariants of the classical
groups, Vol. 68 (Cambridge University Press, 2000).

[14] M. Grant and S. Boyd, in Recent advances in learning and control, Lecture
Notes in Control and Information Sciences, edited by V. Blondel, S. Boyd, and
H. Kimura (Springer-Verlag Limited, 2008) pp. 95–110, http://stanford.edu/
~boyd/graph_dcp.html.

[15] M. Grant and S. Boyd, CVX: Matlab software for disciplined convex program-
ming, version 2.1, http://cvxr.com/cvx (2014).

80
[16] Python package cvxpy, https://www.cvxpy.org/install/index.html, ac-
cessed: 2019-06-10.
[17] J. Watrous, The Theory of Quantum Information (Cambridge University Press,
2018).
[18] H. Häffner, W. Hänsel, C. F. Roos, J. Benhelm, D. Chek-Al-Kar, M. Chwalla,
T. Körber, U. D. Rapol, M. Riebe, P. O. Schmidt, C. Becher, O. Gühne, W. Dür,
and R. Blatt, Scalable multiparticle entanglement of trapped ions, Nature 438,
643 (2005), arXiv:quant-ph/0603217 [quant-ph].

[19] J. Haah, A. W. Harrow, Z. Ji, X. Wu, and N. Yu, Sample-optimal tomography


of quantum states, IEEE Trans. Inf. Theory 63, 5628 (2017), arXiv:1508.01797
[quant-ph].
[20] R. O’Donnell and J. Wright, Efficient quantum tomography, arXiv:1508.01907
[quant-ph].
[21] T. Heinosaari, L. Mazzarella, and M. M. Wolf, Quantum tomography under
prior information, Commun. Math. Phys. 318, 355 (2013), arXiv:1109.5478
[quant-ph].
[22] R. J. Milgram, Immersing projective spaces, Ann. Math. 85, 473 (1967).

[23] K. H. Mayer, Elliptische differentialoperatoren und ganzzahligkeitssätze für


charakteristische zahlen, Topology 4, 295 (1965).
[24] D. Goyeneche, G. Cañas, S. Etcheverry, E. S. Gómez, G. B. Xavier, G. Lima,
and A. Delgado, Five measurement bases determine pure quantum states on any
dimension, Phys. Rev. Lett. 115, 090401 (2015), arXiv:1411.2789 [quant-ph].
[25] J. M. Renes, R. Blume-Kohout, A. J. Scott, and C. M. Caves, Symmetric infor-
mationally complete quantum measurements, J. Math. Phys. 45, 2171 (2004),
quant-ph/0310075.
[26] A. J. Scott, Tight informationally complete quantum measurements, J. Phys. A
Math. Gen. 39, 13507 (2006), quant-ph/0604049.
[27] Z. Fan, A. Heinecke, and Z. Shen, Duality for frames, J. Fourier Anal. Appl.
22, 71 (2016).
[28] R. Kueng and D. Gross, Qubit stabilizer states are complex projective 3-designs,
arXiv:1510.02767 [quant-ph].
[29] A. Ambainis and J. Emerson, Quantum t-designs: t-wise independence in the
quantum world, in Computational Complexity, 2007. CCC ’07. Twenty-Second
Annual IEEE Conference on (2007) pp. 129–140, quant-ph/0701126.
[30] A. Roy and A. J. Scott, Weighted complex projective 2-designs from bases: Opti-
mal state determination by orthogonal measurements, J. Math. Phys. 48, 072110
(2007), arXiv:quant-ph/0703025 [quant-ph].
[31] D. Gross, K. M. R. Audenaert, and J. Eisert, Evenly distributed unitaries:
on the structure of unitary designs, J. Math. Phys. 48, 052104 (2007), quant-
ph/0611002.

[32] C. Dankert, R. Cleve, J. Emerson, and E. Livine, Exact and approximate uni-
tary 2-designs and their application to fidelity estimation, Phys. Rev. A 80,
012304 (2009), arXiv:quant-ph/0606161 [quant-ph].
[33] H. Zhu, Multiqubit clifford groups are unitary 3-designs, Phys. Rev. A 96, 062336
(2017), arXiv:1510.02619 [quant-ph].

81
[34] Z. Webb, The clifford group forms a unitary 3-design, Quantum Info. Comput.
16, 1379 (2016), arXiv:1510.02769 [quant-ph].
[35] J. Benhelm, G. Kirchmair, U. Rapol, T. Körber, C. F. Roos, and R. Blatt,
Generation of hyperentangled photon pairs, Phys. Rev. A 75, 032506 (2007).
[36] A. Klappenecker and M. Roetteler, Mutually unbiased bases are complex projec-
tive 2-designs, in Proc. IEEE International Symposium on Information Theory,
ISIT, 2005 (IEEE, 2005) pp. 1740–1744, arXiv:quant-ph/0502031 [quant-ph].

[37] M. Guta, J. Kahn, R. Kueng, and J. A. Tropp, Fast state tomography with
optimal error bounds, arXiv:1809.11162 [quant-ph].
[38] G. M. D’Ariano and P. Perinotti, Optimal data processing for quantum measure-
ments, Phys. Rev. Lett. 98, 020403 (2007), arXiv:quant-ph/0610058 [quant-ph].
[39] E. J. Candes and T. Tao, Near-optimal signal recovery from random projec-
tions: Universal encoding strategies? IEEE T Inform Theory 52, 5406 (2006),
arXiv:math/0410542 [math.CA].
[40] E. J. Candes, J. Romberg, and T. Tao, Robust uncertainty principles: exact
signal reconstruction from highly incomplete frequency information, IEEE Trans.
Inform. Theor. 52, 489 (2006), arXiv:math/0409186 [math.NA].

[41] D. L. Donoho, Compressed sensing, IEEE Trans. Inf. Th. 52, 1289 (2006).
[42] B. K. Natarajan, Sparse approximate solutions to linear systems, SIAM J.
Comp. 24, 227 (1995).
[43] D. Ge, X. Jiang, and Y. Ye, A note on the complexity of lpminimization,
Mathematical Programming 129, 285 (2011).
[44] V. Chandrasekaran, B. Recht, P. Parrilo, and A. Willsky, The convex ge-
ometry of linear inverse problems, Found. Comput. Math. 12, 805 (2012),
arXiv:1012.0621 [math.OC].

[45] D. Amelunxen, M. Lotz, M. B. McCoy, and J. A. Tropp, Living on the edge:


Phase transitions in convex programs with random data, Information and Infer-
ence: A Journal of the IMA 3, 224 (2014), arXiv:1303.6672 [cs.IT].
[46] J. A. Tropp, Convex recovery of a structured signal from independent random
linear measurements, in Sampling Theory, a Renaissance, edited by E. G. Pfan-
der (Springer, 2015) pp. 67–101, arXiv:1405.1102.

[47] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed optimiza-


tion and statistical learning via the alternating direction method of multipliers,
Found. Trends Mach. Learn. 3, 1 (2011).
[48] R. Kueng, H. Rauhut, and U. Terstiege, Low rank matrix recovery from rank
one measurements, Appl. Comp. Harm. Anal. (2015), arXiv:1410.6913 [cs.IT].
[49] R. Kueng, D. Gross, and F. Krahmer, Spherical designs as a tool for deran-
domization: The case of phaselift, in 2015 International Conference on Sampling
Theory and Applications (SampTA) (IEEE, 2015) pp. 192–196.
[50] D. Gross, Y.-K. Liu, S. T. Flammia, S. Becker, and J. Eisert, Quantum
state tomography via compressed sensing, Phys. Rev. Lett. 105, 150401 (2010),
arXiv:0909.3304 [quant-ph].
[51] D. Gross, Recovering low-rank matrices from few coefficients in any basis, IEEE
Trans. Inf. Th. 57, 1548 (2011), arXiv:0910.1879 [cs.IT].

82
[52] R. Ahlswede and A. Winter, Strong converse for identification via quantum
channels, IEEE Trans. Inform. Theory 48, 569 (2002), arXiv:quant-ph/0012127
[quant-ph].

[53] C. H. Baldwin, A. Kalev, and I. H. Deutsch, Quantum process tomogra-


phy of unitary and near-unitary maps, Phys. Rev. A 90, 012110 (2014),
arXiv:1404.2877.
[54] M. Kabanava, R. Kueng, H. Rauhut, and U. Terstiege, Stable low-rank matrix
recovery via null space properties, Information and Inference: A Journal of the
IMA 5, 405 (2016), arXiv:1507.07184 [cs.IT].
[55] S. T. Flammia, D. Gross, Y.-K. Liu, and J. Eisert, Quantum tomography via
compressed sensing: error bounds, sample complexity and efficient estimators,
New J. Phys. 14, 095022 (2012), arXiv:1205.2300 [quant-ph].

[56] Y.-K. Liu, Universal low-rank matrix recovery from Pauli measurements, Adv.
Neural Inf. Process. Syst. 24, 1638 (2011), arXiv:1103.2816.
[57] R. Kueng and D. Zhu, H. an, Low rank matrix recovery from Clifford orbits,
arXiv:1610.08070 [cs.IT].
[58] H. Zhu, R. Kueng, M. Grassl, and D. Gross, The Clifford group fails gracefully
to be a unitary 4-design, arXiv:1609.08172 [quant-ph].
[59] S. Mendelson, Learning without concentration, J. ACM 62, 21:1 (2015),
arXiv:1401.0304 [cs.LG].
[60] T. Sugiyama, P. S. Turner, and M. Murao, Precision-guaranteed quantum to-
mography, Phys. Rev. Lett. 111, 160406 (2013), arXiv:1306.4191 [quant-ph].
[61] J. A. Smolin, J. M. Gambetta, and G. Smith, Efficient method for computing the
maximum-likelihood quantum state from measurements with additive gaussian
noise, Phys. Rev. Lett. 108, 070502 (2012), arXiv:1106.5458 [quant-ph].
[62] J. A. Tropp, User-friendly tail bounds for sums of random matrices, Found.
Comput. Math. 12, 389 (2012), arXiv:1004.4389 [math.PR].
[63] S. T. Flammia and Y.-K. Liu, Direct fidelity estimation from few Pauli mea-
surements, Phys. Rev. Lett. 106, 230501 (2011), arXiv:1104.4695 [quant-ph].
[64] T. M. Cover and J. A. Thomas, Elements of information theory (John Wiley
and Sons, New York, 2012).
[65] Z. Hradil, J. Řeháček, J. Fiurášek, and M. Ježek, 3 maximum-likelihood method-
sin quantum mechanics, in Quantum State Estimation, Lecture Notes in Physics
No. 649, edited by M. Paris and J. Řeháček (Springer Berlin Heidelberg, 2004)
pp. 59–112.

[66] J. Řeháček, Z. Hradil, E. Knill, and A. I. Lvovsky, Diluted maximum-


likelihood algorithm for quantum tomography, Phys. Rev. A 75, 042108 (2007),
arXiv:quant-ph/0611244 [quant-ph].
[67] J. Shang, Z. Zhang, and H. K. Ng, Superfast maximum-likelihood reconstruction
for quantum tomography, Phys. Rev. A 95, 062336 (2017), arXiv:1609.07881
[quant-ph].
[68] C. Schwemmer, L. Knips, D. Richart, H. Weinfurter, T. Moroder, M. Klein-
mann, and O. Gühne, Systematic errors in current quantum state tomography
tools, Phys. Rev. Lett. 114, 080403 (2015), arXiv:1310.8465 [quant-ph].

[69] J. Wang, V. B. Scholz, and R. Renner, Confidence polytopes in quantum state


tomography, Phys. Rev. Lett. 122, 190401 (2019), arXiv:1808.09988 [quant-ph].

83
[70] P. Faist and R. Renner, Practical and reliable error bars in quantum tomography,
Phys. Rev. Lett. 117, 010404 (2016), arXiv:1509.06763 [quant-ph].
[71] C. Granade, C. Ferrie, I. Hincks, S. Casagrande, T. Alexander, J. Gross,
M. Kononenko, and Y. Sanders, QInfer: Statistical inference software for quan-
tum applications, Quantum 1, 5 (2017), arXiv:1610.00336 [quant-ph].
[72] C. Granade, C. Ferrie, and S. T. Flammia, Practical adaptive quantum tomog-
raphy, New J. Phys. 19, 113017 (2017), arXiv:1605.05039 [quant-ph].
[73] C. A. Fuchs and J. van de Graaf, Cryptographic distinguishability measures for
quantum mechanical states, IEEE Trans. Inf. Th. 45, 1216 (1999), arXiv:quant-
ph/9712042 [quant-ph].
[74] M. P. da Silva, O. Landon-Cardinal, and D. Poulin, Practical characterization
of quantum devices without tomography, Phys. Rev. Lett. 107, 210404 (2011),
arXiv:1104.3835 [quant-ph].
[75] H. Pashayan, J. J. Wallman, and S. D. Bartlett, Estimating Outcome Prob-
abilities of Quantum Circuits Using Quasiprobabilities, Phys. Rev. Lett. 115,
070501 (2015), arXiv:1503.07525 [quant-ph].
[76] S. Pallister, N. Linden, and A. Montanaro, Optimal Verification of Entan-
gled States with Local Measurements, Phys. Rev. Lett. 120, 170502 (2018),
arXiv:1709.03353 [quant-ph].
[77] H. Zhu and M. Hayashi, Efficient verification of pure quantum states with ap-
plications to hypergraph states, arXiv:1806.05565 [quant-ph].
[78] Y. Takeuchi and T. Morimae, Verification of many-qubit states, Phys. Rev. X
8, 021060 (2018), arXiv:1709.07575 [quant-ph].
[79] Z. Li, Y.-G. Han, and H. Zhu, Efficient verification of bipartite pure states,
arXiv:1901.09783 [quant-ph].
[80] X.-D. Yu, J. Shang, and O. Gühne, Optimal verification of general bipartite
pure states, arXiv:1901.09856 [quant-ph].
[81] K. Wang and M. Hayashi, Optimal verification of two-qubit pure states,
arXiv:1901.09467 [quant-ph].
[82] M. Cramer, M. B. Plenio, S. T. Flammia, R. Somma, D. Gross, S. D. Bartlett,
O. Landon-Cardinal, D. Poulin, and Y.-K. Liu, Efficient quantum state tomog-
raphy, Nat. Commun. 1, 149 (2010).
[83] D. Hangleiter, M. Kliesch, M. Schwarz, and J. Eisert, Direct certification
of a class of quantum simulations, Quantum Sci. Technol. 2, 015004 (2017),
arXiv:1602.00703 [quant-ph].
[84] L. Aolita, C. Gogolin, M. Kliesch, and J. Eisert, Reliable quantum certification
of photonic state preparations, Nat. Commun. 6, 8498 (2015), arXiv:1407.4817
[quant-ph].
[85] M. Gluza, M. Kliesch, J. Eisert, and L. Aolita, Fidelity witnesses for fermionic
quantum simulations, Phys. Rev. Lett. 120, 190501 (2018), arXiv:1703.03152
[quant-ph].
[86] G. Tóth, W. Wieczorek, D. Gross, R. Krischek, C. Schwemmer, and H. Wein-
furter, Permutationally invariant quantum tomography, Phys. Rev. Lett. 105,
250403 (2010), arXiv:1005.3313 [quant-ph].
[87] T. Moroder, P. Hyllus, G. Tóth, C. Schwemmer, A. Niggebaum, S. Gaile,
O. Gühne, and H. Weinfurter, Permutationally invariant state reconstruction,
New J. Phys. 14, 105001 (2012), arXiv:1205.4941 [quant-ph].

84
[88] C. Schwemmer, G. Tóth, A. e. Niggebaum, T. Moroder, D. Gross, O. Gühne,
and H. Weinfurter, Experimental comparison of efficient tomography schemes
for a six-qubit state, Phys. Rev. Lett. 113, 040503 (2014), arXiv:1401.7526
[quant-ph].
[89] A. Kalev, A. Kyrillidis, and N. M. Linke, Validating and certifying stabilizer
states, Phys. Rev. A 99, 042337 (2019), arXiv:1808.10786 [quant-ph].
[90] C. Bădescu, R. O’Donnell, and J. Wright, Quantum state certification,
arXiv:1708.06002 [quant-ph].

[91] A. Montanaro and R. de Wolf, A survey of quantum property testing, Theory of


Computing Graduate Surveys, 7, 1 (2016), arXiv:1310.2035 [quant-ph].
[92] U. Schollwöck, The density-matrix renormalization group in the age of matrix
product states, Ann. Phys. 326, 96 (2011), arXiv:1008.3477 [cond-mat.str-el].

[93] J. D. Biamonte, S. R. Clark, and D. Jaksch, Categorical tensor network states,


AIP Advances 1, 042172 (2011), arXiv:1012.0531 [quant-ph].
[94] C. J. Wood, J. D. Biamonte, and D. G. Cory, Tensor networks and graph-
ical calculus for open quantum systems, Quant. Inf. Comp. 15, 0579 (2015),
arXiv:1111.6950 [quant-ph].

[95] A. Jamiolkowski, Linear transformations which preserve trace and positive


semidefiniteness of operators, Rep. Math. Phys. 3, 275 (1972).
[96] M.-D. Choi, Completely positive linear maps on complex matrices, Lin. Alg.
App. 10, 285 (1975).

[97] I. Nechita, Z. Puchala, L. Pawela, and K. Zyczkowski, Almost all quantum


channels are equidistant, J. Math. Phys. 59, 052201 (2018), arXiv:1612.00401
[quant-ph].
[98] J. Watrous, Simpler semidefinite programs for completely bounded norms,
Chicago J. Theo. Comp. Sci. 2013, 1 (2013), arXiv:1207.5726.

[99] I. Roth, R. Kueng, S. Kimmel, Y. K. Liu, D. Gross, J. Eisert, and M. Kliesch,


Recovering quantum gates from few average gate fidelities, Phys. Rev. Lett. 121,
170502 (2018), arXiv:1803.00572 [quant-ph].
[100] A. J. Scott, Optimizing quantum process tomography with unitary 2-designs, J.
Phys. A 41, 055308 (2008), arXiv:0711.1017 [quant-ph].
[101] N. Hunter-Jones, Unitary designs from statistical mechanics in random quantum
circuits, arXiv:1905.12053 [quant-ph].
[102] J. Emerson, R. Alicki, and K. Życzkowski, Scalable noise estimation with ran-
dom unitary operators, J. Opt. B 7, S347 (2005), arXiv:quant-ph/0503243.

[103] B. Lévi, C. C. López, J. Emerson, and D. G. Cory, Efficient error character-


ization in quantum information processing, Phys. Rev. A 75, 022314 (2007),
arXiv:quant-ph/0608246 [quant-ph].
[104] E. Knill, D. Leibfried, R. Reichle, J. Britton, R. B. Blakestad, J. D. Jost,
C. Langer, R. Ozeri, S. Seidelin, and D. J. Wineland, Randomized benchmarking
of quantum gates, Phys. Rev. A 77, 012307 (2008), arXiv:0707.0963 [quant-ph].
[105] E. Magesan, J. M. Gambetta, and J. Emerson, Scalable and robust random-
ized benchmarking of quantum processes, Phys. Rev. Lett. 106, 180504 (2011),
arXiv:1009.3639 [quant-ph].

85
[106] E. Magesan, J. M. Gambetta, B. R. Johnson, C. A. Ryan, J. M. Chow, S. T.
Merkel, M. P. da Silva, G. A. Keefe, M. B. Rothwell, T. A. Ohki, M. B. Ketchen,
and M. Steffen, Efficient measurement of quantum gate error by interleaved ran-
domized benchmarking, Phys. Rev. Lett. 109, 080505 (2012), arXiv:1203.4550
[quant-ph].
[107] J. P. Gaebler, A. M. Meier, T. R. Tan, R. Bowler, Y. Lin, D. Hanneke, J. D. Jost,
J. P. Home, E. Knill, and D. Leibfried, Randomized benchmarking of multiqubit
gates, Phys. Rev. Lett. 108, 260503 (2012), arXiv:1203.3733 [quant-ph].
[108] J. Emerson, M. Silva, O. Moussa, C. Ryan, M. Laforest, J. Baugh, D. G. Cory,
and R. Laflamme, Symmetrized characterization of noisy quantum processes,
Science 317, 1893 (2007).
[109] J. J. Wallman and S. T. Flammia, Randomized benchmarking with confidence,
New J. Phys. 16, 103032 (2014), arXiv:1404.6025 [quant-ph].
[110] S. Kimmel, M. P. da Silva, C. A. Ryan, B. R. Johnson, and T. Ohki, Robust
extraction of tomographic information via randomized benchmarking, Phys. Rev.
X 4, 011050 (2014), arXiv:1306.2348 [quant-ph].
[111] J. Wallman, C. Granade, R. Harper, and S. T. Flammia, Estimating the coher-
ence of noise, New J. Phys. 17, 113020 (2015), arXiv:1503.07865 [quant-ph].
[112] J. J. Wallman, M. Barnhill, and J. Emerson, Robust characterization of loss
rates, Phys. Rev. Lett. 115, 060501 (2015), arXiv:1412.4126.
[113] J. J. Wallman, M. Barnhill, and J. Emerson, Robust characterization of leakage
errors, New J. Phys. 18, 043021 (2016), arXiv:1412.4126 [quant-ph].
[114] A. W. Cross, E. Magesan, L. S. Bishop, J. A. Smolin, and J. M. Gambetta,
Scalable randomised benchmarking of non-clifford gates, npj Quant. Inf. 2, 16012
(2016), arXiv:1510.02720 [quant-ph].
[115] A. Carignan-Dugas, J. J. Wallman, and J. Emerson, Characterizing uni-
versal gate sets via dihedral benchmarking, Phys. Rev. A 92, 060302 (2015),
arXiv:1508.06312 [quant-ph].
[116] E. Onorati, A. H. Werner, and J. Eisert, Randomized benchmarking for indi-
vidual quantum gates, arXiv:1811.11775 [quant-ph].
[117] M. Horodecki, P. Horodecki, and R. Horodecki, General teleportation channel,
singlet fraction, and quasidistillation, Phys. Rev. A 60, 1888 (1999).
[118] M. A. Nielsen, A simple formula for the average gate fidelity of a quantum
dynamical operation, Phys. Lett. A 303, 249 (2002), quant-ph/0205035.
[119] R. Kueng, D. M. Long, A. C. Doherty, and S. T. Flammia, Comparing ex-
periments to the fault-tolerance threshold, Phys. Rev. Lett. 117, 170502 (2016),
arXiv:1510.05653 [quant-ph].
[120] E. Magesan, J. M. Gambetta, and J. Emerson, Characterizing quantum gates
via randomized benchmarking, Phys. Rev. A 85, 042311 (2012), arXiv:1109.6887.
[121] J. J. Wallman, Randomized benchmarking with gate-dependent noise, Quantum
2, 47 (2018), arXiv:1703.09835 [quant-ph].
[122] R. Harper, I. Hincks, C. Ferrie, S. T. Flammia, and J. J. Wallman, Statis-
tical analysis of randomized benchmarking, Phys. Rev. A 99, 052350 (2019),
arXiv:1901.00535 [quant-ph].
[123] I. L. Chuang and M. A. Nielsen, Prescription for experimental determination
of the dynamics of a quantum black box, J. Mod. Opt. 44, 2455 (1997), quant-
ph/9610001.

86
[124] A. Shabani, R. L. Kosut, M. Mohseni, H. Rabitz, M. A. Broome, M. P. Almeida,
A. Fedrizzi, and A. G. White, Efficient measurement of quantum dynamics
via compressive sensing, Phys. Rev. Lett. 106, 100401 (2011), arXiv:0910.5498
[quant-ph].
[125] J. B. Altepeter, D. Branning, E. Jeffrey, T. C. Wei, P. G. Kwiat, R. T. Thew,
J. L. O’Brien, M. A. Nielsen, and A. G. White, Ancilla-assisted quantum process
tomography, Phys. Rev. Lett. 90, 193601 (2003), quant-ph/0303038.
[126] M. Kliesch, R. Kueng, J. Eisert, and D. Gross, Guaranteed recovery of quantum
processes from few measurements, Updated reference, key: KliKueEis19.
[127] A. V. Rodionov, A. Veitia, R. Barends, J. Kelly, D. Sank, J. Wenner, J. M.
Martinis, R. L. Kosut, and A. N. Korotkov, Compressed sensing quantum pro-
cess tomography for superconducting quantum gates, Phys. Rev. B 90, 144504
(2014), arXiv:1407.0761 [quant-ph].

[128] M. Kliesch, R. Kueng, J. Eisert, and D. Gross, Improving compressed sensing


with the diamond norm, IEEE Trans. Inf. Th. 62, 7445 (2016), arXiv:1511.01513
[cs.IT].
[129] U. Michel, M. Kliesch, R. Kueng, and D. Gross, Note on the saturation of the
norm inequalities between diamond and nuclear norm, IEEE Trans. Inf. Th. 64,
7443 (2016), arXiv:1612.07931 [cs.IT].
[130] S. T. Merkel, J. M. Gambetta, J. A. Smolin, S. Poletto, A. D. Córcoles, B. R.
Johnson, C. A. Ryan, and M. Steffen, Self-consistent quantum process tomog-
raphy, Phys. Rev. A 87, 062119 (2013), arXiv:1211.0322 [quant-ph].

[131] R. Blume-Kohout, J. King Gamble, E. Nielsen, J. Mizrahi, J. D. Sterk, and


P. Maunz, Robust, self-consistent, closed-form tomography of quantum logic
gates on a trapped ion qubit, arXiv:1310.4492 [quant-ph].
[132] R. Blume-Kohout, J. K. Gamble, E. Nielsen, K. Rudinger, J. Mizrahi, K. Fortier,
and P. Maunz, Demonstration of qubit operations below a rigorous fault tol-
erance threshold with gate set tomography, Nat. Comm. 8, 14485 (2017),
arXiv:1605.07674 [quant-ph].

87

You might also like