Lecture 9
Lecture 9
Quantum simulations and quantum computing are among the most ex-
citing applications of quantum mechanics. More generally, in the quantum
technology research field one aims to develop new devices using quantum
superposition and entanglement. In a popular wording, these anticipated
developments will lead to the second quantum revolution.
A main milestone is the use of quantum capabilities to solve a (compu-
tational) problem that cannot practically be solved otherwise. Theoretical
proposals include integer factoring (Shor’s algorithm), speed-ups for op-
timization and machine learning algorithms, the simulation of complex
quantum systems, and certain sampling experiments specifically tailored
to that milestone.
But if one cannot obtain the output of a quantum simulation or com-
putation by conventional means how can one make sure that the outcome
is correct? The output of integer factorization can efficiently be checked
but, for instance, for the estimation of energies in quantum many-body
systems, or outcomes of dynamical simulations, the situation is much less
clear. Hence, for the development of trusted quantum technologies special
characterization and verification techniques are urgently required.
This course gives an introduction to the research field, to the problems
of characterization, validation, and verification, and first ways to solve
them. More specifically, quantum state tomography, quantum states cer-
tification, quantum process tomography, and randomized benchmarking
will be covered. In particular, the course provides an overview of the lat-
est developments in this still young and very active research field. The
approaches of the course are mainly of conceptual and mathematical na-
ture.
1
Contents
Contents 2
1. Introduction 4
1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Preliminaries 6
2.1. Notation and math . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2. Representation theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3. Random variables and tail bounds . . . . . . . . . . . . . . . . . . . . 13
2.4. Monte Carlo integration . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.1. Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5. Basic convex optimization problems . . . . . . . . . . . . . . . . . . . 16
2.5.1. Linear programs (LPs) . . . . . . . . . . . . . . . . . . . . . . . 16
2.5.2. Semidefinite programs (SDPs) . . . . . . . . . . . . . . . . . . 16
2.6. Quantum mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
I. Quantum states 20
3. Quantum state tomography (QST) 20
3.1. Informational completeness of measurements . . . . . . . . . . . . . . 21
3.2. Least squares fitting and linear inversion . . . . . . . . . . . . . . . . . 25
3.3. Frame theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4. Complex spherical/projective k-designs . . . . . . . . . . . . . . . . . . 28
3.4.1. Examples for 2-designs . . . . . . . . . . . . . . . . . . . . . . . 30
3.5. Symmetric measurements and the depolarizing channel . . . . . . . . . 31
3.5.1. 2-design based POVMS . . . . . . . . . . . . . . . . . . . . . . 32
3.5.2. Pauli measurements . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6. Compressed sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.6.1. Application to quantum state tomography . . . . . . . . . . . . 38
3.7. Projected least squares estimation . . . . . . . . . . . . . . . . . . . . 39
3.7.1. Proof of Theorem 3.21 for 2-design based POVMs . . . . . . . 40
3.8. Lower bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.9. Maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . . . 45
3.10. Confidence regions (additional information) . . . . . . . . . . . . . . . 46
3.11. Other methods (additional information) . . . . . . . . . . . . . . . . . 46
2
5.6. Unitary k-designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6. Randomized benchmarking 72
6.1. The average gate fidelity . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.2. The standard RB protocol . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.3. Interleaved randomized benchmarking . . . . . . . . . . . . . . . . . . 76
7. Process tomography 78
7.1. Randomized benchmarking tomography . . . . . . . . . . . . . . . . . 78
Bibliography 80
3
1. Introduction
Everything not explicitly covered in the lecture is written in gray and sometimes
indicated as “auxiliary information”.
1.1. Motivation
A central ultimate goal in quantum science is the development of a scalable universal
quantum computer allowing to run quantum algorithms. However, real quantum
systems are difficult to precisely control and subject to unavoidable noise. The noise
prevents the direct implementation of such algorithms. Fortunately, if the noise is
below a certain threshold then quantum error correction can be used in order to still
be able to successfully run quantum algorithms. This desired property is called fault
tolerance.
But still, for universal quantum computation, one needs to reduce the noise below
the fault tolerance threshold and, simultaneously, implement thousands of qubits.
These two seemingly conflicting requirements are expected to prevent the development
of universal quantum computers in the near future.
So for the time being, many researchers are following more modest goals. As a
prove of principle one would like to demonstrate the following1 .
Milestone (quantum supremacy [2])
If the solved problem cannot be solved otherwise, how can one make sure that
the outcome is actually correct? And if one has such a device that can —in
1 Aftersome criticism some researchers have stopped using the term “quantum supremacy”. But
due to its well-established technical meaning we stick to it.
Prioritize Verify
FT
Measure Fix
Figure 1.1.: The Smolin Spiral: In order to achieve fault tolerance (FT) “we must iteratively
improve devices by estimating sources of smaller and smaller errors, prioritize them, measure
them accurately, fix them, and verify they’ve been fixed” [1].
4
High complexity Low complexity
More information Less information
Classical simulation
Figure 1.2.: There is a range of characterization and validation techniques. The more complex
they are, in terms of measurement and computational effort, the more information they tend to
provide. The ones with and without full stable performance guarantees are marked in green and
red, respectively.
principle— demonstrate quantum supremacy but one cannot fully check that
it functions correctly can one then rightfully claim the achievement of quantum
supremacy?
We will introduce tools that can potentially resolve this problem. In particular, for
the case of trusted measurements quantum state certification can be used.
There are also practically relevant problems where it is expected that full universal
quantum computation is not required in order to achieve an advantage over classical
computing. In particular, when a quantum circuit is shallow enough so that it can
be completed before the built-up of noise reaches a critical level then it can be imple-
mented without error correction. There are already many examples suggesting that
the era of such Noisy Intermediate-Scale Quantum (NISQ) technology is starting now
[8]. Moreover, companies including D-Wave, Google, IBM, and Intel have already
heavily invested into building such devices.
In order to be able to improve such devices (see Figure 1.1) and to be able to fairly
compare different implementations it is crucial (i) to fully characterize their compo-
nents and (ii) to find heuristic noise estimation schemes that can be implemented
with as little requirements as possible. This course includes methods specifically tar-
geted at these goals: (i) quantum state and process tomography and (ii) randomized
benchmarking for estimating the noise in quantum gate implementations. As more
fundamental motivation we ask:
• What are the ultimate limits of measuring quantum systems?
Many of us might have wondered about the point on Hamiltonians already in their
first quantum mechanics course, where specific Hamiltonians are mostly only postu-
lated. Quantum field theory introduces several quantization methods that give quan-
tum Hamiltonians from a classical understanding of physics. However, quantization is
typically mathematically not well-defined. Usually uncontrolled approximations are
used to derive effective Hamiltonians most physicists are working with. This type of
non-rigorous top-down approach might be unsatisfying in some aspects. Therefore, it
is very desirable to have a complementary bottom-up approach:
Operational approach
In quantum tomography one does exactly that: to learn the full quantum state or
process from measured data.
We outlined certification and tomography to give two different examples. In fact,
there is a whole range of characterization and verification methods, see Figure 1.2.
5
The method of choice depends e.g., on the amount and type of information one aims
to obtain, the system size, the type of physical correlation that one should expect,
the measurement resources, the computational resources, and the type and strength
of the noise.
2. Preliminaries
In this chapter we introduce some notation and preliminary results that will be used
later. The reader may start skip this chapter first and can come back to it later when
needed.
Mathematical quantum information related background information can be found
with all details, e.g. in Watrous’ lecture notes [9]. All technical terms that are neither
explained nor referenced can be found on Wikipedia.
and σ0 := 12×2 .
• Tensor products of the Pauli matrices σs1 ⊗ · · · ⊗ σsn with s ∈ {0, 1, 2, 3}n are
called Pauli strings. They are operator basis that is orthogonal in the Hilbert-
Schmidt inner product defined below.
• L(V, W ) denotes the vector space of linear (bounded) operators from a vector
space V to a vector space W . We set L(V ) := L(V, V ).
• Herm(H) ⊂ L(H) denotes the subspace self-adjoint operators on a Hilbert space
H.
6
• Pos(H) ⊂ Herm(H) denote the convex cone of positive semidefinite operators.
For X, Y ∈ Herm(H) we write X Y if X − Y ∈ Pos(H).
• Let P, Q ∈ Pos(Cd ). Then
Tr[P Q] ≥ 0 , (2.2)
which can be seen by writing the trace as a sum over an eigenbasis of one of the
operators.
P
• A probability vector is a vector p ∈ [0, 1]d that is normalized, i.e., i pi = 1.
• S(H) := {ρ ∈ Pos(H) : Tr[ρ] = 1} is the convex set of P density operators. It
coincides with the set of convex combinations of the form i pi |ψi ihψi |, where
p is a probability vector.
• (Column) vectorization is a map | · i : Cn1 ×n2 → Cn1 n2 that stacks the columns
of a matrix A ∈ Cn1 ×n2 on top of each other. For all matrices A, B C with
fitting dimensions holds that
|ABC i = C | ⊗ A |B i , (2.4)
• The unitary group of a Hilbert space H is denoted by U(H) ⊂ L(H) and we set
U(d) := U(Cd ).
• The singular values decomposition (SVD) of a matrix X ∈ Cn1 ×n2 is given by
X = U ΣV † , where U ∈ U(n1 ) and V ∈ U(n2 ) are unitary and the only non-zero
entries of Σ are the positive singular values Σi,i for i ∈ [rank(X)]. The unitaries
U and V can be obtained by diagonalizing XX † and X † X, respectively.
Sometimes one uses a different form of the SVD X = Ũ Σ̃Ṽ † where Σ̃ is the
positive (rank(X) × rank(X))-matrix with the non-zero singular values of X
on the diagonal and Ũ and Ṽ are obtained from U and V by removing the
appropriate columns.
P
• For p ∈ R+ the `p -norm of a vector x ∈ Cd is kxk`p := ( i |xi |p ) .
1/p
whenever 1/p + 1/q = 1 (with 1/∞ := 0). Moreover, the norm inequalities
7
are satisfied for any 0 < p < q ≤ ∞ and x ∈ Cd with s non-zero elements; see
e.g. [10, Appendix A.1].
• For p ∈ (0, ∞] the Schatten p-norm kXkp of an operators X given by the `p -
norm of the vector of its singular values. A similar Hölder’s inequality and
similar norm inequalities as for the `p -norms hold for the Schatten p-norms
(with s replaced by the rank). In particular, for any operator X,
p
kXk∞ ≤ kXk2 ≤ kXk1 ≤ rank(X) kXk2 ≤ rank(X) kXk∞ (2.7)
X = U ΣV † . (2.9)
X + := U Σ+ V † . (2.11)
8
which the real and imaginary part are each iid. standard Gaussian RVs. A com-
plex (standard) Gaussian (random) vector is a vector in Cd the components of
which are iid. complex standard Gaussian RVs. A Hermitian (standard) Gaus-
sian (random) matrix is a matrix in Herm(CC d ), of which the diagonal com-
ponents are iid. Guassian RVs and the ohter components iid. complex Gaussian
RVs.
• There is a unitarily invariant probability measure on the unitary group U(H)
of a Hilbert space H called the Haar measure. For instance, unitary matrices
diagonalizing Hermitian Gaussian matrices are distributed Haar randomly. This
fact can be used to numerically sample unitaries from the Haar measure.
• R is called irreducible if the only invariant subspaces are {0} and H itself.
Irreducible representations are also called irreps.
• Two representations R : G → U(H) and R̃ : G → U(H̃) are said to be
unitarily equivalent if there is a unitary operator W : H → H̃ such that
R̃(g) = W R(g)W † for all g ∈ G.
Several irreps Ri1 , . . . , Rim might be unitarily equivalent to each other. The max-
imum such m is called the multiplicity of that irrep. The space Cm in the resulting
9
identification
m
M
Rij (g) ∼
= Ri1 (g) ⊗ 1m ∈ L(H1 ⊗ Cm ) . (2.14)
j=1
is called the multiplicity space of Ri1 . The decomposition (2.13) is called multiplicity-
free if all irreps Ri are inequivalent, i.e., not isomorphic.
Theorem 2.3 (Schur’s lemma):
Proof. The condition (2.15) implies that R(h)A† = A† R(h) for all h = g −1 ∈ G.
Hence, this condition also holds for Re(A) := 21 (A + A† ) and Im(A) := 2i 1
(A − A† )
and A is a constant if they both are. Hence, it is sufficient to prove the theorem for
A ∈ Herm(H).
Let |ψ i be an eigenvector with A |ψ i = λ |ψ i and Eigλ (A) := { |ψ i : A |ψ i = λψ}
the full eigenspace. Then R(g) |ψ i ∈ Eigλ (A) for all g ∈ G because AR(g) |ψ i =
R(g)A |ψ i = λR(g) |ψ i. So, Eigλ (A) is an invariant subspace. Since Eigλ (A) 6= {0}
and R is an irrep, Eigλ (A) = H follows.
and, hence,
Schur’s lemma (Theorem 2.3) implies that A† A = c 1 and AA† = c̃ 1 for constants
c, c̃. Obviously,
√ c = c̃, as can be seen from the eigenvalues. Either c = 0 so that A = 0
or W = A/ c is a unitary. In the latter case
10
for all g ∈ G, i.e., R and R̃ are unitarily equivalent.
A unitary W relating two representations R and R̃ as in (2.20) is called an inter-
twining unitary of R and R̃.
Particularly interesting in the context of k-designs are two group representations on
H = (Cd )⊗k . Remember that the symmetric group Symk is the group of permutations
on k elements.
Two group representations on (Cd )⊗k
The symmetric group Symk and the unitary group U(d) both have a canonical
representation on (Cd )⊗k :
πk : Symk → U (Cd )⊗k , (2.21)
∆d : U(d) → U (C )
d ⊗k
. (2.22)
We note that πk (σ) and ∆d (U ) commute for any σ ∈ Symk and U ∈ U(d).
By Psymk and P∧k we denote the orthogonal projectors onto these two subspaces,
respectively.
Let us briefly consider the case k = 2. It is easy to see that any matrix can be
decomposed into a symmetric and an anti-symmetric part, which are orthogonal to
each other. This implies that
Note that due to Corollary 2.4, both the symmetric and the antisymmetric subspace
are isomorphic to some Cm , where msym2 and m∧2 are the multiplicities of the two
different one-dimensional irreps of Sym2 .
For large k there is a similar decomposition with more summands called Schur-Weyl
decomposition. In ordert to state it we write λ ` k for an integer partition of the
form
l(λ)
X
k= λi (2.28)
i=1
of k into integers λi ≥ 1.
Theorem 2.6 (Schur-Weyl duality):
The action of Symk × U(d) on (Cd )⊗k given by the commuting representations
(2.23) and (2.24) is multiplicity-free and (Cd )⊗k decomposes into irreducible
components as M
(Cd )⊗k ∼= Wλ ⊗ Sλ . (2.29)
λ`k,l(λ)≤d
For any k ≥ 2, both Hsymk and H∧k occur as component in the direct
sum (2.29).
11
The spaces Wλ are called Weyl modules and Sλ Specht modules. Schur-Weyl duality
implies that the Weyl modules are the multiplicity spaces of the irreps of Symk and,
similarly, the Specht modules are the multiplicity spaces of the irreps U(d).
The last ingredient we need in order to prove Lemma 3.12 is the dimension of the
symmetric subspace.
Exercise 2.1 (Symmetric subspace):
Often it is helpful to write Psym2 in terms of the flip operator (a.k.a. swap operator)
F ∈ L(H⊗2 ), which is defined by
F |ψ i |φ i := |φ i |ψ i (2.31)
Then
1
Psym2 = (1 + F) . (2.33)
2
From (2.32) one can see that Tr[F] = d, so that indeed Tr[Psym2 ] = 21 d(d + 1).
Proposition 2.7 (Invariant operators for k = 2):
(U ⊗ U )E = E(U ⊗ U ) (2.34)
(Cd )⊗2 ∼
= Wsym2 ⊗ Ssym2 ⊕ W∧2 ⊗ S∧2 , (2.36)
where Ssym2 and S∧2 carry the multiplicities of the irreps of ∆. But Sym2 is abelian,
so according to Corollary 2.4 dim(Ssym2 ) = dim(S∧2 ) = 1, i.e., the irreps Wsym2 and
W∧2 of ∆ are multiplicity-free.
Now we can write E as a block matrix corresponding to the decomposition (Cd )⊗2 ∼ =
Wsym2 ⊕ W∧2 ,
E1,1 E1,2
E =: . (2.37)
E2,1 E2,2
As similar decomposition for ∆ is ∆ = ∆sym ⊕ ∆∧ (the off-diagonal blocks are zero).
12
The invariance ∆(U )E = E∆(U ) implies
for all U ∈ U(d). Hence, thanks to Schur’s lemma (Theorem 2.3), Ei,i = ci 1 for
i = 1, 2 and some constants c1 and c2 . Similarly, we obtain
for all U ∈ U(d). According to Schur’s lemma (the second version, Theorem 2.5) E1,2
and E2,1 are each either zero or an intertwining unitary for ∆sym and ∆∧ just as W
in (2.20). Since ∆sym and ∆∧ are irreps they cannot be intertwining unitaries and
must hence be zero. Together,
c1 1 0
E= = c1 Psym2 + c2 P∧2 . (2.42)
0 c2 1
E[X]
P[X ≥ t] ≤ . (2.43)
t
Let X be a mean zero random variable with finite variance σ 2 := E[X 2 ]. Then
σ2
P[|X| ≥ t] ≤ (2.46)
t2
for all t ≥ 0.
13
PDF
x
µ−t µ µ+t
Figure 2.1.: The (upper) tail of a random variable X is the probability of X being greater than
some threshold t. This probability is given by the corresponding area under the graph of the
probability density function (PDF) of X.
Note that in the case of a random variable Y that does not necessarily have a zero
mean Chebyshev’s inequality yields a tail bound by applying it to X := Y − E[Y ]; see
also Figure 2.1. The same argument can be made for the tails bounds that follow.
When a random variable is bounded then its empirical mean concentrates much
more than a naive application of Markov’s inequality suggests. More generally, the
following holds (see, e.g., [10, Theorem 7.20]):
Theorem 2.10 (Höffdings inequality):
2 t2
P[Sn − E[Sn ] ≥ t] ≤ exp − nP , (2.47)
(bi − ai )2
i=1
2 t2
P[|Sn − E[Sn ]| ≥ t] ≤ 2 exp − Pn (2.48)
i=1 (bi − ai )
2
for all t ≥ 0.
Proof. The second statement directly follows from the first one. In order to prove the
first one, let s > 0, apply Markov’s inequality to
use the independence of the Xi to factorize the exponential, use the bounds on Xi ,
and optimize over s.
When one can control the variance of the random variables then the following tail
bound can give a better concentration, especially for small values of t.
Theorem 2.11 (Bernstein inequality [10, Corollary 7.31]):
t2 /2
P Sn − E[Sn ] ≥ t ≤ 2 exp − 2 (2.50)
σ + at/3
Pn
with σ 2 := i=1 σi2 .
14
Another related tail bound is Azuma’s inequality, which allows for a relaxation on
the independence assumption (super-martingales with bounded differences).
of some function f . In this setting, we can iid. sample X (1) , . . . , X (m) ∼ p and take
the empirical average
1 X
m
F̂ := f (X (i) ) (2.52)
m i=1
Var[f (X)]
Var[F̂ ] = , (2.53)
m
i.e., the empirical variance also gives an estimate of the estimation error. Thanks to
the central limit theorem F̂ converges to a normal random variable with mean F and
variance Var[fm(X)] , which allows for the simple estimation of confidence intervals.
However, everything relies on the ability to sample from p. Two popular methods to
make such sampling efficient are importance sampling and Markov chain Monte Carlo
sampling.
fp
fp = q (2.54)
q
for some probability distribution q. Then we can apply the Monte Carlo sampling
idea (2.52) w.r.t. q, i.e., we draw X (1) , . . . , X (m) ∼ q to obtain the estimator
1 X p(X (i) )
m
F̂q := f (X (i) ) . (2.55)
m i=1 q(X (i) )
p |f |
q ∗ := (2.57)
Z
with normalization factor Z yields minimum variance. In particular, for f ≥ 0 we
even have Eq∗ [(f p/q ∗ )2 ] = Ep [f ]2 = Eq∗ [f p/q ∗ ]2 , i.e., Varq∗ [F̂q∗ ] = 0. So, if f does
not change its sign then a single sample is sufficient for the estimation! But in order
15
to obtain Z one needs to solve the integration problem first. However, finding non-
optimal but good choices for q can already speed up the integration.
The variables x and y are called primal and dual variable respectively. A
primal feasible point is a point x that satisfies the constraints of the primal LP.
A primal optimal point is a primal feasible point x] so that c| x] is the outcome
of the maximization in the primal LP. Dual feasible points and dual optimal
points y ] are defined similarly via the dual LP.
Weak duality states that for an primal feasible point x and dual feasible point y
we have c| x ≤ b| y, which directly follows from the definition of the primal and dual
problem. The strong duality theorem for LPs states that c| x] = b| y ] for all optimal
primal and dual points x] and y ] .
kzk`1 = min 1| y
subject to yi ≥ zi and yi ≥ −zi ∀i, (2.58)
y ≥ 0,
where 1 is the vector with all components being 1. We note that the `1 -norm of
complex vectors is not an LP but a so-called second-order cone program, which is a
special kind of SDP.
16
be more explicity, we folllow [17, Chapter 1.2.3] and definine SDPs for either real of
complex inner product spaces V and W . A linear operator Θ : L(V ) → L(W ) is said
to be Hermiticity-preserving if Θ(X) ∈ Herm(W ) for all X ∈ Herm(V ).
Definition 2.13 (SDPs):
Primal and dual feasible and optimal points are defined similarly as for LPs.
SDPs that characterized as in this definition are said to be in standard form. For
specific SDPs, equivalent formulations might often be more handy.
Weak duality refers to the fact that the value of the primal SDP cannot be larger
than the value of the dual SDP, i.e., that Tr(CX) ≤ Tr(DY ) for any primal feasible
point X and dual feasible point Y . This fact follows directly from the definition of
the primal and dual problem:
An SDP is said to satisfy strong duality if the optimal values coincide, i.e., if for some
optimal primal feasible and dual feasible point X ] and Y ] it holds that Tr(CX ] ) =
Tr(DY ] ). In fact, from a weak condition, called Slater’s condition, strong duality
follows. Slaters condition is that either one of the two following conditions is fulfilled:
(i) The primal problem is bounded above and there is a strictly feasible point of
the dual problem, i.e., there is Y ∈ Herm(W ) with Ξ† (Y ) 0.
(ii) The dual problem is bounded below and there is a strictly feasible point of the
primal problem, i.e., there is X 0 with Ξ(X) = B.
There are efficient solvers for SDPs such as CVX(PY) [15, 16]. The underlying
numerical routines (interior point methods) come along with convergence proofs. This
means that one can view SDPs as functions in a similar sense as the sine function
is a function: both have a rigorous analytic characterization and polynomial time
algorithms for their evaluation with convergence guarantees.
where the last equivalence can be checked, e.g., by using an SVD of Z and reducing
it to a 2 × 2 matrix problem. It follows that
y1 Z
kZkop = min y ∈ R : 0 (2.61)
Z † y1
17
and dualization yields
Since the spectral norm is dual to the spectral norm the equivalence (2.60) implies
that the optimal value of the primal problem for the trace norm (2.64) is indeed kZk1 .
The outcomes of a POVM measurement are given by the labels i of the effects Ei .
Exercise 2.2 (Measurement postulate):
1. Show that for finite dimensional Hilbert spaces the dual space of
(L(H), k · kop ) is indeed given by (L(H), k · k1 ).
2. Show that ρ ∈ Herm(Cd ) is positive semidefinite if and only if Tr[ρA† A] ≥
0 for all A ∈ L(Cd ). Also note that Tr[ρ] = kρk1 for all ρ ∈ Pos(H).
3. Explain how the measurement of an observable, i.e., von Neumann mea-
surements (a.k.a. projective measurements) are related to POVM mea-
surements.
4. That quantum states are dual to observables already tells us that the
trace norm is the canonical norm for quantum states. Find an operational
interpretation of the trace norm related to measurements.
18
A density matrix ρ ∈ S(Cd ) is called pure if there is a state vector |ψ i ∈ Cd such
that ρ = |ψ ihψ |.
Exercise 2.3 (Pure state condition):
Given two single quantum systems, their joint system should also be a quantum
system. This expectation is captured by the following.
Postulate (composite quantum systems):
The Hilbert space of two quantum systems with Hilbert spaces H1 and H2 ,
respectively, is the tensor product H1 ⊗ H2 .
A 7→ A ⊗ 1 . (2.65)
where ρ1 is the that ρ reduced to system 1. The reduced state captures all information
of ρ that can be obtained from measuring system 1 alone and can be explicitly obtained
by the partial trace over the second subsystem
as ρ1 := Tr2 [ρ].
Exercise 2.4 (The swap-trick):
Let F ∈ L(H ⊗ H) be the flip operator (or swap operator), i.e., the linear
extension of the map |ψ i |φ i 7→ |φ i |ψ i. Show that
19
Part I
Quantum states
Quantum state tomography is the task to (re)construct a full description of a given
quantum state from measurement data. Due to the nature of quantum physics one
cannot measure a quantum state from just one single copy but many copies of the
same state are required for that task. As the quantum measurements are of prob-
abilistic nature one can also only hope to be able to reconstruct the state up to
some finite precision. Of course, the precision depends on the precise reconstruction
method, the number of measurements, and the type of measurements at hand. Also
the computational efficiency of the actual reconstruction algorithm is crucial. For in-
stance, in one of the largest state tomography implementations [18] an 8-qubit state
was reconstructed, which required two weeks post-processing.
This illustrates the need of tomographic schemes that are user friendly and work
with an optimal amount of resources at the same time.
• d2 − 1 many measurement settings are required, which is much more than nec-
essary and infeasible for intermediate-sized quantum systems.
• The errors on the estimates of Tr[ρMi ] tend to add up in an unfavorable way.
• Moreover, they will typically result in an estimate of ρ that is not a positive
semi-definite operator.
In the following sections, we will address the drawbacks of the naive QST scheme
and introduce methods that (partially) resolve them. In order to be able to fairly
compare different strategies we will use the following tomography framework.
20
Sequential QST for iid. state preparations
Usually one does not use a new POVM for each copy of ρ so that many of the
POVMs M(i) are the same.
The randomness in any tomographic scheme comes at least from the measurements
being probabilistic. However, in general, also the measurement setup and potentially
even the reconstruction algorithm can contain probabilistic elements, each of which
can result in some failure probability of the measurement scheme.
In the end, there will always be a trade-off between resources:
• the number of samples nρ ,
• the effort of implementing the measurement setup (M(i) )i∈nρ ,
• prior knowledge P, and
• the computational cost of the reconstruction algorithm
and performance:
• kρ̂ − ρkp and
• the failure probability.
There are two paths of how our tomography setting could be generalized. The
generalization of sequential measurements are parallel measurements where ρ⊗nρ is
measured with one large joint POVM. This concept implicitly includes the possibility
of processing ρ⊗nρ in a quantum computer with unbounded circuit complexity (see
[19, 20] for QST including quantum computations). However, in order to keep things
practical and NISQ-area oriented we focus on sequential measurements where the
(i + 1)-th copy of ρ is measured after the i-th copy.
Another path of generalization could be to relax iid. assumption, i.e., the assump-
tion that the measured total state is of the form ρ⊗nρ . But to keep our considerations
simple we will keep the very convenient iid. assumption. When the total state is ρtot
instead of ρ⊗nρ the iid. mismatch error kρtot − ρ⊗nρ k1 would add to the reconstruc-
tion error in the worst case.
• How many measurement settings (in terms of POVM measurements) are re-
quired?
• What is the relation between the number of outcomes in POVM measurements
and expectation values of observables?
21
now justify this intuition and extend it to the situation where the measured state is
known to be pure.
For a set of operators M = {Mi }i∈[m] ⊂ Herm(H) we define the measurement map
M : L(H) → Rm by its components
for all ρ, σ ∈ P. For the case P = S(H) we just call {M(i) } informationally
complete.
which is the set of all possible complex linear combinations of all operators occurring in
the POVMs (M(1) , . . . , M(nM ) ). A set of operators S is called operator system if 1 ∈ S
and if it is closed under taking adjoints, i.e., S = S† . Clearly, S(M(1) , . . . , M(nM ) ) is
an operator system (assuming nM ≥ 1).
Proposition 3.2 (POVMs and operator systems [21, Proposition 1]):
Let S ⊆ L(H) be an operator system. Then there exists a POVM M such that
S = S(M) and M has dim(S) outcomes. Any POVM B satisfying S = S(B)
has at least dim(S) outcomes.
Proof. Exercise.
It seems that operator systems capture the information contained in POVMs they
are generated by. Indeed, this is formalized by [21, Proposition 2]:
Proposition 3.3 (Operator systems and informational completeness):
(P − P) ∩ S⊥ = {0} . (3.5)
22
Proof. For all states ρ, σ ∈ P, we have:
One may only considers measurement data that comes in the form of expectation
values of observables rather than from full POVM statistics. We adjust the definition
of informational completeness also to that case.
Definition 3.4 (P-informational completeness for observables):
for all ρ, σ ∈ P. For the case P = S(H) we just call M informationally complete.
Proof. Exercise.
23
Proof. The following proof of [21, Proposition 6] also holds here. Also note that con-
tinuous embeddings with compact domains automatically have a continuous inverse
(see a post on StackExchange), i.e., are topological embeddings.
Pure states
Let us consider the special but relevant case of P the set of pure states
where krk`2 ≤ 1 is the Bloch vector and (σi )i∈[3] are the Pauli matrices (2.1). A
sate ρ is pure, i.e., rank(ρ) = 1, iff it is of the form ρ = |ψ ihψ |, or equivalently,
iff krk`2 = 1.
Exercise 3.1:
The Bloch representation tells us that the set of pure one-qubit states is isomorphic
to the 2-dimensional unit sphere in R3 . However, there is no topological embedding
of the 2D sphere into R2 . Hence, Proposition 3.6 tells us that two expectation values
are not enough for the recovery of states in P2 . On the other hand, dimR (S(C2 )) = 3
with the isomorphism Herm(Cd ) = Rd the normalization condition Tr[ρ] = 1 ∀ρ ∈ S
2
implies that 3 expectation values are sufficient to recover any state in P2 . Together,
this yields the first entry in the list (3.11).
In general, the minimum number of observables that are informationally complete
w.r.t. P can be bounded as follows.
Theorem 3.7 (Informational completeness for pure states [21]):
This theorem implies that for d = 4, . . . , 10, the minimum number mmin of observ-
ables that are informationally complete is known to be as follows — the cases d = 2, 3
24
are covered by [21, Theorems 2]:
d 2 3 4 5 6 7 8 9 10
(3.11)
mmin 3 7 9 or 10 15 17 or 18 22 or 23 23 − 25 31 33 or 34
Proof idea of Theorem 3.7. For the proof of the upper bound (3.9) an information-
ally complete set of observables is constructed based on Milgram’s work [22]. The
subset of pure states P ⊂ S(Cd+1 ) is diffeomorphic1 to the complex projective space
CPd . Milgram [22] constructed immersions2 of the complex projective space CPd
into R4d−α−c , where c = 1 for even d and c = 0 otherwise. Then an identification
of the real vector space of observables as Herm(Cd ) ∼ = Rd can be used to obtain an
2
y = M(ρ) + , (3.12)
25
where ∈ Rm is additive noise (e.g, including the unavoidable statistical estimation
error). Now we wish to compute a reasonable first estimate ρ̂ of ρ from y and M.
Without loss of generality we can assume that M : Herm(Cd ) → Rm is injective,
since, if it is not, we can modify it by extending M by 1 and setting the corresponding
estimate yj := 1. Note that M is not surjective in general. In particular, whenever
m > d2 the image M(Herm(Cd )) ⊂ Rm is a zero-set in Rm . Hence, for a continu-
ously distributed sampling error (e.g. with components from a normal distribution
N (0, σ 2 )) the probability that there exists some X ∈ Herm(Cd ) such that M(X) = y
is also zero. In such a situation it is a common and practical strategy to use the least
squares estimate:
Least squares estimator
X∈Herm(Cd )
Proof. Exercise.
−1 †
The linear map M† M M is the pseudo-inverse M+ or Moore–Penrose in-
verse of M whenever M is injective (or equivalently, a matrix representation of M
has linearly independent columns, which is equivalent to M† M being invertible). It
can be calculated via an SVD, see Eq. (2.11).
Prove that
1
E kρ̂LS − ρk2 ≤ √ , (3.15)
λmin (M) nρ
where λmin (M) is the minimum singular value of M.
26
The linear inversion estimator (3.14) can be calculated more explicitly for many
relevant measurement settings that have additional structure. As we will see, this can
also allow to control λmin (M). Two important mathematical tools to capture such
additional structure are frame theory and complex projective k-designs.
In order for a POVM to be informationally complete its elements need to span the
full vector space Herm(Cd ). However, the POVM elements cannot constitute a basis,
see Exercise 3.3. Such generating sets of vectors are generally refereed to as frames.
There are several important properties that frames can have and that can, e.g., indeed
be used to further simplify the linear inversion estimator (3.14).
Exercise 3.3 (No PSD basis):
The constants A and B are called the lower and upper frame bounds, respectively.
They are of practical importance in numerical applications. The self-adjoint operator
X
S := |vi ihvi | (3.17)
i
Hence, the bounds (3.16) imply that the eigenvalues of S are contained in the interval
[A, B]. In particular, S is a positive operator. In fact, the best frame bounds are
are called (canonical) dual frame. It is easy to see that the following frame expansions
hold for any |x i ∈ V ,
X X
|x i = hvi |xi |ṽi i and |x i = hṽi |xi |vi i . (3.21)
i i
27
Note that the frame operator of the dual frame is
X X
S̃ = |ṽi ihṽi | = S −1 |vi ihvi | (S −1 )† = S −1 SS −1 = S −1 . (3.22)
i i
In particular, the dual of the dual frame is the frame itself. Moreover, since the
eigenvalues of S −1 are contained in [1/B, 1/A] it follows that the upper and lower
frame constants of the dual frame are 1/A and 1/B, respectively.
The frame { |vi i} ⊂ V is called a tight frame for V if upper and lower frame
bounds coincide, i.e.,
Pif A = B. A tight2 frame is called a Parseval frame if it satisfies
Parseval’s identity i | hvi |xi |2 = kxk for all x, i.e., if it is a tight frame with unit
frame constant. Note that the dual frame of a tight (Parseval) frame is also a tight
(Parseval) frame with the inverse frame constant. Let us summarize these insights:
Proposition 3.9 (Dual frames):
Let us consider a frame with frame operator S and frame constants 0 < A ≤ B.
Then the frame operator of the dual frame is S −1 and has frame constants
0 < 1/B ≤ 1/A. In particular, the dual frame of a tight frame is again a tight
frame with inverse frame constant.
The frame potential of { |v1 i , . . . , |vn i} is Tr[S 2 ]. Normalized tight frames are the
ones that minimize the frame potential:
Theorem 3.10 (See [28, Section I.D] for a discussion and references):
Let { |v1 i , . . . , |vn i} ⊂ Sd−1 be a frame for Cd of unit norm vectors. Then
n2
Tr[S 2 ] ≥ . (3.23)
d
Moreover, the bound is saturated iff { |v1 i , . . . , |vn i} is a tight frame.
Proof. We remember that S is a positive operator. Let {λi }i∈[d] be the eigenvalues
of S. Since S is positive we have
d
X d
X
Tr[S] = n = λj = kSk1 and Tr[S 2 ] = λ2j = kSk2 (3.24)
2
j=1 j=1
√
and the inequality kSk1 ≤ d kSk2 establishes the bound. Moreover, this inequality
is only saturated when all singular values of S are the same.
28
Definition 3.11 (k-design):
where |φ i ∈ Cd .
Lemma 3.12 (k-th moment):
k!(d − 1)!
Kk = P k, (3.26)
(k + d − 1)! sym
where Psymk is the projector onto the symmetric subspace (2.25) of (Cd )⊗k .
The required material on representation theory (Section 2.2) was also covered in
Lecture 5.] Lecture 5 In order to prove Lemma 3.12 we will need a bit of representation
theory, see Section 2.2 for the required preliminaries.
Proof of Lemma 3.12. Note that Kk is only supported on the symmetric subspace
and, since Kk = Kk† , that also its range is contained in the symmetric subspaces, i.e.
ker(Kk ) = ran(Kk ) ⊆ Hsymk .
Next we observe that for any U ∈ U(d) we have ∆d (U )† Kk ∆d (U ) = Kk and,
hence, Kk ∆d (U ) = ∆d (U )Kk . Schur’s lemma (Theorem 2.3) implies that Kk |Hsymk =
c 1|Hsymk . Together with ker(Kk ) = ran(Kk ) ⊆ Hsymk identity implies that Kk =
c Psymk .
In order to obtain c, note that Tr[Kk ] = 1 and Tr[Psymk ] is given Eq. (2.30).
Getting back to tight frames, we can make the following statement.
Proposition 3.13:
Proof. Let { |ψi i}i∈n be a 1-design. We note that the frame potential S of { |ψi i}i∈n
is
n
S = nK1 = 1 . (3.27)
d
29
Moreover,
n2 n2
Tr[S 2 ] = 2
Tr[1] = . (3.28)
d d
Hence, by Theorem 3.10, { |ψi i}i∈n is a tight frame.
More generally, one can can lift Theorem 3.10 on frame potentials to k-designs. For
this purpose, we define the frame operator of order k of { |ψi i}i∈n to be
n
X
Sk := (3.29)
⊗k
|ψi ihψi | .
i=1
n2 k!(d − 1)!
Tr[Sk2 ] = , (3.30)
(k + d − 1)!
Proof. By the definition of Kt and Sk we have that { |ψi i}i∈[n] is a k-design iff
1
Kk = Sk . (3.31)
n
With Ψki := |ψi i we can write Sk as
⊗k
n
X k
k
Sk = Ψi Ψi . (3.32)
i=1
Since Tr[Kk ] = 1, Theorem 3.10 together with Lemma 3.12 finish the proof.
STABs are states that are ubiquitous in quantum information and are defined
as follows. An n-qubit Pauli string is σs1 ⊗ · · · ⊗ σsn , where s ∈ {0, 1, 2, 3}n and
{σi } are the Pauli matrices (2.1). Then the Pauli group Pn ⊂ U(2n ) is the group
generated by all n-qubit Pauli strings and i1. An n-qubit state |ψ i is a stabilizer
state if there is an abelian subgroup S ⊂ Pn , called stabilizer (subgroup), that
stabilizes |ψ i and only |ψ i, i.e., |ψ i is the unique joint eigenvalue-1 eigenstate
of all elements in that subgroup. Such subgroups turn out to be generated by
n elements. Note that they cannot contain the element −1.
An example of such a subgroup is the one of all Pauli strings made of 1’s and
σz ’s.
The set of all stabilizer states is known to be a 2-design [31, 32], actually even
a 3-design but not a 4-design [28, 33, 34].
30
Exercise 3.5 (Stabilizer states):
MUBs are sets of bases with minimal overlaps. More explicitly, two orthonor-
mal bases { |ψi i}i∈[d] ⊂ Cd and { |φi i}i∈[d] ⊂ Cd are said to be mutually
unbiased if | hψi |φj i |2 = d1 for all i, j ∈ [d]. For instance, if U ∈ U(d) is the
discrete Fourier transform the { |i i}i∈[d] ⊂ Cd and {U |i i}i∈[d] ⊂ Cd are mutu-
ally unbiased. The number of MUBs in Cd is upper bounded by d + 1 and in
prime power dimensions (e.g., for qubits) there are exactly d + 1 MUBs [35].
However, it is a well-known open problem to exactly obtain this number for
all d. Klappenecker and Roettler [36] showed that maximal sets of MUBs are
2-designs.
SIC POVMs
1
| hψi |ψj i |2 = ∀i 6= j . (3.34)
d+1
“Symmetric” refers to the inner products being all equal. Renes et al. [25] have
shown that SIC POVMs are indeed 2-designs and have explicitly constructed
them for small dimensions.
For any p > 0 the depolarizing channel is invertible (as a map) and the inverse
is given by
1 1−p 1
Dp−1 (X) = X − Tr[X] . (3.36)
p p d
31
Proof. Exercise.
Proof. We remember the adjoint measurement map (3.40) and the definition (3.17)
of the frame operator to note that indeed
m
X
M† M(X) = Tr[XMi ] Mi = SM . (3.39)
i=1
d
Mi = |ψi ihψi | (3.41)
m
for i ∈ [m].
32
Proposition 3.18:
Proof. Obviously, Mi 0. Taking the partial trace of K2 , using Lemma 3.12 and
remembering the basis expansion of the flip operator (2.32) yields
2 1
Tr2 [K2 ] = Tr2 [Psym2 ] = Tr2 [1 + F]
d(d + 1) d(d + 1)
(3.42)
1 1
= (d1 + 1) = 1 .
d(d + 1) d
Pm
Noting that i=1 Mi = d Tr2 [K2 ] shows that M is indeed a POVM.
The POVM M is a frame with a frame operator given in terms of the measurement
map (3.2) and dual frame as follows.
Lemma 3.19 (Frame from 2-design POVMs):
Proof. By the definition of the measurement operators (3.41) the frame operator can
be written in terms of the 2-design as
m
d2 X
SM (X) = Tr[X |ψi ihψi |] |ψi ihψi | (3.44)
m2 i=1
" #
1 X
m
d2
= Tr1 (X ⊗ 1) |ψi ihψi | ⊗ |ψi ihψi | (3.45)
m m i=1
d2
= Tr1 [(X ⊗ 1)K2 ] . (3.46)
m
Using again Lemma 3.12 and the flip operator (2.33),
d2 2
SM (X) = Tr1 [(X ⊗ 1)Psym2 ]
m d(d + 1)
(3.47)
d
= (Tr[X]1 + Tr1 [(X ⊗ 1)F]) .
m(d + 1)
d
SM (X) = (X + Tr[X]1)
m(d + 1)
(3.48)
d
= D1/(d+1) (X) .
m
This identity implies that SM is invertible. Hence, M is indeed a frame.
This lemma allows for a full frame characterization of the POVMs that are given
by spherical 2-designs.
33
Corollary 3.20 (Frame characterization and linear inversion for
POVMs from 2-designs):
d
SM (X) = (X + Tr[X]1) . (3.49)
m(d + 1)
m(d + 1)
M̃i = Mi − 1 . (3.51)
d
The pseudo-inverse of the measurement map M is given by
m
X m
X
M+ (y) = (d + 1) yi |ψi ihψi | − yi 1
i=1 i=1 (3.52)
m(d + 1) †
= M (y) − h1, yi 1 ,
d
where 1 ∈ Rm is the vector containing only ones.
Proof. Having characterized the frame operator in Lemma 3.19 this corollary directly
follows with Proposition 3.17 and Proposition 3.16.
1
Pi± = (1 ± Wi ) (3.53)
2
is the projector onto the eigenvalue ±1 eigenspace (note that P0+ = 1 and P0− = 0).
Each Pauli string (observable) Wi is associated with a two-outcome POVM given by
Mi := {Pi+ , Pi− }. Now we consider a measurement setting given by the union of all
these POVMs plus the trivial measurement given by W0 = 1, i.e., by
2
d[ −1
M := Pi+ , Pi− . (3.54)
i=0
34
√ d2 −1
where we have used that {Wi / d}i=0 is an orthonormal basis of Herm(Cd ) w.r.t. the
Hilbert-Schmidt inner product. In terms of the depolarizing channel (Definition 3.15)
this result reads as
d(d2 + 1)
SM = D 21 . (3.56)
2 d +1
(3.53), yi+ + yi− = 1, and Proposition 3.17 with c = 21 d(d2 + 1) and p = 1/(d2 + 1),
the liner inversion estimate (3.14) simplifies as
2 2
2
−1 X
dX 2 dX −1 X
d 1
M+ (y) = yo P o − 2 y o Tr[Pio ]
d i=0 o=± i i d + 1 i=0 o=± i d
2
−1 2
2 1 +
dX dX −1
1 d 2
d 1
= y − yi− Wi + − 2 (3.57)
d i=0 2 i 2 d + 1 i=0 2 d
2
1 X +
d −1
= yi − yi− Wi .
d i=0
This formula has a nice interpretation: The difference yi+ − yi− is the empirical ex-
pectation of Wi and the pseudo-inverse is the corresponding empirical estimate of the
state ρ (the 1/d factor comes from the normalization of Wi ).
defines the full set of Pauli basis measurements. For n ≥ 2 qubits MPB is not a
2-design, as can be checked via Eq. (3.30). However, the pseudo-inverse of the mea-
surement map can still be inverted in a similar fashion as before.
Exercise 3.6 (Pauli basis measurements):
Find the frame operator, the dual frame, and the pseudo-inverse of the mea-
surement of Pauli-basis measurements.
This exercise yields the linear inversion estimator, i.e., the pseudo-inverse of the
measurement map as
1 X X n
O
M+ (y) = yso (3 |bos ihbos | − 1) . (3.59)
3n
s∈{x,y,z}n o∈{±}n i=1
35
3.6. Compressed sensing
The new material on linear programming (Section 2.5.1) was also covered in Lec-
ture 13.] Lecture 13 Compressed sensing (a.k.a. compressive sensing) is still quite a
young research field concerned with the reconstruction of signals from few measure-
ments, where a signal is anything one might want to recover from measurements.
The field was mainly initialized by works of Candès, Romberg, and Tao [39, 40] and
Donoho [41]. It has an extremely wide range of applications. In particular, it is of
great use in many technological applications including imaging, acoustics, radar, and
mobile network communication.
The problems considered in compressed sensing are typically of the following form.
There is a signal x ∈ V ∼
= Rn that is known to be compressible (e.g., a sparse vector
or low-rank matrix). Then one is given access to linear measurements
that are given by measurement vectors ai ∈ V and additive noise i ∈ R that might
arise in the measurement process. Now the task is to reconstruct x from y and
(ai )i∈[m] . We wish the reconstruction to be efficiently implementable and guaranteed
to work for m being as small as possible.
In matrix notation we can rewrite (3.60) as
y = A(x) + (3.61)
where the i-th component of the measurement map A is A(x)i := hai , xi. For a
simplified setting with m ≥ n and without noise ( = 0) we know from linear algebra
that the inverse problem (3.61) has a solution for all signals X iff A is injective. For
m < n, however, A is given by a short fat matrix, which cannot be injective.
Compressed sensing exploits a known compressibility of X in order to still prac-
tically solve the inverse problem (3.61) for several simple forms of compressibility,
the simplest one being sparsity. Indeed, signals that are sparse in a known basis (or
frame) have received the most attention [10]. Let us denote by kxk`0 := |{i : xi 6= 0}|
the sparsity or `0 -norm (which is not a norm) of a signal x ∈ Rn . Moreover, let us
consider again the noiseless case, i.e. = 0. If the inverse problem given by (3.61)
has a unique solution then the solution must be the minimizer of the optimization
problem
minimize kzk`0 subject to A(z) = y . (3.62)
While this reconstruction is clearly optimal in terms of the required number of mea-
surements m, it cannot be implemented efficiently. Indeed, even for any fixed η ≥ 0,
the more general problem
is NP-hard, as can be proven by a reduction from exact cover by 3-sets problem [42]
(see also [10, Theorem 2.17]). In fact, the NP-hardness still holds when the `0 -norm
is replaced by an `q -norm for any q ∈ [0, 1) [43] (which are quasi-norms).
We note that these hardness results extend to matrices, where the sparsity is re-
placed by the matrix rank, i.e., where the recovery problem is of the form
36
Compressed sensing makes use of this observation. In typical settings, the measure-
ments A are drawn at random, where hard instances do practically not occur. The
function that measures the compressibility of the signal is replaced by a tractable
convex function. In particular, for the mentioned examples, the sparsity is replaced
by the `1 -norm and the matrix rank by the trace norm, i.e., the problems (3.63) and
(3.64) are replaced by
and the latter constraints can be rewritten as PSD constraint using (2.60) as
λi wi
2
|wi | ≤ λi ⇔ 0. (3.70)
wi∗ λi
There are several variations of the reconstruction (3.66). For instance, there is the
matrix Dantzig selector where the measured matrix is estimated by the minimizer of
minimize kZk 1 subject to
M† (M(Z) − y)
≤ λ ,
op
(3.71)
where λ depends on the noise strength as it needs to satisfy λ ≥
M† ()
op . ma-
trix Lasso (least absolute shrinkage and selection operator) estimate obtained as the
optimal point of
1
minimize µ kZk1 + kM(Z) − yk`2 , (3.72)
2
2
37
where µ ≥ 2
M† ()
op . These optimization problems can all be written as SDPs and
yield very similar solutions.
There are also computationally more efficient algorithms to solve these convex op-
timization problems. For instance, one can use singular value thresholding, which is
an iterative algorithm where one iteratively (i) updates a matrix by adding η M† (y)
for some step size η and (ii) shrinks the singular values to keep it approximately low
rank. In fact, such algorithms are an instance of the alternating direction method of
multipliers (ADMM), see e.g. [47], where one solves a class of convex optimization
problems including (3.66) iteratively. These algorithms are relatively fast and also
allow for rigorous performance guarantees. Another direction is to us non-convex
optimization methods to directly approximately solve the rank minimization problem
(3.64). These methods are even faster and often need even fewer measurements but
rigorous guarantees very rare.
Finally, let us summarize what a recovery guarantee for low rank matrix recon-
struction exactly is. Typical cases are covered by the following definition.
What is a recovery guarantee?
38
µ, Reference m0 h(η) Remarks/properties Proof technique
√
Random Paulis, [50] O(r d ln(d)2 ) O(η r d) Golfing scheme [51]
O(r η), where
Random Paulis, [55] O(r d ln(d) )
6
Uniform, robust RIP [56]
η ≥ kM† ()kF
rank(Mi ) = 1, kMi kop . d, √
O(r d ln(d)) O(η/ m), p = 2 Uniform Bowling scheme [46]
(approx.) proj. 4-designs, [48]
rank(Mi ) = 1, kMi kop . d, kk` r 1/p−1/2
O(r d ln(d)) O 2√
PSD-fit, uniform, robust NSP [54]
(approx.) proj. 4-designs, [54] m
r 2 dkk` NSP [54], Clifford
Random Clifford orbits [57] O(r3 d ln(d) O q
PSD-fit, uniform, robust
m1/q irreps [58]
Table 3.1.: List of recovery guarantees for compressed sensing based quantum state tomography.
“Random Paulis” refers to measurements of Pauli string observables. A random Clifford orbit is
a set of states obtained by applying all Clifford group operations to a certain fixed state.
There are several proof techniques for low-rank matrix reconstruction: The Golfing scheme relies
on dual certificates, the restricted isometry property quantifies the “distortion” of the relevant
signals under the measurement map, the Bowling scheme is a combination of geometric proof
techniques [44] and Mendelson’s small ball method [59], and the (stable and robust rank) null
space property (NSP) is a property of the measurement map that allows to quantify the robustness
of a reconstruction against violations of the low-rank assumption.
which was suggested by Baldwin et al. [53] and made rigorous by Kabanava et al. [54].
Besides having a slightly improved performance the main advantage of this reconstruc-
tion method is that it does not require an estimate on the noise strength.
For an overview of recovery guarantees for convex compressed sensing methods in
state tomography see Figure 3.1.
39
where ρ̂LS = M+ (y) and M+ denotes the pseudo-inverse of M.
Theorem 3.21 (Error bounds for PLS [37]):
We will outline the proof only for the case of 2-design based POVMs in Section 3.7.1.
The theorem tells implies that with probability at least 1 − δ the PLS estimation
yields an estimate of ρ within trace norm error ε whenever the number of measure-
ments is
rank(ρ)2
nρ ≥ 43 g(d) ln(d/δ) . (3.78)
ε2
The PLS estimation (3.76) can be written as an SDP, which can be seen with
the machinery of Section 2.5.2. However, the PLS estimator allows for an analytic
solution (up to one parameter), which allows to compute it much faster than the SDP
runtime.
Proposition 3.22 (State space projection [61], version of [37]):
chosen such that Tr[ρ] = 1. Moreover, x0 is the root of the function f defined
bya
d
X
f (x) := |λi − x| − d · x − Tr[X] . (3.81)
i=1
a We obtained a slightly different function than in [37, ArXiv version v1].
Proof. Exercise.
40
that the linear inversion on the measurement data is given by M+ with the closed
form expression (3.52).
We note that M is a single POVM and we take
ni
yi = (3.82)
nρ
For measurements (3.82) set ρ̂LS := M+ (y) with the pseudo-inverse M+ from
(3.52). Then
3τ 2 n
P[kρ̂LS − ρkop ≥ τ ] ≤ d exp − (3.84)
16d
for all τ ∈ [0, 2].
For the proof we use a matrix version of the Bernstein inequality from Theorem 2.11.
Theorem 3.24 (Matrix Bernstein inequality [62, Theorem 1.4]):
X̀
σ 2 :=
E[Xi2 ]
. (3.86)
op
i=1
h
X̀
i
τ 2 /2
P
Xi
≥ τ ≤ d exp − . (3.87)
i=1
op σ + a τ /3
2
Pm of Lemma 3.23. From Tr[ρ] = 1 and M being a single POVM follows that
Proof
i=1 yi = 1. Hence, the pseudo-inverse of the measurement operator (3.52) becomes
m
X m
X
M+ (y) = (d + 1) yi |ψi ihψi | − yi 1
i=1 i=1
Xm
ni
= ((d + 1) |ψi ihψi | − 1) (3.88)
n
i=1 ρ
nρ
1 X
= Yk ,
nρ
k=1
where {Yk }k∈[nρ ] are iid. copies of a random matrix Y corresponding to the mea-
surement outcome of the POVM M: Y is (d + 1) |ψi ihψi | − 1 with probability
P[i] = Tr[Mi ρ] = m d
hψi | ρ |ψi i. By construction, it holds that E[Y ] = E[M+ (y)].
41
Since M is informationally complete we have
i.e., the linear inversion estimator ρ̂LS = M+ (y) is unbiased, which implies that
E[Y ] = ρ. Together, we have an error term
nρ
X 1
ρ̂LS − ρ = (Yk − E[Yk ]) . (3.90)
nρ
k=1
Now we wish to apply the matrix Bernstein inequality from Theorem 3.24 with
Xk = n1ρ (Yk − E[Yk ]). So we calculate
1 1
kY − E[Y ]kop ≤ max k(d + 1) |ψi ihψi | − 1 − ρkop
nρ nρ i∈[m]
1
≤ max
d |ψi ihψi | − ((1 − |ψi ihψi |) + ρ)
op
nρ i∈[m] | {z } | {z } (3.91)
d1...0 21...0
d
≤ =: a ,
nρ
We note that (d + 1)2 − 2(d + 1) = d2 − 1, remember the frame operator (3.44), and
und use Lemma 3.19 to obtain
m
X
E[(Y − E[Y ])2 ] = E[Y 2 ] − E[Y ]2 = P[i] ((d + 1) |ψi ihψi | − 1) − ρ2
2
i=1
Xm
d
= (d2 − 1) hψi | ρ |ψi i |ψi ihψi | + 1 − ρ2
i=1
m
m
= (d − 1) SM (ρ) + 1 − ρ2
2
d
1 d 1
= (d − 1)
2
ρ+ + 1 − ρ2
d+1 d+1d
= (d − 1)ρ + d 1 − ρ2 .
(3.93)
This leads to
nρ
hX 2 i
1
σ :=
E
2
(Yk − E[Yk ])
nρ op
k=1
1 X nρ
=
E[Yk2 ] − E[Yk ]2 )
nρ
k=1
op (3.94)
1
=
(d − 1)ρ + d 1 − ρ2
op
nρ
2d − 1
≤ .
nρ
We note that for τ ∈ [0, 2] we have
τ 2 /2 nρ τ 2 /2 3τ 2 nρ
≥ ≥ (3.95)
σ 2 + aτ /3 2d − 1 + 2d/3 16d
42
Lemma 3.23 tells us that a number of copies of
16 d ln(d/δ)
nρ ≥ (3.96)
3 τ2
is sufficient to reconstruct any quantum state with error bounded by τ in spectral
norm with probability at least 1 − δ. However, the distinguishability of quantum
states is given by the trace norm, see Proposition 4.8.
The bound (2.7) implies that a the reconstruction error is bounded in trace norm
as
τ
kρ̂LS − ρk1 ≤ =: ε . (3.97)
rank(ρ̂LS − ρ)
The estimate ρ̂LS carries a reconstruction error due to the statistical estimation error.
It can be expected that the reconstruction error is quite isotropically distributed in
Herm(Cd ). Hence, one would expect that rank(ρ̂LS − ρ) = d with high probability,
since low rank matrices are a zero set in Herm(Cd ). This only leads to a sample
complexity of nρ ∈ Õ(d3 /ε2 ).
Now let us assume that ρ is of low rank r. If the reconstruction ρ̂LS is close to ρ
then ρ̂LS is approximately of low rank. For any operator X ∈ L(Cd ) we denote by Xr
the best rank-r approximation of X in trace norm (and then all Schatten p-norms for
p ∈ [1, ∞)). The error of this approximation is
and the minimizer is approximation Xr . Note that σr (X) and Xr can be calculated
using a singular value decomposition.
The low rank of ρ cab be exploited as follows.
Lemma 3.25 (Approximate rank [37, Appendix, Section VIII]):
Let ρ ∈ S(Cd ) and ρ̂ ∈ Herm(Cd ) with Tr[ρ̂] = 1 such that kρ̂ − ρkop ≤ τ for
some τ ≥ 0. Then the projection of ρ to the density matrices
satisfies
kρ̂P − ρ̂k1 ≤ 4rτ + 2 min{σr (ρ), σr (ρ̂)} (3.100)
for all r ∈ Z+ .
Proof. Proof sketch The proof works in two steps: (i) the threshold value x0 in
the density matrix projection (3.80) satisfies x0 ∈ [0, τ ]. (ii) for quantum states
ρ1 , ρ2 ∈ S(Cd )
holds for all r ∈ Z+ . These two statements are relatively straightforward to prove,
see [37, Appendix, Section VIII].
This leads to the following refined version of the PLS guarantee (Theorem 3.21) for
2-design based POVMs.
Theorem 3.26 (PLS for 2-design based POVMs [37]):
Let ρ ∈ S(Cd ) be a state, fix a total number of 2-design based POVM mea-
surements (3.41) of ρ to be nρ .
43
Then, for an r ∈ Z+ and ε ∈ [0, 1], the PLS estimator (3.76) satisfies
h i
3 nρ ε2
P kρ̂PLS − ρk1 ≥ ε + 2 σr (ρ) ≤ d exp − . (3.102)
256 d r2
yields
kρ̂PLS − ρk1 ≤ ε + 2 min{σr (ρ̂PLS ), σr (ρ)} . (3.104)
We note that Theorem 3.26 exploits an approximate rank of the measured state.
This means that for an increasing number of measurements nρ the PLS estimate of
the measured state ρ estimates and increasing number of eigenvectors of ρ correctly.
where the infimum is taken over ` admissible measurements M(i) (the POVM might be
different in every trial) and all estimators ρ̂` taking the measurement outputs y ∈ Rn ,
where each yi is an iid. sample taken from the measurement outcome probabilities
(Tr[Mj ρ])j [55]. This minimax risk is the failure probability of the best tomographic
(i)
44
Theorem 3.27 (Pauli observable measurements [55, Theorem 6]):
The theorem tells us that the scaling in r and d of the number of mesurements in
Theorem 3.21(ii) is optimal up to ln(d)-factors.
With similar ideas one can derive lower bounds for the case of a single large POVM
acting on ` copies of the state, i.e., on ρ⊗` . Note that setting includes parallel and
interactive measurements. One can prove a lower bound on the number of measure-
ments ` required for tomography scaling as [19]
dr (1 − )2
`∈Ω , (3.107)
2 ln[d/(r)]
where the implicit constant depends on a confidence parameter δ. This bound implies
that also the scaling of the number of mesurements in Theorem 3.21(i) is optimal up to
ln(d)-factors. In particular, the global measurements on ρ⊗` do not yield an improved
scaling compared to sequential measurements.
Xm
fi
Mi A† = λA† . (3.110)
i=1
pi (ρ)
45
Hint: Use ln(1 + x) = x + O(x2 ) for small x.
Pm
2. We define the operators Rρ := i=1 pif(ρ)
i
Mi for i ∈ [m]. Use Eq. (3.110)
to show that λ = 1 and hence Rρ ρ = ρ, as well as Rρ ρRρ = ρ.
The last result Rρ ρRρ = ρ can be used to iteratively find the maximum via the
update rule
1
ρk+1 = Rρk ρk Rρk . (3.111)
N
3. Implement MLE numerically using the above iteration rule for the target
state |ψ i and the Pauli basis measurements from Section 3.5.2.1. Plot
the reconstruction error kρ̂ − |ψ ihψ |k1 and the log likelihood function for
m ∈ [100].
4. Repeat MLE for measurements with added Gaussian noise and plot the
reconstruction error over the noise strength η ∈ [0, 0.1] for m = 100.
5. Repeat this exercise for linear inversion (Section 3.2), compressed sens-
ing based reconstructions (Section 3.6), and projected least squares (Sec-
tion 3.7). Compare the results.
This exercise shows that the numerical implementation of MLE can have a relatively
slow convergence. However, there are much faster implementations [67] based on more
elaborate gradient ascent methods.
For an experimental comparison of MLE, least squares estimation (a different ver-
sion than considered here), and linear inversion see the work by Schwemmer et al.
[68]. They also show the following negative result.
Proposition 3.28 (Biases in QST [68]):
Hence, quantum state tomography methods using the density matrix structure
allow for trace norm error guarantees with better scalings in the dimension (see Sec-
tion 3.7) but come at the cost of being biased. Projected least squares estimation [37]
provides a way to obtain an unbiased estimate of a quantum state so that its nearest
density operator has a close to optimal trace norm error bound.
46
8 State certification is the task of making sure that a quantum state prepared in an
experiment σ is a sufficiently good approximation of a target state ρ. More precisely,
we make the following definitions.
Definition 4.1 (Quantum state certification):
In hypothesis testing one has a null hypothesis H0 (usually the one one hopes to
disprove) and an alternative hypothesis H1 and one needs to figure out which is true
based on statistical data. In this setting, there are two types of error,
47
conditions
σ = ρ ⇒ P[“accept”] ≥ 1 − δ , (4.5)
dist(ρ, σ) > ⇒ P[“reject”] ≥ 1 − δ , (4.6)
for all σ ∈ S(Cd ), where δ = e−c N and c > 0 is an absolute constant. The
parameter 1 − δ is also called the confidence of the test.
A certification test is only required to accept the target state. However, in practice,
such test will accept states from some region around the target state with large prob-
ability. Such a property of a certification test is called robustness (against deviations
from the target states). One way of how such a robustness can be guaranteed is by
estimating the distance of the targeted state ρ and the prepared state σ, as we will
see in Section 4.1 on fidelity estimation, which bounds on the distance. In this way,
one obtains more information (a distance) than just certification (just “accept” or
“reject”).
Clearly, one can also certify through full quantum state tomography. However, the
number of single sequential measurements in general required for tomography of a
state σ ∈ S(Cd ) scales as Ω(d rank(ρ)) and as Ω(d2 rank(ρ)2 ) in the case tow-outcome
Pauli string measurements [55]. So, for the relevant case of pure n-qubit states this
number scales as least as 2n . This measurement effort becomes unfeasible already for
relatively moderate n.
We will see that fidelity estimation can work with dramatically fewer measurements
than full tomography, when the target state has additional structure. In many situ-
ations, certification can work with even fewer measurements than fidelity estimation
due to an improved -dependence in the sample complexity.
Note that
√ √
q
ρ σ
= Tr √ρσ √ρ (4.8)
1
The closeness measured by the fidelity is equivalent to the trace norm distance (see
Exercise 2.2.4) as captured by the Fuchs-van-de-Graaf inequalities [73, Theorem 1],
p 1 p
1− F(ρ, σ) ≤ kρ − σk1 ≤ 1 − F(ρ, σ) . (4.9)
2
Moreover, the fidelity is symmetric, i.e., F(ρ, σ) = F(σ, ρ) for all ρ, σ.
When at least one of the states ρ and σ is pure, say ρ = |ψ ihψ | then
which can easily be proven using (4.8). We will mostly encounter that case, i.e., where
one of the states is pure.
Indeed, in direct fidelity estimation (DFE) [63, 74] one has a target state ρ ∈ S(Cd )
and assumes to be given iid. state preparations of some state σ ∈ S(Cd ). The goal
is to estimate the fidelity Tr[σρ] for the case where the ρ is a pure state, i.e. of the
√ √
1 Some authors define the fidelity just as
ρ σ
(without the square).
1
48
form ρ = |ψ ihψ |. This estimation is then solved using Monte Carlo methods; see
Section 2.4 for the relevant tools.
For general direct fidelity estimation we fix a finite tight frame {Mλ }λ∈Λ ⊂ Herm(Cd )
with frame constant A; see Section 3.3 for an introduction to frame theory. By
{M̃λ }λ∈Λ ⊂ Herm(Cd ) we denote its dual frame, which has 1/A as frame constant
(Proposition 3.9). We define the maximum norm of the frame as C := maxλ∈Λ kMλ kop
and observe that, due to Hölder’s inequality (2.8),
Traditionally [63, 74], the frame (and also the dual frame) are taken to be the
normalized n-qubit Pauli basis {2−n/2 σs1 ⊗ · · · ⊗ σsn }s∈{0,1,2,3}n , which are an or-
thonormal basis. But it has been proven to be useful to consider more general frames
for quantum information tasks [75] and we will follow this trend. In general, Λ can
be a continuous set but we assume it to
be finite here.
Given any operator σ ∈ Herm Cd we define its W -function (sometimes called
discrete Wigner function or quasi-probability distribution) and W̃ -function Wσ , W̃σ :
Λ → R by
Wσ (λ) := Tr[Mλ σ] , W̃σ (λ) := Tr[M̃λ σ] . (4.12)
This allows us to write (see the frame expansion (3.21))
" #
X X
Tr[ρσ] = Tr ρ Wσ (λ)M̃λ = Tr ρM̃λ Wσ (λ)
X
λ∈Λ λ∈Λ (4.13)
= W̃ρ (λ)Wσ (λ)
λ∈Λ
for ρ, σ ∈ Herm(Cd ).
Now, we will use importance sampling from Monte Carlo integration (see Sec-
tion 2.4) to estimate the sum (4.13) for the case where the target state ρ ∈ S(Cd ) is
a pure state and the prepared states σ ∈ S(Cd ) are arbitrary. For this purpose we
rewrite the overlap (4.13) as
X Wσ (λ)
Tr[ρσ] = AW̃ρ (λ)2 (4.14)
λ∈Λ
A W̃ ρ (λ)
and define
qλ := AW̃ρ (λ)2 (4.15)
as importance sampling distribution on the sampling space Λ, where 1/A is the frame
constant of {M̃λ }, which we now argue to be the right normalization constant. The
tight frame condition for ρ can be written as
X X hρ, ρi
W̃ρ (λ)2 = |hM̃λ , ρi|2 = . (4.16)
A
λ∈Λ λ∈Λ
For ρ being a pure state, i.e., hρ, ρi = Tr[ρ2 ] = 1 (see Exercise 2.3), we indeed obtain
X
qλ = 1 . (4.17)
λ∈Λ
Wσ (λ)
Xλ := . (4.18)
A W̃ρ (λ)
49
We will exploit that Xλ with λ ∼ q is an unbiased estimator of the fidelity:
X Wσ (λ) X
Eλ∼q [Xλ ] = qλ = W̃ρ (λ)Wσ (λ) = Tr[ρσ] , (4.19)
λ∈Λ
A W̃ ρ (λ) λ∈Λ
where the last identity is again Eq. (4.13). Next, we take the empirical estimate of X
from ` samples,
1 X̀
Y := Xλi (4.20)
` i=1
where X (i) are iid. drawn as (4.18). This is also and unbiased estimator of Tr[ρσ], the
precision of which we can control by increasing `. In order to bound the confidence
that we have an estimation error |Y − Tr[ρσ]| ≤ for some desired > 0 we need find
a maximum failure probability δ so that the tail bound
is satisfied for some , δ > 0 controlled by `; see Figure 2.1 for the idea of tail bounds.
Then we have an -good estimation of Tr[ρσ] with confidence 1 − δ.
However, we also need to take into account the error estimating Xλ from single
measurements. This for this purpose we will use an estimator Ŷ of the estimator Y ,
which uses finitely many measurements, and derive a tail bound of the form
h i
P Ŷ − Y ≥ ≤ δ (4.22)
Let {Mλ }λ∈Λ be a finite tight frame for Herm(Cd ) satisfying kMλ kop ≤ C for
all λ ∈ Λ and some constant C. Denote by A the frame constant of {Mλ } and
by {M̃λ }λ∈Λ the canonical dual frame.
Let ρ ∈ S(Cd ) be a target state with respect to which we wish to estimate the
fidelity from measurements of the observables {Mλ } and let > 0 and δ > 0
be the parameters for the desired estimation accuracy and maximum failure
probability.
The protocol consists of the following steps applied to nσ iid. copies of a pre-
pared state σ ∈ S(Cd ):
(i) Take iid. samples λ1 , . .. , λ` ∼ q from the importance sampling distribu-
tion (4.15), where ` := 21δ (or as (4.44) for well-conditioned states).
(ii) Measure each Mλi a number of mi times for i ∈ [`] with mi be chosen as
2 C2
mi := ln(2/δ) (4.23)
` A2 W̃ρ (λi )2 2
50
Theorem 4.3 (Guarantee for DEF, frame version of [63]):
X̀ 1 2 C 2 |Λ|
E[nσ ] = E mi ≤ 1 + + ln(2/δ) . (4.25)
i=1
2 δ A 2
= Tr[σ 2 ] − Tr[ρσ]2 ,
where we have used the frame condition (A = B in (3.16)) in the last step. Hence,
51
with {Pλ,α } being the projector onto the eigenspaces and {aλ,α } ⊂ − kMλ kop , kMλ kop
the eigenvalues of Mλ . We note that the expected measurement outcome is
Denoting by aλj ,αj the measurement outcome for measurement j ∈ [mi ] and consider
the following corresponding empirical estimate of Xλi (see (4.18))
1 X i m
X̂λi := aλi ,αj . (4.33)
mi AW̃ρ (λi ) j=1
Then the estimation error of the empirical estimator Ŷ of Y from (4.20) becomes
1 X̀
Ŷ − Y = (X̂λi − Xλi )
` i=1
(4.34)
1 X̀ X 1
mi
= aλi ,αj − Wσ (λi ) .
` i=1 j=1 mi AW̃ρ (λi )
Using the bound (4.11) and Höffding’s inequality (2.47) with t = ` and
C
bi,λj = −ai,λj = (4.35)
mi AW̃ρ (λi )
we find that (w.l.o.g. we assume that there are no i with W̃ρ (λi ) = 0)
2
−
P[|Ŷ − Y | ≥ ] ≤ 2 exp 1 P` 2 C2
. (4.36)
` i=1 ` mi A2 W̃ρ (λi )2
We wish that the tail bound (4.22) holds. Therefore, we impose the RHS of (4.36) to
be bounded by δ, which is equivalent to
2
ln(2/δ) ≤ 1
P` 2 C2
. (4.37)
` i=1 ` mi A2 W̃ρ (λi )2
The choice of mi as in (4.23) guarantees that this bound it always satisfied, i.e., that
the desired tail bound (4.22) holds. Then combination of the tails bounds (4.21) and
(4.22) with the union bound (2.12) proves the confidence statement (4.24).
In order to calculate the final sample complexity (4.25) note that mi is a random
variable itself, since W̃ρ (λi ) was randomly chosen. By the definition of the sampling
(4.15), we have X
E[mi ] = mi qλi
(4.38)
λi ∈Λ
2 C 2 |Λ|
≤1+ ln(2/δ) ,
` A 2
where the +1 comes from the ceiling in (4.23). Using the bound (4.30) on `, the
expected total number of measurements, the expected sample complexity, is
X̀ 1 2 C 2 |Λ|
E mi ≤ 1 + + ln(2/δ) . (4.39)
i=1
2 δ A 2
52
Example 4.4 (Paul measurements):
1 2d
E[nσ ] ≤ 1 + + ln(2/δ) , (4.42)
2 δ 2
which is consistent with its original version [63].
Note that the sample complexity scales linearly in the Hilbert space dimension
for the case of Pauli measurements. In contrast, the number of Pauli measurements
required for state tomography scales as Ω̃(d2 rank(σ)2 ) [55].
The main contribution to the number of measurements in the derivation of the sam-
ple complexity above can be trace back to the application of Chebyshev’s inequality
in (4.29). This step can, however, be improved for the following class of states.
Definition 4.5 (Well-conditioned states):
For example, if the frame {M̃λ } is the dual frame of the Pauli stings, {M̃s } =
{2−n σs1 ⊗ · · · ⊗ σsn }, then each stabilizer state ρ (3.33) with stabilizer S on n-qubits
is well conditioned with parameter α̃ = 2−n ≡ 1/d, as
X
Tr[M̃s ρ] = 2−n Tr[M̃s S] = 2−n δ2n M̃s ∈S Tr[M̃s Ms ] = 2−n δMs ∈S . (4.43)
S∈S
Proof. With probability one we have W̃ρ (λi ) ≥ α̃ for all i ∈ [`]. Moreover, |Ŵσ (λi )| ≤
C. The estimator from Step (iv) of Protocol 4.2 is bounded as
C
|Xλi | ≤ (4.46)
Aα̃
53
with probability 1.
Hence, the estimator Ŷ is also bounded as |Ŷ | ≤ ACα̃ almost surely. Höffding’s
inequality (2.47) with t = /` yields
h i
` A2 α̃2 2
P Ŷ − Tr[ρσ] ≥ ≤ 2 exp −
. (4.47)
2 C2
Imposing
` A2 α̃2 2
2 exp − ≤δ (4.48)
2 C2
and solving for ` yields (4.44).
This theorem tells us direct fidelity estimation has a smaller sampling complexity
for well-conditioned states. For instance, well-conditioning in the Pauli basis leads to
a constant sample complexity:
Example 4.7 (Pauli measurements and well-conditioned states):
Consider the Pauli observable measurements from Example 4.4 and a pure
state ρ that is well-conditioned with parameter α̃ = α/d, where α > 0 is some
constant. This well-conditioning is equivalent to
(
≥ α or
Tr[σs1 ⊗ · · · ⊗ σsn ρ] ∀s ∈ {0, 1, 2, 3}n . (4.49)
=0
2 ln(2/δ)
nσ ≤ 1 + . (4.50)
α 2 2
An example for well-conditioned states are stabilizer states, which are easy to
see to be well-conditioned with α = 1 using (3.33).
Fix parameters ˜, , δ > 0 with ˜ ≤ 12 2 . Let Ŷ be the direct fidelity estimator of
the fidelity F(ρ, σ) so that |Ŷ − F(ρ, σ)| ≤ ˜ with confidence 1 − δ. We consider
the protocol that accepts if Ŷ ≥ 1 − ˜ and rejects otherwise. As distance we
choose the trace distance distTr defined by distTr (ρ, σ) := 21 kρ − σk1 .
• Show that this protocol is an -certification test w.r.t. the trace distance
in the sense of Exercise 4.1, i.e., that the completeness and soundness
conditions are satisfied with confidence 1 − δ.
• What is the resulting sampling complexity of DFE in the well-conditioned
setting from Example 4.7?
• Let 0 < . Turn this protocol into a robust (, 0 )-certification test, i.e.,
into an -certification test that is guaranteed to accept all states within
an 0 -trace norm ball around ρ with confidence 1 − δ.
54
Proposition 4.8 (Operational interpretation of the trace distance):
and the supremum is attained for the projector P + := 1R+(σ − ρ) onto the
positive part of ρ − σ.
Proof. First we show that the supremum is attained for P + . The self-adjoint operator
difference
ρ − σ = X+ − X− (4.52)
has a positive part X + ∈ Pos(Cd ) and a negative part X − ∈ Pos(Cd ). We note that
kX ± kop ≤ 1 and, since Tr[X + − X − ] = Tr[ρ − σ] = 0, we have Tr[X + ] = Tr[X − ].
Moreover, kρ − σk1 = Tr[X + ] + Tr[X − ]. The last two statements together yield that
the trace distance between the two states is
1
kρ − σk1 = Tr[X + ] = Tr[P + (ρ − σ)] , (4.53)
2
where P + is the orthogonal projector onto the support of X + and can be obtained
as P + = 1R+(ρ − σ) using spectral calculus (2.3), where 1R+ denotes the indicator
function of R+ , see (2.44).
In order to show that the supremum cannot become larger than the trace distance,
we consider some operator P with 0 ≤ P ≤ 1. Then, indeed,
This proposition means that the trace distance of two states is given by the max-
imum distinguishability by binary POVM measurements {P, 1 − P }. This distin-
guishability can be amplified by measuring iid. copies of a quantum state σ with
{P, 1 − P }. Next, we turn this basic insight into an -certification test of pure state
ρ = |ψ ihψ | , (4.55)
55
Protocol 4.9 (Naive direct quantum state certification):
Let ρ ∈ S(Cd ) be a pure target state and Ω ∈ Pos(Cd ) with kΩkop ≤ 1. Denote
by {Ω, 1 − Ω} the binary POVM given by Ω, call the outcome corresponding
to Ω “pass” and the one of 1 − Ω “fail”.
For state preparations σ1 , . . . , σnσ ∈ S(Cd ) the protocol consists of the follow-
ing steps.
1: for i = 1, . . . , nσ do
2: measure σi with {Ω, 1 − Ω}
3: if the outcome is “fail” then:
4: output “reject”
5: end protocol (break)
6: end if
7: end for
8: output “accept”
Let ρ ∈ S(Cd ) be a pure target state, , δ > 0, and consider the distance
measure on quantum states given by the infidelity defined by 1 − F(ρ, σ). The
test from Protocol 4.9 with Ω = ρ is an -certification test w.r.t. the infidelity
from nσ independent samples for
ln(1/δ)
nσ ≥ (4.57)
with confidence at least 1−δ according to Definition 4.1. Moreover, the protocol
accepts the target state ρ with probability one.
Clearly, if σi = ρ for all i ∈ [nσ ] then the protocol accepts almost surely. Now let
us consider the case that the fidelity is small, i.e.,
(1 − )nσ ≤ δ . (4.62)
56
We note that for ∈ [0, a] ⊂ [0, 1) the following bounds hold
1 1
≤ ln ≤ ln , (4.64)
1− 1−a a
which can be seen by using the fact that 7→ ln 1− 1
is smooth, has value 0 at 0,
its first derivative is lower bounded by 1, and its second derivative is positive. So, the
minimum nσ is
nσ ≈ ln(1/δ) (4.65)
for small > 0. Moreover, for any nσ ≥ ln(1/δ)
the required bound (4.62) is satisfied.
Perhaps surprisingly, the sample complexity of this protocol does not depend on
the physical system size at all. It has a zero type I error and one can control the
type II error via the parameter δ.
However, it is generically not practical to implement the POVM. So, we follow
Pallister et al. [76] and allow for more complicated strategies. Say, we have access to
a set of POVM elements
where ρ ∈ S(Cd ) is the target state. As one can only make finitely many mea-
surements, we assume that |M| < ∞ in order to avoid technicalities. Then we pick
POVM elements Pj ∈ M with some probability and consider the corresponding binary
POVMs Mj := {Pj , 1 − Pj }, where all Pj have output “pass” and 1 − Pj have output
“fail”. Now we modify Protocol 4.9 by including this probabilistic measurement
strategy.
Protocol 4.11 (Direct quantum state certification):
Let ρ ∈ S(Cd ) be a pure target state and (µj , Pj ) a measurement strategy, i.e.,
µ a probability vector and 0 ≤ Pj ≤ 1. For each POVM {Pj , 1 − Pj }, call the
outcome corresponding to Pj “pass” and the one of 1 − Pj “fail”.
For state preparations σ1 , . . . , σnσ ∈ S(Cd ) the protocol consists of the follow-
ing steps.
1: for i = 1, . . . , nσ do
2: Draw j ∼ µ and measure σi with {Pj , 1 − Pj }
3: if the outcome is “fail” then:
4: output “reject”
5: end protocol (break)
6: end if
7: end for
8: output “accept”
with X
Ω := µj Pj (4.68)
j
Tr[Ωρ] = 1 , (4.69)
57
i.e., that the there is no false reject of the target state ρ with probability one. In
particular, this means that Tr[Pj ρ] = 1 for each measurement setup j. This constraint
still allows for optimal measurement strategies:
Proposition 4.12 ([76, Proposition 8]):
where ν(Ω) := 1 − λ2 (Ω) is the spectral gap between the maximum eigenvalue
1 (corresponding to ρ) and the second largest eigenvalue λ2 (Ω) (among all d
eigenvalues).
Proof. We note that Tr[ρΩ] = 1 means that a state vector |ψ i with ρ = |ψ ihψ | is an
eigenvalue-1 eigenvector of Ω. Moreover, let us write Ω in spectral decomposition,
d
X
Ω= λ j Pj (4.72)
j=1
58
and observe that 0 ≥ . Then
d
X
Tr[Ωσ] = Tr[ρσ] + λj Tr[Pj σ]
j=2
d
X
≤ Tr[ρσ] + λ2 Tr[Pj σ]
j=2
Xd (4.75)
= 1 − 0 + λ2 0 Tr Pj ρ⊥
j=2
= 1 − + λ2 Tr[ρ ]
0 0 ⊥
= 1 − 0 + λ2 0 = 1 − (1 − λ2 )0
≤ 1 − (1 − λ2 ) .
ln(1/δ)
nσ ≥ (4.76)
(1 − λ2 (Ω))
with confidence at least 1 − δ. Moreover, the protocol accepts the target state
ρ with probability one.
ln(1/δ)
nσ ≥ (4.79)
ln 1
1−(1−λ2 (Ω))
This proposition tells us that as long as Ω has a constant gap between its largest
and second larges eigenvalue the sample complexity of the certification protocol has
the same scaling as the one where Ω is the target state itself. Now it depends on the
physical situation of what feasible measurement strategies Ω are. Given a set M of
feasible measurements we can single out an optimal strategy as follows.
59
Definition 4.15 (Minimax optimization):
Let ρ be a pure state and > 0. Moreover, let us assume the we have access to
a compact set of binary measurements given by the operators M ⊂ {P : 0 ≤
P ≤ 1 , Tr[P ρ] = 1}.
Then the best strategy Ω for the worst case state preparation σ is
This quantity is called minimax value and a strategy Ω where the minimum is
attained is called minimax optimal.
Such minimax optimizations are common in game theory and risk analysis.
If there are no restrictions on the measurements of a pure target state ρ, i.e.,
M = {P : 0 ≤ P ≤ 1 , Tr[P ρ] = 1}, then Ω = ρ is minimax optimal.
For number of settings with physically motivated measurement restrictions the
minimax strategy, or at least one that is close to it, has been obtained. Popular
instances include the following settings:
• Stabilizer states and two qubit states with single qubit measurements [76]
• Ground states of locally interacting Hamiltonians [78]
Here, we only outline the example of stabilizer states in more detail. Remember the
definition of stabilizer states from the box on STABs in Section 3.4.1.
Theorem 4.16 (Minimax optimal Pauli measurements for STABs [76]):
Then the minimax optimal measurement strategy for having Pauli observables
Pn as accessible measurements (see Definition 4.15) is given by measuring Si
with probability 2n1−1 . The resulting effective measurement operator Ω =
P2n −1
i=1 Pi satisfies Ω |ψ i = |ψ i and has the second largest eigenvalue
1
2n −1
2n−1 − 1
λ2 (Ω) = . (4.81)
2n − 1
where
X := {Ω ∈ conv(Pn ) : Ω |ψ i = |ψ i}
(4.83)
= conv(S) .
We argue that the minimization over conv(S) can be replaced by a minimization over
conv(S 0 ) with S 0 := S \ {1}. To see this, observe that if Ω = (1 − α)Ω0 + α1 for
60
α ∈ [0, 1] then ν(Ω) ≤ ν(Ω0 ). Then minimax optimal measurement strategies are of
the form n
2X −1
Ω= µj Pi . (4.84)
i=1
Ω = 1 ⊕ Ω̃ (4.86)
and, hence
λ2 (Ω) =
Ω̃
op . (4.87)
Moreover, Tr[Ω̃] = 2n−1 −1. The operator Ω̃ with the minimal norm
Ω̃
op under this
constraint is of the form Ω̃ = a1 for a > 0. Taking the trace of that equality, solving
for a and denoting the orthogonal projector of |ψ ihψ | by |ψ ihψ | := 1 − |ψ ihψ |
⊥
yields
2n−1 − 1
Ω = |ψ ihψ | + n (4.88)
⊥
|ψ ihψ |
2 −1
with
2n−1 − 1
λ2 (Ω) = n . (4.89)
2 −1
In order to finish the proof we show that Ω ∈ conv(S), i.e., that this choice of Ω is
indeed compatible with (4.84).
We write the stabilizer state |ψ ihψ | as combination of the stabilizers (see (3.33))
and use that Sj = 2Pj − 1,
n
1
2X −1
|ψ ihψ | = n 1 + Sj
2 j=1
n
1
2X −1
= n 1+2 Pj − (2n − 1)1 (4.90)
2 j=1
n
1 1
2X −1
= −1 1+ Pj .
2n−1 2n−1 j=1
n
2X −1
Pj = (2n − 1) |ψ ihψ | + (2n−1 − 1) |ψ ihψ | (4.91)
⊥
j=1
and, hence
n
1 2n−1 − 1
−1
2X
= + (4.92)
⊥
P |ψ ihψ | |ψ ihψ | ,
2n − 1 j=1 2n − 1
j
which is the Ω from (4.88) and also the measurement strategy from the theorem
statement.
61
Corollary 4.17 (Sampling complexity [76]):
Let us call the outcome corresponding to Pi “pass” and the one corresponding
to 1 − Pi “fail”. Then Protocol 4.11 is an -certification test of ρ w.r.t.
infidelity from nσ independent samples for
ln(1/δ)
nσ ≥ 2 (4.93)
with confidence 1 − δ. Moreover, ρ is accepted with probability 1.
ln(1/δ)
nσ ≥ (4.94)
ν(Ω)
is sufficient, where
ν(Ω) = 1 − λ2 (Ω)
2n−1 − 1
=1− (4.95)
2n − 1
2n−1
= .
2n − 1
This results in
2n − 1 ln(1/δ)
nσ ≥ . (4.96)
2n−1
62
Part II
Quantum dynamics
Quantum stats can be fully reconstructed tomographically with an essentially opti-
mal number of measurements [37, 55], see Chapter 3. In particular, the reconstruction
error can be bounded in the operationally relevant norm, the trace norm. Similar re-
sults hold for quantum state certification, where one has implemented a targeted state
and is tasked to certify the correctness of the implementation up to some trace norm
error, see Chapter 4.
In Part II of these note, we aim to find similar results for quantum processes. In
principle, one can use the Choi-Jamiołkowski isomorphism (see Section 5.3) to map a
quantum process to a quantum state and apply the characterization and verification
methods from Part I. However, this approach has drawbacks: (i) it might result in
measurements that are practically infeasible and typically require maximally entan-
gled states as available resource and (ii) the error is controlled in the “wrong” norm.
In a similar way as the trace norm is an operationally motivated norm for quantum
states (Proposition 4.8) the so-called diamond norm is an operationally motivated
norm for quantum processes, see Section 5.5. There is even a third potential problem:
To (partially) characterize a quantum process one needs to prepare quantum states,
to evolve them under the process, and to measure the final states. In this task so-
called state preparation and measurement (SPAM) errors can be a serious obstacle
for reliable characterization. Therefore, the development of quantum characterization
and verification methods for quantum processes requires a significant amount of extra
work.
As we will see in Chapter 6, one is able to estimate a weaker error measure than
the diamond norm efficiently using randomized benchmarking. This is typically done
in a way that is robust against SPAM errors. In Chapter 7 we will discuss state-of-
the-art methods for quantum process tomography. In particular, we will see that one
can reconstruct the most relevant part of a quantum process (the unital part) using
similar measurements as in randomized benchmarking. However, the error measure
is again a weaker one than the diamond norm. In Chapter 8 we will discuss gate
set tomography. Here, one reconstruct a full gate set from measurement data and is
able to estimate gate errors in the diamond norm. A disadvantage of this method is
that it comes at the expense of a large overhead in the measurement effort and the
amount of classical post-processing. Further improving those methods and finding
lower bounds on the required measurement and computational effort is still subject
to ongoing research, which makes this field particularly exciting.
5. Preliminaries II
In this chapter we introduce some more preliminaries required to discuss the character-
ization and validation of quantum processes. Let us start with a proper introduction
to quantum processes.
63
5.1. Quantum processes
A quantum processes is given by linear map taking density operators to density op-
erators and satisfying certain properties. Therefore, we start with introducing some
notation related to linear maps between operator spaces.
Let H, K be Hilbert spaces.
• The vector space of linear maps from L(H) to L(K) is denoted by L(H, K) :=
L(L(H), L(K)). We set L(H) := L(H, H) and denote the identity by idH :=
1L(H) ∈ L(H). Often we just write id when it is clear from the context what H
is.
• A map Φ ∈ L(H, K) is called Hermiticity-preserving if
positive if
Φ(Pos(H)) ⊂ Pos(K) , (5.2)
and trace-preserving if
Tr[Φ(X)] = Tr[X] (5.3)
for all X ∈ L(H). Note that positive maps are also Hermiticity-preserving.
The map Φ is called completely positive (CP) if Φ ⊗ 1L(H0 ) is positive for all
Hilbert spaces H0 with identity map 1L(H0 ) ∈ L(H0 ). The set of CP maps is
denoted by CP(H, K) ⊂ L(H, K) and forms a convex cone. We set CP(H) :=
CP(H, H).
• A completely positive and trace preserving (CPT) map is also called a quantum
channel or just channel. The subset of CPT maps is denoted by CPT(H, K) ⊂
CP(H, K) and forms a convex set. Again, we set CPT(H) := CPT(H, H).
• A map Φ ∈ L(H, K) is called unital if Φ(1H ) = 1K . Note that Φ is trace-
preserving iff Φ† is unital.
So, essentially, quantum channels are maps that take density matrices to density
matrices even when applied to a part of a larger system. Usual unitary dynamics is
of this form:
Example 5.1 (Unitary channels):
U(X) := U XU † . (5.4)
These maps are quantum channels and are called unitary (quantum) channels.
Unitary channels are invertible and the inverses are again unitary channels.
64
Vectors |ψ i = ψ and ψ = hψ |
(
ψ
Tensor product |ψ i |φ i =
φ
Operator A = A =
∼ A
Flip operator
Superoperator X (A) = X A
X
Tensor product X ⊗ Y =
Y
Figure 5.1.: Basic examples for tensor network diagrams: A vector |ψ i in some vector space V ,
vectorization of an operator A, a map X ∈ L(V ) applied to that vectorization, the non-vectorized
version of the flip operator F, and a tensor product of two maps on operators X and Y.
L(V ) = V ⊗ V ∗ , (5.5)
65
K H∗ C K H TrK H
X 7→ X 7→ X
K∗ H K∗ H∗ H∗
Figure 5.2.: The Choi-Jamiołkowski isomorphism and partial trace in terms of tensor network
diagrams (explained in Figure 5.1).
Left: Order-4 tensor X ∈ L(H, K) as a map from L(H) ∼ = H ⊗ H∗ to L(K) =∼ K ⊗ K∗ .
Middle: Its Choi-matrix C(X ) as an operator on K ⊗ H.
Right: Partial trace Tr1 [C(X )] of the Choi matrix C(X ). This operator corresponds to the
functional ρ 7→ Tr[X (ρ)].
where the natural isomorphism (5.5) is denoted by “=”, the isomorphism of changing
the order of the vector spaces by “∼=”, and the last one refers to the conjugate linear
Hilbert space isomorphism H ∼ = H∗ composed with complex conjugation isomorphism;
see Figure 5.2 for a tensor network representation of C. The Choi-Jamiołkowski iso-
morphism can be written explicitly. In terms of the unnormalized maximally entan-
gled state
dim(H)
X
|1 i = |i, i i ∈ H ⊗ H (5.8)
i=1
66
(v) X is a CP map iff there are operators K1 , . . . , Kr ∈ L(H, K), where
r = rank(C(X )) so that
r
X
X (A) = Ki AKi† (5.11)
i=1
for all A ∈ L(H). Moreover, show that X is a CPT map iff (5.11) holds
Pr
with i=1 Ki† Ki = 1.
(vi) X is a CPT map iff it has a Stinespring dilation, i.e., there is a unitary
U ∈ U((K ⊗ H)⊗2 ) so that X (ρ) = Tr2,3 (U ρ ⊗ |00 ih00 | U † ), where Tr2,3
is the partial trace over the space the |00 i state is from.
The theorem tells us that X is a quantum channel iff J(X ) is a density matrix with
the reduction to H (obtained by tracing over K) being a maximally mixed state. The
so-called Choi state of a channel X is
where
1
φ+ := |1 ih1 | ∈ S(H ⊗ H) (5.14)
dim(H)
is a maximally entangled state, i.e., has the strongest bipartite quantum correlations
possible in a precise sense. In particular, the Choi state can be prepared by applying
the channel to this state.
Also note that not every bipartite state corresponds to a channel. Indeed, the
Choi-Jamiołkowski isomorphism is an isomorphism of convex cones, C : CP(H, K) →
Pos(K ⊗ H) but CPT(H, K) is mapped to a proper subset of S(K ⊗ H). The reason
is that the trace-preservation constraint of channels corresponds to dim(H)2 many
equalities whereas the trace constraint of states is just one equality.
Exercise 5.3 (Depolarizing channel):
67
that vector space of linear maps L(H) is also equipped with a canonical inner product
(the Hilbert-Schmidt inner product for superoperators) given by
hX , Yi = Tr[X † , Y] (5.16)
with
xi,j = hEi , X (Ej )i = Tr[Ei† X (Ej )] . (5.21)
For H = K one typically uses basises with E0 ∝ 1. Moreover, it is common to
use a different normalization convention. For qubits, i.e. d = 2n , it is common to use
Pauli strings 1 = P0 , P1 , . . . , Pd2 −1 with kPi kop = 1 for all i = 0, . . . , d2 − 1. Then
X ∈ L(Cd ) can be written as
2
dX −1
X (ρ) = χX
i,j Pi ρPj (5.22)
i,j=0
Complement the list in Theorem 5.2 by expressing the CPT conditions in terms
of the χ process matrix (5.23).
68
operator since in that case X (A) = 12 (X (A) + X (A)† ) = X 12 (A + A† ) . Moreover,
due to convexity, we have for any X ∈ L(H, K)
kX k1→1 = sup kX ⊗ id( |ψ ihφ |)k1 : k |ψ ik`2 = k |ψ ik`2 = 1 , (5.25)
In order to distinguish quantum channels one can use ancillary systems. This
motivates the definition of the diamond norm as a so-called CB-completion of the
(1 → 1)-norm, which is justified by Theorem 5.3 below. To begin with, we define
diamond norm by
kX k := kX ⊗ idH k1→1 . (5.27)
Note that the diamond norm inherites the above mentioned properties from the (1 →
1)-norm.
Theorem 5.3 (Complete boundedness and (sub)multiplicativity):
where the supremum is taken over all finite dimensional Hilbert spaces H0 .
Moreover,
Proof. For the proof we refer e.g. to [17, Chapter 3.3] or recommend to prove it as an
exercise.
The theorem tells us that the diamond norm is the maximum distinguishability of
quantum channels in the following sense. Let Φ = X − Y with X , Y ∈ CPT(H, K)
be the difference of two quantum channels. One can prepare copies of a state ρ ∈
S(H⊗H0 ) and apply either X or Y to the parts on H to obtain states on K⊗H0 . Then
Proposition 4.8 tells us that 21 kΦ ⊗ idH0 (ρ)k1 is the distinguishability of the output
states. Taking the supremum over all (pure) states ρ yields the distinguishability of X
and Y, which is given by the diamond norm distance 21 kX − Yk . In particular, the
theorem tells us that optimal distinguishability can be obtained by choosing H0 = H
in a similar sense as it can be detected when a map is not CP just using H0 = H.
Another way to distinguish quantum states is to prepare their Choi states and dis-
tinguish them, as characterized by Proposition 4.8 via the trace norm. The following
statements provides a relation of the two notions of distinguishability of quantum
channels.
Proposition 5.4 (Diamond norm and trace norm):
69
Remark: The upper bound can be improved. For a Hermitian-preserving map X ∈
L(H, K) the improved bound implies [97, Corollary 2]
Proof of Proposition 5.4. We prove the proposition in terms of C(X ) = dim(H) J(X ).
It holds that
kX k = sup k(1 ⊗ A) C(X )(1 ⊗ B)k1 : kAkF = kBkF = 1 , (5.33)
as p
can be seen from (5.25) and using tensor network diagrams. Choosing A = B =
1/ dim(H) (corresponding to the maximally entangled state (5.14)) establishes the
lower bound. The upper bound follows using Hölder’s inequality,
Show that the bounds in Proposition 5.4 are tight, i.e., that there are X , Y ∈
L(H, K) so that kJ(X )k1 = kX k and kX k = dim(H) kJ(X )k1 .
These results tell us that the distinguishing quantum channels via their Choi states
is in general not optimal.
It is non-obvious how the diamond norm can actually be computed in practically.
Watrous has shown that the diamond norm can be computet efficiently via an SDP
[98]. However, for the relevant case where the map is a difference of two unitary
channels the computation is much simpler:
Proposition 5.5 (Diamond norm distance of unitary channels):
For any U, V ∈ U(d) the diamond norm distance of the corresponding unitary
channels is q
1 2
kU − Vk = 1 − dist 0, conv{λi }i∈[d] , (5.35)
2
where λi are the eigenvalues of U † V and dist( · , · ) denotes the Euclidean dis-
tance and conv( · ) the convex hull, both in the complex plane.
This proposition reflects that the diamond norm distance is a worst-case qunantity,
where here the worst-case optimization is done over the spectrum of the unitary
“difference” U † V .
Proof of Proposition 5.5. Starting with (5.25) and, e.g. by using tensor network dia-
grams or using the Choi-Jamiołkowski isomorphism, (5.10) and the vectorization rules
for matrix products (2.4), we can write the diamond norm of the channel difference
as
kU − Vk = max{k(1 ⊗ A)( |U ihU | − |V ihV |)(1 ⊗ A)k1 : kAk2 = 1}
= max{k |AU ihAU | − |AV ihAV |k1 : kAk2 = 1} (5.36)
= max
|A ihA | − |AU † V ihAU † V |
: kAk = 1
1 2
70
According to Watrous’ lecture notes [9, Example 2.3], normalized vectors |ψ i , |φ i ∈
Sd−1 ⊂ Cd satisfy
p
k |ψ ihψ | − |φ ihφ |kp = 21/p 1 − | hψ|φi |2 . (5.37)
This yields
q
1
kU − Vk = max 1 − |hA|AU † V i| : kAk2 = 1
2
2
q
= max 1 − |Tr[A2 U † V ]| : kAk2 = 1
2
q
= max 1 − |Tr[ρ U † V ]| : ρ ∈ S(Cd )
2
r (5.38)
= 1 − min |Tr[ρ U † V ]|
2
ρ∈S(Cd )
v ( )
u X 2
u X
= 1 − min
t pi λi : p ∈ [0, 1]d , pi = 1
i i
p
= 1 − dist(0, conv{λi }) .
The required further material on representation theory (Section 2.2) was also cov-
ered in Lecture 16 including an a characterization of irreps of abelian groups (Corol-
lary 2.4), an important extension of Schur’s lemma (Theorem 2.5), some more details
on Schur-Weyl duality (Theorem 2.6), and a characterization of certain invariant op-
erators (Proposition 2.7).] Lecture 16
defined by Z
M(k)
µ (X) := U ⊗k XU ⊗k † dµ(U ) (5.39)
U(d)
and we write M(k) for the k-th moment operator of the Haar measure. Then
a distribution µ on the unitary group is an unitary k-design if Mµ = M(k) .
(k)
The n-qubit Clifford group Cln ⊂ U(2n ) is the stabilizer of the Pauli group Pn
(see Section 3.4.1),
The Clifford group is a unitary 3-design but not a unitary 4-design [33, 34, 58].
The k-th moment operator of the Haar measure can be calculated using representa-
tion theory. The following identity can be seen as a generalization of Proposition 2.7
71
since M(k) (A) commute with the representations (2.24) of U (d) and (2.23) of Symk
(see e.g. [99, integration formula section]),
1 X X dλ
M(k) (A) = Tr[Aσ]σ −1 Pλ , (5.41)
k! Dλ
σ∈Symk λ`k
and we again drop the subscript µ if µ is the Haar measure. For k ≤ d one can show
that
F (k) = k! . (5.43)
Then it holds that [100, Theorem 5.4] Fµ ≥ F (k) for any finite distribution µ on
(k)
As important example, for k = 2 the only irreps are given by λ = (2) and λ = (1, 1).
It holds that d(2) = 1 = d(1,1) . Thanks to P(2) = 12 (1 + F) and P(1,1) = 12 (1 − F)
it turns out that D(2) = d(d+1)
1
and D(1,1) = d(d−1)
1
, where F denotes again the flip
operator.
A straightforward simplification of (5.41) yields
6. Randomized benchmarking
Randomized benchmarking (RB) can be used to practically measure the average error
rate of targeted quantum channels. It does not quantify the operationally best moti-
vated error measure –the diamond norm distance– but it can be practically measured
in a comparatively cheap way that is robust against state preparation and measure-
ment (SPAM) errors. The original version of RB [32, 102–105] quantifies the average
error of Clifford gates. With interleaved randomized benchmarking [106, 107] one can
measure in a similar way the average gate fidelity of a single Clifford gate. There are
several extensions [108–116] of these basic setups.
72
be Z
Favg (X , Y) := dψ hX ( |ψ ihψ |), Y( |ψ ihψ |)i (6.1)
where the integral is taken according to the uniform Haar-invariant probability mea-
sure on state vectors. So, the average gate fidelity Favg (X , Y) is a measure of closeness
of X and Y.
Let us list some properties of the average gate fidelity.
• For any X , Y
Favg (X , Y) = Favg (Y † X , id) . (6.2)
This motivates the definition
is called the average error rate and inherites those two properties. We set
r(X ) := 1 − Favg (X ) for X ∈ L(H).
The average gate fidelity is related to the diamond norm as follows.
Proposition 6.1 (AGF and diamond norm [109, Proposition 9]):
Proof. The proof follows from Proposition 5.4 and [117, 118]
EU ∼µ [U X (U † ρ U )U † ] = Dp (ρ) , (6.8)
where
d Favg (X ) − 1
p= (6.9)
d−1
Tr[X ] − 1
= . (6.10)
d2 − 1
73
We note that the (0, 0) component of the chi process matrix (5.23) of X is
1
χ0,0 = Tr[X ] (6.11)
d2
and that Tr[X ] real if X is Hermiticity-preserving. Often one only considers qubits
and then (6.10) is sometimes stated in terms of χ0,0 .
Proof. The map tw : L(Cd ) → L(Cd ) given by
tw(X ) := EU ∼µ [U X (U † ( · ) U )U † ] (6.12)
Tr[X ] − 1 1
Favg (X ) = + , (6.18)
d(d + 1) d
and
Tr[X ] = d(d + 1) Favg (X ) − d , (6.19)
where X ∈ L(C ) was assumed to be trace-preserving in Theorem 6.2.
d
This implies that the average gate fidelity can be connected to the cononical inner
product on L(Cd ) as [117, 118] (see also [119])
74
norm meaning that the Frobenius norm is an average case error measure as well.
Also not that for a unitary channel U ∈ CPT(Cd ) with U ∈ U(d)
Tr[U] − 1 1 | Tr[U ] |2 − 1 1
Favg (U) = + = + . (6.21)
d(d + 1) d d(d + 1) d
This equality reflects that the average gate fidelity measures how close U is to 1 on
average, where here the average is taken over it’s spectrum.
where G̃ denotes the implementation of the gate G. In the analysis of the following
protocol we will see that RB provides indeed a consistent estimate of this quantity for
the case of gate independent noise. For an extension to gate dependent noise see the
initial work by Magesan et al. [105] for a first perturbative analysis and for a more
recent and more rigorous analysis the work by Wallman [121].
Protocol 6.3 (Standard RB):
• Obtain an estimate F̂m,s of Fm,s my measuring S (s) (ρ) with the measure-
ment M a number of ns times.
• For each m = 1, . . . , mmax repeat this estimation for sequences
s(1) , . . . , s(nm ) ∈ [nG ]m and set F̂m to be the corresponding empirical
estimate of the average sequence fidelity
F̄m := Es∼[nG ]m Fm,s . (6.24)
75
• Obtain an estimate F̄ˆ of the AGF (6.22) via
dF̄ − 1
p= , (6.26)
d−1
(remember the relation (6.9) of p and Favg ).
Analysis for gate independent noise. We denote the noisy implementations of the ini-
tial state ρ, the gates Gi , and the measurement M by ρ̃, G̃i , and M̃ , respectively.
We restrict our analysis to gate-independent noise, i.e.,
for some channel Λ ∈ CPT(Cd ). Setting Csj := Gsj . . . Gs1 for j ∈ [m], this assumption
allows us to rewrite the implemented gate sequence as
with p given by (6.9), which matches the estimation (6.26). Moreover, the average
sequence fidelity is
76
Let us consider again gate independent noise Λ ∈ CPT(H) that again acts on every
gate G ∈ G, i.e., the implemented gates are
G̃ = ΛG (6.32)
G̃t = Λt Gt . (6.33)
The idea of interleaved RB is to insert G̃t after every gate in the gate sequence G (s) in
standard RB, so to interleave the sequence with multiple applications of G̃t . However,
the gate independent noise Λ also needs to be estimated.
In more detail, in interleaved RB one estimates Favg (G̃t , Gt ) by applying standard
RB twice, (i) to obtain an estimate on Favg (Λ) and (ii) to obtain an estimate on
Favg (G̃t Λ, Gt ) = Favg (Gt† G̃t Λ); see Protocol 6.4 for an RB method to achieve (i).
The idea is that one can extract Favg (G̃t , Gt ) from Favg (G̃t Λ, Gt ) once the noise
strength given by Favg (Λ) is known. In order extract and estimation of Favg (G̃t , Gt )
from these two quantities an approximation of the form
is used. Then one obtains the desired average gate fidelity by taking estimates corre-
sponding to
Favg (G̃t , Gt ) = Favg (G̃t Gt† ) = Favg (Gt† G̃t )
Favg (Gt† G̃t Λ) (6.35)
≈ .
Favg (Λ)
Interleaved RB has been improved and simplified by Kimmel et al. [110, Section 6A].
They have found a bound on the apprximation error that is tighther than the prvious
bound [106]. In terms of χ0,0 from (6.11) it reads as
XY q
0,0 − χ0,0 χ0,0 ≤ 2 (1 − χ0,0 )χ0,0 (1 − χ0,0 )χ0,0 + (1 − χ0,0 )(1 − χ0,0 ) (6.36)
χ X Y X X Y Y X Y
for any X , Y ∈ CPT((C2 )⊗n ). This bound yields bounds on the error in (6.34) via
(6.10).
Protocol 6.4 (Modified RB):
For a target gate Gt ∈ G this protocol is obtained from Protocol 6.3 by replacing
the gate sequence G (s) by
Everyting is now done w.r.t. this modified gate sequence. For instance, the last
gate is Gsm+1 := GGt .
(s) −1
Analysis. We assume the noise model given by (6.32) and (6.33) and that G is a
unitary 2-design.
It is not difficult to see that the implemented gate sequence can be written as
S̃ (s) = Λ Cs†m Φ Csm Cs†m−1 Φ Csm−1 . . . Cs†1 Φ Cs1 (6.38)
with Φ = Gt† G̃t Λ and Csi ∼ G iid. Hence, applying the same arguments as in the
analysis of the standard RB protocol yields
Es∼[nG ]m S̃ (s) = ΛDpm (6.39)
77
with the estimated average gate fidelity F̄ˆG being an estimate of
as desired.
7. Process tomography
A quantum state can be reconstructed from measurement data using quantum state
tomography (Section 3). In a similar way, quantum process tomography can be
used to fully reconstruct quantum channels from measurement data. However, more
complicated measurement setups are required for process tomography.
In the most simple version, one can use linear inversion [123] to obtain the channel’s
χ process matrix (5.23). But this can be challenging to implement and leads to
a non-optimal sampling complexity (number of invocations of the channel). If the
χ matrix is sparse, then compressed sensing 1.0 can be used [124] to dramatically
reduce the measurement effort, cp. the reconstruction program (3.65). However, in
most situations the χ matrix cannot be expected to be sparse.
Another way to process tomography is to reduce it to state tomography by preparing
the channel’s Choi state [125]. But this method requires maximally entangled states,
see (5.13), which are often difficult prepare with high fidelity.
Flammia et al. [55] presents a process tomography protocol that is based on low-
rank matrix reconstruction with random Pauli measurements. These low rank re-
covery guarantees can be applied to the channel’s Choi state, since the rank of this
matrix representation equals the Kraus rank of the original channel. On first sight,
such an approach requires the use of an ancilla in order to implement the Choi state
physically in a concrete application as in the work by Altepeter et al. [125]. However,
Ref. [55] also provides a more direct implementation of their protocol that does not
require any ancillas. Valid for multi-qubit processes this trick exploits the tensor-
product structure of (multi-qubit) Pauli operators. The demerit of this approach is
that the number of individual channel measurements required to evaluate a single
Pauli expectation value scales with the dimension of the underlying Hilbert space.
The channel version of the PSD-fit (3.75), which is later called the CPT-fit [126],
has been suggested by Baldwin et al. [53] and numerically compared to full tomog-
raphy and compressed sensing by Rodionov et al. [127]. First recover guarantees
for the CPT-fit and other CS recovery gurantees for quantum process tomography
wree proven in Ref. [126]. Here, the channel’s input states are sampled from an
(approximate) projective 4-design and the output states are measured with random
observables (approximate) unitary 4-design eigenbases.
Minimizing the diamond norm as a reguralizer in compressed sensing has been
investigated in Ref. [126, 128]. It is argued that for certain signals including low
Kraus rank quantum channels [128, 129] this recovery perform at least as well in
terms of measurement effort compared to the conventional trace-norm minimization.
Another approach to quantum process tomography is the use of RB methods pio-
neered by Kimmel et al. [110] and explained in more detail in the following sections.
The advantage of this approach is some —not yet rigorously quantified— roboustness
against SPAM errors, which is inherited from randomized benchmarking.
78
deviations of a channel X from being unital is not “seen” by average gate fidelities.
But everything else can be reconstructed. More precisely, one can learn the unital
part of X, which is given by
1
Xu (ρ) := X ρ − Tr[ρ] , (7.1)
d
from average gate fidelities [110]. The following result extends and simplifies results
by Scott [100] and Kimmel et al. [110].
Theorem 7.1 ([99, Theorem 38]):
with coefficients
cU (X ) = α Favg (U, X ) − β Tr[X (1)] , (7.3)
where α = d(d + 1)(d2 − 1) and β = d1 αd − 1 .
The average gate fidelities Favg (U, X ) can be estimated using interleaved RB [110].
Hence, RB methods can be used to tomographically reconstruct the unital part of
any quantum channel.
In the case where G is the Clifford group and X is (close to) a unitary channel one
can use compressed sensing for the reconstruction [99]. It is sufficient to subsample
the U ’s from the Clifford group to recover the unitary (and hence unital) X from
a number of average gate fidelities scaling as Õ(d2 ). This caling is clearly optimal.
However, it is still non-obvious what the sample complex in terms of the number of
invocations of the channel X is when RB is used [99].
with s ∈ [nG ]` .
The gauge freedom is given by linear invertible transformations on (M, G, ρ) that
preserve the output distributions
p(j|s) := Tr Mj G (s) (ρ) . (8.2)
79
randomized benchmarking tomography (Section 7.1). Moreover, the involved compu-
tations are challenging and do not (yet) come along with any theoretical guarantees.
On the upside, however, one can estimate diamond norm errors (Section 5.5) of the
implementation of G by optimizing over the gauge freedom. So far, this seems to be
the only quantum characterization and verification that has been used to estimate
error in diamond norm.
Bibliography
[1] S. Flammia, Characterization of quantum devices, QIP tutorial 2017, Seattle
(2017).
[2] J. Preskill, Quantum supremacy now? (2012).
[3] D. Shepherd and M. J. Bremner, Temporally unstructured quantum computa-
tion, Proc. Roy. Soc. A 465, 1413 (2009), arXiv:0809.0847.
[11] R. T. Rockafellar, Convex analysis, 2nd ed. (Princeton university press, 1970).
[12] B. Simon, Representations of finite and compact groups, 10 (Am. Math. Soc.,
1996).
[13] R. Goodman and N. R. Wallach, Representations and invariants of the classical
groups, Vol. 68 (Cambridge University Press, 2000).
[14] M. Grant and S. Boyd, in Recent advances in learning and control, Lecture
Notes in Control and Information Sciences, edited by V. Blondel, S. Boyd, and
H. Kimura (Springer-Verlag Limited, 2008) pp. 95–110, http://stanford.edu/
~boyd/graph_dcp.html.
[15] M. Grant and S. Boyd, CVX: Matlab software for disciplined convex program-
ming, version 2.1, http://cvxr.com/cvx (2014).
80
[16] Python package cvxpy, https://www.cvxpy.org/install/index.html, ac-
cessed: 2019-06-10.
[17] J. Watrous, The Theory of Quantum Information (Cambridge University Press,
2018).
[18] H. Häffner, W. Hänsel, C. F. Roos, J. Benhelm, D. Chek-Al-Kar, M. Chwalla,
T. Körber, U. D. Rapol, M. Riebe, P. O. Schmidt, C. Becher, O. Gühne, W. Dür,
and R. Blatt, Scalable multiparticle entanglement of trapped ions, Nature 438,
643 (2005), arXiv:quant-ph/0603217 [quant-ph].
[32] C. Dankert, R. Cleve, J. Emerson, and E. Livine, Exact and approximate uni-
tary 2-designs and their application to fidelity estimation, Phys. Rev. A 80,
012304 (2009), arXiv:quant-ph/0606161 [quant-ph].
[33] H. Zhu, Multiqubit clifford groups are unitary 3-designs, Phys. Rev. A 96, 062336
(2017), arXiv:1510.02619 [quant-ph].
81
[34] Z. Webb, The clifford group forms a unitary 3-design, Quantum Info. Comput.
16, 1379 (2016), arXiv:1510.02769 [quant-ph].
[35] J. Benhelm, G. Kirchmair, U. Rapol, T. Körber, C. F. Roos, and R. Blatt,
Generation of hyperentangled photon pairs, Phys. Rev. A 75, 032506 (2007).
[36] A. Klappenecker and M. Roetteler, Mutually unbiased bases are complex projec-
tive 2-designs, in Proc. IEEE International Symposium on Information Theory,
ISIT, 2005 (IEEE, 2005) pp. 1740–1744, arXiv:quant-ph/0502031 [quant-ph].
[37] M. Guta, J. Kahn, R. Kueng, and J. A. Tropp, Fast state tomography with
optimal error bounds, arXiv:1809.11162 [quant-ph].
[38] G. M. D’Ariano and P. Perinotti, Optimal data processing for quantum measure-
ments, Phys. Rev. Lett. 98, 020403 (2007), arXiv:quant-ph/0610058 [quant-ph].
[39] E. J. Candes and T. Tao, Near-optimal signal recovery from random projec-
tions: Universal encoding strategies? IEEE T Inform Theory 52, 5406 (2006),
arXiv:math/0410542 [math.CA].
[40] E. J. Candes, J. Romberg, and T. Tao, Robust uncertainty principles: exact
signal reconstruction from highly incomplete frequency information, IEEE Trans.
Inform. Theor. 52, 489 (2006), arXiv:math/0409186 [math.NA].
[41] D. L. Donoho, Compressed sensing, IEEE Trans. Inf. Th. 52, 1289 (2006).
[42] B. K. Natarajan, Sparse approximate solutions to linear systems, SIAM J.
Comp. 24, 227 (1995).
[43] D. Ge, X. Jiang, and Y. Ye, A note on the complexity of lpminimization,
Mathematical Programming 129, 285 (2011).
[44] V. Chandrasekaran, B. Recht, P. Parrilo, and A. Willsky, The convex ge-
ometry of linear inverse problems, Found. Comput. Math. 12, 805 (2012),
arXiv:1012.0621 [math.OC].
82
[52] R. Ahlswede and A. Winter, Strong converse for identification via quantum
channels, IEEE Trans. Inform. Theory 48, 569 (2002), arXiv:quant-ph/0012127
[quant-ph].
[56] Y.-K. Liu, Universal low-rank matrix recovery from Pauli measurements, Adv.
Neural Inf. Process. Syst. 24, 1638 (2011), arXiv:1103.2816.
[57] R. Kueng and D. Zhu, H. an, Low rank matrix recovery from Clifford orbits,
arXiv:1610.08070 [cs.IT].
[58] H. Zhu, R. Kueng, M. Grassl, and D. Gross, The Clifford group fails gracefully
to be a unitary 4-design, arXiv:1609.08172 [quant-ph].
[59] S. Mendelson, Learning without concentration, J. ACM 62, 21:1 (2015),
arXiv:1401.0304 [cs.LG].
[60] T. Sugiyama, P. S. Turner, and M. Murao, Precision-guaranteed quantum to-
mography, Phys. Rev. Lett. 111, 160406 (2013), arXiv:1306.4191 [quant-ph].
[61] J. A. Smolin, J. M. Gambetta, and G. Smith, Efficient method for computing the
maximum-likelihood quantum state from measurements with additive gaussian
noise, Phys. Rev. Lett. 108, 070502 (2012), arXiv:1106.5458 [quant-ph].
[62] J. A. Tropp, User-friendly tail bounds for sums of random matrices, Found.
Comput. Math. 12, 389 (2012), arXiv:1004.4389 [math.PR].
[63] S. T. Flammia and Y.-K. Liu, Direct fidelity estimation from few Pauli mea-
surements, Phys. Rev. Lett. 106, 230501 (2011), arXiv:1104.4695 [quant-ph].
[64] T. M. Cover and J. A. Thomas, Elements of information theory (John Wiley
and Sons, New York, 2012).
[65] Z. Hradil, J. Řeháček, J. Fiurášek, and M. Ježek, 3 maximum-likelihood method-
sin quantum mechanics, in Quantum State Estimation, Lecture Notes in Physics
No. 649, edited by M. Paris and J. Řeháček (Springer Berlin Heidelberg, 2004)
pp. 59–112.
83
[70] P. Faist and R. Renner, Practical and reliable error bars in quantum tomography,
Phys. Rev. Lett. 117, 010404 (2016), arXiv:1509.06763 [quant-ph].
[71] C. Granade, C. Ferrie, I. Hincks, S. Casagrande, T. Alexander, J. Gross,
M. Kononenko, and Y. Sanders, QInfer: Statistical inference software for quan-
tum applications, Quantum 1, 5 (2017), arXiv:1610.00336 [quant-ph].
[72] C. Granade, C. Ferrie, and S. T. Flammia, Practical adaptive quantum tomog-
raphy, New J. Phys. 19, 113017 (2017), arXiv:1605.05039 [quant-ph].
[73] C. A. Fuchs and J. van de Graaf, Cryptographic distinguishability measures for
quantum mechanical states, IEEE Trans. Inf. Th. 45, 1216 (1999), arXiv:quant-
ph/9712042 [quant-ph].
[74] M. P. da Silva, O. Landon-Cardinal, and D. Poulin, Practical characterization
of quantum devices without tomography, Phys. Rev. Lett. 107, 210404 (2011),
arXiv:1104.3835 [quant-ph].
[75] H. Pashayan, J. J. Wallman, and S. D. Bartlett, Estimating Outcome Prob-
abilities of Quantum Circuits Using Quasiprobabilities, Phys. Rev. Lett. 115,
070501 (2015), arXiv:1503.07525 [quant-ph].
[76] S. Pallister, N. Linden, and A. Montanaro, Optimal Verification of Entan-
gled States with Local Measurements, Phys. Rev. Lett. 120, 170502 (2018),
arXiv:1709.03353 [quant-ph].
[77] H. Zhu and M. Hayashi, Efficient verification of pure quantum states with ap-
plications to hypergraph states, arXiv:1806.05565 [quant-ph].
[78] Y. Takeuchi and T. Morimae, Verification of many-qubit states, Phys. Rev. X
8, 021060 (2018), arXiv:1709.07575 [quant-ph].
[79] Z. Li, Y.-G. Han, and H. Zhu, Efficient verification of bipartite pure states,
arXiv:1901.09783 [quant-ph].
[80] X.-D. Yu, J. Shang, and O. Gühne, Optimal verification of general bipartite
pure states, arXiv:1901.09856 [quant-ph].
[81] K. Wang and M. Hayashi, Optimal verification of two-qubit pure states,
arXiv:1901.09467 [quant-ph].
[82] M. Cramer, M. B. Plenio, S. T. Flammia, R. Somma, D. Gross, S. D. Bartlett,
O. Landon-Cardinal, D. Poulin, and Y.-K. Liu, Efficient quantum state tomog-
raphy, Nat. Commun. 1, 149 (2010).
[83] D. Hangleiter, M. Kliesch, M. Schwarz, and J. Eisert, Direct certification
of a class of quantum simulations, Quantum Sci. Technol. 2, 015004 (2017),
arXiv:1602.00703 [quant-ph].
[84] L. Aolita, C. Gogolin, M. Kliesch, and J. Eisert, Reliable quantum certification
of photonic state preparations, Nat. Commun. 6, 8498 (2015), arXiv:1407.4817
[quant-ph].
[85] M. Gluza, M. Kliesch, J. Eisert, and L. Aolita, Fidelity witnesses for fermionic
quantum simulations, Phys. Rev. Lett. 120, 190501 (2018), arXiv:1703.03152
[quant-ph].
[86] G. Tóth, W. Wieczorek, D. Gross, R. Krischek, C. Schwemmer, and H. Wein-
furter, Permutationally invariant quantum tomography, Phys. Rev. Lett. 105,
250403 (2010), arXiv:1005.3313 [quant-ph].
[87] T. Moroder, P. Hyllus, G. Tóth, C. Schwemmer, A. Niggebaum, S. Gaile,
O. Gühne, and H. Weinfurter, Permutationally invariant state reconstruction,
New J. Phys. 14, 105001 (2012), arXiv:1205.4941 [quant-ph].
84
[88] C. Schwemmer, G. Tóth, A. e. Niggebaum, T. Moroder, D. Gross, O. Gühne,
and H. Weinfurter, Experimental comparison of efficient tomography schemes
for a six-qubit state, Phys. Rev. Lett. 113, 040503 (2014), arXiv:1401.7526
[quant-ph].
[89] A. Kalev, A. Kyrillidis, and N. M. Linke, Validating and certifying stabilizer
states, Phys. Rev. A 99, 042337 (2019), arXiv:1808.10786 [quant-ph].
[90] C. Bădescu, R. O’Donnell, and J. Wright, Quantum state certification,
arXiv:1708.06002 [quant-ph].
85
[106] E. Magesan, J. M. Gambetta, B. R. Johnson, C. A. Ryan, J. M. Chow, S. T.
Merkel, M. P. da Silva, G. A. Keefe, M. B. Rothwell, T. A. Ohki, M. B. Ketchen,
and M. Steffen, Efficient measurement of quantum gate error by interleaved ran-
domized benchmarking, Phys. Rev. Lett. 109, 080505 (2012), arXiv:1203.4550
[quant-ph].
[107] J. P. Gaebler, A. M. Meier, T. R. Tan, R. Bowler, Y. Lin, D. Hanneke, J. D. Jost,
J. P. Home, E. Knill, and D. Leibfried, Randomized benchmarking of multiqubit
gates, Phys. Rev. Lett. 108, 260503 (2012), arXiv:1203.3733 [quant-ph].
[108] J. Emerson, M. Silva, O. Moussa, C. Ryan, M. Laforest, J. Baugh, D. G. Cory,
and R. Laflamme, Symmetrized characterization of noisy quantum processes,
Science 317, 1893 (2007).
[109] J. J. Wallman and S. T. Flammia, Randomized benchmarking with confidence,
New J. Phys. 16, 103032 (2014), arXiv:1404.6025 [quant-ph].
[110] S. Kimmel, M. P. da Silva, C. A. Ryan, B. R. Johnson, and T. Ohki, Robust
extraction of tomographic information via randomized benchmarking, Phys. Rev.
X 4, 011050 (2014), arXiv:1306.2348 [quant-ph].
[111] J. Wallman, C. Granade, R. Harper, and S. T. Flammia, Estimating the coher-
ence of noise, New J. Phys. 17, 113020 (2015), arXiv:1503.07865 [quant-ph].
[112] J. J. Wallman, M. Barnhill, and J. Emerson, Robust characterization of loss
rates, Phys. Rev. Lett. 115, 060501 (2015), arXiv:1412.4126.
[113] J. J. Wallman, M. Barnhill, and J. Emerson, Robust characterization of leakage
errors, New J. Phys. 18, 043021 (2016), arXiv:1412.4126 [quant-ph].
[114] A. W. Cross, E. Magesan, L. S. Bishop, J. A. Smolin, and J. M. Gambetta,
Scalable randomised benchmarking of non-clifford gates, npj Quant. Inf. 2, 16012
(2016), arXiv:1510.02720 [quant-ph].
[115] A. Carignan-Dugas, J. J. Wallman, and J. Emerson, Characterizing uni-
versal gate sets via dihedral benchmarking, Phys. Rev. A 92, 060302 (2015),
arXiv:1508.06312 [quant-ph].
[116] E. Onorati, A. H. Werner, and J. Eisert, Randomized benchmarking for indi-
vidual quantum gates, arXiv:1811.11775 [quant-ph].
[117] M. Horodecki, P. Horodecki, and R. Horodecki, General teleportation channel,
singlet fraction, and quasidistillation, Phys. Rev. A 60, 1888 (1999).
[118] M. A. Nielsen, A simple formula for the average gate fidelity of a quantum
dynamical operation, Phys. Lett. A 303, 249 (2002), quant-ph/0205035.
[119] R. Kueng, D. M. Long, A. C. Doherty, and S. T. Flammia, Comparing ex-
periments to the fault-tolerance threshold, Phys. Rev. Lett. 117, 170502 (2016),
arXiv:1510.05653 [quant-ph].
[120] E. Magesan, J. M. Gambetta, and J. Emerson, Characterizing quantum gates
via randomized benchmarking, Phys. Rev. A 85, 042311 (2012), arXiv:1109.6887.
[121] J. J. Wallman, Randomized benchmarking with gate-dependent noise, Quantum
2, 47 (2018), arXiv:1703.09835 [quant-ph].
[122] R. Harper, I. Hincks, C. Ferrie, S. T. Flammia, and J. J. Wallman, Statis-
tical analysis of randomized benchmarking, Phys. Rev. A 99, 052350 (2019),
arXiv:1901.00535 [quant-ph].
[123] I. L. Chuang and M. A. Nielsen, Prescription for experimental determination
of the dynamics of a quantum black box, J. Mod. Opt. 44, 2455 (1997), quant-
ph/9610001.
86
[124] A. Shabani, R. L. Kosut, M. Mohseni, H. Rabitz, M. A. Broome, M. P. Almeida,
A. Fedrizzi, and A. G. White, Efficient measurement of quantum dynamics
via compressive sensing, Phys. Rev. Lett. 106, 100401 (2011), arXiv:0910.5498
[quant-ph].
[125] J. B. Altepeter, D. Branning, E. Jeffrey, T. C. Wei, P. G. Kwiat, R. T. Thew,
J. L. O’Brien, M. A. Nielsen, and A. G. White, Ancilla-assisted quantum process
tomography, Phys. Rev. Lett. 90, 193601 (2003), quant-ph/0303038.
[126] M. Kliesch, R. Kueng, J. Eisert, and D. Gross, Guaranteed recovery of quantum
processes from few measurements, Updated reference, key: KliKueEis19.
[127] A. V. Rodionov, A. Veitia, R. Barends, J. Kelly, D. Sank, J. Wenner, J. M.
Martinis, R. L. Kosut, and A. N. Korotkov, Compressed sensing quantum pro-
cess tomography for superconducting quantum gates, Phys. Rev. B 90, 144504
(2014), arXiv:1407.0761 [quant-ph].
87