Project Aayush
Project Aayush
Aayush Karan
1 Introduction
Quantum state tomography concerns the task of learning an unknown quantum state ρ from a
series of measurements conducted on copies of the state generated by some black box apparatus.
Of primary interest is the sample complexity, or the minimum number of copies required to learn
the unknown state ρ sufficiently well. The choice of setting that makes “sufficiently well” precise
can drastically affect the sample complexity. For example, if we want to learn the density matrix
representation of ρ up to small error in trace norm (see Definition 1.5) distance, then for indepen-
dently chosen (nonadaptive) measurements, the sample complexity is necessarily at least cubic in
the dimension of ρ [3]. On the other hand, if we want to accurately predict expectation values of
randomly drawn binary measurements on ρ, it turns out that the sample complexity is logarithmic
in the dimension of ρ [2].
For this paper, we are concerned with the online learning setting. In online learning, a learner
receives a sequence of data x1 , x2 , · · · over a period of time; at timestep t, a prediction is made and
environmental feedback is returned. In this setting, as opposed to directly limiting some notion of
sample complexity, we would like to limit the number of mistakes, which occur when the feedback
on a prediction violates some error threshold. Online learning algorithms with a finite (non-time
dependent) limit on mistakes is called a mistake-bounded learning algorithm. For learning quantum
states online, mistakes correspond to high error in estimating expectation values of measurement
predictions. As we will see shortly, we can actually design a mistake-bounded learning algorithm
with mistake bound that is logarithmic in the dimension of the quantum state [1]. Before formalizing
online quantum learning, we introduce some notation and prerequisite mathematical knowledge.
1.1 Preliminaries
1.1.1 Positive Semidefinite Matrices
In this section, let us fix M, N, and P to be arbitrary complex Hermitian matrices of dimension d.
This choice is to avoid description redundancy in what follows.
Definition 1.1. M is said to be positive semidefinite (PSD) if all its (real) eigenvalues are non-
negative. A shorthand notation to designate M as PSD is to write M ⪰ 0. For given M, N , we
write M ⪰ N ⇐⇒ M − N ⪰ 0.
Definition 1.2. The spectrum of M , written as Spec(M ), is the multiset of all (real) eigenvalues
of M .
Lemma 1.3. Suppose we have M, N ⪰ 0. Then Tr(M N ) ≥ 0.
Proof. Due to the well-known Cholesky decomposition, we can write N = SS † , where S is pos-
itive semidefinite and Hermitian. Notice S † M S ⪰ 0, since for any vector v ∈ Cd , v † S † M Sv =
(Sv)† M (Sv) ≥ 0 as M ⪰ 0. Hence, Tr(M N ) = Tr(S † M S) ≥ 0, as desired.
1
Corollary 1.4. Suppose P ⪰ 0 and M ⪰ N . Then Tr(M P ) ≥ Tr(N P ).
Proof. Apply Lemma 1.3 to P and M − N .
√
Definition 1.5. The trace norm of M is given by ||M ||Tr = Tr M M † ; in other words, the
trace norm is sum of the absolute values of the eigenvalues of M .
Lemma 1.6. If the trace norm of M satisfies ||M ||Tr ≤ 1, then M ⪯ 1, where 1 denotes the
identity matrix.
Proof. Since the trace norm of M is the sum of the absolute values of its eigenvalues, it follows
that all the eigenvalues of M have absolute value less than 1. Now 1 and M are simultaneously
diagonalizable, so the eigenvalues of 1 − M are of the form 1 − λ for λ ∈ Spec(M ). But |λ| ≤ 1 =⇒
1 − λ ≥ 0 for all such λ, so it follows that M ⪯ 1, as desired.
Lemma 1.7. Suppose the trace norm of M satisfies ||M ||Tr ≤ 1. Then
exp(−M ) ⪯ 1 − M + M 2 .
Proof. Let D be the diagonalization of M , so that M = U DU † for some unitary matrix U . Notice
that exp(−D) ⪯ 1 − D + D2 is equivalent to proving the inequality on the real eigenvalues λ so
that exp(−λ) ≤ 1 − λ + λ2 for |λ| ≤ 1. The latter holds (see Appendix) whence the former does as
well, so reconjugating both sides of the relation on D by U and U † finishes the proof of the lemma,
since U exp (−D)U † = exp(−U DU † ) = exp(−M ) and U D2 U † = M U DU † = M 2 .
Proposition 1.8 (Golden-Thompson Inequality; see [7]). The following inequality on M, N holds:
1.2 Framework
With the basic preliminaries in place, we can now formalize the quantum online learning setting.
For the remainder of this paper, fix a positive integer d and an arbitrary unknown d-dimensional
quantum state ρ ∈ D(d). The online learner maintains a sequence of estimates {ωt }t≥1 of ρ over
time, with all ωt ∈ D(d), the goal being to predict any measurement expectation over the true
state ρ sufficiently accurately. Fix error parameter ε ∈ (0, 1). For times t = 1, 2, 3, · · · , the learner
receives binary measurement operator Et from the environment, with the prediction being Tr(Et ωt ),
which can be calculated since both Et and ωt are known. The environment responds by outputting
an approximation bt of Tr(Et ρ) such that
ε
|bt − Tr(Et ρ)| ≤ . (1)
3
2
The means by which the approximation bt is obtained is typically relegated to an oracle [1]; however,
one simple methodology is to simply take the empirical average of several copies of ρ.
A prediction is a mistake if |Tr(Et ωt ) − Tr(Et ρ)| > ε, and we would like to bound the number of
mistakes our learning scheme makes so that after some time T , our estimates {ωt }t≥T are guaranteed
to satisfy |Tr(Eωt ) − Tr(Eρ)| ≤ ε for any binary measurement operator E. The key result this
paper focuses on is the following mistake bound, due to [1].
Theorem 1.11. Let ρ be a d-dimensional quantum state. Then for any sequence of binary mea-
surement operators E1 , E2 , · · · in an online learning
setting,
there exists an algorithm constructing
log(d)
estimates sequence {ωt }t≥1 making at most O ε2
mistakes, given environmental feedback se-
ε
quence {bt }t≥1 satisfying |bt − Tr(Et ρ)| ≤ 3 for all t.
The subsequent exposition and proof of this theorem roughly follows [1] and [3].
The MMW algorithm works as follows. Fix update hyperparameter β ∈ (0, 1). Suppose further
that for each timestep t = 1, 2, · · · , given the learner prediction and environmental feedback, one
can define a loss matrix Lt , where Lt is a d-dimensional Hermitian matrix such that ||Lt ||Tr ≤ 1.
Initialize weight matrix W1 = 1 to the identity matrix. Then at time t, we have the following
update rule:
Lemma 2.1. The weight matrix Wt is positive semidefinite for all times t.
Pt−1
Proof. Fix time t.PObserve that i=1 Li is Hermitian as it is the sum of individual Hermitian
t−1 d d
matrices. If Spec i=1 Li = {λi }i=1 , then Spec(Wt ) = {exp (−βλi )}i=1 ; in other words, Wt has
all eigenvalues nonnegative. It follows that Wt is PSD for all times t, as desired.
Corollary 2.2. For all times t, the estimate ωt is contained in D(d) and is hence a valid d-
dimensional quantum state.
Tr(Wt )
Proof. Fix time t. We can quickly confirm that Tr(ωt ) = Tr(Wt ) = 1, so ωt has unit trace. From
Wt
Lemma 2.1 Wt ⪰ 0, so Tr(Wt ) ⪰ 0. Hence, ωt is PSD with unit trace =⇒ ωt ∈ D(d) for all times
t, as desired.
3
Theorem 2.3. (See [3]) For any time T , the estimates {ωt }t≤T of the MMW algorithm satisfy
T T T
!
X X X log (d)
Tr(Lt ωt ) ≤ λmin Lt +β Tr(L2t ωt ) + , (3)
β
t=1 t=1 t=1
where λmin is a function that outputs the minimum eigenvalue of a Hermitian matrix.
Proof.
We
prove this
inequality by providing upper and lower bounds on the quantityTr(WT +1) =
Tr exp −β i=1 Li . For the lower bound, since Ti=1 Li is Hermitian, let Spec
PT P PT
i=1 Li =
{λi }di=1 . Then clearly
T d T
!! !!
X X X
Tr exp −β Li = exp (−βλi ) ≥ exp −βλmin Lt , (4)
i=1 i=1 t=1
P
where the former equality is due to the fact that Tr(exp M ) = λ∈Spec(M ) exp(λ). The latter
inequality holds since exp(−β · λi ) ≥ 0 for all i.
using Lemma 1.8. Now, since ||βLt ||Tr ≤ ||Lt ||Tr ≤ 1, from Lemma 1.7 we have that
Tr (exp (−βLt )Wt ) ≤ Tr Wt − βLt Wt + β 2 L2t Wt = Tr(Wt ) 1 − βTr (Lt ωt ) + β 2 Tr L2t ωt . (6)
By convexity of the exponential function, ex ≥ x + 1 for all x ∈ R; applying this to the RHS of (6),
it follows that
Tr(Wt+1 )
≤ exp −βTr (Lt ωt ) + β 2 Tr L2t ωt .
(7)
Tr(Wt )
Multiplying the recursive relation (7) for all times 1 ≤ t ≤ T and noting that Tr(W1 ) =Tr(1) = d,
we obtain
T T
!
X X
Tr (Lt ωt ) + β 2 Tr L2t ωt .
Tr(WT +1 ) ≤ d · exp −β (8)
t=1 t=1
Combining the lower bound on Tr(WT +1 ) in (4) with the upper bound in (8), we have
T T T
!! !
X X X
Tr (Lt ωt ) + β 2 Tr L2t ωt ,
exp −βλmin Lt ≤ d · exp −β
t=1 t=1 t=1
4
3 Logarithmic Mistake Bound
To obtain a mistake bound in the quantum online learning setting, it remains to construct loss
matrices Lt for each time t such that Lt is Hermitian and satisfies ||Lt ||Tr ≤ 1. For all times t, let
us define the convex loss function
ℓt (x) = |x − bt |, (9)
where bt is the feedback satisfying (1) provided by the environment. Then given learner prediction
Tr(Et ωt ), the loss of the prediction at time t is given by ℓt (Tr(Et ωt )).
Proposition 3.1. The loss function ℓt (x) as defined in (9) satisfies ℓ′t (x)(x − z) ≥ ℓt (x) − ℓt (z) for
any x, z ∈ R.
Proof. See Theorem 2.1.3 in [3] for full details. This inequality is due to the global convexity of
ℓt (x), as it states that the slope of a line through two points on the graph of ℓt (x) is bounded
between the slopes of the tangent lines at the endpoints.
Definition 3.2. For a sequence of learner estimates {ωt }t≤T and environment-provided binary
measurement operators {Et }t≤T , the regret over T > 0 rounds of learning is given by
T
X T
X
R(T ) = ℓt (Tr(Et ωt )) − min ℓt (Tr(Et φ)) .
φ∈D(d)
t=1 t=1
Lemma 3.3. Let {ωt }t≥1 be the estimates of ρ chosen according to the MMW algorithm with loss
function Lt = ℓ′t (Tr(Et ωt ))Et for all times t. Then the regret over T rounds satisfies
log(d) ε
R(T ) ≤ 4 +T . (10)
ε 4
Proof. First, note that from (9), we have ℓ′t (x) = |x−b
x−bt
t|
=⇒ |ℓ′t (x)| = 1. As a result, ||Lt ||Tr =
||Et ||Tr ≤ 1 using Definition 1.10; moreover, Et Hermitian implies Lt is Hermitian, so the defined
loss matrices can be applied in the context of the MMW algorithm.
Now, given (3) holds, we can rewrite its RHS in a manner that evokes the regret formula. Notice
that for any Hermitian matrix M , λmin (M ) = minφ∈D(d) Tr(M φ), with equality holding when φ is
P
T
the outer product of an eigenvector corresponding to λmin (M ) with itself. Then λmin t=1 t =
L
minφ∈D(d) t=1 Tr(Lt φ). Moreover, since ||Lt ||Tr ≤ 1 =⇒ Lt ⪯ 1 by Lemma 1.6, using Corollary
PT
whence
T T
X X log(d)
ℓ′t (Tr(Et ωt ))Tr(Et ωt ) − min ℓ′t (Tr(Et ωt ))Tr(Et φ) ≤ βT + .
φ∈D(d) β
t=1 t=1
5
so it follows that
T T
X X log(d)
R(T ) = ℓt (Tr(Et ωt )) − min ℓt (Tr(Et φ)) ≤ βT + .
φ∈D(d) β
t=1 t=1
ε
Setting β = 4 gives the desired inequality (10).
With this lemma in place, we can finally prove the main theorem of the paper.
Proof of Theorem 1.11. We consider a slightly more sophisticated modification of the MMW al-
gorithm strategy for generating estimates ωt for ρ, inspired by conservative mistake-bounded algo-
rithms [9], which only update predictions when a mistake occurs. Set T = {t|ℓt (Tr(Et ωt )) > 2ε 3 },
where T is updated with a new time sequentially whenever the loss exceeds 2ε 3 . For t ∈
/ T , since
|bt − Tr(Et ρ)| ≤ 3ε , the triangle inequality implies |Tr(Et ωt ) − Tr(Et ρ)| ≤ ε; in other words, for
t∈/ T , we are guaranteed to not have made a mistake.
Hence, our sequence of estimates {ωt }t≥1 is such that when t ∈ / T , the running weight matrix
satisfies Wt+1 = Wt , so ωt+1 = ωt . When t ∈ T , we update according to (2) but restricted only to
T:
X Wt+1
Wt+1 = exp −β Li , ωt+1 = .
Tr(Wt+1 )
i∈T ,i≤t
Indeed, since the loss is received after the estimate-based prediction is revealed to the environment,
if the loss exceeds the 2ε 3 threshold the next estimate is updated. This modification is just a
reindexing over T , so our previous results on the MMW algorithm still apply. Now, let us run this
learning procedure until some ′ |{1, · · · , T } ∩ T | be the number of times an
P time T , and let T = ′
update is made. Clearly t∈T ,t≤T ℓt (Tr(Et ωt )) ≥ 2T3 ε by definition of T . At the same time, due
′
to the feedback guarantee (1), t∈T ,t≤T ℓt (Tr(Et ρ)) ≤ T3ε , so this sustains an upper bound for the
P
′
sum of losses for predictions on a fixed state: minφ∈D(d) t∈T ,t≤T ℓt (Tr(Et φ)) ≤ T3ε . Hence,
P
X X T ′ε
R(T ′ ) = ℓt (Tr(Et ωt )) − min ℓt (Tr(Et φ)) ≥ .
φ∈D(d) 3
t∈T ,t≤T t∈T ,t≤T
4 Future Directions
We haveconstructed
a mistake-bounded algorithm for online quantum state learning that makes at
log(d)
most O ε2
mistakes in predicting expectation values of binary measurements. The success of
this scheme relies on the feedback bt , which is guaranteed to be within 3ε of the true expectation value
on the unknown state ρ. However, in a practical setting, the existence of such reliable feedback
6
bt could be difficult or expensive to construct. As mentioned earlier, the most direct method
approximating Tr(Eρ) for some binary measurement E takes the sample average of positive results
of E on several copies of ρ. The issue here is that the sample complexity
scales linearly with the
log(d)
number of rounds of online learning. We are guaranteed after O ε2
mistakes that our estimate
of any measurement expectation on ρ is within ε distance from the true values; however, we are
not guaranteed how quickly all these mistakes will appear during the online learning process. We
could have arbitrarily long sequences of binary measurements that do not require updating our
estimate ωt , but in the process, we are “wasting” copies of ρ on generating feedback for rounds
where mistakes do not occur.
Hence, suppose now that the learner has the option to choose the sequence of binary measurements
E1 , E2 , · · · over time, and maintains a time series of estimates {ωt }t≥1 of the hidden state ρ as
well. The learner can query the environment for feedback bt that approximates Tr(Et ρ) up to error
ε; assume this demands some finite sample complexity per round. Let Tρ (δ) denote the earliest
time such that with probability at least 1 − δ, we have |Tr(EωTρ (δ) ) − Tr(Eρ)| ≤ ε for all binary
measurements E.
Question: (Original) Is there some adaptive scheme for choosing E1 , E2 , · · · such that Tρ (δ) =
p log(d), 1ε , 1δ , for some polynomial p?
The answer to such a question unites quantum online learning with the notion of sample complexity,
potentially offering a sample-efficient method to learning robust estimates of hidden quantum states.
5 Acknowledgements
I’d like to thank Professor Anshu and Chi-Ning Chou for their valuable suggestions regarding the
project.
References
[1] S. Aaronson et. al. Online Learning of Quantum States. Advances in Neural Information
Processing Systems 31, 2018. arXiv:1802.09025.
[2] S. Aaronson. The learnability of quantum states. Proc. Roy. Soc. London, A463(2088):
3089–3114, 2007. arXiv:quant-ph/0608142.
[3] A. Lowe. Learning Quantum States Without Entangled Measurements. UWSpace, 2021.
http://hdl.handle.net/10012/17663.
[4] Y. Freund and R. Schapire. Adaptive game playing using multiplicative weights. Games and
Economic Behavior(29): 79–103, 1999.
[5] Y. Freund and R. Schapire. A short introduction to boosting. Journal of Japanese Society for
Artificial Intelligence(55): 771–780, 1999.
[6] S. Arora and S. Kale. A combinatorial, primal-dual approach to semidefinite programs. Pro-
ceedings of the 39th annual ACM symposium on Theory of computing, 2007.
7
[7] D. Petz. A survey of trace inequalities. Functional Analysis and Operator Theory, 287–298,
1994.
[8] M. Nielsen and I. Chuang. Quantum Computation and Quantum Information: 10th Anniversary
Edition. Cambridge University Press, 2010.
[9] N. Littlestone. Learning Quickly When Irrelevant Attributes Abound: A New Linear-threshold
Algorithm. Machine Learning(2): 285–318, 1988.
8
6 Appendix
Lemma 6.1. For all real x such that |x| ≤ 1, the inequality exp (−x) ≤ 1 − x + x2 holds.
Proof. Let f (x) = 1 − x + x2 − exp(−x). Notice that f ′ (x) = −1 + 2x + exp(−x), with f (−1) =
3 − e > 0. We want to show that f ′ (x) < 0 for −1 ≤ x < 0 and f ′ (x) > 0 for x ≥ 0. Since f (0) = 0,
this will show that f (x) ≥ 0 for |x| ≤ 1.
Let us look at f ′′ (x) = 2 − exp(−x). We have that f ′′ (x) < 0 for x < − log 2, and f ′′ (x) > 0
otherwise. Hence, f ′ (x) is decreasing for x < − log 2 and increasing afterwards. Since f ′ (− log 2) =
1 − log 4 < 0 but limx→±∞ f ′ (x) = ∞, it follows that f ′ (x) intersects the x-axis at 2 points, one
of them being 0. Since f ′ (−1) = e − 3 < 0, by the intermediate value theorem, f ′ (x) intersects
the x-axis at x < −1, so we have that f ′ (x) < 0 for −1 ≤ x < 0. Thus f (x) is decreasing from
x = −1 to x = 0, but f (0) = 0 implies that f (x) ≥ 0 for −1 ≤ x ≤ 0. Now, beyond 0, since
f ′ (x) is increasing for x > − log 2 and f ′ (0) = 0, it follows that f ′ (x) > 0 for x > 0. Hence f (x) is
increasing for x > 0, and since f (0) = 0, this implies f (x) ≥ 0 for x > 0.
We have thus shown that f (x) ≥ 0 for all x ≥ −1, and the desired inequality follows.