Quantum Algorithms To Matrix Multiplication
Quantum Algorithms To Matrix Multiplication
Changpeng Shao
[email protected]
Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
arXiv:1803.01601v2 [quant-ph] 29 Jul 2018
Abstract. In this paper, we study quantum algorithms of matrix multiplication from the viewpoint of in-
putting quantum/classical data to outputting quantum/classical data. The main target is trying to overcome
the input and output problem, which are not easy to solve and many quantum algorithms will encounter, to
study matrix operations in quantum computer with high efficiency. And solving matrix multiplication will be
the first step. We propose three quantum algorithms to matrix multiplication based on swap test, SVE and
HHL. From the point of making fewer assumptions, swap test method works the best than the other two. We
also show that the quantum algorithm of matrix multiplication with classical input and output data by swap
e 2 /ǫ) with no assumptions. This is proved by giving an efficient quantum
test achieves the best complexity O(n
algorithm in polynomial time to solve the input problem, that is to prepare the quantum states of the classical
data efficiently. Other contributions of this paper include: (1). Extending swap test to a more general form
that is suitable to deal with quantum data in parallel, which will have further applications in other matrix
operations. (2). Generalizing SVE technique such that it applies to any matrix (not just Hermitian) directly
only with quantum data. (3). Proposing two new efficient quantum algorithms to prepare quantum states of
classical data, which solves the input problem efficiently than other quantum algorithms.
Key words. quantum algorithm, quantum computation, matrix multiplication, quantum state preparation
1 Introduction
In the study of quantum algorithms (for example, see [3], [13], [28]), people usually encounter the “input” and
“output” problem. The input problem is the transformation from classical data (such as complex vectors) into
quantum data (such as quantum states); the output problem is the converse. These two problems generally are
not easy to solve efficiently in quantum computer, and sometimes even cost more than other steps of the quantum
algorithms. So mostly we just assume that we already get the quantum data by some methods when studying
quantum algorithms. Another important idea is studying quantum algorithms with input and output data are all
quantum, such as the quantum machine learning for quantum data [3], principal component analysis [18], quantum
simulator [19], [29] and so on.
In this paper, we study the matrix multiplication in quantum computer from two different perspectives with
three different techniques. The two perspectives share the same input data, that is it can be classical or quantum.
The difference reflects in the output, one is quantum and the other one is classical. The three techniques we will
use are swap test [4], SVE [14] and HHL [13]. In the two different perspectives, if the input data is quantum, then
we can only apply swap test and SVE; if the input data is classical, then all the three techniques can play roles in.
What we cares more in this work is quantum data to quantum data. Although SVE contains more wide applications,
its performance in matrix multiplication is not efficient than swap test. Also to make swap test works for matrix
multiplication from quantum data to quantum data, a more generalized version of swap test (proposition 2) will be
proposed in this paper. Note that the swap test proposed in [4] can be viewed as a procedure from quantum data
to classical data, now the new version will achieve quantum data to quantum data.
The target of this study aims at extending classical matrix operations into quantum case (i.e., quantum input
and output), hoping to obtain efficient matrix operations in quantum computer and so solving the classical problems
more efficiently. Also for the comparison with classical algorithms to matrix multiplication, quantum algorithm to
matrix multiplication achieve classical data to classical data are also studied comprehensively in this work, which
contains two sub-works: the preparation of quantum states (classical data to quantum data) and the reading out
from quantum data into classical data.
2 Changpeng Shao
As for the matrix multiplication, the reading out problem is not difficult to solve mainly based on swap test.
However, the preparation of quantum states is not so easy in quantum computer generally. Efficient quantum
algorithms to certain special cases still exists (for instance, see [8], [12], [17], [26]). One special well known case is
when the classical data is relatively uniform distributed [1], [8], [17]. Based on this special case, in the paper we
will propose two new quantum algorithms (theorem 6 and 7) to prepare the quantum states of classical data in
the general case. The corresponding complexities are satisfactory and better than any other quantum algorithm
to achieve the same target to my knowledge. When we are given enough information (such as the maximum, the
minimum, the norm, the positions of nonzero entries and so on) about the classical data, then the input problem
can be solved efficiently in polynomial time. Obtaining such information may take some extra time. In the matrix
multiplication problem, however, all these required information can be obtained before implementing the quantum
algorithms. If the given matrix is n-by-n, then to get the required information will take at most O(n2 ), which is
acceptable, since Ω(n2 ) is the lower bound of matrix multiplication problem. This means, the input problem can
be actually solved “efficiently” in matrix multiplication.
The obtained quantum algorithms to matrix multiplication is polynomially depends on the precision. Therefore,
when the precision is bounded by O(1/poly log n), then quantum computer can solve the matrix multiplication
problem efficiently in time O(n e 2 ) by swap test (see table 3). The quantum algorithms obtained from SVE and HHL
depend on the condition number of the given matrices. If the condition number is bounded by O(poly log n), then
these two algorithms also achieve the best efficiency. However, if we only interested in quantum output data, then
all the three quantum algorithms are rely on the condition number. With the same assumption above on precision
and condition number, if the input is classical data, then this √ problem can be solved in polynomial time by HHL;
if the input is quantum data, then it can be solved in time O( e n) (see table 1) by swap test and SVE.
The structure of this paper is as follows: In section 2, 3, we consider the quantum algorithms to the matrix
multiplication with quantum output and classical output under one assumption about the preparation of quantum
states. Section 4 devotes to study efficient quantum algorithms √ to solve the assumption.
Notations.
qP In this paper, i refers to the imaginary unit −1. For any n × n matrix A = (aij ), the notation
kAkF := 2
i,j |aij | refers to the Frobenius norm of A; for any vector x = (x0 , . . . , xn−1 ), the notation kxkp :=
P p 1/p
( i |xi | ) refers to the p norm of x.
Assumption: Assume that we have efficient quantum algorithm to achieve the quantum state preparation of the
rows and the columns of A, such as by QRAM [10] or the data structure introduced in [14]. That is we can get
n−1 n−1 n−1
1 X 1 X 1 X
|Ai = aij |i, ji = kAi• k2 |ii|Ai• i = kA•j k2 |A•j i|ji (2.1)
kAkF i,j=0 kAkF i=0 kAkF j=0
qP
efficiently, where kAkF = i,j |aij |2 refers to the Frobenius norm of A.
Remark 1. By applying the inverse of the quantum algorithm of preparing the quantum states of rows and columns
of A on |Ai, we can obtain the following quantum states
n−1 n−1
1 X 1 X
|AF • i = kAi• k2 |ii, |A•F i = kA•j k2 |ji. (2.2)
kAkF i=0 kAkF j=0
Quantum Algorithms to Matrix Multiplication 3
1
Pn−1 2
Also note that, the density matrix of |Ai equals kAk 2 i=0 kAi• k2 |ii|Ai• ihi|hAi• |. By taking the trace on the
F
second register, we will get the density matrix of |AF • i, which is another method to get the quantum state of
(2.2) from (2.1). For convenience, we define AF • and A•F as the column vectors with entries kAi• k2 and kA•j k2
respectively.
Remark 2. The preparation of quantum states work efficient in practical in some cases [8], [12], [17], [26]. We will
discuss this in section 4. Two new efficient quantum algorithms to prepare quantum state will be proposed then,
which makes the assumption above more reliable.
Given two matrices A, B, in the following, we first consider the quantum algorithm to prepare the quantum
state of AB from swap test by giving the quantum information of A and B. Certainly, this problem can be solved
by the singular value estimation technique (SVE) proposed in [14] still with quantum information of A and B.
Moreover, if we view A as a classical data, then the quantum state of AB can also be solved by performing the
matrix multiplication algorithm obtained from HHL algorithm [13]. The complexity of the algorithms obtained in
the last two ways depends
on the condition number of A or B. Also in HHL algorithm, we need the Hamiltonian
e 0 A
simulation of A = to be efficient.
A† 0
We only consider the case of real matrix multiplication, as for complex matrix multiplication, we just need to
focus on the real and imaginary parts separately.
in the basis {|0i|ui, |1i|vi}. The eigenvalues of G are e±i2θ and the corresponding eigenvectors are
1 1
|w1 i = √ |0i|ui + i|1i|vi , |w2 i = √ |0i|ui − i|1i|vi
2 2
respectively. Note that |φi = − √i2 (eiθ |w1 i − e−iθ |w2 i). So performing quantum phase estimation algorithm on G
with initial state |0in |φi for some n = O(log 1/δǫ). We will get an approximate of the following state
i
− √ eiθ |yi|w1 i − e−iθ | − yi|w2 i , (2.4)
2
4 Changpeng Shao
where y ∈ Z2n satisfies |θ − yπ/2n | ≤ ǫ. The time complexity of the above procedure is O(Tin /ǫδ). Sometimes, δ
will be ignored just for simplicity in the complexity analysis. Performing a measurement on (2.4), we will get an ǫ
approximate of θ.
Furthermore, let f (y) = g(θ) be some functions such that f (y) = f (−y) (i.e., f is an even function), then from
(2.4), we can get
|g(θ)i|φi, (2.5)
by adding a register to store g(θ) and undoing the quantum phase estimation. This is a procedure that we want to
further make use of more quantum information about θ instead of outputting.
Now let |xi, |yi be two real quantum states, except a global phase, which can be prepared in time O(Tin ). Then
the above method provide us an quantum algorithm to estimate hx|yi to accuracy ǫ in time O(Tin /ǫ). Actually, we
just need to consider the state
1 1
|φi = √ (|+i|xi + |−i|yi) = (|0i(|xi + |yi) + |1i(|xi + |yi)).
2 2
p
The probability
p of |0i (resp. |1i) is (1 + hx|yi)/2 (resp. (1 − hx|yi)/2). So we can set sin θ = (1 + hx|yi)/2 and
cos θ = (1 − hx|yi)/2. The quantum state |φi can be rewritten in the form (2.3), where |ui, |vi corresponds to
the normalization of |xi + |yi, |xi − |yi. Therefore, the inner product hx|yi can be evaluated in time O(Tin /ǫ) with
accuracy ǫ. Concluding this, we get the following result
Proposition 1. Let |xi, |yi be two quantum states, which can be prepared in time O(Tin ), then hx|yi can be estimated
with accuracy ǫ in time O(Tin /ǫ).
Remark 3. If |xi, |yi are complex quantum states, then the probability of |0i (resp. |1i) is (1 + Rehx|yi)/2 (resp.
(1 − Rehx|yi)/2). So, we can only get the value of Rehx|yi. The image part of hx|yi can be computed by considering
the inner product of |xi with i|yi.
The above method to estimate hx|yi is usually called swap test [4]. Note that quantum counting [2] can also used
to estimate hx|yi. They contain the same idea. Moreover, from (2.5), we actually can obtain the following quantum
state
1 E
√ |0i|xi + |1i|yi g(hx|yi) ,
2
for any function g, since cosine function is even.
Proposition 2. Let |xi, |yi be two real quantum states, except a global phase, which can be prepared in time O(Tin ).
Let f be any function. Then there is a quantum algorithm within time O(Tin /ǫ) to achieve
1 1
√ (|0i|xi + |1i|yi) 7→ √ (|0i|xi + |1i|yi)|f (s)i, (2.6)
2 2
where |hx|yi − s| ≤ ǫ.
the procedure of HHL algorithm to put this value into the coefficient. At this time, the quantum parallelism will
play an important role in helping us deal with the inner product in parallel.
Denote C = AB, the target of the following quantum algorithm aim at finding the quantum information of
C, that is |Ci. Returning classical information of matrix multiplication will be studied in the next section. Note
that Cij = ATi• B•j = kAi• k2 kB•j k2 hAi• |B•j i. By swap test introduced above, we can estimate hAi• |B•j i efficiently.
Together with quantum parallelism, we can get the desired quantum state |Ci efficiently in the following five steps:
Step 1, consider the initial state, which equals the tensor product of |AF • i and |B•F i:
n−1
X
1
kAi• k2 kB•j k2 |i, ji|0, 0i.
kAkF kBkF i,j=0
Step 2, by control transformation, we can prepare |Ai• i and |B•j i in the last register:
1
n−1
X
1
kAi• k2 kB•j k2 |i, ji ⊗ √ |0i|Ai• i + |1i|B•j i
kAkF kBkF i,j=0 2
(2.7)
1
n−1
X
1
7→ kAi• k2 kB•j k2 |i, ji ⊗ |0i(|Ai• i + |B•j i) + |1i(|Ai• i − |B•j i) .
kAkF kBkF i,j=0 2
Denote |φij i = 21 (|0i(|Ai• i + |B•j i) + |1i(|Ai• i − |B•j i)) = sin θij |0i|uij i + cos θij |1i|vij i, where sin2 θij (resp.
2
cos θij ) is the probability of |0i (resp. |1i) and |uij i (resp. |vij i) is the normalization of |Ai• i + |B•j i (resp.
|Ai• i − |B•j i). Also denote the eigenvalues of Gij = (2|φij ihφij | − I)(Z ⊗ I) as e±i2θij and the corresponding
±
eigenvectors as |wij i. Then (2.7) can be written as
1
n−1
X
kAi• k2 kB•j k2 |i, ji sin θij |0i|uij i + cos θij |1i|vij i .
kAkF kBkF i,j=0
Step 3, perform quantum phase estimation to Gij with the initial state (sin θij |0i|uij i + cos θij |1i|vij i)|0i. To-
gether with the control operation, we can get
−i
n−1
X
−
√ +
kAi• k2 kB•j k2 |i, ji eiθij |wij i|yij i − e−iθij |wij i| − yij i ,
2kAkF kBkF i,j=0
−i
n−1
X
+ −
√ kAi• k2 kB•j k2 |i, ji eiθij |wij i|yij i − e−iθij |wij i| − yij i
2kAkF kBkF i,j=0
p
⊗ hAi• |B•j i|0i + 1 − hAi• |B•j i2 |1i .
Step 5, undo the procedure 1-3, which yields the desired state
n−1
X
1
kAi• k2 kB•j k2 hAi• |B•j i|i, ji|0i + |0i⊥ .
kAkF kBkF i,j=0
The next thing we need to do is estimating the error and the final complexity. This procedure is quite simple
just based on triangle inequality of norm, so we put all the details in appendix A. Note that the above algorithm
procedure hold for all matrices, not just square. The final result can be summarized in the following
6 Changpeng Shao
e 3 3 3
Theorem 1. For any two matrices A, B, the quantum state of AB can be obtained in time O(kAk F kBkF /kABkF ǫ)
to accuracy ǫ.
For instance,
(1). If B = |bi, then the quantum state |Abi can be obtained in time O(kAk e 2 2 e 2
F /ǫkA|bik2 ) = O(nκ /ǫ), where κ
is the condition number of A. Actually, the result in theorem 1 is also bounded by O(nκ e 2
/ǫ).
(2). If A = |ai and B = hb|, then the quantum state of the the rank 1 matrix |aihb| can be obtained in time
e
O(1/ǫ). This result also holds when A, B are given in classical column and row vectors.
(3). For any two general matrices A, B, they P can decomposed by columns and rows, that is A = (A•0 , . . . , A•(n−1) )
T T
and B = (B0• , . . . , B(n−1)• )T . Then AB = j A•j Bj• T
. Since the quantum state of |A•j ihBj• | to accuracy ǫ, denoted
as |Cej i, can be obtained in time O(1/ǫ).
e Then we just need to compute the quantum state |Ci e proportional to the
P
linear combination ej i. In [6, Chapter 26], there is a quantum algorithm to get |Ci
kA•j k2 kBj• k2 |C e in time
j
P
e j kA•j k2 kBj• k2 e
=O A•F · BF •
O P ,
e ǫkABkF
ǫ j kA•j k2 kBj• k2 |Cj i
2
where A•F = (kA•0 k2 , . . . , kA•(n−1) k2 )T and BF • = (kB0• k2 , . . . , kB(n−1)• k2 )T (see remark 1). The above result
can also changed only depending the Frobenius norm by Cauchy inequality into the form O(kAk e F kBkF /ǫkABkF ),
which is better than theorem 1.
Theorem 2. Let A be an l × m matrix and B an m × n matrix such that ln 6= 1, then the quantum state of AB
e
can be obtained in time O(kAk F kBkF /kABkF ǫ) to accuracy ǫ.
In [14], Kerenidis et al introduced a data structure, which is similar to QRAM [10], to store classical matrices in
quantum computer efficiently. Based on this data structure, a fast quantum algorithm to the singular Pvalue estimated
(SVE for brief, which is close to singular value decomposition) was obtained. More precisely, P if A = σi |uPi ihvi | is the
singular value decomposition of A, then there is an efficient quantum algorithm to achieve αi |vi i 7→ αi |vi i|σi i.
This algorithm is enough to solve certain problems relating to matrix operations, like multiplication or inversion.
Moreover, their algorithm about SVE only applies the quantum information of A.
In this subsection,
P we
Pfirst review the basic ideas of SVE, then we generalize their result into a quantum algorithm
to achieve αi |vi i 7→ αi |ui i|σi i. So even when A is not Hermitian, we can also perform matrix multiplication
or inversion directly only with the quantum information of A. Although, the data structure proposed in [14] lie in
a model which is a little different from the standard quantum circuit model, their result about SVE only depends
on the efficient preparation of some quantum states. So with the assumption given in the beginning of this section,
their result also works in the standard quantum circuit model.
Let A = (aij )n×n be a n × n matrix, based on the assumption given in the beginning of this section, we know
that quantum computer can perform the following mappings efficiently in time O(poly log(n)):
n−1 n−1
1 X 1 X
UM : |ii|0i 7→ |ii|Ai• i = aij |i, ji, UN : |0i|ji 7→ |AF • i|ji = kAi• k|i, ji.
kAi• k j=0 kAkF j=0
Remark 4. The mapping UM and UN seem too perfect. Generally, the results will contain some other orthogonal
parts or some errors in the results. However, by amplitude amplification, we can make it very close to the results
given in the above formula. In the following, to make things simple, we just use these two mappings as [14] did.
A
Define two degenerate operators M and N as: M : |ii 7→ |ii|Ai• i, and N : |ji 7→ |AF • i|ji. Then M† N = kAkF
.
† † † †
It is also easy to check that M M = N N = In . The reflections 2MM − In2 and 2N N − In2 can be efficiently
Quantum Algorithms to Matrix Multiplication 7
Pn−1
implemented in quantum computer. Denote W = (2MM† − In2 )(2N N † − In2 ). Let A = i=0 σi |ui ihvi | be the
singular value decomposition of A, then
2σi
W N |vi i = M|ui i − N |vi i;
kAkF
4σ 2 2σi
i
W M|ui i = − 1 M|ui i − N |vi i.
kAk2F kAkF
So the subspace span{M|ui i, N |vi i} is invariant under W . The matrix representation of W in this space is
4σi2 2σi
kAk2 − 1
F kAkF
Wi = .
2σi
− −1
kAkF
The eigenvalues of Wi are exp(±iθi ) where θi satisfies cos θi = 2σi2 /kAk2F − 1. So cos(θi /2) = σi /kAkF . The
corresponding eigenvectors are wi± = −M|ui i + e∓iθi /2 N |vi i. It is easy to get the following decomposition
1 1
N |vi i = (w+ − w− ), M|ui i = (eiθi /2 w+ − e−iθi /2 w− ).
2i sin(θi /2) 2i sin(θi /2)
With the above notations, we now can prove a more general result than [14].
Pn−1
Proposition 3. Let A be a n × n matrix with singular value decomposition A = i=0 σi |ui ihvi |. Then there is a
Pn−1 Pn−1
quantum algorithm that runs in O(poly log(n)/ǫ) and achieves i=0 αi |vi i|0i →
7 i=0 αi |ui i|σ̃i i, where |σ̃i − σi | ≤
ǫkAkF for all i with probability at least 1 − 1/poly(n).
Proof. Denote the norm of wi± as m± ±
i , the corresponding quantum states as |wi i. Since the eigenvalues and
eigenvectors of W contain the information of singular value and singular vectors of A, these information can be
obtained by performing quantum phase estimation on W . The desired procedure can be obtained from the following
five steps: Pn−1
Step 1, choose the initial state as i=0 αi |vi i, then apply UN on it
n−1
X n−1
X αi
− −
αi N |vi i = m+
i |w +
i − m i |w i .
i=0 i=0
2i sin(θi /2)
Step 2, perform the quantum phase estimation algorithm to estimate the eigenvalues and eigenvectors of W ,
then we get the following state
n−1
X αi
− −
m+ +
i |w i|θi i − mi |w i| − θi i .
i=0
2i sin(θi /2)
Step 3, change the phase and store the singular values in another register
n−1
X αi
−iθi /2 − −
eiθi /2 m+
i |w +
i|θ i i − e m i |w i| − θ i i |σ̃i i.
i=0
2i sin(θi /2)
P
any quantum state |bi = i αi |vi i. To get the quantum information about A|bi, in proposition 3, when we
For P
obtain i αi |ui i|σ̃i i, we can perform a controlled rotation on the register stores singular value and will get
n−1
X q
αi |ui i σ̃i t|0i + 1 − t2 σ̃i2 |1i , (2.8)
i=0
where t = 1/ maxi σ̃i . By choosing a suitable ǫ, we will get a good approximate of |Abi. As for our problem of
computing the quantum state of AB, we can choose the initial state as |Bi, and implement the above procedure in
parallel in each column. Finally by a simple analysis about the error and complexity (details are given in appendix
B), we will get the following result
e
Theorem 3. The quantum algorithm to get the quantum state of AB to precision ǫ costs O(kAk 2
F kBkF κ /ǫkABkF )
by SVE.
e
Remark 5. If B = |bi, which only contains one column, then the complexity to obtain |Abi is O(kAk 2
F κ ǫkA|bik2 ) =
√
e nκ3 /ǫ). However, to get a good approximate of A|bi without normalization, the procedure may not so expensive.
O(
Pn−1
Actually, from (2.8), we see that the error between i=0 αi σ̃i |ui i and A|bi is bounded by ǫkAkF . So we just need
e
to choose ǫkAkF = ǫ1 small. Then the complexity to get A|bi is O(kAk F /ǫ1 ).
The quantum algorithm to get the quantum state of AB byHHL is similar to the algorithm by SVE. We
e 0 A
now assume that A is Hermitian, otherwise we can consider A = . We also assume that the Hamiltonian
A† 0
e P
simulation of e−iAt is efficiently. For any quantum state |bi = αj |uj i, by HHL algorithm, we can get the state
αj |uj i|σ̃i, where |σ̃i − σi | ≤ ǫ maxi σi . Similarly, we can get (2.8). Compared to the singular value estimation to
achieve the quantum state of |ABi, the only change is kAkF , now it becomes maxi σi . So by HHL algorithm, we
can get the quantum state of AB in time O(κ e 3 /ǫ) to accuracy ǫ, since kABkF ≥ kBkF mini σi .
Remark 6. Just like HHL algorithm to solve linear system, the methods based on SVE and HHL also have the
problem. More precisely, we potentially assume that each |B•j i lies in the nonzero components of A, i.e., the space
generated by singular vectors with nonzero singular values. Otherwise, the success P probability and so the final
complexity will be affected. For instance, in (2.8), the success probability is P = i,σ̃i 6=0 |αi σ̃i t|2 . If we assume that
P P
|bi lies in the nonzero components, then P ≥ mini,σ̃i 6=0 |σ̃i t|2 i,σ̃i 6=0 |αi |2 = 1/κ2 , since i,σ̃i 6=0 |αi |2 = 1. If there
exists i such that σ̃i 6= 0 but αi = 0, then P ≥ 1/κ2 may not hold anymore. So in these two algorithms, we should
make this as another assumption. However, the method based on swap test do not contain such a problem.
The following table is a summary about the three quantum algorithms proposed in this section to achieve the
quantum data of the multiplication of two matrices.
Table 1. Comparison of different quantum algorithms to achieve matrix multiplication with quantum information, where
A, B are input matrices, κ is the condition number of A.
Note that swap test and SVE work for all the cases if A, B are classical or quantum data, while HHL needs A
or B to be classical. All the results are related to the condition number in the worst case. Simple analysis shows
Quantum Algorithms to Matrix Multiplication 9
In this section, we consider the problem of getting classical data to the multiplication of two matrices. First, we
focus on the analysis of the method based on swap test. The quantum algorithms based on SVE or HHL are similar
to analyze.
Let x, y are two n dimensional vectors. Denote the corresponding quantum states of these two vectors as |xi, |yi.
Then x·y = kxk2 kyk2 hx|yi, where kxk2 , kyk2 are the norms of x, y. By proposition 1, we can get a good approximate
of hx|yi, i.e., we can get a value Pxy in time O(Tin /ǫ) such that |Pxy − hx|yi| ≤ ǫ. However, a good approximate of
hx|yi does not imply a good approximate of x · y. This is because |Pxy kxk2 kyk2 − x · y| ≤ |x||y|ǫ. In order to make
this error small, we denote kxk2 kyk2 ǫ = ǫ̃, then the final complexity of estimating x · y becomes
Here we did not considered the complexity of evaluating kxk2 , kyk2 . From the above analysis, we conclude that
proposition 1 solves the inner product problem of two classical vectors efficiently only if the norms of the vectors
are small.
Remark 7. We should remark that the influence of norms kxk2 , kyk2 on the complexity (3.1) by swap test to
estimate inner product of x and y cannot removed actually. This is all because of the optimality of Grover searching
algorithm. Consider the searching problem in Zn . Assume there are r marked items, and f : Zn → Z2 is defined
f (x)
as f (i) = 1 if and only if i is marked.√ Now we define g(x) = (−1) . Denote x = (g(0), . . . , g(n − 1)) and
y = (1, . . . , 1). Then kxk2 = kyk2 = n, and the quantum state |xi, |yi can be prepared efficiently. As we can see
Pn−1
x · y = x=0 g(x) = n − 2r. Suppose the complexity of evaluating the inner product of x · y is independent of
e
kxk2 , kyk2 and can be improved into O(1/ǫ̃), then we can decide whether or not there exist marked items in Zn
efficiently, since we can just choose ǫ̃ = O(1). Together with the bisection method, we can finally find one marked
item if r > 0 efficiently. This will contradict the optimality of Grover searching algorithm.
Let A, B be two n × n matrices. Multiplying A and B is equivalent to evaluate n2 inner product of n dimensional
vectors. Classical method to evaluate inner product of two n dimensional vectors takes time O(n), which lead the
complexity of the classical matrix multiplication to O(n3 ). However, swap test may reduce the complexity of
evaluating inner product and so may reduce the complexity of matrix multiplication. The norms of Ai• , B•j can be
evaluated by the classical method (0 ≤ i, j ≤ n − 1), which costs O(n2 ). These are classical data, and so can be used
as many times as we want. Since we assume that the quantum states of Ai• , B•j can be prepared efficiently. Then
by (3.1) and note that AF • , B•F are column vectors store the information of the 2 norms Ai• , B•j (see remark 1),
we have
Theorem 4. There is a quantum algorithm that computes the multiplication of A and B with classical information
e
in time O(kA 2
F • k1 kB•F k1 /ǫ + n ) to accuracy ǫ.
P
Proof. From (3.1), we know that the complexity to multiply A, B with classical data is i,j kAi• k2 kB•j k2 /ǫ =
kAF • k1 kB•F k1 /ǫ. Together with O(n2 ) to compute the corresponding norms of Ai• , B•j , we will get the desired
result. ⊓
⊔
The accuracy ǫ in the theorem means that if C = AB = (cij ) is the exact result and C e = (c̃ij ) is the result
obtained from the quantum algorithm, then |cij − c̃ij | ≤ ǫ. This is the absolute error. It is not easy to do the
analysis of relative error now, since we have no more information about the value of inner product x · y. However,
by choosing the absolute error relatively smaller then the kxk2 kyk2 , the absolute error becomes closer to relative
error.
10 Changpeng Shao
Next, we consider the algorithms based on SVE and HHL. In order to compute the classical information about
AB, we actually do Pnot need to perform measurements in the quantum algorithm by SVE or HHL. As discussed in
remark 5, if |Bj i = k αjk |uk i, then (2.8) can be written as
1
A|Bj i|0i + orthogonal part.
maxk σk
e
The above quantum state can be obtained efficiently in time O(1/kAk F ǫ). By applying swap test on the above
state with |i, 0i, we will get an approximate about the entries of AB. More details about the analysis of error and
complexity, which are not so difficult, are given in appendix C. The final result is
Theorem √ 5. There is a quantum algorithm that computes the multiplication of A and B with classical information
e nκkAF • k2 kB•F k3 /ǫ2 + n2 ) by SVE and O(κ
in time O( e 2 kAk2 kB•F k3 /ǫ2 + n2 ) by HHL algorithm to accuracy ǫ.
3 F 3 3
The following table summarizes the above results about quantum matrix multiplication with classical data.
Different from the algorithm to obtain quantum information, the complexity now is independent of kABkF . This
is because that no measurements are needed in evaluating the classical data.
Table 2. Comparison of different quantum algorithms to achieve matrix multiplication with classical information, where
A, B are input matrices, κ is the condition number of A.
By swap test e
O(kA 2 e 2
F • k1 kB•F k1 /ǫ + n ) = O(nkAkF kBkF /ǫ + n )
To make the above table more easy to understand and easy to compare√ with classical algorithms, we assume
that the singular values of A, B are smaller than 1, then kAkF , kBkF ≤ n. So the above table can be simplified
into
Table 3. Comparison of different quantum algorithms to achieve matrix multiplication with classical information, where
A, B are input matrices, κ is the condition number of A and the singular values of A, B are smaller than 1.
By SVE e
O(κn2.25 2
/ǫ ) Assumptions in table 1 and the singular values of A, B are smaller than 1
The best classical algorithm to matrix multiplication with complexity O(n2.3728639 ) is due to Le Gall [16] at 2014.
If the precision ǫ is small in size O(1/poly log n), then the quantum algorithm to matrix multiplication based on
e 2 ). Also, to make this quantum algorithm better than Le Gall’s classical algorithm, the upper bound
swap test is O(n
0.3728639
of ǫ is O(n /poly log n). The quantum algorithm based on SVE works better than the classical algorithm only
e 0.1228639 ) and the quantum algorithm based on HHL works better only if κ2 /ǫ2 = O(n
if κ/ǫ2 = O(n e 0.6228639 ). In
[5], Buhrman et al also proposed a quantum algorithm to achieve matrix multiplication, however, their complexity
depends on the number of nonzero entries of AB, so we prefer not to compare with it here.
Quantum Algorithms to Matrix Multiplication 11
Proposition 4. For any vector x, its quantum state can be prepared in time O(n2 (log n)2 logc (n2 (log n)2 /ǫ)) to
precision ǫ.
Although the above method works for all cases, it is not efficient generally. We still hope the input problem can
be solved efficiently in some special cases. Under certain conditions, this problem can actually solved efficiently [8],
[12], [17], [26]. In the following, we focus on the one given in [17]. In this paper, Lloyd et al provided a quantum
algorithm to prepare quantum state, which works very well when the given classical data are relatively uniform
distributed. It has been used to solve the supervised classification problem [17] and the least square support vector
machine problem [21]. In the following, we first give a detailed analysis about this technique in order to find its
advantages and disadvantages. Then based on this algorithm, we will propose two new quantum algorithms with
better efficiency.
Let f be a map from Zn to R, denote
These notations can be similarly extended to vectors or sequences. We should remark that in [17], the authors’s
main objective is solving the supervised classification problem, so their results and methods are confined to this
problem. Moreover, they did not give too much analysis about the efficiency of their method. However, their method
is more general then preparing quantum states. The following result is obtained by generalizing their method.
Pn−1
Proposition 5. Let f be a map (or an oracle) from Zn to R∗ = R \ {0}. Then for any state |ψi = k=0 bk |ki with
P n−1
preparation complexity O(T|ψi ), we can construct the state |ψ ′ i = √1Z k=0 f (k)bk |ki in time O(κ(f )3/2 T|ψi /ǫ) to
Pn−1
accuracy ǫ, where Z = k=0 |f (k)bk |2 .
Pn−1 −iHt
Proof. Let H = k=0 f (k)|kihk| be a Hamiltonian, which is a diagonal matrix, so e can be implemented
efficiently. Consider the following procedure:
n−1
X n−1
X n−1
X
1 1 1
√ (|0i + |1i) bk |ki 7→ √ |0i bk eif (k)t |ki + √ |1i bk e−if (k)t |ki
2 k=0
2 k=0
2 k=0
n−1
X n−1
X
1 1
7→ √ |+i bk eif (k)t |ki + √ |−i bk e−if (k)t |ki (4.1)
2 k=0
2 k=0
n−1
X n−1
X
= |0i bk cos(f (k)t)|ki + i|1i bk sin(f (k)t)|ki,
k=0 k=0
where the first step is the result of Hamiltonian simulation and the second step applies Hadamard transformation
on the first qubit.
Choosing t small enough such that there exist ǫ0 , ǫ1 satisfy ǫ0 ≤ |f (k)t| ≤ ǫ1 ≪ 1, here we choose ǫ1 =
O(κ(f )−1/2 ǫ). Then the state along with |1i is an approximation of the state |ψ ′ i. And the error between them is
12 Changpeng Shao
bounded by O(ǫ) (see more details in appendix D). Note that f (k) 6= 0 for all k, so the probability of getting |1i is
Pn−1
P ≈ Zt2 = k=0 |bk f (k)t|2 ≥ ǫ20 . By amplitude amplification technique, it suffices measuring O(1/ǫ0 ) times. Note
that ǫ0 ≤ |f (k)t| ≤ ǫ1 , so we have ǫ0 ≈ min(f )t and ǫ1 ≈ max(f )t. Hence ǫ0 = κ(f )−1 ǫ1 = O(κ(f )−3/2 ǫ). The
complexity of procedure (4.1) is O(T|ψi ), so the complexity of getting |ψ ′ i is O(T|ψi /ǫ0 ) = O(κ(f )3/2 T|ψi /ǫ). ⊓
⊔
The final complexity of the algorithm given in proposition 5 is affected by κ(f ). A simple case is when the
sequence {|f (0)|, · · · , |f (n − 1)|} is relatively uniform. Here relatively uniform means κ(f ) = max(f )/ min(f ) is an
acceptable small constant. In this case, the complexity of proposition 5 can be further simplified into O(T|ψi /ǫ).
In [8], Clader et al also propose a quantum algorithm to prepare quantum state, which is indirectly inspired
by the work of HHL algorithm [13]. Simple analysis shows that their algorithm also contains the same problem
discussed above, that is the influence of κ(f ). However, their algorithm is quite simple and contains no error. Note
that the idea of proposition 5 can be generalized into the complex field by considering the real and image part
respectively, so in the following we just need to focus onP the preparation of real vectors. Moreover, if there exists
n−1
k such that f (k) = 0, then the probability analysis P = k=0 |bk f (k)t|2 ≥ ǫ20 may not hold anymore. However, if
P n−1
|ψi = √1n k=0 |ki and the sequence {|f (0)|, · · · , |f (n−1)|} contains O(z) nonzero elements, then the complexity of
p
proposition 5 should be multiplied by O( n/z) because of amplitude amplification technique. A direct application
of proposition 5 is
Proposition 6. Let x = (x0 , . . . , xn−1 ) be a given vector that contains O(z) nonzeropelements, denote κ(x) =
1
Pn−1 3/2
maxk |xk |/mink,xk 6=0 |xk |, then |xi = kxk k=0 xk |ki can be prepared in time O(κ(x) n/z(log n)/ǫ) to accuracy
2
p
ǫ. Moreover, if x is relatively uniform, then the complexity is O( n/z(log n)/ǫ).
Pn−1
Proof. Just choose |ψi = √1n k=0 |ki and f (k) = xk in proposition 5. ⊓
⊔
However, if we know the positions of nonzero entries in x, then we can just focus on these nonzero parts and
apply the algorithm given in proposition 5 to prepare the quantum state of the nonzero components of x, which is
also equals to the quantum state of x itself. So we have
Proposition 7. For any given vector x, there is a quantum algorithm to prepare its quantum state in time
O(κ(x)3/2 (log n)ǫ). Moreover, if x is relatively uniform, then the complexity is O((log n)/ǫ) .
As we can see from the above result, the quantum state preparation algorithm works efficient when x is a
relatively uniform distributed vector, and may performs very bad otherwise. One way to grasp the property of
relatively uniform distribution is decomposing x into the sum of several relatively uniform distributed vectors. In
the following, we given two different such decompositions.
Let x = (x0 , . . . , xn−1 ) be a real vector. Assume that 0 < x0 < · · · < xn−1 for simplicity. Denote κ(x) =
xn−1 /x0 ≈ 2q . Then each interval Ij = [2j−1 x0 , 2j x0 ), where 1 ≤ j ≤ q, contains several values of xi , we denote them
as xj1 , . . . , xjtj and set the vector yj = (0, . . . , 0, xj1 , . . . , xjtj , 0, . . . , 0). Then x = y1 +· · ·+yq and each vector yj can
be prepared efficiently in time O((log n)/ǫ) by proposition 7. Now we see that |xi = λ1 |y1 i + · · ·+ λq |yq i, where λj =
kyj k2 /kxk2 . From the Pqmethod given in [6, Chapter 26], the complexity to achieve such a linear combination to get |xi
equals O(Cq (log n) j=1 kyj k2 /kxk2 ǫ) = O(q 5/2 (log q)2 logc (q 2 (log q)2 /ǫ)(log n)/ǫ), where Cq is the complexity to
Pp
implement the unitary U such that U |0i ∝ λj |ji, which is at most O(q 2 (log q)2 logc (q 2 (log q)2 /ǫ)) as discussed
in [20, Chapter 4]. When the entries of x are not all positive and increasing, we define the interval Ij based on the
vector whose entries are the absolute value of the entries of x and nonzero. The requirement that all entries are
sorted is not so necessary, since the above analysis are also hold for the case that x is not sorted, we only need to
focus on positions of vectors lie in Ij . At this case, the notations will be a little complicate, but it changes nothing.
Therefore, we have
Theorem 6. Let x = (x0 , . . . , xn−1 ) be a given vector and κ(x) = maxk |xk |/ mink,xk 6=0 |xk |. Then its quantum
state can be prepared in time O((log κ(x))5/2 (log log κ(x))2 logc [(log κ(x))2 (log log κ(x))2 /ǫ](log n)/ǫ).
This result is better than proposition 7. If κ(x) is too large, then we may consider giving up the components
that are close to mink,xk 6=0 |xk | if they are not too many. Moreover, even if max |xk | = 21000 and mink,xk 6=0 |xk | = 1,
Quantum Algorithms to Matrix Multiplication 13
then log κ(x) = 1000, which is still an acceptable constant. From this point, the above result which is polynomially
depending on log κ(x) seems to be a pretty good algorithm to prepare quantum states.
Another decomposition is more direct and easy. We assume that all entries of x are nonzero, otherwise, we
only focus on the nonzero components. Now define y = M (sign(x0 ), . . . , sign(xn−1 )), where M ≥ maxi |xi | and
sign(xi ) = 1 if xi ≥ 0; sign(xi ) = −1 if xi < 0. Then the quantum state |yi of y can be obtained efficiently in time
O(log n). Also define z := x + y = (sign(x0 )M + x0 , . . . , sign(xn−1 )M + xn−1 ), which is uniformly distributed with
n nonzero entries. So by proposition 7, the quantum state |zi of z can be obtained efficiently in time O((log n)/ǫ)
to precision ǫ. Since
1 kzk2 kyk2
|xi = (z − y) = |zi − |yi.
kxk2 kxk2 kxk2
What we should do next is computing the linear combination of two efficiently prepared quantum states. The linear
combination of two quantum states can be obtained from a similar procedure to Hadamard test as follows: here for
simplicity denote kzk2 /kxk2 as λ, denote kyk2 /kxk2 as µ,
1 1
p (λ|0i|zi + µ|1i|yi) 7→ p (λ|+i|zi + µ|−i|yi)
2
λ +µ 2 λ + µ2
2
1 h i
= p |0i(λ|zi + µ|yi) + |1i(λ|zi − µ|yi) .
2(λ2 + µ2 )
2 2 2 2 2
pprobability to get |xi = λ|zi − µ|yi is 1/2(λ + µ ) = kxk2 /2(kyk2 + kzk2 ). Then the complexity to get |xi is
The
2 2 2
O( (kyk2 + kzk2 )/kxk2 (log n)/ǫ). Set
If d is a small constant, then we can just choose M = maxk |xk |. Note that kxk2 ≥ n mink |xk |, so
P
kyk22 + kzk22 2nM 2 + 2M i xi + kxk22
= ≤ 3κ(x)2 + 1.
kxk22 kxk22
Hence, the complexity obtained by this decomposition is O(κ(x)(log n)/ǫ), which is better by applying proposi-
tion 7 directly, however, not efficient than theorem 6. In [25], the author propose a method to achieve the linear
combination, which is independent of the effect of λ and µ, in time O(log(n)/ǫ2 ). Therefore, we have
Theorem 7. For any vector x, its quantum state can be prepared in time O(log(n)/ǫ2 ) to precision ǫ.
In the following, we summarize all the quantum algorithms proposed above in the following table. Note that
they work for all classical data, so no other assumptions are needed. However, they need a lot of information about
the input data, such as the nonzero components, the maximal and minimal entries. It may take some extra time to
get them (such as by quantum searching algorithm [9], [11]) and we will not consider it right here.
Table 4. Comparison of different quantum algorithms to prepare quantum state of x = (x0 , . . . , xn−1 ), where κ(x) =
maxk |xk |/ mink,xk 6=0 |xk | and ǫ is the precision.
Theorem 6 O((log κ(x))5/2 (log log κ(x))2 logc [(log κ(x))2 (log log κ(x))2 /ǫ](log n)/ǫ)
Theorem 7 O(log(n)/ǫ2 )
14 Changpeng Shao
In the quantum algorithms to achieve matrix multiplication with classical data, we can first apply searching
algorithm to find the desired information to prepare quantum states, which takes at most O(n2 ) steps and does
not affect the final complexity of the algorithms. This means, to getting the classical data of matrix multiplication,
quantum algorithms listed in table 2, except HHL, do not need the assumption listed in the beginning of section 2.
Strictly speaking, actually only the quantum algorithm obtained by swap test do not need any assumptions by the
analysis in remark 6. Therefore, we have
e 2 /ǫ) to precision ǫ.
Theorem 8. The multiplication of two n × n matrices can be obtained in time O(n
5 Conclusions
Quantum computer outperforms the classical computer in many problems. However, many of the quantum algo-
rithms make one or two assumptions; the most common one is the input problem, that is we assume the given data
is quantum data by some methods like QRAM. However, this problem is not easy to solve generally. As suggested
in [3], we can study quantum algorithms only with quantum input and output. With this idea, we do not need to
consider the input and output problem. One important task worth to study is extending classical matrix operations
into quantum computer with quantum input and output data. In this paper, we only considered the problem of
matrix multiplication, however, it forms the most elementary step of many other matrix operations, such as QR
decomposition and LU decomposition. Until now, most matrix operations are not easy to find suitable quantum
techniques to deal with them efficiently. QR decomposition is very useful, however, it seems quit difficult to make it
efficient in quantum computer in polynomial time. Fortunately, we already have SVE technique and its generalized
version. This will play important roles in studying quantum matrix operations. The generalized version of swap test
can be viewed as another technique that we can apply.
Acknowledgements. This work is supported by the NSFC Project 11671388 and the CAS Frontier Key Project
QYZDJ-SSW-SYS022.
References
1. Aaronson, S., Quantum Machine Learning Algorithms: Read the Fine Print, Nature Physics 11(4): 291-293, 2015.
2. Brassard, G., Høyer, P., Tapp, A., Quantum Counting, 25th Intl. Colloquium on Automata, Languages, and Programming
(ICALP), LNCS 1443, pp. 820-831, 1998.
3. Biamonte, J., Wittek, P., Pancotti, N., Rebentrost, P., Wiebe, N. and Lloyd, S., Quantum machine learning, Nature 549,
195-202, 2017.
4. Buhrman, H., Cleve, R., Watrous, J. and Wolf, R. de, Quantum Fingerprinting, Phys. Rev. Lett. 87, pp. 167902, 2001.
5. Buhrman, H. and Špalek, R., Quantum verification of matrix products. In Proceedings of the 17th ACM-SIAM Symposium
on Discrete Algorithms, pp. 880-889, 2006.
6. Childs, A.M., Lecture Notes on Quantum Algorithms, http://www.cs.umd.edu/~ amchilds/qa/, 2017.
7. Childs, A.M., Kothari, R., and Somma, R.D., Quantum algorithm for systems of linear equations with exponentially
improved dependence on precision, SIAM Journal on Computing 46, 1920-1950, 2017.
8. Clader, B.D., Jacobs, B.C. and Sprouse, C.R., Preconditioned quantum linear system algorithm, Phys. Rev. Lett. 110,
pp. 250504, 2013.
9. Dürr, C. and Høyer, P., A quantum algorithm for finding the minimum, arXiv:quant-qh9607014, 1996.
10. Giovannetti, V., Lloyd, S. and Maccone, L., Quantum random access memory. Phys. Rev. Lett., 100, pp. 160501, 2008.
11. Grover, L.K., A fast quantum mechanical algorithm for database search, Proceedings, 28th Annual ACM Symposium on
the Theory of Computing (STOC), 212-219, 1996.
12. Grover, L. and Rudolph, T., Creating superpositions that correspond to efficiently integrable probability distributions.
arXiv:quant-ph/0208112, 2002.
13. Harrow, A.W., Hassidim, A. and Lloyd, S., Quantum algorithm for solving linear systems of equations, Phys. Rev. Lett.
15, pp. 150502, 2009.
14. Kerenidis, I. and Prakash, A., Quantum Recommendation System, 8th Innovations in Theoretical Computer Science
Conference, 49:1-49:21, 2017.
Quantum Algorithms to Matrix Multiplication 15
15. Kerenidis, I. and Prakash, A., Quantum gradient descent for linear systems and least squares, arXiv:1704.04992v3, 2017.
16. Le Gall, F., Powers of tensors and fast matrix multiplication, in Proc. ISSAC 2014, 296-303, 2014.
17. Lloyd, S., Mohseni, M. and Rebentrost, P., Quantum algorithms for supervised and unsupervised machine learning,
arXiv:1307.0411v2, 2013.
18. Lloyd, S., Rebentrost, P. and Mohseni, M. Quantum principal component analysis, Nature Physics 10, 631-633, 2014.
19. Marvian, I. and Lloyd, S. Universal quantum emulator, arXiv:1606.02734, 2016.
20. Nielsen, M.A. and Chuang, I.L., Quantum Computation and Quantum Information, 10th Anniversary Edition, Cambridge
University Press, 2010.
21. Rebentrost, P., Mohseni, M. and Lloyd, S., Quantum support vector machine for big data classification. Phys. Rev. Lett.
113(13), pp. 130503, 2014.
22. Rebentrost, P., Schuld, M., Wossnig, L., Petruccione. F. and Lloyd, S., Quantum gradient descent and Newton’s method
for constrained polynomial optimization, arXiv:1612.01789v2, 2016.
23. Rebentrost, P., Steffens, A. and Lloyd, S., Quantum singular value decomposition of non-sparse low-rank matrices, Phys.
Rev. A 97, pp. 012327, 2018.
24. Schuld, M., Sinayskiy, I. and Petruccione, F., Prediction by linear regression on a quantum computer, Phys. Rev. A. 94,
pp. 022342, 2016.
25. Shao, C.P., From linear combination of quantum states to Grover’s searching algorithm, arXiv:1807.09693, 2018.
26. Soklakov, A.N. and Schack, R., Efficient state preparation for a register of quantum bits, Physical review A 73, pp.
012307, 2006.
27. Wang, G.M., Quantum algorithm for linear regression, Phys. Rev. A. 96, pp. 012335, 2017.
28. Wiebe, N., Braun, D. and Lloyd, S., Quantum Algorithm for Data Fitting, Phys. Rev. Lett. 109, pp. 050505, 2012.
29. Wiebe, N., Granade, C., Ferrie, C. and Cory, D.G., Hamiltonian learning and certification using quantum resources.
Phys. Rev. Lett. 112, pp. 190501, 2014.
30. Wossnig, L., Zhao, Z.K. and Prakash, A., A quantum linear system algorithm for dense matrices, Phys. Rev. Lett. 120,
pp. 050502, 2018.
For simplicity, we denote the approximate of sij := hAi• |B•j i obtained from the algorithm as s̃ij . Then |sij − s̃ij | ≤ ǫ.
Denote
n−1 n−1
1 X 1 X
|Ci = Cij |i, ji = kAi• k2 kB•j k2 sij |i, ji,
kCkF i,j=0 kCkF i,j=0
and
n−1
X
e = √1
|Ci kAi• k2 kB•j k2 s̃ij |i, ji
Z i,j=0
n−1
X
Z= kAi• k22 kB•j k22 s̃2ij .
i,j=0
Then
n−1
X n−1
X
|kCk2F − Z| = kAi• k22 kB•j k22 |s2ij − s̃2ij | ≤ 2ǫ kAi• k22 kB•j k22 = 2ǫkAk2F kBk2F ;
i,j=0 i,j=0
√ kCk2F
− Z| 2ǫkAk2F kBk2F
|kCkF − Z| = √ ≤ √ .
kCkF + Z kCkF + Z
16 Changpeng Shao
Finally,
1
n−1
X √ 2
e 22 ≤
k|Ci − |Cik kAi• k 2
2 kB •j k 2
2 s ij Z − s̃ ij kCk F
kCk2F Z i,j=0
2
n−1
X √
≤ 2 kAi• k22 kB•j k22 s̃2ij ( Z − kCkF )2 + Z(sij − s̃ij )2
kCkF Z i,j=0
To make the error is small in size ǫ0 , we need to choose ǫkAk2F kBk2F /kCk2F = ǫ0 . Since the success probability is
e
Z/kAk2F kBk2F ≈ kCk2F /kAk2F kBk2F , finally the complexity will be O(kAk e 3 3 3
F kBkF /kCkF ǫ) = O(kAkF kBkF /kCkF ǫ0 ).
n−1 n−1
1 X 1 X
|Bi = kB•j k2 |B•j i|ji = kB•j k2 αjk |uk i|ji,
kBkF j=0 kBkF
j,k=0
Now denote
n−1 n−1
1 X 1 X
|φi = √ kB•j k2 αjk σ̃k |uk i|ji, |ψi = √ kB•j k2 αjk σk |uk i|ji,
Z j,k=0 W j,k=0
Pn−1 2 2 2
Pn−1 2 2 2 2
where Z = j,k=0 kB•j k2 |αjk | σ̃k and W = j,k=0 kB•j k2 |αjk | σk . Since |σ̃k − σk | ≤ ǫkAkF and kBkF =
Pn−1 2 2
j,k=0 kB•j k2 |αjk | , we have
n−1
X
|Z − W | ≤ kB•j k22 |αjk |2 |σ̃k2 − σk2 | ≤ ǫkAkF kBk2F max |σ̃k + σk |.
0≤k≤n−1
j,k=0
Therefore,
n−1 2
1 X √ √
kφi − |ψik22 = kB•j k2 αjk ( W σ̃k − Zσk )|uk i|ji
ZW
j,k=0 2
2
n−1
X √ √
≤ kB•j k22 |αjk |2 W (σ̃k − σk )2 + ( Z − W )2 σk2 (2.2)
ZW
j,k=0
2ǫ kAk2F kBk2F
2
2ǫ2 kAk2F kBk4F maxk |σ̃k + σk |2
≤ + √ √ .
Z Z( Z + W )2
Quantum Algorithms to Matrix Multiplication 17
By choosing ǫ1 = ǫkAkF small, then σ̃k ≈ σk , so |Z − W | ≤ 2ǫ1 kBk2F maxk σk . Now choosing ǫ2 = ǫ1 kBk2F maxk σk ,
we have Z ≈ W ≤ kBk2F maxk σk2 , so the upper bound given in (2.2) is close to
2ǫ2 kAk2F kBk2F ǫ2 kAk2F kBk4F maxk σk2 2ǫ21 kBk2F ǫ22
+ = 2 +
W W2 kABkF kABk4F
ǫ22 2 1
= +
kABk2F kBk2F maxk σk2 kABk2F
3ǫ22
≤ .
kABk4F
Finally, we set ǫ2 = ǫ3 kABk2F which makes the error between |φi and |ψi smaller than ǫ3 . The probability of getting
|φi in (2.1) equals Z/kBk2F maxk σ̃k2 ≈ kABk2F /kBk2F maxk σk2 . So the complexity to get |φi is
1 X
αjk σ̃k |uk i|0i + orthogonal part
maxk σk
k
e
in time O(kAk F /ǫ1 ). Apply swap test on the state |i, 0i and the above state, we will get a value L in time
e
O(kAk F /ǫ 1 2 such that
ǫ ),
1 X
L− kB•j k2 αjk σ̃k hi|uk i ≤ ǫ2 .
kB•j k2 maxk σk
k
P
Note that cij = k kB•j kαjk σk hi|uk i, so
X
LkB•j k maxk σk − cij ≤ LkB•j k2 max σk − kB•j k2 αjk σ̃k hi|uk i
k
k
X X
+ kB•j k2 αjk σk hi|uk i − kB•j k2 αjk σ̃k hi|uk i
k k
X
≤ ǫ2 kB•j k2 max σk + ǫ1 kB•j k2 |αjk hi|uk i|.
k
k
P
We choose ǫ1 and ǫ2 such that ǫ2 kB•j k2 maxk σk = ǫ3 and ǫ1 kB•j k2 k |αjk hi|uk i| = ǫ3 . Finally, the complexity
is ! !
P
kAkF kAkF kB•j k22 maxk σk k |αjk hi|uk i|
O =O .
ǫ1 ǫ2 ǫ23
This is the complexity to compute cij to accuracy ǫ3 . Therefore, the total complexity to compute all entries of C
equals ! !
kAkF maxk σk X 2 2 nkAkF kB•F k33 maxk σk 2
O kB•j k |αjk hi|uk i| + n =O +n , (3.1)
ǫ23 ǫ23
i,j,k
18 Changpeng Shao
which is due to
X X √ X X √ X X
kB•j k22 |αjk ||hi|uk i| = kB•j k22 |αjk |kuk k1 ≤ n kB•j k22 |αjk | = n kB•j k22 kB•j k1 ≤ n kB•j k3 .
i,j,k j,k j k j j
√
Since kAkF ≥ n mink σk , the above result can be changed into
√ !
nκkAk2F kB•F k33 2
O +n .
ǫ23
If we apply HHL algorithm, then kAF • k should be changed into maxk σk , so the complexity in (3.1) becomes
! ! !
nkB•F k33 maxk σk2 2 nκ2 kB•F k33 mink σk2 2 κ2 kB•F k33 kAk2F 2
O +n =O +n =O +n .
ǫ23 ǫ23 ǫ23
We only consider the case when all given data are real. By measuring (4.1), if we get |1i, then the post measurement
state is
n−1
1 X
|ψ ′′ i = √ bk sin(f (k)t)|ki,
Y k=0
Pn−1
where Y = k=0 |bk sin(f (k)t)|2 . The desired state is
n−1
1 X
|ψ ′ i = √ bk f (k)|ki,
Z k=0
Pn−1
where Z = k=0 |bk f (k)|2 . Then
n−1
1 X 2
n−1
1 X 2 ǫ3
′′ ′
hψ |ψ i = √ bk f (k) sin(f (k)t) ≥ √ bk f (k) f (k)t − 1
Y Z k=0 Y Z k=0 6
1 1 ǫ3 √ 1 √ ǫ3
n−1
ǫ3 X 2
= √ Zt − 1 bk f (k) ≥ √ Zt − 1 Z = √ Zt − 1
YZ 6 YZ 6 Y 6
k=0
1 √ ǫ 3
ǫ 3 2
ǫ κ(f )
≥ √ Zt − 1 = 1 − √1 ≥ 1 − 1 ,
Zt 6 6 Zt 6
where in the second step, we use the fact sin x ≥ x − x3 /6 and |f (k)t| ≤ ǫ1 . In the fourth step, we applies Cauchy-
Schwarz inequality to v v
n−1 un−1 un−1
X uX uX √
b2 f (k) ≤ t
k b2 t b2 f (k)2 = Z.
k k
k=0 k=0 k=0
In the sixth step, we use the fact that Y ≤ Zt2 . And in the final step, we applies |f (k)t| ≥ ǫ0 for all k and
ǫ0 = κ(f )−1 ǫ1 . Therefore, we have
r
′′ ′
p κ(f )
′
k|ψ i − |ψ ik2 = 2(1 − hφ|ψ i) ≤ ǫ1 = O(ǫ),
3
p
if we choose ǫ1 = O(ǫ/ κ(f )).