0% found this document useful (0 votes)
43 views13 pages

Quantum-Inspired Support Vector Machine: Chen Ding, Tian-Yi Bao, and He-Liang Huang

1. The document presents a quantum-inspired classical algorithm for support vector machines (SVM) called quantum-inspired SVM. 2. The algorithm improves on previous fast sampling techniques by proposing "indirect sampling" to sample the kernel matrix and classify data. 3. Theoretical analysis shows the algorithm can perform classification with arbitrary success probability in logarithmic runtime for low rank, low condition number, and high dimensional data matrices, matching the runtime of quantum SVM algorithms.

Uploaded by

fiwav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views13 pages

Quantum-Inspired Support Vector Machine: Chen Ding, Tian-Yi Bao, and He-Liang Huang

1. The document presents a quantum-inspired classical algorithm for support vector machines (SVM) called quantum-inspired SVM. 2. The algorithm improves on previous fast sampling techniques by proposing "indirect sampling" to sample the kernel matrix and classify data. 3. Theoretical analysis shows the algorithm can perform classification with arbitrary success probability in logarithmic runtime for low rank, low condition number, and high dimensional data matrices, matching the runtime of quantum SVM algorithms.

Uploaded by

fiwav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO.

8, AUGUST 2020 1

Quantum-Inspired Support Vector Machine


Chen Ding, Tian-Yi Bao, and He-Liang Huang∗

Abstract—Support vector machine (SVM) is a particularly exponential improvement on previous algorithms [16], which
powerful and flexible supervised learning model that analyzes is a breakthrough that shows how to apply the subsampling
data for both classification and regression, whose usual algorithm strategy based on Alan Frieze, Ravi Kannan, and Santosh
complexity scales polynomially with the dimension of data space
and the number of data points. To tackle the big data challenge, Vempala’s 2004 algorithm [17] to find a low-rank approxi-
a quantum SVM algorithm was proposed, which is claimed mation of a matrix. Subsequently, Tang continued to use same
to achieve exponential speedup for least squares SVM (LS- techniques to dequantize two quantum machine learning algo-
arXiv:1906.08902v5 [cs.LG] 16 Mar 2021

SVM). Here, inspired by the quantum SVM algorithm, we rithms, quantum principal component analysis [18] and quan-
present a quantum-inspired classical algorithm for LS-SVM. In tum supervised clustering [19], and shows classical algorithms
our approach, an improved fast sampling technique, namely
indirect sampling, is proposed for sampling the kernel matrix and could also match the bounds and runtime of the corresponding
classifying. We first consider the LS-SVM with a linear kernel, quantum algorithms, with only polynomial slowdown [20].
and then discuss the generalization of our method to non-linear Later, András Gilyén et al. [21] and Nai-Hui Chia et al.
kernels. Theoretical analysis shows our algorithm can make [22] independently and simultaneously proposed a quantum-
classification with arbitrary success probability in logarithmic inspired matrix inverse algorithm with logarithmic complexity
runtime of both the dimension of data space and the number
of data points for low rank, low condition number and high of matrix size, which eliminates the speedup advantage of
dimensional data matrix, matching the runtime of the quantum the famous Harrow-Hassidim-Lloyd (HHL) algorithm [23]
SVM. on certain conditions. Recently, Juan Miguel Arrazola et al.
Index Terms—Quantum-inspired algorithm, machine learning, studied the actual performance of quantum-inspired algorithms
support vector machine, exponential speedup, matrix sampling. and found that quantum-inspired algorithms can perform well
in practice under given conditions. However, the conditions
should be further reduced if we want to apply the algorithms
I. I NTRODUCTION to practical datasets [24]. All of these works give a very
INCE the 1980s, quantum computing has attracted wide promising future for designing the quantum-inspired algorithm
S attention due to its enormous advantages in solving hard
computational problems [1], such as integer factorization [2]–
in the machine learning area, where matrix inverse algorithms
are universally used.
[4], database searching [5], [6], machine learning [7]–[11] and Support vector machine (SVM) is a data classification algo-
so on [12], [13]. In 1997, Daniel R. Simon offered compelling rithm which is commonly used in machine learning area [25],
evidence that the quantum model may have significantly [26]. Extensive studies have been conducted on SVMs to boost
more complexity theoretic power than the probabilistic Turing and optimize their performance, such as the sequential minimal
machine [14]. However, it remains an interesting question optimization algorithm [27], the cascade SVM algorithm [28],
where is the border between classical computing and quantum and the SVM algorithms based on Markov sampling [29],
computing. Although many proposed quantum algorithms have [30]. These algorithms offer promising speedup either by
exponential speedups over the existing classical algorithms, is changing the way of training a classifier, or by reducing
there any way we can accelerate such classical algorithms to the size of training sets. However, the time complexity of
the same complexity of the quantum ones? current SVM algorithms are all polynomial of data sizes. In
In 2018, inspired by the quantum recommendation sys- 2014, Patrick Rebentrost, Masoud Mohseni and Seth Lloyd
tem algorithm proposed by Iordanis Kerenidis and Anupam proposed the quantum SVM algorithm [31], which can achieve
Prakash [15], Ewin Tang designed a classical algorithm to an exponential speedup compared to the classical SVMs. The
produce a recommendation algorithm that can achieve an time complexity of quantum SVM algorithm is polynomial of
the logarithm of data sizes. Inspired by the quantum SVM al-
This work was supported by the Open Research Fund from State Key gorithm, Tang’s methods [16] and András Gilyén et al.’s work
Laboratory of High Performance Computing of China (Grant No. 201901-01),
National Natural Science Foundation of China under Grants No. 11905294, [21], we propose a quantum-inspired classical SVM algorithm,
and China Postdoctoral Science Foundation. (Corresponding author: He- which also shows exponential speedup compared to previous
Liang Huang. Email: [email protected]) classical SVM for low rank, low condition number and high
Chen Ding is with CAS Centre for Excellence and Synergetic Innovation
Centre in Quantum Information and Quantum Physics, University of Science dimensional data matrix. Both quantum SVM algorithm [31]
and Technology of China, Hefei, Anhui 230026, China. and our quantum-inspired SVM algorithm are least squares
Tian-Yi Bao is with Department of Computer Science, University of Oxford, SVM (LS-SVM), which reduce the optimization problem to
Wolfson Building, Parks Road, OXFORD, OX1 3QD, UK.
He-Liang Huang is with Hefei National Laboratory for Physical Sciences finding the solution of a set of linear equations.
at Microscale and Department of Modern Physics, University of Science and Our algorithm is a dequantization of the quantum SVM
Technology of China, Hefei, Anhui 230026, China, and also with CAS Centre algorithm [31]. In quantum SVM algorithm, the labeled data
for Excellence and Synergetic Innovation Centre in Quantum Information and
Quantum Physics, University of Science and Technology of China, Hefei, vectors (𝑥 𝑗 for 𝑗 = 1, ..., 𝑚) are mapped to quantum vec-
Í
Anhui 230026, China. tors |𝑥 𝑗 i = 1/|𝑥 𝑗 | (𝑥 𝑗 ) 𝑘 |𝑘i via a quantum random access
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2020 2

memory (qRAM) and the kernel matrix is prepared using TABLE I


quantum inner product evaluation [19]. Then the solution of T HE N OTATIONS
SVM is found by solving a linear equation system related
Symbol Meaning
to the quadratic programming problem of SVM using the
quantum matrix inversion algorithm [23]. In our quantum- 𝐴 matrix 𝐴
inspired SVM, the labeled vectors are stored in an arborescent 𝑦 vector 𝑦 or matrix 𝑦 with only one column
data structure which provides the ability to random sampling 𝐴+ pseudo inverse of 𝐴
𝐴𝑇 transpose of 𝐴
within logarithmic time of the vector lengths. By performing
𝐴+𝑇 transpose of pseudo inverse of 𝐴
sampling on these labeled vectors both by their numbers 𝐴𝑖,∗ 𝑖-th row of 𝐴
and lengths to get a much smaller dataset, we then find 𝐴∗, 𝑗 𝑗-th column of 𝐴
the approximate singular value decomposition of the kernel k 𝐴k 2-operator norm of 𝐴
matrix. And finally, we solve the optimization problem and k 𝐴k 𝐹 Frobenius norm of 𝐴
perform classification based on the solved parameters. 𝑄 ( ·) time complexity for querying an element of ·
Our methods, particularly the sampling technique, is based 𝐿 ( ·) time complexity for sampling an element of ·
on [16], [21]. However, the previous sampling techniques
cannot be simply copied to solve the SVM tasks, since we
don’t have an efficient direct sampling access to the kernel According to [26], the optimization problem of LS-SVM
matrix we want to perform matrix inversion on (see Section II- with linear kernel is
B for a more detailed explanation). Hence we have developed 1
𝑚
𝛾 ∑︁ 2
an indirect sampling technique to solve such problem. In the min L1 (𝑤, 𝑏, 𝑒) = 𝑤𝑇 𝑤 + 𝑒 ,
𝑤 ,𝑏,𝑒 2 2 𝑘=1 𝑘
whole process, we need to avoid the direct multiplication on
the vectors or matrices with the same size as the kernel, in subject to 𝑦 𝑘 (𝑤𝑇 𝑥 𝑘 + 𝑏) = 1 − 𝑒 𝑘 , 𝑘 = 1, . . . , 𝑚.
case losing the exponential speedup. We first consider the
Take 𝑏 = 0, we get
LS-SVM with linear kernels, no regularization terms and no
𝑚
bias of the classification hyperplane, which could be regarded 1 𝑇 𝛾 ∑︁ 2
min L2 (𝑤, 𝑒) = 𝑤 𝑤+ 𝑒 ,
as the prototype for quantum-inspired techniques applied in 𝑤 ,𝑒 2 2 𝑘=1 𝑘
various SVMs. Then we show that the regularization terms
can be easily included in the algorithm in Section III. Finally, subject to 𝑦 𝑘 𝑤𝑇 𝑥 𝑘 = 1 − 𝑒 𝑘 , 𝑘 = 1, . . . , 𝑚.
we discuss the generalization of our method to non-linear One defines the Lagrangian
kernels in Section VII-C and the general case without the 𝑚
∑︁
constraint on biases of classification hyperplanes in Section ℒ(𝑤, 𝑒, 𝜇) = L2 (𝑤, 𝑒) − 𝜇 𝑘 (𝑦 𝑘 𝑤𝑇 𝑥 𝑘 − 1 + 𝑒 𝑘 ).
VII-D. Theoretical analysis shows that our quantum-inspired 𝑘=1
SVM can achieve exponential speedup over existing classical The condition for optimality
algorithms under several conditions. Experiments are carried
𝑚
out to demonstrate the feasibility of our algorithm. The indirect 𝜕ℒ ∑︁
= 0 →𝑤 = 𝜇𝑘 𝑦 𝑘 𝑥𝑘 ,
sampling developed in our work opens up the possibility of 𝜕𝑤 𝑘=1
a wider application of the sampling methods into the field of 𝜕ℒ
machine learning. = 0 →𝜇 𝑘 = 𝛾𝑒 𝑘 , 𝑘 = 1, . . . , 𝑚,
𝜕𝑒 𝑘
𝜕ℒ
II. PRELIMINARY = 0 →𝑦 𝑘 𝑤𝑇 𝑥 𝑘 − 1 + 𝑒 𝑘 = 0, 𝑘 = 1, . . . , 𝑚
𝜕𝜇 𝑘
A. Notations
can be written as the solution to the following set of linear
We list some matrix-related notations used in this paper. equations 𝑍 𝑇 𝑍 𝜇 + 𝛾 −1 𝜇 = 1, where 𝑍 = (𝑥1 𝑦 1 , . . . , 𝑥 𝑚 𝑦 𝑚 ).
Let 𝛼 𝑘 = 𝜇 𝑘 𝑦 𝑘 , we have
B. Least squares SVM
(𝑋 𝑇 𝑋 + 𝛾 −1 𝐼)𝛼 = 𝑦. (1)
Suppose we have 𝑚 data points {(𝑥 𝑗 , 𝑦 𝑗 ) : 𝑥 𝑗 ∈ R𝑛 , 𝑦 𝑗 =
±1} 𝑗=1,...,𝑚 , where 𝑦 𝑗 = ±1 depending on the class which 𝑥 𝑗 Once 𝛼 is solved, the classification hyperplane will be 𝑥𝑇 𝑋𝛼 =
belongs to. Denote (𝑥 1 , ..., 𝑥 𝑚 ) by 𝑋 and (𝑦 1 , . . . , 𝑦 𝑚 )𝑇 by 0. Given query point 𝑥, we evaluate sgn(𝑥𝑇 𝑋𝛼) to make
𝑦. A SVM finds a pair of parallel hyperplanes 𝑥 · 𝑤 + 𝑏 = ±1 classification.
that divides the points into two classes depending on the given We use our sampling techniques in solving Equation (1) and
data. Then for any new input points, it can make classification evaluating sgn(𝑥𝑇 𝑋𝛼) to avoid time complexity overhead of
by its relative position with the hyperplanes. poly(𝑚) or poly(𝑛), which will kill the wanted exponential
We make the following assumption on the dataset so as to speedup. Note that the quantum-inspired algorithm for linear
simplify the problem: Assume these data points are equally equations [21], [22] may inverse a low-rank matrix in loga-
distributed on both sides of a hyperplane that passes through rithmic runtime. However, such algorithm cannot be invoked
the origin and their labels are divided by such hyperplane. directly to solve Equation (1) here, since the complexity of
Thus we assume 𝑏 = 0. An generalized method for 𝑏 ≠ 0 is direct computing the matrix 𝑋 𝑇 𝑋 + 𝛾 −1 𝐼 is polynomial, which
discussed in Section VII-D. would once again kill the exponential speedup. Thus we
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2020 3

need to develop the indirect sampling technique to efficiently algorithm is designed for low-rank datasets, while the algo-
perform matrix inversion on 𝑋 𝑇 𝑋 + 𝛾 −1 𝐼 with only sampling rithms based on Markov sampling [29], [30] may work well on
access of 𝑋. the datasets that the columns form a uniformly ergodic Markov
chain. In our algorithm, to achieve exponential speedup, the
sampling technique is different from Markov sampling: (i) We
C. The sampling technique
sample both the rows and columns of matrix, rather than only
We show the definition and idea of our sampling method sampling columns. (ii) We sample each elements according to
to get indices, elements or submatrices, which is the key norm-squared probability distribution. (iii) In each dot product
technique used in our algorithm, as well as in [16], [17], [21]. calculation (Alg. 1), we use sampling technique to avoid
Definition 1 (Sampling on vectors). Suppose 𝑣 ∈ C𝑛 , define operations with high complexity.
𝑞 (𝑣) as a probability distribution that:
D. The preliminary algorithms
(𝑣) |𝑣 𝑖 | 2
𝑥∼𝑞 : P[𝑥 = 𝑖] = . We invoke two algorithms employing sampling techniques
k𝑣k 2 for saving complexity from [21]. They are treated as oracles
Picking an index according to the probability distribution 𝑞 (𝑣) that outputs certain outcomes with controlled errors in the
is called a sampling on 𝑣. main algorithm. Lemma 1 and Lemma 2 shows their correct-
ness and efficiency. For the sake of convenience, some minor
Definition 2 (Sampling the indices from matrices). Suppose changes on the algorithms and lemmas are made.
𝐴 ∈ C𝑛×𝑚 , define 𝑞 ( 𝐴) as a 2-dimensional probability distri- 1) Trace inner product estimation: Alg. 1 achieves calcula-
bution that: tion of trace inner products with logarithmic time on the sizes
| 𝐴𝑖 𝑗 | 2 of the matrices.
(𝑥, 𝑦) ∼ 𝑞 (𝑣) : P[𝑥 = 𝑖, 𝑦 = 𝑗] = .
k 𝐴k 2𝐹 Algorithm 1 Trace Inner Product Estimation.
Picking a pair of indices (𝑖, 𝑗) according to the probability Input: 𝐴 ∈ C𝑚×𝑛 that we have sampling access in complexity
distribution 𝑞 ( 𝐴) is called a sampling on 𝐴. 𝐿 ( 𝐴) and 𝐵 ∈ C𝑛×𝑚 that we have query access in
complexity 𝑄(𝐵). Relative error bound 𝜉 and success
Definition 3 (Sampling the submatrices from matrices). Sup- probability bound 1 − 𝜂.
pose the target is to sample a submatrix 𝑋 00 ∈ C𝑐×𝑟 from 𝑋 ∈ Goal: Estimate Tr[ 𝐴𝐵].
2
C𝑛×𝑚 . First we sample 𝑟 times on the vector (k 𝑋∗, 𝑗 k) 𝑗=1,...,𝑚 1: Repeat step 2 d6 log2 ( 𝜂 )e times and take the median of
and get column indices 𝑗1 , ..., 𝑗𝑟 . The columns 𝑋∗, 𝑗1 , ..., 𝑋∗, 𝑗𝑟 𝑌 , noted as 𝑍.
form submatrix 𝑋 0. Then we sample 𝑐 times on the 𝑗-th column 2: Repeat step 3 d 92 e times and calculate the mean of 𝑋,
𝜉
of 𝑋 and get row indices 𝑖1 , ..., 𝑖 𝑐 . In each time the 𝑗 is sampled noted as 𝑌 .
uniformly at random from 𝑗1 , ..., 𝑗𝑟 . The rows 𝑋𝑖01 ,∗ , ..., 𝑋𝑖0𝑐 3: Sample 𝑖 from row norms of 𝐴. Sample 𝑗 from 𝐴𝑖 . Let
k 𝐴k 2
form submatrix 𝑋 00. The matrices 𝑋 0 and 𝑋 00 are normalized 𝑋 = 𝐴𝑖 𝑗𝐹 𝐵 𝑗𝑖 .
so that E[𝑋 0 𝑋 0𝑇 ] = 𝑋 𝑋 𝑇 and E[𝑋 00𝑇 𝑋 00] = 𝑋 0𝑇 𝑋 0. Output: 𝑍.
The process of sampling the submatrices from matrices (as
described in Def. 3) is shown in Fig. 1. To put it simple, it is
taking several rows and columns out of the matrix by a random Lemma 1 [21]. Suppose that we have length-square sampling
choice decided by the “importance” of the elements. Then access to 𝐴 ∈ C𝑚×𝑛 and query access to the matrix 𝐵 ∈ C𝑛×𝑚
normalize them so that they are unbiased from the original in complexity 𝑄(𝐵). Then we can estimate Tr[ 𝐴𝐵] to precision
rows and columns. 𝜉 k 𝐴k 𝐹 k𝐵k 𝐹 with probability at least 1 − 𝜂 in time
To achieve fast sampling, we usually store vectors in an 
log(1/𝜂)

arborescent data structure (such as binary search tree) as 𝑂 (𝐿( 𝐴) + 𝑄(𝐵)) .
𝜉2
suggested in [16] and store matrices by a list of their row trees
or column trees. Actually, the sampling is an analog of quan-
tum states measurements. It only reveals a low-dimensional Algorithm 2 Rejection sampling.
projection of vectors and matrices in each calculation. Rather Input: 𝐴 ∈ C𝑚×𝑛 that we have length-square sampling access
than computing with the whole vector or matrix, we choose and 𝑏 ∈ C𝑛 that we have norm access and 𝑦 = 𝐴𝑏 that
the most representative elements of them for calculation with we have query access.
a high probability (we choose the elements according to Goal: Sample from length-square distribution of 𝑦 = 𝐴𝑏.
the probability of their squares, which is also similar to 1: Take 𝐷 ≥ k𝑏k 2 .
the quantum measurement of quantum states.). The sampling 2: Sample a row index 𝑖 by row norm squares of 𝐴.
technique we use has the advantage of unbiasedly representing | 𝐴𝑖,∗ 𝑏 | 2
3: Query |𝑦 𝑖 | 2 = | 𝐴𝑖,∗ 𝑏| 2 and calculate .
𝐷 k 𝐴𝑖,∗ k 2
the original vector while consuming less computing resources. 4: Sample a real number 𝑥 uniformly distributed in [0, 1]. If
We note that there are other kinds of sampling methods | 𝐴 𝑏 |2
𝑥 < 𝐷 k 𝑖,∗
𝐴𝑖,∗ k 2
, output 𝑖, else, go to step 2.
for SVM such as the Markov sampling [29], [30]. Different
Output: The row index 𝑖.
sampling methods may work well on different scenarios. Our
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2020 4

i2 i1 i3 i4 i1 i2 i3 i4
-0.25 +0.21 -0.21 +0.30 +0.26 +0.24 +0.06 -0.15 +0.21 -0.25 -0.21 -0.21 +0.31 -0.31 -0.31 -0.31
+0.08 -0.06 +0.07 -0.09 -0.09 -0.07 -0.03 +0.04 -0.06 +0.08 +0.07 +0.07 -0.10 +0.10 +0.11 +0.11
+0.00 -0.00 -0.01 +0.01 -0.00 +0.00 -0.00 +0.00 -0.00 +0.00 -0.01 -0.01 -0.00 +0.01 -0.01 -0.01
-0.05 +0.04 -0.05 +0.05 +0.04 +0.04 +0.01 -0.04 sample columns +0.04 -0.05 -0.05 -0.05 renormalization X= +0.06 -0.06 -0.07 -0.07
X= -0.19 +0.19 +0.18 +0.18
+0.15 -0.13 +0.12 -0.18 -0.17 -0.16 -0.04 +0.08 -0.13 +0.15 +0.12 +0.12
-0.22 +0.18 -0.19 +0.27 +0.23 +0.22 +0.06 -0.14 +0.18 -0.22 -0.19 -0.19 +0.28 -0.28 -0.29 -0.29
-0.08 +0.05 -0.07 +0.08 +0.08 +0.07 +0.02 -0.04 +0.05 -0.08 -0.07 -0.07 +0.08 -0.10 -0.10 -0.10
+0.10 -0.09 +0.08 -0.12 -0.10 -0.09 -0.02 +0.05 -0.09 +0.10 +0.08 +0.08 -0.14 +0.13 +0.12 +0.12

=
+0.31 -0.31 -0.31 -0.31 j2
-0.10 +0.10 +0.11 +0.11 j3
-0.00 +0.01 -0.01 -0.01
+0.29 -0.28 -0.29 -0.29 +0.28 -0.28 -0.29 -0.29 j1
renormalization sample rows +0.06 -0.06 -0.07 -0.07
X= +0.29 -0.29 -0.29 -0.29 +0.31 -0.31 -0.31 -0.31 j2
-0.27 +0.28 +0.30 +0.30 -0.10 +0.10 +0.11 +0.11 j3 -0.19 +0.19 +0.18 +0.18
+0.28 -0.28 -0.29 -0.29 j1
+0.08 -0.10 -0.10 -0.10
-0.14 +0.13 +0.12 +0.12

Fig. 1. A demonstration of sampling submatrices from matrices (The process described in Def. 3, which is also Step 2 and Step 3 in Alg. 3.). We sample
columns from 𝑋 to get 𝑋 0 and sample rows from 𝑋 0 to get 𝑋 00 . Note that 𝑋 0 and 𝑋 00 are normalized such that E[𝑋 0 𝑋 0𝑇 ] = 𝑋 𝑋 𝑇 and E[𝑋 00𝑇 𝑋 00 ] = 𝑋 0𝑇 𝑋 0 .

2) Rejection sampling: Alg. 2 achieves sampling of a vector Algorithm 3 Quantum-inspired SVM Algorithm.
that we do not have full query access in time logarithmic of 3: Sample rows: Sample 𝑠 ∈ [𝑟] uniformly, then sample a
|𝑋 0 | 2
its length. row index 𝑗 distributed as k𝑋 0𝑗𝑠 k 2 . Sample a total number
∗,𝑠
of 𝑐 row indices 𝑗1 ,0 𝑗2 , ..., 𝑗 𝑐 this way. Define 𝑋 00 whose
𝑋
Lemma 2 [21]. Suppose that we have length-square sampling 𝑡-th row is k𝑋√𝑐k 𝐹 k𝑋 0𝑗𝑡 ,∗ k . Define 𝐴 00 = 𝑋 00𝑇 𝑋 00.
𝑗𝑡 ,∗
access to 𝐴 ∈ C𝑚×𝑛 having normalized rows, and we are 4: Spectral decomposition: Calculate the spectral decompo-
given 𝑏 ∈ C𝑛 . Then we can implement queries to the vector sition of 𝐴 00. Denote here by 𝐴 00 = 𝑉 00Σ2𝑉 00𝑇 . Denote the
𝑦 := 𝐴𝑏 ∈ C𝑛 with complexity 𝑄(𝑦) = 𝑂 (𝑛𝑄( 𝐴)) and we can calculated eigenvalues by 𝜎𝑙2 , 𝑙 = 1, . . . , 𝑘.
length-square sample from 𝑞 ( 𝑦) with complexity
 𝐿 (𝑦) such that 5: Approximate eigenvectors: Let 𝑅 = 𝑋 0𝑇 𝑋. Define 𝑉˜𝑙 =
𝑛 k𝑏 k 2 𝑅𝑇 𝑉𝑙00
E[𝐿(𝑦)] = 𝑂 k 𝑦 k 2 (𝐿 ( 𝐴) + 𝑛𝑄( 𝐴)) . for 𝑙 = 1, ..., 𝑘, 𝑉˜ = (𝑉˜𝑙 )𝑙=1,...,𝑘 .
𝜎𝑙2
6: Estimate matrix elements: Calculate 𝜆˜𝑙 = 𝑉˜ 𝑇 𝑦 to pre- 𝑙
3𝜖 𝜎𝑙2
cision √k𝑦k by Alg. 1, each with success probability
III. Q UANTUM - INSPIRED SVM A LGORITHM 16 𝑘
Í 𝑘 𝜆˜𝑙 00
𝜂
1 − 4𝑘 . Let 𝑢 = 𝑙=1 𝑉 .
𝜎𝑙4 𝑙
We show the main algorithm (Alg. 3) that makes classifica- 7: Find query access: Find query access of 𝛼 ˜ = 𝑅˜𝑇 𝑢
tion as the classical SVMs do. Note that actual calculation ˜ ˜
by 𝛼˜ 𝑝 = 𝑢 𝑅∗, 𝑝 , in which 𝑅𝑖 𝑗 is calculated to pre-
𝑇
only happens when we use the expression "calculate" in 𝜖 𝜅2
cision 4 k𝑋 k 𝐹 by Alg. 1, each with success probability
this algorithm. Otherwise it will lose the exponential-speedup
1 − 4 d864/𝜖 2𝜂log(8/𝜂) e .
advantage for operations on large vectors or matrices. 𝛾 is
temporarily taken as ∞. Fig. 2 shows the algorithm process. 8: Find sign: Calculate 𝑥𝑇 𝑋 𝛼 ˜ to precision 𝜖4 k𝛼k k𝑥k with
𝜂
success probability 1 − 4 by Alg. 1. Tell its sign.
Algorithm 3 Quantum-inspired SVM Algorithm. Output: The answer class depends on the sign. Positive
corresponds to 1 while negative to −1.
Input: 𝑚 training data points and their labels {(𝑥 𝑗 , 𝑦 𝑗 ) : 𝑥 𝑗 ∈
R𝑛 , 𝑦 𝑗 = ±1} 𝑗=1,...,𝑚 , where 𝑦 𝑗 = ±1 depending on the
class to which 𝑥 𝑗 belongs. Error bound 𝜖 and success
The following theorem states the accuracy and time com-
probability bound 1 − 𝜂. 𝛾 set as ∞.
plexity of quantum-inspired support vector machine algorithm,
Goal 1: Find 𝛼˜ that k 𝛼˜ − 𝛼k ≤ 𝜖 k𝛼k with success probability
from which we conclude the time complexity 𝑇 depends
at least 1 − 𝜂, in which 𝛼 = (𝑋 𝑇 𝑋) + 𝑦.
polylogarithmically on 𝑚, 𝑛 and polynomially on 𝑘, 𝜅, 𝜖, 𝜂. It
Goal 2: For any given 𝑥 ∈ R𝑛 , find its class.
is to be proved in section IV and section V.
1: Init: Set 𝑟, 𝑐 as described in (6) and (7).
2: Sample columns: Sample 𝑟 column indices 𝑖 1 , 𝑖 2 , ..., 𝑖𝑟 Theorem 1. Given parameters 𝜖 > 0, 0 < 𝜂 < 1, and given the
k𝑋 k 2
according to the column norm squares k𝑋∗,𝑖k 2 . Define 𝑋 0 data matrix 𝑋 with size 𝑚 × 𝑛, rank 𝑘, norm 1, and condition
𝐹
k𝑋 k 𝐹 𝑋∗,𝑖𝑠 number 𝜅, the quantum-inspired SVM algorithm will find the
to be the matrix whose 𝑠-th column is √
𝑟 k𝑋∗,𝑖𝑠 k
. Define
√ 𝑥 𝑋𝛼 for any vector 𝑥 ∈ C with
classification expression 𝑇 𝑛
𝐴 0 = 𝑋 0𝑇 𝑋 0. 2
error less than 𝜖 𝜅 𝑚k𝑥k, success probability higher than
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2020 5

n×m n×r c×r


Step 2 Step 3
X X0 X 00
Sample Columns Sample Rows

m×m E[X 0 X 0T ] = XX T r×r E[A00 ] = A0 r×r


A = XT X A0 = X 0T X 0 A00 = X 00T X 00
Step 4
reduce columns r×m reduce rows
R = X 0T X σl2 , Vl00

Pk Step 5
1
α̃ = RT u ≈ A−1 y u= λl 00
l=1 σl4 Vl Ṽl = σl2
RT Vl00
Step 7 Step 6

Step 8

sgn(xT X α̃)

Fig. 2. The quantum-inspired SVM algorithm. In the algorithm, the subsampling of 𝐴 is implemented by subsampling the matrix 𝑋 (Step 1-3), which is
called the indirect sampling technique. After the indirect sampling, we perform the spectral decomposition (Step 4). Then we estimate the approximation of
the eigenvectors (𝑉˜𝑙 ) of 𝐴 (Step 5). Finally, we estimate the classification expression (Step 6-8).

Í 𝑘 𝜆𝑙 ˜ ˜ −2𝑉˜ 𝑇 𝑦, in which
1 − 𝜂 and time complexity 𝑇 (𝑚, 𝑛, 𝑘, 𝜅, 𝜖, 𝜂). Let 𝛼 = (𝑋 𝑇 𝑋) + 𝑦, 𝛼 0 = 𝑙=1 𝑉 = 𝑉Σ
𝜎𝑙2 𝑙
Í 𝑘 𝜆˜ 𝑙 ˜
𝜆𝑙 = 𝑉˜𝑙𝑇 𝑦 and 𝛼 00 = 𝑙=1 𝑉 . Then the total error of the
𝑇 = 𝑂 (𝑟 log2 𝑚 + 𝑐𝑟 log2 𝑛 + 𝑟 3 𝜎2 𝑙 𝑙
classification expression is 1
k 𝑋 k 2𝐹 𝑘 2
8𝑘
+ log2 (
) (log2 (𝑚𝑛) + 𝑘)
𝜖2 𝜂
1 1 2 k 𝑋 k4 𝐸 = Δ(𝑥𝑇 𝑋𝛼)
+ 2 log2 (log2 (𝑚𝑛) + 𝑟 𝑘 log2 ( ) 2 𝐹 log2 (𝑚𝑛))),
𝜖 𝜂 𝜂1 𝜖 1 𝑟 ≤ |𝑥𝑇 𝑋 (𝛼 − 𝛼)|
˜ + Δ(𝑥𝑇 𝑋 𝛼)
˜
≤ k𝑥k (k𝛼 − 𝛼 0 k + k𝛼 0 − 𝛼 00 k + k𝛼 00 − 𝛼k)
˜ + Δ(𝑥𝑇 𝑋 𝛼)
˜
in which
𝜖 k𝑥k Denote 𝐸 1 = k𝑥k k𝛼 0 −𝛼k, 𝐸 2 = k𝑥k k𝛼 00 −𝛼 0 k, 𝐸 3 = k𝑥k k 𝛼−
˜
𝜖1 = √ 36 , 𝛼 00 k, ˜ Our target is to show each of them is
𝐸 4 = Δ(𝑥𝑇 𝑋 𝛼).
2 𝑟 d 𝜖 2 e d6 log2 ( 16
𝜂 )e no more than 𝜖4 k𝛼k k𝑥k with probability no less than 1 − 𝜂4 .
So that
𝜂
𝜂1 = .
8𝑟 d 36 e d6 log2 ( 16 𝐸 ≤ 𝐸1 + 𝐸2 + 𝐸3 + 𝐸4
𝜖2 𝜂 )e √
≤ 𝜖 𝜅 2 𝑚k𝑥k,
In Alg. 3, 𝛾 is set as ∞, which makes the coefficient matrix
𝐴 = 𝑋 𝑇 𝑋. Notice that the eigenvectors of 𝑋 𝑇 𝑋 + 𝛾 −1 𝐼 and with success probability no less than 1 − 𝜂.
𝑋 𝑇 𝑋 are the same, and the difference of their eigenvalues are 𝐸 1 represents the error introduced by subsampling and
𝛾 −1 . Thus the algorithm can be easily extended to be applied eigenvector approximation (i.e., Step 1-5 in Alg. 3). The fact
to the coefficient matrix 𝑋 𝑇 𝑋 + 𝛾 −1 𝐼 with arbitrary 𝛾, by just that it is less than 𝜖4 k𝛼k k𝑥k with probability no less than 1− 𝜂4
simply adding 𝛾 −1 to the calculated eigenvalues in Step 4. is shown in subsection IV-B.
𝐸 2 represents the error introduced by approximation on 𝜆𝑙
(i.e., Step 6 in Alg. 3). The fact that it is less than 𝜖4 k𝛼k k𝑥k
IV. ACCURACY with probability no less than 1 − 𝜂4 is shown in subsection
IV-A.
We prove that the error of computing the classification 𝐸 3 represents the error introduced in query of 𝑅 and 𝛼. The
expression 𝑥𝑇 𝑋√𝛼˜ in the quantum-inspired SVM algorithm will fact that it is less than 𝜖4 k𝛼k k𝑥k with probability no less than
not exceed 𝜖 𝜅 2 𝑚k𝑥k. We take 𝛾 = ∞ in the analysis because 1 − 𝜂4 is guaranteed by Step 7 of Alg. 3.
adding 𝛾 −1 to the eigenvalues won’t cause error and thus the
1 For any expression 𝑓 , Δ( 𝑓 ) represents the difference of the exact value
analysis is the same in the case of 𝛾 ≠ ∞. We first show how
of 𝑓 and the value calculated by the estimation algorithms Alg. 1 and Alg. 3
to break the total error into multiple parts, and then analyze (These two algorithms cannot get the exact values because randomness is
each part in the subsections. introduced.).
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2020 6

When a submatrix 𝑋 00 is randomly subsampled from 𝑋 0, it


A Ṽ is a matrix of multiple random variables. Theorem 2 is the
Thm 6
Chebyshev’s Inequality for 𝑋 00. It points out that the operator
norm distance between 𝑋 0𝑇 𝑋 0 and 𝑋 00𝑇 𝑋 00 is short with a high
probability.
Thm 2 Ṽ = (Ṽl )l=1,...,k
Thm 4
Theorem 3. Suppose the columns of matrix 𝑉 00, denoted as
𝑉𝑙00, 𝑙 = 1, . . . , 𝑘, are orthogonal normalized vectors while
A′ Ṽl 𝑘
∑︁
𝐴 00 = 𝜎𝑙2𝑉𝑙00𝑉𝑙00𝑇 .
𝑙=1

Ṽl = 1
RT Vl′′ Suppose k 𝐴0 − 𝐴 00 k ≤ 𝛽. Then ∀𝑖, 𝑗 ∈ {1, ..., 𝑟 },
Thm 2 Thm 3 σl2
|𝑉𝑖00𝑇 𝐴 0𝑉 𝑗00 − 𝛿𝑖 𝑗 𝜎𝑖2 | ≤ 𝛽.

Eigenvectors Theorem 3 points out that if matrix 𝐴 0 and 𝐴 00 are close


A ′′ Vl′′ in operator norm sense, 𝐴 00’s eigenvectors will approximately
work as eigenvectors for 𝐴 0 too.
Fig. 3. The whole procedure of proving k 𝑉˜ Σ−2 𝑉˜ 𝑇 𝐴 − 𝐼𝑚 k ≤ 𝜖2 . Thm 2 Theorem 4. Suppose the columns of matrix 𝑉 00, denoted as
shows the difference among 𝐴 and the subsampling outcomes 𝐴0 and 𝐴00 .
Thm 3 shows the relation between 𝐴0 and 𝑉𝑙00 . Thm 4 shows the relation 𝑉𝑙00, 𝑙 = 1, . . . , 𝑘, are orthogonal normalized vectors while
between 𝐴 and 𝑉˜𝑙 . Thm 6 shows the final relation between 𝐴 and 𝑉˜ .
|𝑉𝑖00𝑇 𝐴 0𝑉 𝑗00 − 𝛿𝑖 𝑗 𝜎𝑖2 | ≤ 𝛽, ∀𝑖, 𝑗 ∈ {1, ..., 𝑟 }.

Suppose k 𝑋 𝑋 𝑇 − 𝑋 0 𝑋 0𝑇 k ≤ 𝜖 0, k 𝑋 k ≤ 1, 1𝜅 ≤ 𝜎𝑖2 ≤ 1 and the


𝐸 4 represents the error caused by Alg. 1 in estimating 𝑥𝑇 𝑋 𝛼˜ 𝑅𝑇 𝑉𝑙00
as the footnote1 suggests. The fact that it is less than 𝜖4 k𝛼k k𝑥k condition of Thm 3 suffices. Let 𝑉˜𝑙 = 2 , then 𝜎𝑙
with probability no less than 1 − 𝜂4 is guaranteed by Step 8 of
Alg. 3. |𝑉˜𝑖𝑇 𝑉˜ 𝑗 − 𝛿𝑖 𝑗 | ≤ 𝜅 𝛽 + 2𝜅𝛽 + 𝜅 2 𝜖 0 k 𝑋 k 2𝐹 ,
2 2

For achieving accurate classification, we only need a relative


and
error 𝑥𝑇𝐸𝑋 𝛼 less than 1. Thus by lessening 𝜖, we can achieve
this goal in any given probability range. |𝑉˜𝑖𝑇 𝐴𝑉˜ 𝑗 − 𝛿𝑖 𝑗 𝜎𝑖2 | ≤ (2𝜖 0 + 𝛽k 𝑋 k 2𝐹 ) k 𝑋 k 2𝐹 𝜅 2 .

A. Proof of 𝐸 2 ≤ 𝜖
k𝛼k k𝑥k in which 𝐴 0 = 𝑋 0𝑇 𝑋 0, 𝐴 = 𝑋 𝑇 𝑋.
4
Notice that Theorem 4 points out that if 𝐴 00’s eigenvectors approxi-
mately work as eigenvectors for 𝐴 0 and k 𝑋 𝑋 𝑇 − 𝑋 0 𝑋 0𝑇 k ≤ 𝜖 0,
𝐸 3 = k𝑥k k𝛼 − 𝛼 0 k
𝑉˜𝑙𝑇 approximately work as eigenvectors for 𝐴.
˜ −2𝑉˜ 𝑇 𝐴𝛼k
= k𝑥k k𝛼 − 𝑉Σ
Theorem 5 [21]. If rank(𝐵) ≤ 𝑘, 𝑉˜ has 𝑘 columns that spans
˜ −2𝑉˜ 𝑇 𝐴 − 𝐼𝑚 k.
≤ k𝛼k k𝑥k k𝑉Σ the row and column space of 𝐵, then
Here we put 5 theorems (from 2 to 6) to prove k𝑉Σ ˜ −2𝑉˜ 𝑇 𝐴− ˜ + k k𝑉˜ 𝑇 𝐵𝑉˜ k.
𝜖
k𝐵k ≤ k (𝑉˜ 𝑇 𝑉)
𝐼𝑚 k ≤ 4 , in which theorem 2 and 5 are invoked from [21]. We
offer proofs for Theorem 3,4 and 6 in appendix A. The purpose Under the condition that 𝑉˜𝑙𝑇 approximately work as eigen-
of these theorems is to show that 𝑉Σ ˜ −2𝑉˜ 𝑇 is functionally close ˜ −2𝑉˜ 𝑇
vectors for 𝐴, the following Theorem 6 points out that 𝑉Σ
to the inverse of matrix A, as k𝑉Σ ˜ −2𝑉˜ 𝑇 𝐴 − 𝐼𝑚 k ≤ 𝜖 suggests. is functionally close to the inverse of matrix A.
4
Theorem 2 states the norm distance between 𝐴, 𝐴 0 and 𝐴 00.
According to the norm distance, and the fact that 𝑉𝑙00 are the
Theorem 6. If ∀𝑖, 𝑗 ∈ {1, . . . , 𝑘 },
eigenvectors of 𝐴 00, Theorem 3 finds the relation between 𝐴 0
and 𝑉𝑙00. We define 𝑉˜𝑙 = 𝜎12 𝑅𝑇 𝑉𝑙00, and Theorem 6 finally 1
𝑙 |𝑉˜𝑖𝑇 𝑉˜ 𝑗 − 𝛿𝑖 𝑗 | ≤ , (2)
gives the relation between 𝐴 and 𝑉. ˜ The procedure is shown 4𝑘
in Fig. 3. |𝑉˜𝑖𝑇 𝐴𝑉˜ 𝑗 − 𝛿𝑖 𝑗 𝜎𝑖2 | ≤ 𝜁,
Theorem 2 [21]. Let 𝑋 0 ∈ C𝑛×𝑟 , 𝑋 00 ∈ C𝑐×𝑟 is the sampling
and the condition of Thm 4 suffices. Then
outcome of 𝑋 0. Suppose 0𝑋 00 is normalized that E[𝑋 00𝑇 𝑋 00] =
k𝑋 k
𝑋 0𝑇 𝑋 0, then ∀𝜖 ∈ [0, k𝑋 0 k ], we have ˜ −2𝑉˜ 𝑇 𝐴 − 𝐼𝑚 k ≤ 5
𝐹 k𝑉Σ 𝜅𝑘 𝜁 .
𝜖 2𝑐
3
P k 𝑋 0𝑇 𝑋 0 − 𝑋 00𝑇 𝑋 00 k ≥ 𝜖 k 𝑋 0 k k 𝑋 0 k 𝐹 ≤ 2𝑟𝑒 − 4 .
 
To conclude, for P[k𝛼 0 − 𝛼k > 𝜖
4 k𝛼k] ≤ 𝜂
4, we need to pick
4 log2 ( 2𝑟 0
𝜖 and 𝛽 such that
𝜂 )
Hence, for 𝑐 ≥ 𝜖2
, with probability at least 1 − 𝜂 we
have 1
𝜅 2 𝛽2 + 2𝜅𝛽 + 𝜅 2 𝜖 0 k 𝑋 k 2𝐹 ≤ , (3)
k 𝑋 0𝑇 𝑋 0 − 𝑋 00𝑇 𝑋 00 k ≤ 𝜖 k 𝑋 0 k k 𝑋 0 k 𝐹 . 4𝑘
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2020 7

(2𝜖 0 + 𝛽k 𝑋 k 2𝐹 ) k 𝑋 k 2𝐹 𝜅 2 ≤ 𝜁, (4) B. The spectral decomposition


Step 4 is the spectral decomposition. For 𝑟 × 𝑟 symmet-
5 𝜖 ric matrix 𝐴, the fastest classical spectral decomposition is
𝜅𝑘 𝜁 ≤ , (5) through classical spectral symmetric QR method, of which
3 4
the complexity is 𝑂 (𝑟 3 ).
and decide the sampling parameter as

4 log2 ( 8𝑛
𝜂) C. Calculation of 𝜆˜𝑙
𝑟=d e, (6)
𝜖 02 In Step 5-6 we calculate 𝜆˜𝑙 . By Alg. 1, we have
4𝜅 2 log2 ( 8𝑟
𝜂)
𝑐=d e. (7) 1 00𝑇 1 1
𝛽2 𝜆𝑙 = 2
𝑉𝑙 𝑅𝑦 = 2 Tr[𝑉𝑙00𝑇 𝑋 0𝑇 𝑋 𝑦] = 2 Tr[𝑋 𝑦𝑉𝑙00𝑇 𝑋 0𝑇 ].
𝜎𝑙 𝜎𝑙 𝜎𝑙

B. Proof of 𝐸 1 ≤ 𝜖
k𝛼k k𝑥k Observe that k𝑦𝑉𝑙00𝑇 𝑋 0𝑇 k 𝐹 = k𝑦k k𝑉𝑙00𝑇 𝑋 0𝑇 k ≤ k𝑦k, and we
4
can query the (𝑖, 𝑗) matrix element of 𝑦𝑉𝑙00𝑇 𝑋 0𝑇 in cost 𝑂 (𝑟).
Notice that According to Lemma 1, the complexity in step 6 is
𝐸 4 = k𝑥k k𝛼 − 𝛼k.
˜ k 𝑋 k 2𝐹 𝑘 2 8𝑘
𝑇6 = 𝑂 ( log2 ( ) (log2 (𝑚𝑛) + 𝑘)).
𝜖2 𝜂
For 𝑦 = 𝑋 𝑇 𝑋𝛼 and 𝛼 = 𝑋 + 𝑋 +𝑇 𝑦, we have k𝑦k ≤ k𝛼k ≤
𝜅 2 k𝑦k.
˜ D. Calculation of 𝑥𝑇 𝑋 𝛼˜
For k 𝛼˜ − 𝛼 0 k, let 𝑧 be the vector that 𝑧𝑙 = 𝜆𝑙𝜎−2𝜆𝑙 , we have
𝑙
In Step 7-8 we calculate 𝑥𝑇 𝑋 𝛼. ˜ Calculation of 𝑥𝑇 𝑋 𝛼˜ is
𝑘
∑︁ 𝜆𝑙 − 𝜆˜𝑙 ˜ the last step of the algorithm, and also the most important step
k 𝛼˜ − 𝛼 0 k =k 𝑉𝑙 k for saving time complexity. In Step 8 of Alg. 3, we need to
𝜎𝑙2
𝑙=1 calculate 𝑥𝑇 𝑋 𝛼, ˜ which is equal to Tr[𝑋 𝛼𝑥 ˜ 𝑇 ], with precision
=k𝑉˜ 𝑧k 𝜂
𝜖 k𝛼k k𝑥k and success probability 1 − 4 using Alg. 1. Let the
˜ 𝑇 , respectively. To calculate
√︃
𝐴 and 𝐵 in Alg. 1 be 𝑋 and 𝛼𝑥
≤ k𝑉˜ 𝑇 𝑉˜ k k𝑧k
˜ ], we first establish the query access for 𝛼𝑥
Tr[𝑋 𝛼𝑥 𝑇 ˜ 𝑇 (we
2
4 3𝜖 𝜎𝑙 1 √ already have the sampling access of 𝑋), and then using the
≤ √ k𝑦k 2 𝑘
3 8 𝑘 𝜎𝑙 Alg. 1 as an oracle. We first analyze the time complexity of
1 querying 𝑅 and 𝛼, ˜ and then provide the time complexity of
≤ 𝜖 k𝛼k. calculating 𝑥𝑇 𝑋 𝛼: ˜
4
1) Query of 𝑅: First we find query access of 𝑅 = 𝑋 0𝑇 𝑋.
4
in which k𝑉˜ 𝑇 𝑉˜ k ≤ 3 as shown in proof of theorem 6. For any 𝑠 = 1, ..., 𝑟, 𝑗 = 1, ..., 𝑚, 𝑅𝑠 𝑗 = 𝑒𝑇𝑠 𝑋 0𝑇 𝑋𝑒 𝑗 =
Tr[𝑋𝑒 𝑗 𝑒𝑇𝑠 𝑋 0𝑇 ], we calculate such trace by Alg. 1 to precision
𝜖1 with success probability 1 − 𝜂1 . The time complexity for
V. C OMPLEXITY
one query will be
In this section, we will analyze the time complexity of
each step in the main algorithm.We divide these steps into 2 k 𝑋 k 4𝐹
𝑄(𝑅) = 𝑂 (log2 ( ) log2 (𝑚𝑛)).
four parts and analyze each part in each subsection: Step 1- 𝜂1 𝜖12 𝑟
3 are considered in Subsection V-A. Step 4 is considered in 2) Query of 𝛼: ˜ For any 𝑖 = 1, ..., 𝑚, we have 𝛼˜ 𝑗 =
Subsection V-B. Step 5-6 are considered in Subsection V-C. Í𝑟
𝑅 𝑢 . One query of 𝛼˜ will cost time 𝑟 𝑘𝑄(𝑅), with error
𝑠=1 𝑠 𝑗 𝑠
Step 7-8 are considered in Subsection V-D. Note that in the Í
𝜖1 𝑟𝑠=1 |𝑢 𝑠 | and success probability more than 1 − 𝑟𝜂1 .
main algorithm the variables 𝑅, 𝑉˜𝑙 , 𝛼˜ are queried instead of
3) Calculation of 𝑥𝑇 𝑋 𝛼: ˜ We use Alg. 1 to calculate
calculated. We include the corresponding query complexity in
𝑥 𝑋 𝛼˜ = Tr[𝑋 𝛼𝑥
𝑇 ˜ 𝑇 ] to precision 𝜖2 k𝛼k k𝑥k with success
analysis of the steps where we queried these variables.
probability 1 − 𝜂8 . Notice the query of 𝛼˜ is with error and
success probability. We only need
A. Sampling of columns and rows 𝑟
∑︁ 36 16 𝜖
𝜖1 |𝑢 𝑠 | d e d6 log2 ( )e ≤ k𝛼k k𝑥k,
In Step 1, the value of 𝑟 and 𝑐 are determined according to 𝜖2 𝜂 2
𝑠=1
Inequalities (3,4,5,6,7). The time of solving these inequalities
is a constant. In Step 2 we sample 𝑟 indices, each sampling 36 16 𝜂
𝑟𝜂1 de d6 log2 ( )e ≤
takes no more than log2 𝑚 time according to the arborescent 𝜖2 𝜂 8
vector data structure shown in II-C. In Step 3 we sample Í √
to fulfill the overall computing task. Notice 𝑟𝑠=1 |𝑢 𝑠 | ≤ 𝑟 k𝑢k
𝑐 indices, each sampling takes no more than 𝑟 log2 𝑛 time
and 𝛼 = 𝑅𝑇 𝑢 We set
according to the arborescent matrix data structure shown
in II-C. Thus the overall time complexity of Step 1-3 is 𝜖 k𝑥k
𝜖 1 = √ 36 ,
𝑂 (𝑟 log2 𝑚 + 𝑐𝑟 log2 𝑛). 2 𝑟 d 𝜖 2 e d6 log2 ( 16
𝜂 )e
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2020 8

𝜂
𝜂1 = . B. Experiment II: Discussion on algorithm parameters
8𝑟 d 36
𝜖2
e d6 log2 ( 16
𝜂 )e As analyzed in Section IV and Section V, there are two
main parameters for the quantum-inspired algorithm: relative
And the overall time complexity for computing 𝑥𝑇 𝑋 𝛼˜ is
error 𝜖 and success probability 1 − 𝜂. Based on them we set
1 1 subsampling size 𝑟, 𝑐 and run the algorithm. However, for
𝑇7 = 𝑂 ( log2 (log2 (𝑚𝑛) + 𝑟 𝑘𝑄(𝑅))) datasets that are not large enough, setting 𝑟, 𝑐 by Equation (6)
𝜖2 𝜂
and Equation (7) is rather time costly. For instance, when the
1 1 2 k 𝑋 k4
= 𝑂 ( 2 log2 (log2 (𝑚𝑛) + 𝑟 𝑘 log2 ( ) 2 𝐹 log2 (𝑚𝑛))). condition number of data matrix is 1.0, taking 𝜂 = 0.1 and 𝜖 =
𝜖 𝜂 𝜂1 𝜖 1 𝑟 5.0, theoretically, the 𝑟, 𝑐 for 10000 × 10000 dataset should be
set as 1656 and 259973 to assure that the algorithm calculates
VI. E XPERIMENTS the classification expression with relative error less than 𝜖 and
success probability higher than 1−𝜂. For practical applications
In this section, we demonstrate the proposed quantum- of not too large datasets, we set 𝑟, 𝑐 as 𝑟 = 𝑏d4 log2 (2𝑛/𝜂)/𝜖 2 e
inspired SVM algorithm in practice by testing the algorithm on and 𝑐 = 𝑏d4 log2 (2𝑟/𝜂)/𝜖 2 e, in which 𝑏 is the subsampling
artificial datasets. The feasibility and efficiency of some other size control parameter. When 𝑏 = 1, our practical choice of
quantum-inspired algorithms (quantum-inspired algorithms for 𝑟, 𝑐 assures the relative error of subsampling (Step 2 and Step 3
recommendation systems and linear systems of equations) on in Alg. 3) won’t exceed 𝜖 (guaranteed by Theorem 2).
large datasets has been benchmarked, and the results indicate In Experiment I, we took the practical setting of 𝑟, 𝑐,
that quantum-inspired algorithms can perform well in practice where we already found advantage compared to LIBSVM.
under its specific condition: low rank, low condition number, Our choice of 𝜖, 𝜂 and 𝑏 is 𝜖 = 5, 𝜂 = 0.1 and 𝑏 = 1.
and very large dimension of the input matrix [24]. Here we Here, we test the algorithm on other choices of 𝜖, 𝜂 and 𝑏
show the feasibility of the quantum-inspired SVM. Firstly, we and check the classification rate of the algorithm. We test
test the quantum-inspired SVM algorithm on low-rank and each parameter choice for 50 times. The variation intervals
low-rank approximated datasets and compare it to an exist- of each parameter are 𝜖 from 1 to 10, 𝜂 from 0.1 to 1, and
ing classical SVM implementation. Secondly, we discuss the 𝑏 from 1 to 10. The results are shown in Fig. 4. We find the
characteristics of the algorithm by analyzing its dependence average classification rates of the algorithm in each experiment
on the parameters and datasets. In our experiment, we use are close. We notice when using the practical 𝑟, 𝑐, which
the arborescent data structure instead of arrays for storage are much smaller than the theoretical ones, the algorithm
and sampling [24], making the experiment conducted in a maintains its performance (classification rate around 0.90).
more real scenario compared to the previous work [24]. All This phenomenon indicates a gap between our theoretical
algorithms are implemented in Julia [32]. The source code and analysis and the actual performance, as [24] reports “the
data are available at https://github.com/helloinrm/qisvm. performance of these algorithms is better than the theoretical
complexity bounds would suggest”.

A. Experiment I: Comparison with LIBSVM VII. D ISCUSSION


In this experiment, we test quantum-inspired SVM algo- In this section, we will present some discussions on the
rithm on large datasets and compare its performance to the proposed algorithm. And we will also discuss the potential
well-known classical SVM implementation LIBSVM [33]. applications of our techniques to other types of SVMs, such
We generate datasets of size 10000×11000, which represent as non-linear SVM and least square SVM, but more works on
11000 vectors (6000 vectors for training and 5000 vectors for the complexity and errors are required in future work if we
testing) with length 10000. All the data vectors in training want to realize these extensions.
and testing sets are chosen uniformly at random from the
generated data matrix, so that they are statistically independent A. The cause of exponential speedup
and identically distributed. We test quantum-inspired SVM An interesting fact is that we can achieve exponential
and LIBSVM on two kinds of datasets: low-rank datasets speedup without using any quantum resources, such as super-
(rank= 1) and high-rank but low-rank approximated datasets position or entanglement. This is a somewhat confusing but
(rank= 10000). Each scenario is repeated for 5 times. The con- reasonable result that can be understood as follows: Firstly, the
struction method for data matrices is described in Appendix B. advantage of quantum algorithms, such as HHL algorithm, is
And the parameters for quantum-inspired SVM are choosen as that high-dimensional vectors can be represented using only
𝜖 = 5, 𝜂 = 0.1 and 𝑏 = 1 (We explain the parameters and their a few qubits. By replacing qRAM to the arborescent data
setting in Experiment II.). structure for sampling, we can also represent the low-rank
The average classification rates are shown in Table II, from matrices by its normalized submatrix in a short time. By
which we observe the advantage of quantum-inspired SVM on using the technique of sampling, large-size calculations are
such low-rank approximated datasets (on average about 5% avoided, and we only need to deal with the problem that has
higher). We also find that both quantum-inspired SVM and the logarithmic size of the original data. Secondly, the relative
LIBSVM performs better on low-rank datasets than low-rank error of matrix subsampling algorithm is minus-exponential
approximated datasets. on the matrix size, which ensures the effectiveness of such
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2020 9

TABLE II
T HE AVERAGE VALUES AND STANDARD DEVIATIONS OF CLASSIFICATION RATES (%) OF QI SVM AND LIBSVM IN FIVE EXPERIMENTS .

Testing Set Training Set


qiSVM LIBSVM qiSVM LIBSVM
Low-rank 91.45±3.17 86.46±2.00 91.35±3.64 86.45±2.15
Low-rank approximated 89.82±4.38 84.90±3.20 89.92±4.23 84.69±2.87

1 .0 1 .0 1 .0

a v e r a g e c la s s ific a tio n r a te
a v e r a g e c la s s ific a tio n r a te

a v e r a g e c la s s ific a tio n r a te
0 .8 0 .8 0 .8

0 .6 0 .6 0 .6
1 2 3 4 5 6 7 8 9 1 0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1 .0 1 2 3 4 5 6 7 8 9 1 0
ε η s u b s a m p lin g s iz e
(a) (b) (c)
Fig. 4. The average classification rate of quantum-inspired SVM algorithm with different parameters on the dataset with rank 1. Each point represents an
average classification rate for 50 trials, and the error bar shows the standard deviation of the 50 trials. (a) Algorithm performance when the parameter 𝜖 is
taken from 1 to 10. (b) Algorithm performance when the parameter 𝜂 is taken from 0.1 to 1. (c) Algorithm performance when the parameter 𝑏 is taken from
1 to 10.

logarithmic-complexity algorithm (e.g. Theorem 2 shows the Algorithm 4 Polynomial kernel matrices sampling.
error of matrix row subsampling). Input: The sampling access of 𝑋 in logarithmic time of 𝑚
and 𝑛.
Goal: Sample a column index 𝑗 from the column norm vector
B. Improving sampling for dot product (k𝑥1 k 𝑝 , k𝑥 2 k 𝑝 , . . . , k𝑥 𝑚 k 𝑝 ) of 𝑍, and them sample a row
index 𝑖 from column 𝑥 ⊗𝑗 𝑝 of 𝑍.
Remember in Alg. 1 we can estimate dot products for two
1: Sample on column norm vector (k𝑥 1 k, k𝑥 2 k, . . . , k𝑥 𝑚 k) of
vectors. However, it does not work well for all the conditions,
𝑋 to get index 𝑗.
like when k𝑥k and k𝑦k are donminated by one element. For
2: Query k𝑥 𝑗 k from (k𝑥 1 k, k𝑥 2 k, . . . , k𝑥 𝑚 k). Calculate
randomness, [34] implies that we can apply a spherically
k𝑥 𝑗 k 𝑝 .
random rotation 𝑅 to all 𝑥, which does not change the kernel
3: Sample a real number 𝑎 uniformly distributed in [0, 1].
matrix 𝐾, but will make all the elements in the dataset matrix
If 𝑎 ≥ k𝑥 𝑗 k 𝑝 , go to Step 1. If not, output index 𝑗 as the
be in a same distribution.
column index and continue.

C. LS-SVM with non-linear kernels


Algorithm 4 Polynomial kernel matrices sampling.
In Section II, we have considered the LS-SVM with the 4: Repeat sampling on 𝑥 𝑗 for 𝑝 times. Denote the outcome
linear kernel 𝐾 = 𝑋 𝑇 𝑋. When data sets are not linear indices as 𝑖 1 , 𝑖2 , . . . , 𝑖 𝑝 . Í𝑝
separable, non-linear kernels are usually needed. To deal with Output: Column index 𝑗 and row index 𝜏=1 (𝑖 𝜏 −1)𝑛 𝑝−𝜏 +1.
non-linear kernels with Alg. 3, we only have to show how to
establish sampling access for the non-linear kernel matrix 𝐾
from the sampling access of 𝑋. For general non-linear kernels, we note that they can always
We first show how the sampling access of polynomial kernel be approximated by linear combination of polynomial kernels
𝐾 𝑝 (𝑥𝑖 , 𝑥 𝑗 ) = (𝑥𝑇𝑗 𝑥 𝑖 ) 𝑝 can be established. The corresponding (and thus can be sampled based on sampling access of poly-
nomial kernels) the corresponding non-linear feature function
kernel matrix is 𝐾 𝑝 = ((𝑥𝑇𝑗 𝑥𝑖 ) 𝑝 )𝑖=1,...,𝑚, 𝑗=1,...,𝑚 .
is continous. For instance, the popular radial basis function
We take
(RBF) kernel
𝑍 = (𝑥1⊗ 𝑝 , 𝑥2⊗ 𝑝 , ..., 𝑥 𝑚
⊗𝑝
),
k𝑥 𝑖 − 𝑥 𝑗 k 2
𝐾RBF (𝑥𝑖 , 𝑥 𝑗 ) = exp(− )
in which the 𝑗-column 𝑍 𝑗 is the 𝑝-th tensor power of 𝑥 𝑗 . 2𝜎 2
Notice that 𝑍 𝑇 𝑍 = 𝐾 𝑝 . Once we have sampling access of can be approximated by
𝑍, we can sample 𝐾 𝑝 as Step 2 and Step 3 in Alg. 3 do. The !𝑝
𝑁
∑︁ 1 𝑥𝑇𝑖 𝑥𝑖 − 2𝑥𝑇𝑗 𝑥𝑖 + 𝑥𝑇𝑗 𝑥 𝑗
sampling access of 𝑍 can be established by (The effectiveness 𝐾˜ RBF (𝑥𝑖 , 𝑥 𝑗 ) = −
of Alg. 4 is shown in Appendix C.): 𝑝=0
𝑝! 2𝜎 2
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2020 10

𝑁   𝑝 ∑︁𝑝  
∑︁ 1 𝑝 more complex kernels [26], [36] or other machine learning
= − 𝐾𝑞 (𝑥𝑖 , 𝑥𝑖 )
2𝜎 2 𝑞 algorithms. The technique of indirect sampling can expand
𝑝=0 𝑞,𝑙=0
    the application area of fast sampling techniques. And it will
𝑝 𝑝 make contribution to the further competition between classical
+ (−2) 𝑙 𝐾𝑙 (𝑥𝑖 , 𝑥 𝑗 ) + 𝐾 𝑝−𝑞−𝑙 (𝑥 𝑗 , 𝑥 𝑗 ).
𝑙 𝑝−𝑞−𝑙 algorithms and quantum ones.
Some improvements on our work would be made in the
future, such as reducing the conditions on the data matrix,
D. General LS-SVM further reducing the complexity, and tighten the error bounds
In the former sections, we began with a LS-SVM with 𝑏 = in the theoretical analysis, which can be achieved through a
0 and linear kernels in Section II. And we showed how the deeper investigation on the algorithm and the error propaga-
method can be extended to nonlinear kernels in Section VII-C. tion process. The investigation of quantum-inspired non-linear
Finally, we deal with the last assumption 𝑏 = 0. We show how SVMs and least squares SVM as discussed in Section VII also
a general LS-SVM can be tackled using techniques alike in requires theoretical analysis and empirical evaluations.
Alg. 3: We note that our work, as well as the previous quantum-
A general LS-SVM equation [26] is inspired algorithms, are not intended to demonstrate that
     quantum computing is uncompetitive. We want to find out
0 1𝑇 𝑏 0 where the boundaries of classical and quantum computing are,
= , (8)
1 𝐾 + 𝛾 −1 𝐼 𝛼 𝑦 and we expect new quantum algorithms to be developed to beat
in which 𝐾 is the kernel matrix. our algorithm.
Equation (8) can be solved as follows:
(i) Firstly, by methods in Section VII-C, we establish the A PPENDIX A
sampling access for kernel matrix 𝐾. Suppose a sampling P ROOF OF T HEOREMS IN IV
outcome of 𝐾 is 𝐾 00. A. Proof of Theorem 3
(ii) Secondly, take Proof: We break the expression |𝑉𝑖00𝑇 𝐴 0𝑉 𝑗00 − 𝛿𝑖 𝑗 𝜎𝑖2 | into

0 1𝑇
 two parts,
𝐴= .
1 𝐾 + 𝛾 −1 𝐼 |𝑉𝑖00𝑇 𝐴 0𝑉 𝑗00 −𝛿𝑖 𝑗 𝜎𝑖2 | ≤ |𝑉𝑖00𝑇 ( 𝐴 0 − 𝐴 00)𝑉 𝑗00 |+|𝑉𝑖00𝑇 𝐴 00𝑉 𝑗00 −𝛿𝑖 𝑗 𝜎𝑖2 |.
and   For the first item, because of the condition k 𝐴 0 − 𝐴 00 k ≤ 𝛽 and
0
00 1𝑇
𝐴 = . 𝑉 𝑗00 are normalized,
1 𝐾 00 + 𝛾 −1 𝐼
We establish the eigen relations between 𝐴 and 𝐴 00 by theo- |𝑉𝑖00𝑇 ( 𝐴 0 − 𝐴 00)𝑉 𝑗00 | ≤ k𝑉𝑖00𝑇 k · k ( 𝐴 0 − 𝐴 00)𝑉 𝑗00 k
rems which are similar to Theorem 2 and Theorem 4. ≤ 𝛽.
(iii) Once 𝐴 ∈ R𝑚×𝑚 is subsampled to 𝐴 00 ∈ R𝑟 ×𝑟 , we can
For
Í 𝑘 the second item, because of the condition 𝐴 00 =
continue Step 3-7 of Alg. 3. 2 00 00𝑇
(iv) Once Equation (8) is solved in Step 7 of Alg. 3, which 𝑙=1 𝜎𝑙 𝑉𝑙 𝑉𝑙 ,

means we can establish the query access for 𝛼. According |𝑉𝑖00𝑇 𝐴 00𝑉 𝑗00 − 𝛿𝑖 𝑗 𝜎𝑖2 | = 0.
to Equation 8, 𝑏 = 𝑦 𝑗 − 𝑥𝑇𝑗 𝑋𝛼 − 𝛾 −1 𝛼 𝑗 for any 𝑗 such that
𝛼 𝑗 ≠ 0. We can then evaluate the classification expression In all,
𝑦 𝑗 + (𝑥 −𝑥 𝑗 )𝑇 𝑋𝛼 − 𝛾 −1 𝛼 𝑗 and make classification using Alg. 1. |𝑉𝑖00𝑇 𝐴 0𝑉 𝑗00 − 𝛿𝑖 𝑗 𝜎𝑖2 | ≤ 𝛽.
There are two ways to find 𝑗: One is executing the rejection The description above can be written in short as follows:
sampling on 𝛼 using Alg. 2. The other is checking if 𝛼 𝑗 = 0
after each sampling of 𝑋 in Step 3 of Alg. 1. |𝑉𝑖00𝑇 𝐴 0𝑉 𝑗00 − 𝛿𝑖 𝑗 𝜎𝑖2 | ≤ |𝑉𝑖00𝑇 ( 𝐴 0 − 𝐴 00)𝑉 𝑗00 | + |𝑉𝑖00𝑇 𝐴 00𝑉 𝑗00 − 𝛿𝑖 𝑗 𝜎𝑖2 |
≤ k𝑉𝑖00𝑇 k · k ( 𝐴 0 − 𝐴 00)𝑉 𝑗00 k
VIII. C ONCLUSION ≤ 𝛽.
We have proposed a quantum-inspired SVM algorithm that
achieves exponential speedup over the previous classical al-
gorithms. The feasibility of the proposed algorithm is demon-
B. Proof of Theorem 4
strated by experiments. Our algorithm works well on low-rank
datasets or datasets that can be well approximated by low-rank Proof: Denote |𝑉˜𝑖𝑇 𝑉˜ 𝑗 − 𝛿𝑖 𝑗 | as Δ1 , |𝑉˜𝑖𝑇 𝐴𝑉˜ 𝑗 − 𝛿𝑖 𝑗 𝜎𝑖2 | as
matrices, which is similar with quantum SVM algorithm [31] Δ2 . By definition, 𝑉˜𝑙 = 𝜎12 𝑅𝑇 𝑉𝑙00. Thus
𝑙
as "when a low-rank approximation is appropriate". Certain
investigations on the application of such an algorithm are 𝑉𝑖00𝑇 𝑅𝑅𝑇 𝑉 𝑗00 − 𝛿𝑖 𝑗 𝜎𝑖4
Δ1 = | |.
required to make quantum-inspired SVM operable in solving 𝜎𝑖2 𝜎 2𝑗
questions like face recognition [25] and signal processing [35].
We hope that the techniques developed in our work can We break it into two parts:
lead to the emergence of more efficient classical algorithms, 1  
Δ1 ≤ 2 2 |𝑉𝑖00𝑇 𝐴 0 𝐴 0𝑉 𝑗00 − 𝛿𝑖 𝑗 𝜎𝑖4 | + |𝑉𝑖00𝑇 (𝑅𝑅𝑇 − 𝐴 0 𝐴 0)𝑉 𝑗00 | .
such as applying our method to support vector machines with 𝜎𝑖 𝜎 𝑗
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2020 11

For the first item, we have For the second item, we have
|𝑉𝑖00𝑇 𝐴 0 𝐴 0 𝐴 0𝑉 𝑗00 − 𝛿𝑖 𝑗 𝜎𝑖6 |
|𝑉𝑖00𝑇 𝐴 0 𝐴 0𝑉 𝑗00 − 𝛿𝑖 𝑗 𝜎𝑖4 |
=|𝑉𝑖00𝑇 ( 𝐴 0 − 𝐴 00) 𝐴 0 𝐴 0𝑉 𝑗00 + 𝑉𝑖00𝑇 𝐴 00 ( 𝐴 0 − 𝐴 00) 𝐴 0𝑉 𝑗00
=|𝑉𝑖00𝑇 ( 𝐴 0 − 𝐴 00) 2𝑉 𝑗00 + 𝑉𝑖00𝑇 ( 𝐴 0 − 𝐴 00) 𝐴 0𝑉 𝑗00
+𝑉𝑖00𝑇 𝐴 00 𝐴 00 ( 𝐴 0 − 𝐴 00)𝑉 𝑗00 + 𝑉𝑖00𝑇 𝐴 00 𝐴 00 𝐴 00𝑉 𝑗00 − 𝛿𝑖 𝑗 𝜎𝑖6 |
+𝑉𝑖00𝑇 𝐴 0 ( 𝐴 0 − 𝐴 00)𝑉 𝑗00 + 𝑉𝑖00𝑇 𝐴 00 𝐴 00𝑉 𝑗00 − 𝛿𝑖 𝑗 𝜎𝑖4 |
≤|𝑉𝑖00𝑇 ( 𝐴 0 − 𝐴 00) 𝐴 0 𝐴 0𝑉 𝑗00 | + |𝑉𝑖00𝑇 𝐴 00 ( 𝐴 0 − 𝐴 00) 𝐴 0𝑉 𝑗00 |
≤|𝑉𝑖00𝑇 ( 𝐴 0 − 𝐴 00) 2𝑉 𝑗00 | + |𝑉𝑖00𝑇 ( 𝐴 0 − 𝐴 00) 𝐴 0𝑉 𝑗00 |
+|𝑉𝑖00𝑇 𝐴 00 𝐴 00 ( 𝐴 0 − 𝐴 00)𝑉 𝑗00 | + |𝑉𝑖00𝑇 𝐴 00 𝐴 00 𝐴 00𝑉 𝑗00 − 𝛿𝑖 𝑗 𝜎𝑖6 |
+|𝑉𝑖00𝑇 0
𝐴 (𝐴 − 𝐴 0 00
)𝑉 𝑗00 | + |𝑉𝑖00𝑇 𝐴 00
𝐴 00𝑉 𝑗00 − 𝛿𝑖 𝑗 𝜎𝑖4 |
≤k ( 𝐴 0 − 𝐴 00) 𝐴 0 𝐴 0 k + k 𝐴 00 ( 𝐴 0 − 𝐴 00) 𝐴 0 k + k 𝐴 00 𝐴 00 ( 𝐴 0 − 𝐴 00) k
2
≤𝛽 + 𝜎 2𝑗 𝛽 + 𝜎𝑖2 𝛽.
≤k 𝑋 0 k 4 k 𝐴 0 − 𝐴 00 k + k 𝑋 00 k 2 k 𝑋 0 k 2 k 𝐴 0 − 𝐴 00 k
+k 𝑋 00 k 4 k 𝐴 0 − 𝐴 00 k
The last step above used the same technique as the proof of
Thm 3. ≤𝛽k 𝑋 k 4𝐹 .
For the second item, we have In all,
1
|𝑉𝑖00𝑇 (𝑅𝑅𝑇 − 𝐴 0 𝐴 0)𝑉 𝑗00 | ≤ k𝑅𝑅𝑇 − 𝐴 0 𝐴 0 k Δ2 ≤ (2k 𝑋 k 2𝐹 𝜖 0 + 𝛽k 𝑋 k 4𝐹 )
𝜎𝑖2 𝜎 2𝑗
= k 𝑋 0𝑇 𝑋 𝑋 𝑇 𝑋 0 − 𝑋 0𝑇 𝑋 0 𝑋 0𝑇 𝑋 0 k
≤ (2𝜖 0 + 𝛽k 𝑋 k 2𝐹 ) k 𝑋 k 2𝐹 𝜅 2 .
≤ k 𝑋 0 k 2 k 𝑋 𝑋 𝑇 − 𝑋 0 𝑋 0𝑇 k.

Because
C. Proof of Theorem 6
0 0
k 𝑋 k ≤ k 𝑋 k𝐹 = k 𝑋 k𝐹 , Proof: For 𝑉˜𝑖𝑇 𝑉˜ 𝑗 −𝛿𝑖 𝑗 are elements of 𝑉˜ 𝑇 𝑉˜ −𝐼 and |𝑉˜𝑖𝑇 𝑉˜ 𝑗 −
1
𝛿𝑖 𝑗 | ≤ 4𝑘 ,
we have 1
k𝑉˜ 𝑇 𝑉˜ − 𝐼 k ≤ 𝑘max{|𝑉˜𝑖𝑇 𝑉˜ 𝑗 − 𝛿𝑖 𝑗 |} ≤ .
|𝑉𝑖00𝑇 (𝑅𝑅𝑇 − 𝐴 0 𝐴 0)𝑉 𝑗00 | ≤ 𝜖 0 k 𝑋 k 2𝐹 . 4
Thus k𝑉˜ 𝑇 𝑉˜ k is invertible and
In all, due to 𝜎𝑖 ≥ 𝜅 ∀𝑖 ∈ {1, . . . , 𝑘 }, ˜ −1 k = 1/k𝑉˜ 𝑇 𝑉˜ k ≤ 1/(1 − k𝑉˜ 𝑇 𝑉˜ − 𝐼 k) = 4
k (𝑉˜ 𝑇 𝑉) .
3
1 ˜ −2𝑉˜ 𝑇 𝐴 − 𝐼𝑚 , we have
Take 𝐵 = 𝑉Σ
Δ1 ≤ (𝛽2 + 𝜎 2𝑗 𝛽 + 𝜎𝑖2 𝛽 + 𝜖 0 k 𝑋 k 2𝐹 )
𝜎𝑖2 𝜎 2𝑗 𝑘 ˜𝑇 ˜
∑︁ 𝑉𝑖 𝑉𝑙 · 𝑉˜𝑙𝑇 𝐴𝑉˜ 𝑗
2 2
≤ 𝜅 𝛽 + 2𝜅𝛽 + 𝜅 𝜖 2 0
k 𝑋 k 2𝐹 . |𝑉˜𝑖𝑇 𝐵𝑉˜ 𝑗 | = | − 𝑉˜𝑖𝑇 𝑉˜ 𝑗 |.
𝑙=1
𝜎𝑙2
1 𝑇 00 We break it into two parts:
By definition, 𝑉˜𝑙 = 𝜎𝑙2
𝑅 𝑉𝑙 . Thus
𝑘 ˜𝑇 ˜ 𝑘
∑︁ 𝑉 𝑉𝑙 ∑︁
|𝑉˜𝑖𝑇 𝐵𝑉˜ 𝑗 | ≤ | 𝑖
(𝑉˜𝑙𝑇 𝐴𝑉˜ 𝑗 −𝛿𝑙 𝑗 𝜎𝑙2 )|+| 𝑉˜𝑖𝑇 𝑉˜𝑙 𝛿𝑙 𝑗 −𝑉˜𝑖𝑇 𝑉˜ 𝑗 |.
𝑉𝑖00𝑇 𝑅 𝐴𝑅𝑇 𝑉 𝑗00 − 𝛿𝑖 𝑗 𝜎𝑖6 𝑙=1
𝜎𝑙2 𝑙=1
Δ2 = | |.
𝜎𝑖2 𝜎 2𝑗 The second item is zero because
𝑘
∑︁
We break it into two parts: | 𝑉˜𝑖𝑇 𝑉˜𝑙 𝛿𝑙 𝑗 − 𝑉˜𝑖𝑇 𝑉˜ 𝑗 | = |𝑉˜𝑖𝑇 𝑉˜ 𝑗 − 𝑉˜𝑖𝑇 𝑉˜ 𝑗 |.
𝑙=1

1 The first item


Δ2 ≤ (|𝑉𝑖00𝑇 (𝑅 𝐴𝑅𝑇 − 𝐴 0 𝐴 0 𝐴 0)𝑉 𝑗00 | 𝑘 ˜𝑇 ˜
𝜎𝑖 𝜎 2𝑗
2 𝑘
∑︁ 𝑉𝑖 𝑉𝑙 𝑇 2
∑︁
| ˜ ˜
(𝑉𝑙 𝐴𝑉 𝑗 − 𝛿𝑙 𝑗 𝜎𝑙 )| ≤ 𝜁 𝜅| 𝑉˜𝑖𝑇 𝑉˜𝑙 |
2
+ |𝑉𝑖00𝑇 𝐴 0 𝐴 0 𝐴 0𝑉 𝑗00 − 𝛿𝑖 𝑗 𝜎𝑖6 |) 𝑙=1
𝜎 𝑙 𝑙=1
∑︁
≤ 𝜁 𝜅( |𝑉˜𝑖𝑇 𝑉˜𝑙 | + |𝑉˜𝑖𝑇 𝑉˜𝑖 |)
For the first item, we have 𝑙≠𝑖
1 1
≤ 𝜁 𝜅((𝑘 − 1) +( + 1))
|𝑉𝑖00𝑇 (𝑅 𝐴𝑅𝑇 − 𝐴 0 𝐴 0 𝐴 0)𝑉 𝑗00 | 4𝑘 4𝑘
5
≤ k𝑅 𝐴𝑅𝑇 − 𝐴 0 𝐴 0 𝐴 0 k 𝜁 𝜅.≤
4
≤ k 𝑋 0 k 2 k 𝑋 𝑋 𝑇 𝑋 𝑋 𝑇 − 𝑋 0 𝑋 0𝑇 𝑋 0 𝑋 0𝑇 k
Thus |𝑉˜𝑖𝑇 𝐵𝑉˜ 𝑗 | ≤ 54 𝜁 𝜅 and k𝑉˜ 𝑇 𝐵𝑉˜ k ≤ 45 𝜁 𝜅𝑘. By Theorem 5,
≤ k 𝑋 k 2𝐹 (k 𝑋 𝑋 𝑇 (𝑋 𝑋 𝑇 − 𝑋 0 𝑋 0𝑇 ) k + k (𝑋 𝑋 𝑇 − 𝑋 0 𝑋 0𝑇 ) 𝑋 0 𝑋 0𝑇 k)
˜ −2𝑉˜ 𝑇 𝐴 − 𝐼𝑚 k = k𝐵k ≤ k (𝑉˜ 𝑇 𝑉)
˜ −1 k k𝑉˜ 𝑇 𝐵𝑉˜ k ≤ 5
≤ 2k 𝑋 k 2𝐹 k 𝑋 k 2 k 𝑋 𝑋 𝑇 − 𝑋 0 𝑋 0𝑇 k k𝑉Σ 𝜅𝑘 𝜁 .
3
≤ 2k 𝑋 k 2𝐹 𝜖 0 .
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2020 12

A PPENDIX B R EFERENCES
T HE CONSTRUCTION METHOD OF DATASETS
[1] H.-L. Huang, D. Wu, D. Fan, and X. Zhu, “Superconducting quantum
In our experiment, we constructed artificial datasets which computing: a review,” Science China Information Sciences, vol. 63, no.
180501, 2020.
are low-rank or can be low-rank approximated. Here we put [2] P. W. Shor, “Algorithms for quantum computation: Discrete logarithms
up our construction mehtod: and factoring,” in Proc. 35th Annual Symposium Foundations Computer
1. Firstly, we multiply a random matrix 𝐴 of size 𝑛 × 𝑘 Sci. Santa Fe, NM, USA: IEEE, Nov. 1994, pp. 124–134. [Online].
Available: https://ieeexplore.ieee.org/document/365700
with another random matrix 𝐵 of size 𝑘 × 𝑚. The elements in [3] C.-Y. Lu, D. E. Browne, T. Yang, and J.-W. Pan, “Demonstration
both of them are evenly distributed in [−0.5, 0.5]. Denote the of a compiled version of shor’s quantum factoring algorithm using
multiplication outcome as 𝑋. Then the rank of 𝑋 is at most photonic qubits,” Physical Review Letters, vol. 99, no. 25, p. 250504,
2007. [Online]. Available: https://journals.aps.org/prl/abstract/10.1103/
𝑘. PhysRevLett.99.250504
2. We add turbulence to the matrix 𝑋 by adding a random [4] H.-L. Huang, Q. Zhao, X. Ma, C. Liu, Z.-E. Su, X.-L. Wang,
number evenly distributed in [−0.1𝑥, 0.1𝑥] to all the elements L. Li, N.-L. Liu, B. C. Sanders, C.-Y. Lu et al., “Experimental
blind quantum computing for a classical client,” Physical review
in 𝑋, in which 𝑥 is the average of all the absolute values of letters, vol. 119, no. 5, p. 050503, 2017. [Online]. Available:
𝑋. After adding turbulence, 𝑋 is no more low-rank but still https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.119.050503
low-rank approximated. [5] L. K. Grover, “A fast quantum mechanical algorithm for database
search,” in Proc. 21th Annual ACM Symposium Theory Computing.
3. We normalize 𝑋 such that 𝑋 has operator norm 1. Philadelphia, Pennsylvania, USA: ACM, May 1996, pp. 212–219.
4. We divide the column vectors of 𝑋 into two classes by a [Online]. Available: http://doi.acm.org/10.1145/237814.237866
[6] T. Li, W.-S. Bao, H.-L. Huang, F.-G. Li, X.-Q. Fu, S. Zhang,
random hyperplane 𝑤𝑇 𝑥 = 0 that passes the origin (By random C. Guo, Y.-T. Du, X. Wang, and J. Lin, “Complementary-
hyperplane we mean the elements in 𝑤 are uniformly sampled multiphase quantum search for all numbers of target items,” Physical
from [0, 1] at random.), while making sure that both classes Review A, vol. 98, no. 6, p. 062308, 2018. [Online]. Available:
https://journals.aps.org/pra/abstract/10.1103/PhysRevA.98.062308
are not empty.
[7] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe,
5. Since now we have 𝑚 linear-separable labeled vectors, and S. Lloyd, “Quantum machine learning,” Nature, vol. 549,
each with length 𝑛. We choose uniformly at random 𝑚 1 of no. 7671, p. 195–202, Sept. 2017. [Online]. Available: https:
//doi.org/10.1038/nature23474
them for training, and let the other 𝑚 2 = 𝑚 − 𝑚 1 for testing,
[8] H.-L. Huang, X.-L. Wang, P. P. Rohde, Y.-H. Luo, Y.-W. Zhao,
while making sure that the training set includes vectors of both C. Liu, L. Li, N.-L. Liu, C.-Y. Lu, and J.-W. Pan, “Demonstration
classes. of topological data analysis on a quantum processor,” Optica,
vol. 5, no. 2, pp. 193–198, 2018. [Online]. Available: https:
//www.osapublishing.org/optica/abstract.cfm?uri=optica-5-2-193
[9] J. Liu, K. H. Lim, K. L. Wood, W. Huang, C. Guo, and H.-L. Huang,
A PPENDIX C “Hybrid quantum-classical convolutional neural networks,” arXiv
T HE EFFECTIVENESS OF A LG . 4 preprint, 2019. [Online]. Available: https://arxiv.org/abs/1911.02998
[10] H.-L. Huang, Y.-W. Zhao, T. Li, F.-G. Li, Y.-T. Du, X.-Q. Fu,
S. Zhang, X. Wang, and W.-S. Bao, “Homomorphic encryption
The goal of Alg. 4 is to sample a column index and a row experiments on ibm’s cloud quantum computing platform,” Frontiers
index from 𝑍. We show it achieves this goal. of Physics, vol. 12, no. 1, p. 120305, 2017. [Online]. Available:
https://link.springer.com/article/10.1007/s11467-016-0643-9
Step 1-3 are for sampling out the column index. They are [11] H.-L. Huang, Y. Du, M. Gong, Y. Zhao, Y. Wu, C. Wang, S. Li, F. Liang,
essentially Alg. 2 with 𝐴 = Diag(k𝑥1 k 𝑝−1 , . . . , k𝑥 𝑚 k 𝑝−1 ) and J. Lin, Y. Xu et al., “Experimental quantum generative adversarial
𝑏 = (k𝑥1 k, . . . , k𝑥 𝑚 k), which sample from the column norm networks for image generation,” arXiv:2010.06201, 2020.
[12] H.-L. Huang, A. K. Goswami, W.-S. Bao, and P. K. Panigrahi, “Demon-
vector 𝑏 = (k𝑥1 k 𝑝 , . . . , k𝑥 𝑚 k 𝑝 ) of 𝑍 to get the column index 𝑗. stration of essentiality of entanglement in a deutsch-like quantum
We note that in practical applications, Step 1-3 can be adjusted algorithm,” SCIENCE CHINA Physics, Mechanics & Astronomy, vol. 61,
for speedup, such as frugal rejection sampling suggested in no. 060311, 2018.
[37]. [13] H.-L. Huang, M. Narożniak, F. Liang, Y. Zhao, A. D. Castellano,
M. Gong, Y. Wu, S. Wang, J. Lin, Y. Xu et al., “Emulating quantum
Í 𝑝Step 4 is for sampling out the row index. Suppose 𝑙 = teleportation of a majorana zero mode qubit,” Physical Review Letters,
(𝑖 𝜏 −1)𝑛 𝑝−𝜏 +1. According the definition of tensor power, vol. 126, no. 9, p. 090502, 2021.
𝜏=1
[14] D. R. Simon, “On the power of quantum computation,” SIAM J.
the 𝑙-th element of 𝑥 ⊗𝑗 𝑝 is Comput., vol. 26, no. 5, pp. 1474–1483, July 1997. [Online]. Available:
https://doi.org/10.1137/S0097539796298637
(𝑥 ⊗𝑗 𝑝 )𝑙 = Π 𝜏=1
𝑝
𝑥𝑖𝜏 𝑗 . [15] I. Kerenidis and A. Prakash, “Quantum recommendation systems,”
in 8th Innovations Theoretical Computer Sci. Conf., ser. Leibniz
International Proceedings in Informatics (LIPIcs), vol. 67, Berkeley,
When Step 4 executes 𝑝 times of sampling on 𝑥 𝑗 , the probabil- CA, USA, Jan. 2017, pp. 49:1–49:21. [Online]. Available: http:
//drops.dagstuhl.de/opus/volltexte/2017/8154
𝑝
ity of getting the outcome 𝑖 1 , 𝑖2 , . . . , 𝑖 𝑝 is |Π 𝜏=1 𝑥𝑖𝜏 𝑗 | 2 , which
[16] E. Tang, “A quantum-inspired classical algorithm for recommendation
is exactly the probability of sampling out (𝑥 𝑗 )𝑙 in 𝑥 ⊗𝑗 𝑝 . Thus
⊗𝑝
systems,” in Proc. 51st Annual ACM SIGACT Symposium Theory
Í𝑝 Computing, vol. 25. New York, NY, USA: ACM, June 2019, pp.
we output index 𝑙 = 𝜏=1 (𝑖 𝜏 − 1)𝑛 𝑝−𝜏 + 1.
217–228. [Online]. Available: https://doi.org/10.1145/3313276.3316310
[17] A. Frieze, R. Kannan, and S. Vempala, “Fast monte-carlo algorithms
for finding low-rank approximations,” J. Assoc. Comput. Mach.,
ACKNOWLEDGMENT vol. 51, no. 6, pp. 1025–1041, Nov. 2004. [Online]. Available:
http://doi.acm.org/10.1145/1039488.1039494
[18] S. Lloyd, M. Mohseni, and P. Rebentrost, “Quantum principal
The authors would like to thank Yi-Fei Lu for helpful component analysis,” Nat. Phys., vol. 10, no. 9, p. 631–633, July 2014.
discussions. [Online]. Available: https://doi.org/10.1038/nphys3029
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2020 13

[19] S. Lloyd, M. Mohseni, and P. Rebentrost, “Quantum algorithms for [34] D. Achlioptas, F. McSherry, and B. Schölkopf, “Sampling techniques
supervised and unsupervised machine learning,” arXiv preprint, Nov. for kernel methods,” in Advances Neural Inform. Processing
2013. [Online]. Available: https://arxiv.org/abs/1307.0411 Systems, T. G. Dietterich, S. Becker, and Z. Ghahramani,
[20] E. Tang, “Quantum-inspired classical algorithms for principal Eds. Vancouver, British Columbia, Canada: MIT Press, Dec.
component analysis and supervised clustering,” arXiv preprint, 2002, pp. 335–342. [Online]. Available: https://papers.nips.cc/paper/
Oct. 2018. [Online]. Available: http://arxiv.org/abs/1811.00414 2072-sampling-techniques-for-kernel-methods
[21] A. Gilyén, S. Lloyd, and E. Tang, “Quantum-inspired low-rank [35] L. Wang, Support Vector Machines for Signal Processing, 1st ed. The
stochastic regression with logarithmic dependence on the dimension,” Netherlands: Springer, Berlin, Heidelberg, 2005, ch. 15, pp. 321–342.
arXiv preprint, Nov. 2018. [Online]. Available: http://arxiv.org/abs/ [Online]. Available: https://doi.org/10.1007/b95439
1811.04909 [36] L. Wang, Multiple Model Estimation for Nonlinear Classification,
[22] N.-H. Chia, H.-H. Lin, and C. Wang, “Quantum-inspired sublinear 1st ed. The Netherlands: Springer, Berlin, Heidelberg, 2005, ch. 2,
classical algorithms for solving low-rank linear systems,” arXiv preprint, pp. 49–76. [Online]. Available: https://doi.org/10.1007/b95439
Nov. 2018. [Online]. Available: https://arxiv.org/abs/1811.04852 [37] I. L. Markov, A. Fatima, S. V. Isakov, and S. Boixo, “Quantum
[23] A. W. Harrow, A. Hassidim, and S. Lloyd, “Quantum algorithm for linear Supremacy Is Both Closer and Farther than It Appears,” arXiv preprint,
systems of equations,” Phys. Rev. Lett., vol. 103, no. 15, p. 150502, Oct. Sep. 2018. [Online]. Available: http://arxiv.org/abs/1807.10749
2009. Chen Ding received the B.S. degree from University
[24] J. M. Arrazola, A. Delgado, B. R. Bardhan, and S. Lloyd, “Quantum- of Science and Technology of China, Hefei, China,
inspired algorithms in practice,” Quantum, vol. 4, p. 307, Aug. 2020. in 2019.
[Online]. Available: https://doi.org/10.22331/q-2020-08-13-307 He is currently a graduate student in CAS Centre
[25] P. J. Phillips, “Support vector machines applied to face recognition,” for Excellence and Synergetic Innovation Centre in
in Advances Neural Inform. Processing Systems, vol. 48, no. 6241, Quantum Information and Quantum Physics. His
Gaithersburg, MD, USA, Nov. 1999, pp. 803–809. [Online]. Available: current research interests include quantum machine
https://doi.org/10.6028/nist.ir.6241 learning, quantum-inspired algorithm designing and
[26] J. A. K. Suykens and J. Vandewalle, “Least squares support vector variational quantum computing.
machine classifiers,” Neural Process. Lett., vol. 9, no. 3, pp.
293–300, June 1999. [Online]. Available: https://doi.org/10.1023/A:
1018628609742
[27] J. Platt, “Sequential Minimal Optimization: A Fast Algorithm
for Training Support Vector Machines,” Apr. 1998. [Online].
Available: https://www.microsoft.com/en-us/research/publication/
sequential-minimal-optimization-a-fast-algorithm-for-training-support-vector-machines/ Tian-Yi Bao received the B.S. degree from Univer-
[28] H. P. Graf, E. Cosatto, L. Bottou, I. Dourdanovic, and sity of Michigan, Ann Arbor, USA, in 2020.
V. Vapnik, “Parallel Support Vector Machines: The Cascade She is currently a graduate student in Oxford
SVM,” in Advances in Neural Information Processing Systems University. Her current research interests include the
17, L. K. Saul, Y. Weiss, and L. Bottou, Eds. MIT Press, machine learning and human-computer interaction.
2005, pp. 521–528. [Online]. Available: http://papers.nips.cc/paper/
2608-parallel-support-vector-machines-the-cascade-svm.pdf
[29] J. Xu, Y. Y. Tang, B. Zou, Z. Xu, L. Li, Y. Lu, and B. Zhang, “The
generalization ability of svm classification based on markov sampling,”
IEEE transactions on cybernetics, vol. 45, no. 6, pp. 1169–1179,
2014. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/
6881630
[30] B. Zou, C. Xu, Y. Lu, Y. Y. Tang, J. Xu, and X. You, “𝑘-times
markov sampling for svmc,” IEEE transactions on neural networks
and learning systems, vol. 29, no. 4, pp. 1328–1341, 2017. [Online]. He-Liang Huang received the Ph.D. degree from
Available: https://ieeexplore.ieee.org/abstract/document/7993056/ the University of Science and Technology of China,
[31] P. Rebentrost, M. Mohseni, and S. Lloyd, “Quantum support vector Hefei, China, in 2018.
machine for big data classification,” Phys. Rev. Lett., vol. 113, p. He is currently an Assistant Professor of Henan
130503, Sept. 2014. [Online]. Available: https://link.aps.org/doi/10. Key Laboratory of Quantum Information and Cryp-
1103/PhysRevLett.113.130503 tography, Zhengzhou, China, and the Postdoctoral
[32] J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah, “Julia: A fresh Fellow of University of Science and Technology of
approach to numerical computing,” SIAM review, vol. 59, no. 1, pp. China, Hefei, China. He has authored or co-authored
65–98, 2017. [Online]. Available: https://doi.org/10.1137/141000671 over 30 papers in refereed international journals and
[33] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector co-authored 1 book. His current research interests
machines,” ACM Transactions on Intelligent Systems and Technology, include secure cloud quantum computing, big data
vol. 2, pp. 27:1–27:27, 2011, software available at http://www.csie.ntu. quantum computing, and the physical implementation of quantum computing
edu.tw/~cjlin/libsvm. architectures, in particular using linear optical and superconducting systems.

You might also like