Quantum Algorithms For Supervised and Unsupervised Machine Learning
Quantum Algorithms For Supervised and Unsupervised Machine Learning
1
This paper shows that quantum machine learning can provide exponential speed-ups
over classical computers for a variety of learning tasks. The intuition is straightforward.
Machine learning is about manipulating and classifying large amounts of data. The data
is typically post-processed and ordered in arrays (vectors) and arrays of arrays (tensor
products): quantum computers are good at manipulating vectors and tensor products in
high-dimensional spaces. In different machine learning settings, the speed-up plays out in
different fashions. First, classical data expressed in the form of N -dimensional complex
vectors can be mapped onto a quantum states over log2 N qubits: when the data is stored
in a quantum random access memory (qRAM), this mapping takes O(log2 N ) steps [10-
16]. Once it is in quantum form, the data can be post-processed by various quantum
algorithms (quantum Fourier transforms [17], matrix inversion [18], etc.), which take time
O(poly(log N )). Estimating distances and inner products between post-processed vectors
in N -dimensional vector spaces then takes time O(log N ) on a quantum computer. By
contrast, as noted by Aaronson [19], sampling and estimating distances and inner products
between post-processed vectors on a classical computer is apparently exponentially hard.
Quantum machine learning provides an exponential speed-ups over all known classical
algorithms for problems involving evaluating distances and inner products between large
vectors.
In this paper, we show that the problem of assigning N -dimensional vectors to one of
several clusters of M states takes time O(log(M N )) on a quantum computer, compared
with time O(poly(M N )) for the best known classical algorithm. That is, quantum ma-
chine learning can provide an exponential speed-up for problems involving large numbers
of vectors as well (“big quantum data”). We present a quantum version of Lloyd’s algo-
rithm for performing k-means clustering: using a novel version of the quantum adiabatic
algorithm one can classify M vectors into k clusters in time O(k log kM N ).
Finally, we note that in addition to supplying exponential speed-ups in both number
of vectors and their dimension, quantum machine learning allows enhanced privacy: only
O(log(M N )) calls to the quantum data-base are required to perform cluster assignment,
while O(M N ) are required to uncover the actual data. The data-base user can still obtain
information about the desired patterns, while the data-base owner is assured that the user
has only accessed an exponentially small fraction of the data base.
2
assume that our data sets that consist of arrays of numbers (vectors), and arrays of arrays
(collections of vectors), originally stored in random access memory (RAM) in the classical
case, or in quantum random access memory (qRAM) in the quantum case [10-16]. The
key feature of quantum machine learning is that quantum random access memory allows
us to access the data in quantum parallel. Begin with state preparation. Consider the
vector N = 2n dimensional complex vector ~v with components {vi = |vi |eiφi }. Assume
that {|vi |, φi } are stored as floating point numbers in quantum random access memory.
Constructing the log2 N qubit quantum state |vi = |~v |−1/2~v then takes O(log2 N ) steps as
Pℓ
long as the sub-norms nℓ = i=1 |vi |2 can be estimated efficiently [20-22]. Alternatively,
we can assume that these sub-norms are also given in qRAM in which case any quantum
state can be constructed in O(log N ) steps.
Once the exponentially compressed quantum versions of the vectors have been created,
we can postprocess them using quantum Fourier transforms, matrix inversion, etc., to
create vectors of the form QF T |vi, f (A)|vi, where A is a sparse Hermitian matrix and f is
a computable function, e.g., f (A) = A−1 . The postprocessing takes time O(poly(log N ))
As will now be shown, this allows us to evaluate generalized inner products hu|QF T |vi,
and hu|f (A)|vi between the quantum vectors. By contrast, as noted by Aaronson [19],
the best known algorithms for evaluating the classical versions of these generalized inner
products ~u† F T~v , ~u† f (A)~v, via sampling and classical postprocessing takes time O(polyN ).
3
where Z = |~u|2 + (1/M ) j |~vj |2 . It is straightforward to verify that the desired distance,
P
| ~u − (1/M ) j ~vj |2 , is equal to Z times the probability of success for this measurement.
P
The state |φi can be generated by using quantum access to the norms together with
quantum simulation to apply the unitary transformation e−iHt , where H = ( |~u||0ih0| +
P √ √ P
j |~
v j ||jihj| ) ⊗ σ x , to the state (1/ 2)( |0i − (1/ M ) j |ji ) ⊗ |0i. The result is the
state √ √ X
(1/ 2) cos(|~u|t) |0i − (1/ M ) cos(|~vj |t) |ji ⊗ |0i
j
√ √ X (1)
− (i/ 2) sin(|~u|t) |0i − (1/ M ) sin(|~vj |t) |ji ⊗ |1i.
j
Choosing t so that |~u|t, |~vj |t ≪ 1 and measuring the ancilla bit then yields the state |φi
with probability (1/2)(|~u|2 + (1/M ) j |~vj |2 )t2 = Z 2 t2 . This procedure creates the desired
P
state and, when repeated, also allows the quantity Z to be estimated. A more efficient way
to create the state and to estimate Z to accuracy ǫ is to use Grover’s algorithm/quantum
counting [17]. Quantum counting takes time O(ǫ−1 log M ), and also allows quantum co-
herence to be preserved during the state creation.
Unsupervised quantum learning:
The exponential quantum speed-up above holds for supervised learning. A similar
speed-up extends to unsupervised learning. Consider the k-means problem of assigning M
vectors to k clusters in a way that minimizes the average distance to the centroid of the
cluster. The standard method for solving k-means is Lloyd’s algorithm [1-2] (no relation
to the co-author of this paper): (0) choose the initial centroid randomly or by a method
such as k-means++ ; (1) assign each vector to the cluster with the closest mean; (2) re-
calculate the centroids of the clusters; repeat steps (1-2) until a stationary assignment is
attained. When classical estimation of the distance to the centroids in the N -dimensional
space takes time O(N ), each step of the classical algorithm takes time O(M 2 N ), while
the quantum Lloyd’s algorithm takes time O(M log(M N )). The additional factor of M in
both classical and quantum algorithms arises because every vector is tested individually
for reassignment at each step.
The quantum Lloyd’s algorithm can be improved by noting that the k-means problem
can be rephrased as a quadratic programming problem which is amenable to solution by
the adiabatic algorithm. As will now be seen, such unsupervised quantum machine learn-
ing takes time at most O(k log(M N )) and can even take only O(log(kM N )). In order to
reduce the dependence on the number of vectors from O(M log M ) to O(log M ), the output
4
of the computation can no longer be a list of the M vectors and their cluster assignments.
√ P √ P
Instead, the output is a quantum state |χi = (1/ M ) j |cj i|ji = (1/ M ) c,j∈c |ci|ji
that contains the labels j of vectors correlated with their cluster assignments cj in su-
perposition: we can then sample from that state to obtain a statistical picture of the
clustering. The procedure for constructing the clustering state |χi via the quantum adi-
abatic algorithm is given in the supplementary material. The algorithm takes time no
greater than O(ǫ−1 k log kM N ) to construct this state to accuracy ǫ, and could take time
as little as O(ǫ−1 log kM N ) if the clusters are relatively well separated, so that the gap of
the adiabatic algorithm is O(1).
Any algorithm that reveals the assignment of all M vectors necessarily takes time
O(M ) merely to print out the output. Many questions about the k-means clustering
can be answered using smaller outputs. As we now show, adiabatic algorithms provide a
powerful method for answering clustering questions. First, look at the problem of finding
initial seeds for the clusters. As the efficiency of the k-means++ algorithm shows, the
performance of Lloyd’s algorithm, classical or quantum, depends strongly on a good choice
of initial seeds. Initial seed vectors should be spread as far apart from each other as
possible. Begin the adiabatic seed-finding algorithm in the state |Ψi = |ψi1 ⊗ . . . |ψik ,
√ PM
where |ψi = (1/ M ) j=1 |ji is the uniform superposition of vector labels, and with
initial Hamiltonian H0 = 1 − |ΨihΨ|.
The distance-finding algorithm given above allows us to apply any Hamiltonian of the
form,
X
Hs = f ({|~vjℓ − ~vjℓ′ |2 })|j1 ihj1 | ⊗ . . . ⊗ |jk ihjk |. (2)
j1 ...jk
To find good seeds for k-means, use a final Hamiltonian for the adiabatic algorithm of the
Pk
form (2) with f = − ℓ,ℓ′ =1 |~vjℓ − ~vjℓ′ |2 . The ground state of this final Hamiltonian is the
seed set that maximizes the average distance squared between seeds.
We can also use the adiabatic algorithm to find sets of r vectors that should lie in the
same cluster. Here the final Hamiltonian is of the form
X
Hc = f ({|~vjℓ − ~vjℓ′ |2 })|j1 ihj1 | ⊗ . . . ⊗ |jr ihjr |, (3)
j1 ...jr
Pr
where f = ℓ,ℓ′ =1 |~vjℓ − ~vjℓ′ |2 + κδjℓ ,jℓ′ , κ > 0. Because of the overall positive sign,
the distance term now rewards sets of vectors that are clustered closely, while the κδjℓ ,jℓ′
ensures that the vectors in the ℓ and ℓ′ positions are different (we already know that a
5
vector lies in the same cluster as itself). To find such sets of vectors that are expected to
lie in the same cluster can take time O(r log M N ), depending on the probability of success
of the quantum adiabatic algorithm (see next paragraph). Combining this ‘attractive’
Hamiltonian with the ‘repulsive’ Hamiltonian of (2) allows one to find kr representative
groups of r vectors from each of the k clusters.
The success of the quantum adiabatic algorithm in finding the ground state of the final
Hamiltonian relies on traversing the minimum gap point of the quantum phase transition
between the initial and final Hamiltonians sufficiently slowly. Finding the optimal seed
set of size k classically is a combinatorially hard problem in k, and finding the optimal
cluster of r vectors is combinatorially hard in r. Accordingly, the minimum gap and
time to find the ground state may well scale exponentially in k, r. Indeed, optimal k-
means is an NP-complete problem which we do not expect to solve in polynomial time
on a computer, classical or quantum. Approximate solutions of these hard problems are
well within the grasp of the quantum adiabatic algorithm, however. k-means++ does not
require an optimal seed set, but merely a good seed set with well-separated vectors. In
addition, in k-means we are interested in finding various sets of highly clustered vectors,
not only the optimal set. Even running the algorithm for a time linear in O(k log M N ) is
likely to suffice to construct pretty good seed sets and clusters. We can reasonably hope
that an adiabatic quantum computer that traverses the minimum gap in finite time τ at
finite temperature T should be able to find approximate solutions whose energy is within
max{O(kT ), O(h̄/τ )} of the minimum energy. The question of how well the adiabatic
algorithm performs on average is an open one.
Discussion:
The power of quantum computers to manipulate large numbers of high-dimensional
vectors makes them natural systems for performing vector-based machine learning tasks.
6
Operations that involve taking vector dot products, overlaps, norms, etc., in N -dimensional
vector spaces that take time O(N ) in the classical machine learning algorithms, take time
O(log N ) in the quantum version. These abilities, combined with the quantum linear
systems algorithm [18], represent a powerful suite of tools for manipulating large amounts
of data. Once the data has been processed in a quantum form, as in the adiabatic quantum
algorithm for search engine ranking [23], then measurements can be made on the processed
data to reveal aspects of the data that can take exponentially longer to reveal by classical
algorithms Here, we presented a quantum algorithm for assigning a vector to clusters of
M vectors that takes time O(log M N ), an exponential speed-up in both M (quantum big
data) and N . We used this algorithm as a subroutine for the standard k-means algorithm
to provide an exponential speed-up for unsupervised learning (quantum Lloyd’s algorithm)
via the adiabatic algorithm.
Currently, the rate of generation of electronic data generated per year is estimated to
be on the order of 1018 bits. This entire data set could be represented by a quantum state
using 60 bits, and the clustering analysis could be performed using a few hundred opera-
tions. Even if the number of bits to be analyzed were to expand to the entire information
content of the universe within the particle horizon, O(1090 ≈ 2300 ) bits, in principle the
data representation and analysis would be well within the capacity of a relatively small
quantum computer.
The generic nature of the quantum speed-ups for dealing with large numbers of high-
dimensional vectors suggests that a wide variety of machine learning algorithms may be
susceptible to exponential speed-up on a quantum computer. Quantum machine learning
also provides advantages in terms of privacy: the data base itself is of size O(M N ), but the
owner of the data base supplies only O(log M N ) quantum bits to the user who is performing
the quantum machine learning algorithm. In addition to supplying an exponential speed-
up over classical machine learning algorithms, quantum machine learning methods for
analyzing large data sets (‘big quantum data’) supply significant advantages in terms of
privacy for the owners of that data.
Acknowledgments: This work was supported by DARPA, Google, NSF, ARO under a
MURI program, Jeffrey Epstein, and FQXi. The authors thank Scott Aaronson for helpful
discussions.
7
References:
8
[16] H. Wu, R.E. George, J.H. Wesenberg, K. Mlmer, D.I. Schuster, R.J. Schoelkopf, K.M.
Itoh, A. Ardavan, J.J.L. Morton, G.A.D. Briggs, Phys. Rev. Lett. 105, 140503 (2010).
[17] M.S. Nielsen, I.L. Chuang, Quantum computation and quantum information, Cam-
bridge University Press, Cambridge, 2000.
[18] A.W. Harrow, A. Hassidim, S. Lloyd, Phys. Rev. Lett. 15, 150502 (2009); arXiv:
0811.3171.
[19] S. Aaronson, ‘BQP and the polynomial hierarchy,’ arXiv:0910.4698.
[20] L. Grover, T. Rudolpha ‘Creating superpositions that correspond to efficiently inte-
grable probability distributions,’ arXiv: quant-ph/0208112.
[21] P. Kaye, M. Mosca, in Proceedings of the International Conference on Quantum In-
formation, Rochester, New York, 2001; arXiv: quant-ph/0407102.
[22] A. N. Soklakov, R. Schack, Phys. Rev. A 73, 012307 (2006).
[23] S. Garnerone, P. Zanardi, D.A. Lidar, Phys. Rev. Lett. 108, 230506 (2012): arXiv:
1109.6546.
[24] S. Lloyd, J.-J.E. Slotine, Phys. Rev. A 6201, 2307 (2000); arXiv: quant-ph/9905064.
Supplementary material:
Here we present an adiabatic algorithm for constructing a quantum state
√ X √ X
|χi = (1/ M ) |cj i|ji = (1/ M ) |ci|ji (S1)
j c,j∈c
that contains the output of the unsupervised k-means clustering algorithm in quantum
form. This state contains a uniform superposition of all the vectors, each assigned to its
appropriate cluster, and can be sampled to provide information about which states are
in the same or in different clusters. For the quantum clustering algorithm, proceed as in
the original Lloyd’s algorithm, but express all means in quantum superposition. At the
first step, select k vectors with labels ic as initial seeds for each of the clusters. These
may be chosen at random, or in a way that maximizes the average distance between them,
as in k-means++ . Then re-cluster. We show by induction that the re-clustering can be
performed efficiently by the quantum adiabatic algorithm.
For the first step, begin with the state
1 X ′ 1 X ⊗d
√ |c i|ji √ |ci|ic i . (S2)
M k c′ j k c
9
√1
P
The multiple copies of the seed state k c |ci|ic i combined with the distance evaluation
techniques given in the paper allow one to evaluate the distances |~vj − ~vic′ |2 in the c′ j
2
component of the superposition, and to apply the phase e−i∆t|~vj −~vic′ | . This is equivalent
to applying the Hamiltonian
X
H1 = |~vj − ~vic′ |2 |c′ ihc′ | ⊗ |jihj|, (S3)
c′ j
Now perform the adiabatic algorithm with the initial Hamiltonian H0 = 1 − |φihφ|, where
√ P
|φi = (1/ k) c′ |c′ i, adiatically deforming to the Hamiltonian H1 . The time it takes
to perform the adiabatic algorithm accurately will be evaluated below. The result is the
first-order clustering state
1 X
|ψ1 i = √ |ci|ji, (S4)
M c,j∈c
where each j is associated with the c with the closest seed vector ic . By constructing
multiple copies of this state, one can also construct the individual cluster states |φc1 i =
√ P
(1/ M c ) j∈c |ji and estimate the number of states Mc in the c’th cluster.
Now continue. At the next re-clustering step, assume that d copies of the state
|ψ1 i are made available from the previous step. The ability to construct the individ-
ual cluster states |φc1 i together with the ability to perform the distance evaluation as in
the paper allows to evaluate the average distance between ~vj and the mean of cluster
c, |~vj − (1/Mc ) k∈c′ ~vk |2 = |~vj − v̄c |2 . This ability in turn allows us to apply a phase
P
2
e−i|~vj −v̄c′ | δt
to each component |c′ i|ji of the superposition, which is equivalent to applying
the Hamiltonian
X
Hf = |~vj − v̄c′ |2 |c′ ihc′ | ⊗ |jihj| ⊗ I ⊗d . (S5)
c′ j
1 X ′
√ |c i|ji|ψ1 i⊗d , (S6)
M k c′ ,j
√ P
with initial Hamiltonian 1 − |φihφ|, where |φi = (1/ k) c′ |c′ i, and gradually deform to
the final Hamiltonian Hf , rotating the |c′ i to associate each cluster label c′ with the set
of j’s that should be assigned to c′ . We obtain the final state
1 X ′
√ |c i|ji |ψ1 i⊗d = |ψ2 i|ψ1 i⊗d . (S7)
M c′ ,j∈c′
10
That is, the adiabatic algorithm can be used to assign states to clusters in the next step
of the quantum Lloyd’s algorithm.
Repeating d times to create d copies, one can now iterate this quantum adiabatic algo-
rithm to create a quantum superposition of the cluster assignments at each step. Continue
the reassignment until the cluster assignment state is unchanged (which can be verified,
e.g., using a swap test). Since Lloyd’s algorithm typically converges after a small number
√ P
of steps, we rapidly arrive at the clustering state |χi = (1/ M ) c,j∈c |ci|j ∈ ci. The re-
sulting k-means clustered quantum state |χi contains the final optimized k-means clusters
in quantum superposition and can be sampled to obtain information about the contents
of the individual clusters.
To calculate the scaling of finding the clustering state, note first that each distance
evaluation is essentially a weak measurement [24] that perturbs the clustered state |ψℓ i⊗d
√
at the previous level by an amount < d dδ 2 (measured by fidelity), where δ is the accuracy
of the distance evaluation. Accordingly, as long as the desired accuracy is δ > 1/d2/3 , d
copies of the next cluster assignment state can be created from the d copies of the previous
cluster assignment state.
To evaluate the time that the adiabatic algorithm takes note that adiabatic part of
the algorithm acts only on the c′ cluster labels, and that the overlap squared between the
initial state of each step (S6) and the final state (S7) is O(1/k). Accordingly, the time
per step that the algorithm requires is no greater than O(k log kM N ) (and could be as
small as O(log kM N ) if the minimum gap during the adiabatic stage is O(1)). As Lloyd’s
algorithm typically converges after a relatively small number of steps, our estimate for the
overall algorithm to construct the clustering state |χi is O(k log kM N ).
11