0% found this document useful (0 votes)
24 views7 pages

Sciadv Aat9004

Uploaded by

aegr82
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views7 pages

Sciadv Aat9004

Uploaded by

aegr82
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

SCIENCE ADVANCES | RESEARCH ARTICLE

PHYSICS Copyright © 2018


The Authors, some

A quantum machine learning algorithm based on rights reserved;


exclusive licensee
generative models American Association
for the Advancement
of Science. No claim to
X. Gao1*, Z.-Y. Zhang1,2, L.-M. Duan1,2† original U.S. Government
Works. Distributed
Quantum computing and artificial intelligence, combined together, may revolutionize future technologies. A under a Creative
significant school of thought regarding artificial intelligence is based on generative models. Here, we propose Commons Attribution
a general quantum algorithm for machine learning based on a quantum generative model. We prove that our NonCommercial
proposed model is more capable of representing probability distributions compared with classical generative License 4.0 (CC BY-NC).
models and has exponential speedup in learning and inference at least for some instances if a quantum com-
puter cannot be efficiently simulated classically. Our result opens a new direction for quantum machine learning
and offers a remarkable example where a quantum algorithm shows exponential improvement over classical
algorithms in an important application field.

INTRODUCTION distribution describing correlations in data is generated by measuring


A central task in the field of quantum computing is to find applications a set of observables under a many-body entangled state. In terms of the
where a quantum computer could provide exponential speedup over representational power, we prove that our introduced QGM can effi-

Downloaded from https://www.science.org on August 23, 2024


any classical computer (1–3). Machine learning represents an important ciently represent any factor graphs, which include almost all the clas-
field with broad applications where a quantum computer may offer sub- sical generative models as particular cases. Throughout this paper, the
stantial speedup (4–14). The candidate algorithms with potential expo- word “efficiently” means that the computational or memory cost is
nential speedup in runtime so far rely on efficient quantum solutions of bounded by a polynomial function of the size of the problem. Further-
linear algebraic problems (6–11). These algorithms require quantum more, we show that the QGM is exponentially more expressive than
random access memory (QRAM) as a critical component in addition factor graphs by proving that at least some instances generated by the
to a quantum computer (4, 5, 15). In a QRAM, the number of required QGM cannot be efficiently represented by any factor graph with a
quantum routing operations (16) scales up exponentially with the num- polynomial number of variables if a widely accepted conjecture in
ber of qubits in those algorithms (9, 15, 17). This exponential overhead the computational complexity theory holds, that is, the polynomial
in resource requirement poses a serious challenge for its experimen- hierarchy, which is a generalization of the famous P versus NP prob-
tal implementation and is a caveat for fair comparison with the cor- lem, does not collapse. For learning and inference in our QGM, we
responding classical algorithms (5, 18). propose a general heuristic algorithm using a combination of tech-
A significant school of thought regarding artificial intelligence is niques such as tensor networks, construction of parent Hamiltonians
based on generative models (19–21), which are widely used for prob- for many-body entangled states, and recursive quantum phase estima-
abilistic reasoning as well as for supervised and unsupervised machine tion. Although it is unreasonable to expect that the proposed quantum
learning (19–21). Generative models try to capture the full underlying algorithm has polynomial scaling in runtime in all the cases (as this
probability distributions for the observed data. Compared to discrim- would imply that a quantum computer could efficiently solve any NP
inative models such as support vector machines and feed-forward problems, an unlikely result), we prove that, at least for some in-
neural networks, generative models can express far more complex re- stances, our quantum algorithm has exponential speedup over any
lations among variables, which makes them broadly applicable but at classical algorithms, assuming that a quantum computer cannot be
the same time harder to tackle (19, 21, 22). Typical generative models efficiently simulated in general by classical computers, a conjecture
include probabilistic graphical models such as the Bayesian nets and that is believed to hold. Very recently, a different generative model
the Markov random fields (19, 20, 22), and generative neural networks for quantum machine learning was proposed on the basis of a quan-
such as the Boltzmann machines, the deep belief nets, and the gen- tum version of the Boltzmann machine (23). The quantum advantages
erative adversarial networks (21). The probability distributions in compared with the classical generative models, however, still remain
these classical generative models can be represented by the so- unknown for that model in terms of the representational power and
called factor graphs (19, 21, 22). In section S1, we give a brief intro- the runtimes for learning and inference.
duction to generative and discriminative models and their applications The intuition for quantum speedup in our algorithm can be un-
in machine learning. derstood as follows: The purpose of generative machine learning is to
Here, we propose a generative quantum machine learning model any data generation process in nature by finding the underlying
algorithm that offers potential exponential improvement on three probability distribution. As nature is governed by the law of quantum
key elements of the generative models, that is, the representational mechanics, it is too restrictive to assume that the real-world data can
power, and the runtimes for learning and inference. In our introduced always be modeled by an underlying probability distribution as in clas-
quantum generative model (QGM), the underlying probability sical generative models. Instead, in our QGM, correlations in data are
parameterized by the underlying probability amplitudes of a many-
1
body entangled state. As the interference of quantum probability am-
Center for Quantum Information, IIIS, Tsinghua University, Beijing 100084, PR plitudes can lead to phenomena much more complex than those from
China. 2Department of Physics, University of Michigan, Ann Arbor, MI 48109, USA.
*Present address: Department of Physics, Harvard University, MA 02138, USA. classical probabilistic models, it is possible to achieve marked im-
†Corresponding author. Email: [email protected] provement in our QGM under certain circumstances.

Gao et al., Sci. Adv. 2018; 4 : eaat9004 7 December 2018 1 of 7


SCIENCE ADVANCES | RESEARCH ARTICLE

RESULTS |Q〉 can be written as a special tensor network state. We define our
Factor graphs and our QGM model in this form for two reasons: First, the probability distribution
We start by defining factor graphs and our QGM. Direct characteriza- Q({xi}) needs to be general enough to include all the factor graphs;
tion of a probability distribution of n binary variables has an exponential second, for the specific form of the state |Q〉 introduced in this paper,
cost of 2n. A factor graph, which includes many classical generative the parameters in this model can be conveniently trained by a quan-
models as special cases, is a compact way to represent n-particle corre- tum algorithm on any given dataset.
lation (21, 22). As shown in Fig. 1A, a factor graph is associated with a
bipartite graph where the probability distribution can be expressed as a Representational power of our QGM
product of positive correlation functions of a constant number of varia- Representational power is a key property of a generative model. It sets
bles. Here, without loss of generality, we have assumed constant-degree the ultimate limit to which the model might be useful. The more prob-
graphs, where the maximum number of edges per vertex is bounded by ability distributions a generative model can efficiently represent, the
a constant. wider applications it can potentially have. The representational power
Our QGM is defined on a graph state |G〉 of m qubits. As a powerful is also closely related to the so-called generalization ability of a probabil-
tool to represent many-body entangled states, the graph state |G〉 is istic model (see section S2). In this subsection, we prove that the QGM
defined on a graph G (24), where each vertex in G is associated with introduced above is exponentially more expressive in terms of the re-
a qubit. To prepare the graph pffiffistate
ffi |G〉, all the qubits are initialized to presentation power compared with the classical factor graphs. This re-
the state jþi ¼ ðj0i þ j1〉Þ= 2 at the beginning, where |0〉, |1〉 denote sult is more accurately described by theorems 1 and 2. First, we show
the qubit computational basis vectors, and then a controlled phase flip that any factor graph can be viewed as a special case of our QGM by the
gate is applied to all the qubit pairs connected by an edge in the graph G. following theorem.
We then introduce the following transformation to the graph state |G〉 Theorem 1

Downloaded from https://www.science.org on August 23, 2024


The QGM defined above can efficiently represent probability distribu-
jQi ≡ M1 ⊗⋯⊗Mm jGi ð1Þ tions from any constant-degree factor graphs with arbitrary precision.
Concretely, the number of parameters in the QGM can be bounded by
where Mi denotes an invertible (in general, nonunitary) 2 × 2 matrix O(2ks), where s is the number of function nodes in the factor graph and
applied on the Hilbert space of qubit i. Note that for general nonunitary k is the degree of the graph.
Mi, the state |Q〉 cannot be directly prepared from the state |G〉 effi- As probabilistic graphical models and generative neural networks
ciently, and in our following learning algorithm, we actually first can all be reduced to constant-degree factor graphs (22), the above
transform |Q〉 into a tensor network state and then use a combination theorem shows that our proposed QGM is general enough to include
of techniques to prepare this tensor network state. From m vertices of those probability distributions in widely used classical generative
the graph G, we choose a subset of n qubits as the visible units and models. In section S3, we show how to reduce typical classical generative
measure them in the computational basis {|0〉, |1〉}. The measurement models to factor graphs with a bounded degree. Actually, by reducing
outcomes sample from a probability distribution Q({xi}) of n binary the degree of the factor graph through equivalence transformation (see
variables {xi, i = 1, 2, ⋯ n} (the other m − n hidden qubits are just fig. S4), we can consider the factor graphs with degree k = 3 without loss
traced over to give the reduced density matrix). Given a graph G of generality.
and the subset of visible vertices, the distribution Q({xi}) defines our Proof of theorem 1: First, in any factor graph with its degree
QGM, which is parameterized efficiently by the parameters in the bounded by a constant k, each of the functions (denoted by square
matrices Mi. As shown in Fig. 1C, the pure entangled quantum state nodes in Fig. 1) has at most k variables and therefore, by the universal

A B x1 C x1 x2 i
x1 x2
M1 M2
j k = δij δjk
A H
x2 i H j
f2 x5 B
f1 i x3 x4 i = (−1)i j
x3 k l
H

M3
H M4
M = Mij
j
x3 f3 x4 C D H |Q j with det M = 0

Fig. 1. Classical and quantum generative models. (A) Illustration of a factor graph, which includes widely used classical generative models as its special cases. A
factor graph is a bipartite graph where one group of the vertices represents variables (denoted by circles) and the other group of vertices represents positive functions
(denoted by squares) acting on the connected variables. The corresponding probability distribution is given by the product of all these functions. For instance, the
probability distribution in (A) is p(x1, x2, x3, x4, x5) = f1(x1, x2, x3, x4)f2(x1, x4, x5)f3(x3, x4)/Z, where Z is a normalization factor. Each variable connects to at most a constant
number of functions that introduce correlations in the probability distribution. (B) Illustration of a tensor network state. Each unshared (shared) edge represents a
physical (hidden) variable, and each vertex represents a complex function of the variables on its connected edges. The wave function of the physical variables is defined
as a product of the functions on all the vertices, after summation (contraction) of the hidden variables. Note that a tensor network state can be regarded as a quantum
version of the factor graph after partial contraction (similar to the marginal probability in the classical case), with positive real functions replaced by complex functions.
(C) Definition of a QGM introduced in this paper. The state |Q〉 represented here is a special kind of tensor network state, with the vertex functions fixed to be three
types as shown on the right side. Without the single-qubit invertible matrix Mi, which contains the model parameters, the wave function connected by Hadamard and
identity matrices just represent a graph state. To get a probability distribution from this model, we measure a subset of n qubits (among total m qubits corresponding
to the physical variables) in the computational basis under this state. The unmeasured m − n qubits are traced over to get the marginal probability distribution P({xi}) of
the measured n qubits. We prove in this paper that the P({xi}) is general enough to include probability distributions of all the classical factor graphs and special enough
to allow a convenient quantum algorithm for the parameter training and inference.

Gao et al., Sci. Adv. 2018; 4 : eaat9004 7 December 2018 2 of 7


SCIENCE ADVANCES | RESEARCH ARTICLE

approximation theorem (25), it can be approximated arbitrarily well additional single-qubit matrix D1 and D2 to account for the remaining
with a restricted Boltzmann machine with k visible variables and 2k difference between −ax1/2 − ax2/2 and bx1 + cx2 in the correlation
hidden variables. In section S4, we give an accurate description of function f(x1p , xffiffiffiffiffiffiffiffiffiffiffiffi
2). Specifically,
ffi we takeffi D1 and D2 to be diagonal with
pffiffiffiffiffiffiffiffiffiffiffiffi
the universal approximation theorem and an explanation of the eigenvalues d1 ðx1 Þ and d2 ðx2 Þ , respectively. As shown in Fig. 2
restricted Boltzmann machine. As the degree k of the factor graph (C and D), the correlator between x1 and x2 in the QGM is then given
can be reduced to a very small number (such as k = 3), the number by d1 ðx1 Þd2 ðx2 Þ½l1 dx1 x2 þ l2 ð1  dx1 x2 Þ=2, and one simple solution
of hidden variables 2k only represents a moderate constant cost, which with d1(0) = d2(0) = l1/2 = 1 and d1(1) = eb + a/2, d2(1) = ec + a/2, l2 =
does not increase with the system size. In a restricted Boltzmann ma- 2e−a/2 makes it equal to the correlation function f(x1, x2) with arbi-
chine, only the visible and the hidden variables are connected by the trary coefficients a, b, c. From the above proof, each function node
two-variable correlator that takes the generic form f ðx1 ; x2 Þ ¼ introduces the number of parameters of the order O(2k) independent
eax1 x2 þbx1 þcx2 , where x1, x2 denote the binary variables and a, b, c are of the representation precision. For a factor graph of s function nodes
real parameters. The representation precision using the universal ap- (note that s is upper bounded by nk, where n is the number of varia-
proximation theorem does not affect the number of variables but only bles), the total number of parameters is of the order O(2ks). This
depends on the range of the parameters a, b, c, e.g., arbitrary prevision completes the proof.
can be achieved if a, b, c are allowed to vary over the whole real axis. As Furthermore, we can show that the QGM is exponentially more
Q({xi}) has a similar factorization structure as the factor graph after expressive than factor graphs in representing some probability dis-
measuring the visible qubits xi under a diagonal matrix Mi (see tributions. This is summarized by the following theorem.
Fig. 2), it is sufficient to show that each correlator f(x1, x2) can be Theorem 2
constructed in the QGM. This construction can be achieved by adding If the polynomial hierarchy in the computational complexity theory
one hidden variable (qubit) j with the invertible matrix Mj between does not collapse, there exist probability distributions that can be effi-

Downloaded from https://www.science.org on August 23, 2024


two visible variables x1 and x2. As shown in Fig. 2, Q({x1, x2}) can ciently represented by a QGM but cannot be efficiently represented
be calculated by contracting two copies of a diagram from Fig. 1C even under approximation by conditional probabilities from any clas-
as 〈Q|x1, x2〉〈x1, x2|Q〉. To make this equal to arbitrary correlation sical generative models that are reducible to factor graphs.
f(x1, x2) between the two variables x1 and x2, we first take thepmatrix
ffiffiffi The rigorous proof of this theorem is lengthy and technical, and we
Mj† Mj ¼ l1 j þ 〉〈 þ j þ l2 j  〉〈  j, where j±〉 ¼ ðj0〉±j1〉Þ= 2, and thus include it in section S5. Here, we briefly summarize the major idea
l1 and l2 terms account for positive and negative correlations, respec- of the proof: In the QGM, we construct the underlying distribution by
tively. Here, positive (negative) correlation means that two variables adding up the probability amplitudes (complex numbers), while in the
tend to be the same (different). The convex hull of these positive and factor graphs, we only add up probabilities (positive real numbers). The
negative correlations generates any symmetric correlation of the form complexity of these two addition problems turns out to be different:
eax1 x2 ax1 =2ax2 =2 between the two variables x1 and x2. Then, we take Adding the positive probabilities up to get the normalization factor in

A B
x1 x2 ax x +bx +cx
f f (x1 , x2 ) = e x = x x
C Hidden variable
x =
i i x = δxi
H H x11 H H x2 x x
D∗1 M D∗ 2 d(x) δxy
D = D = d(x)
x1 x2 = d1 (x1 )d2 (x2 ) M M y x
D1 M D2
|+

H H x1 H H x22 =
Measurement results
D =
|

x1 H H x2 x1 H H x2 x1 H H x2 Z
|
|+ +|

λ1|+ +|+λ2| | = λ1 + λ2 x1 x2 x1 X x2
|

= λ1 + λ2
x1 x2 x1 X x2
x1 H H x2 x1 H H x2 x1 H H x2
Fig. 2. Efficient representation of factor graphs by the QGM. (A) General form of correlation functions of two binary variables in a factor graph, with parameters a, b,
c being real. This correlation acts as the building block for general correlations in any factor graphs pffiffiffiffiffiffiffiffiffiby use of the universal approximation theorem (25). (B) Notations of
some common pffiffiffi tensors and their identities: D is a diagonal matrix with diagonal elements d ðx Þ with x = 0, 1; Z is the diagonal Pauli matrix diag(1, −1); and j±i ¼
ðj0i±j1〉Þ= 2. (C) Representation of the building block correlator f(x1, x2) in a factor graph [see (A)] by the QGM with one hidden variable (unmeasured) between two
visible variables x1, x2 (measured in the computational basis). As illustrated in this figure, f(x1, x2) can be calculated as contraction of two copies of a diagram from Fig.
1C asp〈Q|x 1, xp
ffiffiffiffiffiffiffiffiffiffiffi 2〉〈x 1, x2|Q〉, where x1, x2p
ffiffiffiffiffiffiffiffiffiffiffi are measurement
ffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffi results of the two visible variables. We choose the single-bit matrix D1, D2 to be diagonal with D1 ¼
diagð d1 ð0Þ; d1 ð1ÞÞ and D2 ¼ diagð d2 ð0Þ; d2 ð1ÞÞ. For simplification of this graph, we have used the identities shown in (B). (D) Further simplification of the
graph in (C), where we choose the form of the single-bit matrix M†M acting on the hidden variable to be M†M = l1| + 〉〈 + | + l2| − 〉〈 − | with positive eigenvalues l1, l2.
We have used the identities in (B) and the relation HZH = X, where X (H) denotes the Pauli (Hadamard) matrix, respectively. By solving the values of l1, l2, d1(x1), d2(x2) in
terms of a, b, c (see the proof of theorem 1), this correlator of the QGM exactly reproduces the building block correlator f(x1, x2) of the factor graph.

Gao et al., Sci. Adv. 2018; 4 : eaat9004 7 December 2018 3 of 7


SCIENCE ADVANCES | RESEARCH ARTICLE

factor graphs is a nondecreasing process and its complexity is at a lower we choose to minimize the Kullback-Leibler (KL) divergence D(qd||pq) =
level of the polynomial hierarchy in the computational complexity the- − ∑vqd(v)log(pq(v)/qd(v)) between qd, the distribution of the given data
ory according to the Stockmeyer’s theorem (26). In contrast, the sum- sample, and pq, the distribution of the generative model, with the
mation of complex numbers to get the normalization factor in the QGM whole parameter set denoted by q. Typically, one minimizes D(qd||pq)
involves marked oscillating behaviors, similar to the sign problem in the by optimizing the model parameters q using the gradient descent
quantum Monte Carlo method (27), which puts its computational method (21). The q-dependent part D(q) of D(qd||pq) can be expressed
complexity at a much higher level. Because of this difference in the com- as − ∑v∈data set log pq(v)/M, where M denotes the total number of data
putational complexity, by adding up probability amplitudes in the points.
QGM, we can generate a much wider type of distributions, which are In our QGM, both the conditional probability ∑yp(x, y|z) and the
hard to represent by factor graphs through adding up the positive prob- gradient of the KL divergence ∂qD(q) can be conveniently calculated
abilities. To represent some distributions generated by the QGM with using the structure of state |Q〉 defined in Fig. 1. We first define a tensor
factor graphs, the number of parameters in the factor graphs is required network state |Q(z)〉 ≡ (I ⊗ 〈z|)|Q〉 by projecting the variable set z to the
to scale up exponentially with the size of the QGM, and therefore, there computational basis. As shown in section S8, the conditional probability
is an exponential gap between the representational power of these two can be expressed as
models if the polynomial hierarchy in the computational complexity
theory does not collapse.
Theorems 1 and 2 show that the QGM is much more powerful to ∑y pðx; yjzÞ ¼ 〈QðzÞjOjQðzÞ〉
〈QðzÞjQðzÞ〉
ð2Þ
represent probability distributions compared with the classical factor
graphs. Representational power is important for the success of a ma-
chine learning model. It is closely related to the generalization ability. which is the expectation value of the operator O = |x〉 〈x| under the state

Downloaded from https://www.science.org on August 23, 2024


Generalization ability characterizes the ability of a model to make a |Q(z)〉. For inference problems, we typically only need to know the
good prediction for new data by using as few training data as possible. distribution p(x|z) = ∑yp(x, y|z) in the label x by a constant precision.
Higher representational power usually implies better generalization This can be conveniently achieved by measuring the operator O under
ability in practice (21, 28). Similar to the principle of Occam’s razor the state |Q(z)〉. The measurement results correspond to an importance
in physics, a good choice for the machine learning model is the one with sampling and automatically give the dominant x labels for which the
a minimum number of parameters but still can well explain the ob- probability distribution p(x|z) has notable nonzero values. Similarly,
served training data. The representational power characterizes this we show in section S8 that ∂qD(q) can be reduced to a combination of
feature by representing a wide class of distributions using as few terms taking the same form as Eq. 2, with operator O replaced by O1 ¼
parameters as possible. Our QGM can efficiently represent some ð∂qi Mi ÞMi1 þ H:c: or O2 ¼ jvi ihvi jð∂qi Mi ÞMi1 þ H:c:, where qi de-
probability distributions that are out of the reach of the classical notes a specific parameter in the invertible matrix Mi, vi is the qubit of
factor graphs, as the latter may need an exponentially large number data v corresponding to the variable xi, and H. c. stands for the Hermi-
of parameters for the representation. This suggests that our proposed tian conjugate term. The variable z in this case takes the value of v (or v
QGM is a good choice for the machine learning model in terms of the excluding vi) expressed in a binary string.
representational power. With the above simplification, training or inference in the QGM is
As any factor graphs can be efficiently represented by our QGM, this reduced to preparation of the tensor network state |Q(z)〉. With an
suggests that we should not expect that an arbitrary state |Q〉 can be algorithm similar to the one in (29), we use recurrent quantum phase
prepared efficiently with a quantum computer. If we can prepare arbi- estimation to prepare the state |Q(z)〉. For this purpose, we first
trary |Q〉 efficiently, we can efficiently solve the inference problem for construct the parent Hamiltonian H(z), which has a unique ground
any factor graph, which is known to be an NP-hard problem and un- state given by |Q(z)〉 with zero eigenenergy in the generic case as shown
likely to be fully solved with a quantum computer. The detailed proof of in (30). The procedure is illustrated in Fig. 3, where the variables z = {zi}
this statement is included in section S6. For applications in generative are grouped as in Fig. 3A. In this case, the corresponding local tensors in
learning, we have the freedom to choose the topology and the parameter the tensor network state |Q(z)〉 are all easy to compute and of a constant
form of |Q〉 and only need to use a subset of states |Q〉 that can be effi- degree. The parent Hamiltonian H(z) is constructed through contrac-
ciently prepared. Normally, we first construct the parent Hamiltonian tion of these local tensors as shown in Fig. 3 (C and D) (30).
for the state |Q〉 (see section S7) and then use the tensor network for- By constructing the parent Hamiltonian for the tensor network
malism for its preparation, inference, and learning, as will be explained state, the quantum algorithm for training and inference in the
in the next section. QGM is realized through the following steps:
Step 1: Construct a classical description of a series of tensor network
A quantum algorithm for inference and training of our QGM states {|Qt〉} with t = 0, 1,..., n, as shown in Fig. 3C, by reduction from |
For a generative model to be useful for machine learning, apart from its Qn〉 = |Q(z)〉. The initial state |Q0〉 is a product state |0〉⊗n, and |Qt〉 is
representational power and generalization ability, we also need to have constructed from |Qt −1〉 by adding one more tensor in |Q(z)〉 that is not
an efficient algorithm for both training and inference. Training is the contained in |Qt −1〉 and setting the uncontracted virtual indices as 0.
process of tuning parameters in a model to represent the probability Step 2: Construct a classical description of the parent Hamiltonian
distribution as close as possible to that of the dataset. This usually in- Ht for each |Qt〉 with the method illustrated in Fig. 3 (B to D) (30).
volves minimization of a cost function, which determines how close Step 3: On a quantum computer, starting from |Q0〉, we prepare
these two distributions are. Once we have a trained model, we make |Q1〉,..., |Qn〉 sequentially. Suppose that we have prepared |Qt −1〉.
inference to extract useful information for analysis or prediction. Most The following substeps will prepare |Qt〉 based on the recursive
of inference tasks can be reduced to computing conditional probability quantum phase estimation to measure the eigenstates of the parent
∑yp(x, y|z) (22), where x, y, z denote different variable sets. For training, Hamiltonian Ht.

Gao et al., Sci. Adv. 2018; 4 : eaat9004 7 December 2018 4 of 7


SCIENCE ADVANCES | RESEARCH ARTICLE

A B C
Conditioned variables p
p j q r
i i
p
i = Apijk
j j
k
Unconditioned k
variables

D E F
|Qt |Qt |Qt |Qt |Qt
p q r |Qt 1 1 1 |Qt 1
ηt
= Lpqr,ij |Qt 1 − ηt 1 − ηt
ηt
i j
√ |Qt 1 |Qt |Qt 1 |Qt |Qt 1
Arccos ηt
Fig. 3. Illustration of our training algorithm for the QGM. (A) Training and inference of the QGM are reduced to measuring certain operators under the state |Q(z)〉. The key
step of the quantum algorithm is therefore to prepare the state |Q(z)〉, which is achieved by recursive quantum phase estimation of the constructed parent Hamiltonian. The
variables in the set z whose values are specified are called conditioned variables, whereas the other variables that carry the binary physical index are called unconditioned
variables. We group the variables in a way such that each group contains only one unconditioned variable and different groups are connected by a small constant number
of edges (representing virtual indices or hidden variables). Each group then defines a tensor with one physical index (denoted by p) and a small constant number of virtual indices
(denoted by i, j, k in the figure). (B) Tensor network representation of |Q(z)〉, where a local tensor is defined for each group specified in (A). (C) Tensor network representation of |Qt〉,
where |Qt〉 are the series of states reduced from |Qz〉. In each step of the reduction, one local tensor is moved out. The moved-out local tensors are represented by the unfilled

Downloaded from https://www.science.org on August 23, 2024


circles, each carrying a physical index set to 0. For the edges between the remaining tensor network and the moved-out tensors, we set the corresponding virtual indices to 0
(represented by the unfilled circles). (D) Construction of the parent Hamiltonian. The figure shows how to construct one term in the parent Hamiltonian, which corresponds to a
group of neighboring local tensors such as those in the dashed box in (C). After contraction of all virtual indices among the group, we get a tensor Lpqr,ij, which defines a linear
map L from the virtual indices i, j to the physical indices p, q, r. As the indices i, j take all the possible values, the range of this mapping L spans a subspace range(L) in the Hilbert
space Hp,q,r of the physical indices p, q, r. This subspace has a complementary orthogonal subspace inside Hp,q,r, denoted by comp(L). The projector to the subspace comp(L)
then defines one term in the parent Hamiltonian, and by this definition, |Qt〉 lies in the kernel space of this projector. We construct each local term with a group of neighboring
tensors. Each local tensor can be involved in several Hamiltonian terms [as illustrated in (C) by the dashed box and the dotted box]; thus, some neighboring groups have non-
empty overlap, and they generate terms that, in general, do not commute. By this method, one can construct the parent Hamiltonian whose ground state generally uniquely
defines the state |Qt〉 (30) [technically, this step requires the tensor network to be injective (29), a condition that is generically satisfied
 forrandom choice of the tensor networks
(30)]. (E) States involved in the evolution from |Qt −1〉 to |Qt〉 by quantum phase estimation applied on their parent Hamiltonians. Q⊥t1 ; jQ⊥t irepresent the states orthogonal
to |Qt −1〉, |Qt〉, respectively, inside the two-dimensional (2D) subspace spanned by |Qt −1〉 and |Qt〉. The angle between |Qt〉 and |Qt −1〉 is determined by the overlap ht = |〈Qt|Qt −1〉|2.
(F) State evolution under recursive application of the quantum phase estimation algorithm. Starting from the state |Qt −1〉, we always stop at the state |Qt〉, following any branch of
this evolution, where ht and 1 − ht denote the probabilities of the corresponding outcomes.

Substep 1: We use the quantum phase estimation algorithm to mea- computes n Hamiltonians with runtime Oðncþ1 Þ for some constant c,
sure whether the eigenenergy of the parent Hamiltonian Ht is zero (31), which denotes the number of nodes involved for constructing a local
which implements a projective measurement to the corresponding ei- term of the parent Hamiltonians with the method shown in Fig. 3D
genspaces {|Qt〉〈Qt|, I − |Qt〉〈Qt|} of Ht with zero and nonzero eigenener- [thus, there are Oðnc Þ local terms]; Ht constructed by this method ge-
gies, respectively (see Fig. 3E). Similarly, we can implement the nerically has the unique ground state |Qt〉 (30) and the larger c is, the
projective measurement {|Qt −1〉〈Qt −1|, I − |Qt −1〉〈Qt −1|} by the quan- more likely there is no ground-state degeneracy.
tum phase estimation using the parent Hamiltonian Ht −1. A quantum computer is required for implementation of step 3.
Substep 2: Starting from |Qt −1〉, we perform the projective measure- When this step gives the correct answer, the correctness of this
ment {|Qt〉〈Qt|, I − |Qt〉〈Qt|}. If the result
 is |Qt〉, we succeed and skip the algorithm is guaranteed. Let Dt denote the energy gap for the parent
following substeps. Otherwise, we get Q⊥t lying in the plane spanned by Hamiltonian Ht. Because we require the precision of the quantum phase
|Qt −1〉 and |Qt〉. estimation to be D to distinguish the ground state from the excited states
Substep 3: We perform the  projective
 measurement {|Qt −1〉〈Q  t −1|, of Ht, the quantum phase estimation algorithm in substep 1 has runtime
I − |Qt −1〉〈Qt −1|} on the state Q⊥t . The result is either |Qt −1〉 or Q⊥t1 . complexity Oðn~ 2c =DÞ for the QGM defined on a constant-degree graph
Substep 4: We perform the projective measurement {|Qt〉〈Qt|, I − |Qt〉 (32–34), where D = mintDt and Oð⋅Þ ~ suppresses slowly growing factors
  succeed in getting |Qt〉, with probability ht = |〈Qt|
〈Qt|} again. We either o(1)
such as (n/D) (6, 30). The key idea of step 3 is that the subspace
Qt −1〉|2, or have Q⊥t . In the latter case, we go back to the substep 3 and spanned by {|Qt〉, |Qt −1〉}, as shown in Fig. 3E, is the invariant subspace
continue until success. of all the projective operators involved. It can be seen from the evolution
Step 4: After successful preparation of the state |Q(z)〉, we measure tree of step 3 (shown in Fig. 3F) that the construction of |Qt〉 from |Qt −1〉
the operator O (for inference) or O1, O2 (for training), and the expec- can terminate within 2k + 1 iterations with probability
tation value of the measurement gives the required conditional prob-
ability (for inference) or the gradient of the KL divergence (for training).  k
In the following, we analyze the runtime of this algorithm for getting termðkÞ ¼ 1  ð1  ht Þ h2t þ ð1  ht Þ2 ð3Þ
the correct answer. Step 1 computes n tensor network states with run-
time OðnÞ because we only add one tensor with a constant number of This implies that within s steps of iteration, the probability of failure
variables in constructing each of n states |Qt〉 with t from 1 to n. Step 2 to get |Qt〉 is smaller than pfail(s) < she/2 for all t, where e is the base of the

Gao et al., Sci. Adv. 2018; 4 : eaat9004 7 December 2018 5 of 7


SCIENCE ADVANCES | RESEARCH ARTICLE

natural logarithm and h = mintht (30). Denote the probability of failure to circuits with m qubits; and Vt is the tth gate. A similar type of the history
get |Qn〉 as e, then we require (1 − pfail)n ≥ 1 − e, so s > n/(2hee) is suf- state has been used before to prove the QMA (quantum Merlin-Arthur)
~ 2cþ1 =ðhDeÞÞ.
ficient. Thus, the runtime of preparing |Qt〉 in step 3 is Oðn hardness for spin systems in a 2D lattice (37). For this specific |Q(z)〉, we
Iterating from 1 to n, we can get |Q(z)〉 with a probability of at least 1 − e. prove that both the gap Dt and the overlap ht scale as 1/ poly (n) for all
Measuring operator O over |Q(z)〉 1/d times, the result will be an approx- the steps t, by calculating the parent Hamiltonian of |Qt〉 with proper
imation of the conditional probability to an additive error d as shown in grouping of the local tensors. Our heuristic algorithm for training
Eq. 2. and inference therefore can be accomplished in a polynomial time if
Therefore, the total runtime of approximating the conditional all the conditional probabilities that we need can be generated from
probability is those history states. On the other hand, our specific state |Q(z)〉 encodes
universal quantum computation through representation of the history
~ 2cþ2 =ðhDedÞÞ state, so it not only should include a large set of useful distributions but
T ¼ Oðn ð4Þ also cannot be achieved by any classical algorithm in a polynomial time
if universal quantum computation cannot be efficiently simulated by a
The gap Dt and the overlap ht depend on the topology of the graph classical computer.
G, the variable set z, and the parameters of the matrices Mi. If these Here, we focus on the complexity theoretical proofs to support the
two quantities are bounded by poly (n) for all the steps t from 1 to n, quantum advantages of our proposed QGM in terms of the representa-
this quantum algorithm will be efficient with its runtime bounded by tional power and the runtimes for learning and inference. Before ending
poly (n). the paper, we briefly discuss the possibility to apply the model here for
Gradient of the KL divergence for each parameter is constituted solving practical problems in machine learning. We mention two
by several simple terms (see section S8), where each term is similar to examples in section S10 and show a sketchy diagram there on how to

Downloaded from https://www.science.org on August 23, 2024


the expression of the conditional probability except that the operator embed our QGM model into a typical machine learning pipeline. The
O is replaced by O1 or O2. The number of terms is proportional to M, first example is on classification of handwritten digits, which is a typical
which is the number of training data for a full gradient descent task for supervised learning. The second example is on completion of
method, or the batch size using the stochastic gradient descent incomplete pictures, a typical unsupervised learning task. The diagrams
method, which usually requires only O(1) data for calculation of the in section S10 illustrate the basic steps for these two examples, which
gradient (22). require a combination of both classical and quantum computing. The
numerical test for these two examples requires efficient classical
Quantum speedup simulation of the quantum computing part, which puts restrictions
Although we do not expect the above algorithm to be efficient in the on the topology of the underlying graph states, requires a lot of numer-
worst case [even for the simplified classical generative model such as ical optimizations, and is still an ongoing project. The efficiency and
the Bayesian network, the worst-case complexity is at least NP hard runtimes for real-world problems need to be tested with a quantum
(35)], we know that the QGM with the above heuristic algorithm will computer or classical simulation of sufficiently large quantum circuits,
provide exponential speedup over classical generative models for some which, of course, is not a easy task and remains an interesting question
instances. In section S9, we give a rigorous proof that our inference and for the future.
learning algorithm has an exponential speedup over any classical
algorithm for some instances under a well-accepted conjecture in the Summary
computational complexity theory. We arrive at the following theorem. In summary, we have introduced a QGM for machine learning and
Theorem 3 proven that it offers potential exponential improvement in the represen-
There exist instances for computing the conditional probabilities or the tational power over widely used classical generative models. We have
gradients of the KL divergence in our QGM to an additive error 1/ poly also proposed a heuristic quantum algorithm for training and making
(n) such that (i) our algorithm can achieve those calculations in a poly- inference on our model, and proven that this quantum algorithm offers
nomial time and (ii) any classical algorithm cannot accomplish them in exponential speedup over classical algorithms at least for some instances
a polynomial time if universal quantum computing cannot be efficiently if quantum computing cannot be efficiently simulated by a classical
simulated by a classical computer. computer. Our result combines the tools of different areas and generates
The proof of this theorem is lengthy and can be found in the Sup- an intriguing link between quantum many-body physics, quantum
plementary Materials. Here, we briefly summarize the major idea. We computational complexity theory, and the machine learning frontier.
construct a specific |Q(z)〉 that corresponds to the tensor network repre- This result opens a new route to apply the power of quantum
sentation of the history state for universal quantum circuits rearranged computation to solving the challenging problems in machine learning
into a 2D spatial layout. The history state is a powerful tool in quantum and artificial intelligence, which, apart from its fundamental interest,
complexity theory (36). It is a superposition state of the entire comput- may have wide applications in the future.
ing history of an arbitrary universal quantum circuit

 E T SUPPLEMENTARY MATERIALS
yhistory ¼ pffiffiffiffiffiffiffiffiffiffiffiffi ∑ jti⊗Vt ⋯V1 j0i⊗m
 1
ð5Þ Supplementary material for this article is available at http://advances.sciencemag.org/cgi/
T þ 1 t¼0 content/full/4/12/eaat9004/DC1
Section S1. A brief introduction to generative models
Section S2. Representational power and generalization ability
where T is the number of total gates in a quantum circuit, which is as- Section S3. Reduction of typical generative models to factor graphs
sumed to be a polynomial function of the total qubit number m; |t〉 is a Section S4. Universal approximation theorem
quantum state encoding the step counter t; |0〉⊗m is the input state of the Section S5. Proof of theorem 2

Gao et al., Sci. Adv. 2018; 4 : eaat9004 7 December 2018 6 of 7


SCIENCE ADVANCES | RESEARCH ARTICLE

Section S6. NP hardness for preparation of an arbitrary QGM state |Q〉 21. I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, 2016).
Section S7. Parent Hamiltonian of the state |Q〉 22. C. M. Bishop, Pattern Recognition and Machine Learning (Springer, 2006).
Section S8. Training and inference in the QGM 23. M. H. Amin, E. Andriyash, J. Rolfe, B. Kulchytskyy, R. Melko, Quantum Boltzmann machine.
Section S9. Proof of theorem 3 Phys. Rev. X 8, 021050 (2018).
Section S10. Applying the QGM to practical examples 24. R. Raussendorf, H. J. Briegel, A one-way quantum computer. Phys. Rev. Lett. 86,
Fig. S1. Parameter space of factor graph and QGM. 5188–5191 (2001).
Fig. S2. Probabilistic graphical models. 25. N. Le Roux, Y. Bengio, Representational power of restricted Boltzmann machines and
Fig. S3. Energy-based neural networks. deep belief networks. Neural Comput. 20, 1631–1649 (2008).
Fig. S4. Simulating graphs of unbounded degrees with graphs of constantly bounded degrees. 26. L. J. Stockmeyer, The polynomial-time hierarchy. Theor. Comput. Sci. 3, 1–22 (1976).
Fig. S5. Illustration of the universal approximation theorem by a restricted Boltzmann machine. 27. E. Y. Loh Jr., J. E. Gubernatis, R. T. Scalettar, S. R. White, D. J. Scalapino, R. L. Sugar, Sign
Fig. S6. #P-hardness for the QGM. problem in the numerical simulation of many-electron systems. Phys. Rev. B 41,
Fig. S7. Contraction between two local tensors using the structure of QGM state and 9301–9307 (1990).
conditioned variables. 28. S. Shalev-Shwartz, S. Ben-David, Understanding Machine Learning: From Theory to
Fig. S8. Construction of the history state. Algorithms (Cambridge Univ. Press, 2014).
Fig. S9. Flow chart for a machine learning process using the QGM. 29. M. Schwarz, K. Temme, F. Verstraete, Preparing projected entangled pair states on a
References (38–40) quantum computer. Phys. Rev. Lett. 108, 110502 (2012).
30. D. Perez-Garcia, F. Verstraete, M. M. Wolf, J. I. Cirac, Peps as unique ground states of local
Hamiltonians. Quantum Inf. Comput. 8, 650–663 (2008).
31. D. S. Abrams, S. Lloyd, Quantum algorithm providing exponential speed increase for
REFERENCES AND NOTES finding eigenvalues and eigenvectors. Phys. Rev. Lett. 83, 5162–5165 (1999).
1. P. W. Shor, Polynomial-time algorithms for prime factorization and discrete logarithms on 32. D. Nagaj, P. Wocjan, Y. Zhang, Fast amplification of QMA. Quantum Inf. Comput. 9,
a quantum computer. SIAM Rev. 41, 303–332 (1999). 1053–1068 (2009).
2. R. P. Feynman, Simulating physics with computers. Int. J. Theor. Phys. 21, 467–488 (1982). 33. M. A. Nielsen, I. L. Chuang, Quantum Computation and Quantum Information (Cambridge
3. S. Lloyd, Universal quantum simulators. Science 273, 1073–1078 (1996). Univ. Press, 2000).
4. J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, S. Lloyd, Quantum machine 34. D. W. Berry, G. Ahokas, R. Cleve, B. C. Sanders, Efficient quantum algorithms for simulating

Downloaded from https://www.science.org on August 23, 2024


learning. Nature 549, 195–202 (2017). sparse Hamiltonians. Commun. Math. Phys. 270, 359–371 (2007).
5. C. Ciliberto, M. Herbster, A. D. Ialongo, M. Pontil, A. Rocchetto, S. Severini, L. Wossnig, 35. P. Dagum, M. Luby, Approximating probabilistic inference in Bayesian belief networks is
Quantum machine learning: A classical perspective. arXiv:1707.08561 [quant-ph] NP-hard. Artif. Intell. 60, 141–153 (1993).
(26 July 2017). 36. A. Yu. Kitaev, A. H. Shen, M. N. Vyalyi, Classical and Quantum Computation (American
6. A. W. Harrow, A. Hassidim, S. Lloyd, Quantum algorithm for linear systems of equations. Mathematical Society, 2002), vol. 47.
Phys. Rev. Lett. 103, 150502 (2009). 37. R. Oliveira, B. M. Terhal, The complexity of quantum spin systems on a two-dimensional
7. N. Wiebe, D. Braun, S. Lloyd, Quantum algorithm for data fitting. Phys. Rev. Lett. 109, square lattice. Quantum Inf. Comput. 8, 900–924 (2008).
050505 (2012). 38. X. Gao, S.-T. Wang, L.-M. Duan, Quantum supremacy for simulating a translation-invariant
8. S. Lloyd, M. Mohseni, P. Rebentrost, Quantum algorithms for supervised and Ising spin model. Phys. Rev. Lett. 118, 040502 (2017).
unsupervised machine learning. arXiv:1307.0411 [quant-ph] (1 July 2013). 39. S. Aaronson, A. Cojocaru, A. Gheorghiu, E. Kashefi, On the implausibility of classical client
9. S. Lloyd, M. Mohseni, P. Rebentrost, Quantum principal component analysis. Nat. Phys. blind quantum computing. arXiv:1704.08482 [quant-ph] (27 April 2017).
10, 631–633 (2014). 40. J. Kempe, A. Kitaev, O. Regev, The complexity of the local Hamiltonian problem.
10. P. Rebentrost, M. Mohseni, S. Lloyd, Quantum support vector machine for big data SIAM J. Comput. 35, 1070–1097 (2006).
classification. Phys. Rev. Lett. 113, 130503 (2014).
11. I. Cong, L. Duan, Quantum discriminant analysis for dimensionality reduction and
Acknowledgments: We thank D. Deng, J. Liu, T. Liu, M. Lukin, S. Lu, L. Wang, S. Wang, and
classification. New J. Phys. 18, 073011 (2016).
Z. Wei for helpful discussions. Funding: This work was supported by the Ministry of Education
12. F. G. S. L. Brandão, K. Svore, Quantum speed-ups for semidefinite programming.
and the National Key Research and Development Program of China (2016YFA0301902).
arXiv:1609.05537 [quant-ph] (18 September 2016).
L.-M.D. and Z.-Y.Z. also acknowledge support from the MURI program. Author contributions:
13. F. G. S. L. Brandão, A. Kalev, T. Li, C. Y.-Y. Lin, K. M. Svore, X. Wu, Quantum SDP solvers:
X.G. and Z.-Y.Z. carried out the work under L.-M.D.’s supervision. All the authors made
Large speed-ups, optimality, and applications to quantum learning. arXiv:1710.02581
substantial contributions to this work. Competing interests: The authors declare that they
[quant-ph] (6 October 2017).
have no competing interests. Data and materials availability: All data needed to evaluate
14. E. Farhi, J. Goldstone, S. Gutmann, J. Lapan, A. Lundgren, D. Preda, A quantum adiabatic
the conclusions in the paper are present in the paper and/or the Supplementary Materials.
evolution algorithm applied to random instances of an NP-complete problem. Science
Additional data related to this paper may be requested from the authors. Correspondence and
292, 472–475 (2001).
requests for materials should be addressed to L.-M.D. ([email protected]) or X.G.
15. V. Giovannetti, S. Lloyd, L. Maccone, Quantum random access memory. Phys. Rev. Lett.
([email protected]).
100, 160501 (2008).
16. X. X. Yuan, J.-J. Ma, P.-Y. Hou, X.-Y. Chang, C. Zu, L.-M. Duan, Experimental demonstration
of a quantum router. Sci. Rep. 5, 12452 (2015). Submitted 16 April 2018
17. V. Giovannetti, S. Lloyd, L. Maccone, Architectures for a quantum random access memory. Accepted 7 November 2018
Phys. Rev. A 78, 052310 (2008). Published 7 December 2018
18. S. Aaronson, Read the fine print. Nat. Phys. 11, 291–293 (2015). 10.1126/sciadv.aat9004
19. T. Jebara, Machine Learning: Discriminative and Generative (Springer Science & Business
Media, 2012), vol. 755. Citation: X. Gao, Z.-Y. Zhang, L.-M. Duan, A quantum machine learning algorithm based on
20. S. Russell, P. Norvig, Artificial Intelligence: A Modern Approach (Prentice Hall Press, ed. 3, 2009). generative models. Sci. Adv. 4, eaat9004 (2018).

Gao et al., Sci. Adv. 2018; 4 : eaat9004 7 December 2018 7 of 7

You might also like