Sciadv Aat9004
Sciadv Aat9004
RESULTS |Q〉 can be written as a special tensor network state. We define our
Factor graphs and our QGM model in this form for two reasons: First, the probability distribution
We start by defining factor graphs and our QGM. Direct characteriza- Q({xi}) needs to be general enough to include all the factor graphs;
tion of a probability distribution of n binary variables has an exponential second, for the specific form of the state |Q〉 introduced in this paper,
cost of 2n. A factor graph, which includes many classical generative the parameters in this model can be conveniently trained by a quan-
models as special cases, is a compact way to represent n-particle corre- tum algorithm on any given dataset.
lation (21, 22). As shown in Fig. 1A, a factor graph is associated with a
bipartite graph where the probability distribution can be expressed as a Representational power of our QGM
product of positive correlation functions of a constant number of varia- Representational power is a key property of a generative model. It sets
bles. Here, without loss of generality, we have assumed constant-degree the ultimate limit to which the model might be useful. The more prob-
graphs, where the maximum number of edges per vertex is bounded by ability distributions a generative model can efficiently represent, the
a constant. wider applications it can potentially have. The representational power
Our QGM is defined on a graph state |G〉 of m qubits. As a powerful is also closely related to the so-called generalization ability of a probabil-
tool to represent many-body entangled states, the graph state |G〉 is istic model (see section S2). In this subsection, we prove that the QGM
defined on a graph G (24), where each vertex in G is associated with introduced above is exponentially more expressive in terms of the re-
a qubit. To prepare the graph pffiffistate
ffi |G〉, all the qubits are initialized to presentation power compared with the classical factor graphs. This re-
the state jþi ¼ ðj0i þ j1〉Þ= 2 at the beginning, where |0〉, |1〉 denote sult is more accurately described by theorems 1 and 2. First, we show
the qubit computational basis vectors, and then a controlled phase flip that any factor graph can be viewed as a special case of our QGM by the
gate is applied to all the qubit pairs connected by an edge in the graph G. following theorem.
We then introduce the following transformation to the graph state |G〉 Theorem 1
A B x1 C x1 x2 i
x1 x2
M1 M2
j k = δij δjk
A H
x2 i H j
f2 x5 B
f1 i x3 x4 i = (−1)i j
x3 k l
H
M3
H M4
M = Mij
j
x3 f3 x4 C D H |Q j with det M = 0
Fig. 1. Classical and quantum generative models. (A) Illustration of a factor graph, which includes widely used classical generative models as its special cases. A
factor graph is a bipartite graph where one group of the vertices represents variables (denoted by circles) and the other group of vertices represents positive functions
(denoted by squares) acting on the connected variables. The corresponding probability distribution is given by the product of all these functions. For instance, the
probability distribution in (A) is p(x1, x2, x3, x4, x5) = f1(x1, x2, x3, x4)f2(x1, x4, x5)f3(x3, x4)/Z, where Z is a normalization factor. Each variable connects to at most a constant
number of functions that introduce correlations in the probability distribution. (B) Illustration of a tensor network state. Each unshared (shared) edge represents a
physical (hidden) variable, and each vertex represents a complex function of the variables on its connected edges. The wave function of the physical variables is defined
as a product of the functions on all the vertices, after summation (contraction) of the hidden variables. Note that a tensor network state can be regarded as a quantum
version of the factor graph after partial contraction (similar to the marginal probability in the classical case), with positive real functions replaced by complex functions.
(C) Definition of a QGM introduced in this paper. The state |Q〉 represented here is a special kind of tensor network state, with the vertex functions fixed to be three
types as shown on the right side. Without the single-qubit invertible matrix Mi, which contains the model parameters, the wave function connected by Hadamard and
identity matrices just represent a graph state. To get a probability distribution from this model, we measure a subset of n qubits (among total m qubits corresponding
to the physical variables) in the computational basis under this state. The unmeasured m − n qubits are traced over to get the marginal probability distribution P({xi}) of
the measured n qubits. We prove in this paper that the P({xi}) is general enough to include probability distributions of all the classical factor graphs and special enough
to allow a convenient quantum algorithm for the parameter training and inference.
approximation theorem (25), it can be approximated arbitrarily well additional single-qubit matrix D1 and D2 to account for the remaining
with a restricted Boltzmann machine with k visible variables and 2k difference between −ax1/2 − ax2/2 and bx1 + cx2 in the correlation
hidden variables. In section S4, we give an accurate description of function f(x1p , xffiffiffiffiffiffiffiffiffiffiffiffi
2). Specifically,
ffi we takeffi D1 and D2 to be diagonal with
pffiffiffiffiffiffiffiffiffiffiffiffi
the universal approximation theorem and an explanation of the eigenvalues d1 ðx1 Þ and d2 ðx2 Þ , respectively. As shown in Fig. 2
restricted Boltzmann machine. As the degree k of the factor graph (C and D), the correlator between x1 and x2 in the QGM is then given
can be reduced to a very small number (such as k = 3), the number by d1 ðx1 Þd2 ðx2 Þ½l1 dx1 x2 þ l2 ð1 dx1 x2 Þ=2, and one simple solution
of hidden variables 2k only represents a moderate constant cost, which with d1(0) = d2(0) = l1/2 = 1 and d1(1) = eb + a/2, d2(1) = ec + a/2, l2 =
does not increase with the system size. In a restricted Boltzmann ma- 2e−a/2 makes it equal to the correlation function f(x1, x2) with arbi-
chine, only the visible and the hidden variables are connected by the trary coefficients a, b, c. From the above proof, each function node
two-variable correlator that takes the generic form f ðx1 ; x2 Þ ¼ introduces the number of parameters of the order O(2k) independent
eax1 x2 þbx1 þcx2 , where x1, x2 denote the binary variables and a, b, c are of the representation precision. For a factor graph of s function nodes
real parameters. The representation precision using the universal ap- (note that s is upper bounded by nk, where n is the number of varia-
proximation theorem does not affect the number of variables but only bles), the total number of parameters is of the order O(2ks). This
depends on the range of the parameters a, b, c, e.g., arbitrary prevision completes the proof.
can be achieved if a, b, c are allowed to vary over the whole real axis. As Furthermore, we can show that the QGM is exponentially more
Q({xi}) has a similar factorization structure as the factor graph after expressive than factor graphs in representing some probability dis-
measuring the visible qubits xi under a diagonal matrix Mi (see tributions. This is summarized by the following theorem.
Fig. 2), it is sufficient to show that each correlator f(x1, x2) can be Theorem 2
constructed in the QGM. This construction can be achieved by adding If the polynomial hierarchy in the computational complexity theory
one hidden variable (qubit) j with the invertible matrix Mj between does not collapse, there exist probability distributions that can be effi-
A B
x1 x2 ax x +bx +cx
f f (x1 , x2 ) = e x = x x
C Hidden variable
x =
i i x = δxi
H H x11 H H x2 x x
D∗1 M D∗ 2 d(x) δxy
D = D = d(x)
x1 x2 = d1 (x1 )d2 (x2 ) M M y x
D1 M D2
|+
H H x1 H H x22 =
Measurement results
D =
|
x1 H H x2 x1 H H x2 x1 H H x2 Z
|
|+ +|
λ1|+ +|+λ2| | = λ1 + λ2 x1 x2 x1 X x2
|
= λ1 + λ2
x1 x2 x1 X x2
x1 H H x2 x1 H H x2 x1 H H x2
Fig. 2. Efficient representation of factor graphs by the QGM. (A) General form of correlation functions of two binary variables in a factor graph, with parameters a, b,
c being real. This correlation acts as the building block for general correlations in any factor graphs pffiffiffiffiffiffiffiffiffiby use of the universal approximation theorem (25). (B) Notations of
some common pffiffiffi tensors and their identities: D is a diagonal matrix with diagonal elements d ðx Þ with x = 0, 1; Z is the diagonal Pauli matrix diag(1, −1); and j±i ¼
ðj0i±j1〉Þ= 2. (C) Representation of the building block correlator f(x1, x2) in a factor graph [see (A)] by the QGM with one hidden variable (unmeasured) between two
visible variables x1, x2 (measured in the computational basis). As illustrated in this figure, f(x1, x2) can be calculated as contraction of two copies of a diagram from Fig.
1C asp〈Q|x 1, xp
ffiffiffiffiffiffiffiffiffiffiffi 2〉〈x 1, x2|Q〉, where x1, x2p
ffiffiffiffiffiffiffiffiffiffiffi are measurement
ffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffi results of the two visible variables. We choose the single-bit matrix D1, D2 to be diagonal with D1 ¼
diagð d1 ð0Þ; d1 ð1ÞÞ and D2 ¼ diagð d2 ð0Þ; d2 ð1ÞÞ. For simplification of this graph, we have used the identities shown in (B). (D) Further simplification of the
graph in (C), where we choose the form of the single-bit matrix M†M acting on the hidden variable to be M†M = l1| + 〉〈 + | + l2| − 〉〈 − | with positive eigenvalues l1, l2.
We have used the identities in (B) and the relation HZH = X, where X (H) denotes the Pauli (Hadamard) matrix, respectively. By solving the values of l1, l2, d1(x1), d2(x2) in
terms of a, b, c (see the proof of theorem 1), this correlator of the QGM exactly reproduces the building block correlator f(x1, x2) of the factor graph.
factor graphs is a nondecreasing process and its complexity is at a lower we choose to minimize the Kullback-Leibler (KL) divergence D(qd||pq) =
level of the polynomial hierarchy in the computational complexity the- − ∑vqd(v)log(pq(v)/qd(v)) between qd, the distribution of the given data
ory according to the Stockmeyer’s theorem (26). In contrast, the sum- sample, and pq, the distribution of the generative model, with the
mation of complex numbers to get the normalization factor in the QGM whole parameter set denoted by q. Typically, one minimizes D(qd||pq)
involves marked oscillating behaviors, similar to the sign problem in the by optimizing the model parameters q using the gradient descent
quantum Monte Carlo method (27), which puts its computational method (21). The q-dependent part D(q) of D(qd||pq) can be expressed
complexity at a much higher level. Because of this difference in the com- as − ∑v∈data set log pq(v)/M, where M denotes the total number of data
putational complexity, by adding up probability amplitudes in the points.
QGM, we can generate a much wider type of distributions, which are In our QGM, both the conditional probability ∑yp(x, y|z) and the
hard to represent by factor graphs through adding up the positive prob- gradient of the KL divergence ∂qD(q) can be conveniently calculated
abilities. To represent some distributions generated by the QGM with using the structure of state |Q〉 defined in Fig. 1. We first define a tensor
factor graphs, the number of parameters in the factor graphs is required network state |Q(z)〉 ≡ (I ⊗ 〈z|)|Q〉 by projecting the variable set z to the
to scale up exponentially with the size of the QGM, and therefore, there computational basis. As shown in section S8, the conditional probability
is an exponential gap between the representational power of these two can be expressed as
models if the polynomial hierarchy in the computational complexity
theory does not collapse.
Theorems 1 and 2 show that the QGM is much more powerful to ∑y pðx; yjzÞ ¼ 〈QðzÞjOjQðzÞ〉
〈QðzÞjQðzÞ〉
ð2Þ
represent probability distributions compared with the classical factor
graphs. Representational power is important for the success of a ma-
chine learning model. It is closely related to the generalization ability. which is the expectation value of the operator O = |x〉 〈x| under the state
A B C
Conditioned variables p
p j q r
i i
p
i = Apijk
j j
k
Unconditioned k
variables
D E F
|Qt |Qt |Qt |Qt |Qt
p q r |Qt 1 1 1 |Qt 1
ηt
= Lpqr,ij |Qt 1 − ηt 1 − ηt
ηt
i j
√ |Qt 1 |Qt |Qt 1 |Qt |Qt 1
Arccos ηt
Fig. 3. Illustration of our training algorithm for the QGM. (A) Training and inference of the QGM are reduced to measuring certain operators under the state |Q(z)〉. The key
step of the quantum algorithm is therefore to prepare the state |Q(z)〉, which is achieved by recursive quantum phase estimation of the constructed parent Hamiltonian. The
variables in the set z whose values are specified are called conditioned variables, whereas the other variables that carry the binary physical index are called unconditioned
variables. We group the variables in a way such that each group contains only one unconditioned variable and different groups are connected by a small constant number
of edges (representing virtual indices or hidden variables). Each group then defines a tensor with one physical index (denoted by p) and a small constant number of virtual indices
(denoted by i, j, k in the figure). (B) Tensor network representation of |Q(z)〉, where a local tensor is defined for each group specified in (A). (C) Tensor network representation of |Qt〉,
where |Qt〉 are the series of states reduced from |Qz〉. In each step of the reduction, one local tensor is moved out. The moved-out local tensors are represented by the unfilled
Substep 1: We use the quantum phase estimation algorithm to mea- computes n Hamiltonians with runtime Oðncþ1 Þ for some constant c,
sure whether the eigenenergy of the parent Hamiltonian Ht is zero (31), which denotes the number of nodes involved for constructing a local
which implements a projective measurement to the corresponding ei- term of the parent Hamiltonians with the method shown in Fig. 3D
genspaces {|Qt〉〈Qt|, I − |Qt〉〈Qt|} of Ht with zero and nonzero eigenener- [thus, there are Oðnc Þ local terms]; Ht constructed by this method ge-
gies, respectively (see Fig. 3E). Similarly, we can implement the nerically has the unique ground state |Qt〉 (30) and the larger c is, the
projective measurement {|Qt −1〉〈Qt −1|, I − |Qt −1〉〈Qt −1|} by the quan- more likely there is no ground-state degeneracy.
tum phase estimation using the parent Hamiltonian Ht −1. A quantum computer is required for implementation of step 3.
Substep 2: Starting from |Qt −1〉, we perform the projective measure- When this step gives the correct answer, the correctness of this
ment {|Qt〉〈Qt|, I − |Qt〉〈Qt|}. If the result
is |Qt〉, we succeed and skip the algorithm is guaranteed. Let Dt denote the energy gap for the parent
following substeps. Otherwise, we get Q⊥t lying in the plane spanned by Hamiltonian Ht. Because we require the precision of the quantum phase
|Qt −1〉 and |Qt〉. estimation to be D to distinguish the ground state from the excited states
Substep 3: We perform the projective
measurement {|Qt −1〉〈Q t −1|, of Ht, the quantum phase estimation algorithm in substep 1 has runtime
I − |Qt −1〉〈Qt −1|} on the state Q⊥t . The result is either |Qt −1〉 or Q⊥t1 . complexity Oðn~ 2c =DÞ for the QGM defined on a constant-degree graph
Substep 4: We perform the projective measurement {|Qt〉〈Qt|, I − |Qt〉 (32–34), where D = mintDt and Oð⋅Þ ~ suppresses slowly growing factors
succeed in getting |Qt〉, with probability ht = |〈Qt|
〈Qt|} again. We either o(1)
such as (n/D) (6, 30). The key idea of step 3 is that the subspace
Qt −1〉|2, or have Q⊥t . In the latter case, we go back to the substep 3 and spanned by {|Qt〉, |Qt −1〉}, as shown in Fig. 3E, is the invariant subspace
continue until success. of all the projective operators involved. It can be seen from the evolution
Step 4: After successful preparation of the state |Q(z)〉, we measure tree of step 3 (shown in Fig. 3F) that the construction of |Qt〉 from |Qt −1〉
the operator O (for inference) or O1, O2 (for training), and the expec- can terminate within 2k + 1 iterations with probability
tation value of the measurement gives the required conditional prob-
ability (for inference) or the gradient of the KL divergence (for training). k
In the following, we analyze the runtime of this algorithm for getting termðkÞ ¼ 1 ð1 ht Þ h2t þ ð1 ht Þ2 ð3Þ
the correct answer. Step 1 computes n tensor network states with run-
time OðnÞ because we only add one tensor with a constant number of This implies that within s steps of iteration, the probability of failure
variables in constructing each of n states |Qt〉 with t from 1 to n. Step 2 to get |Qt〉 is smaller than pfail(s) < she/2 for all t, where e is the base of the
natural logarithm and h = mintht (30). Denote the probability of failure to circuits with m qubits; and Vt is the tth gate. A similar type of the history
get |Qn〉 as e, then we require (1 − pfail)n ≥ 1 − e, so s > n/(2hee) is suf- state has been used before to prove the QMA (quantum Merlin-Arthur)
~ 2cþ1 =ðhDeÞÞ.
ficient. Thus, the runtime of preparing |Qt〉 in step 3 is Oðn hardness for spin systems in a 2D lattice (37). For this specific |Q(z)〉, we
Iterating from 1 to n, we can get |Q(z)〉 with a probability of at least 1 − e. prove that both the gap Dt and the overlap ht scale as 1/ poly (n) for all
Measuring operator O over |Q(z)〉 1/d times, the result will be an approx- the steps t, by calculating the parent Hamiltonian of |Qt〉 with proper
imation of the conditional probability to an additive error d as shown in grouping of the local tensors. Our heuristic algorithm for training
Eq. 2. and inference therefore can be accomplished in a polynomial time if
Therefore, the total runtime of approximating the conditional all the conditional probabilities that we need can be generated from
probability is those history states. On the other hand, our specific state |Q(z)〉 encodes
universal quantum computation through representation of the history
~ 2cþ2 =ðhDedÞÞ state, so it not only should include a large set of useful distributions but
T ¼ Oðn ð4Þ also cannot be achieved by any classical algorithm in a polynomial time
if universal quantum computation cannot be efficiently simulated by a
The gap Dt and the overlap ht depend on the topology of the graph classical computer.
G, the variable set z, and the parameters of the matrices Mi. If these Here, we focus on the complexity theoretical proofs to support the
two quantities are bounded by poly (n) for all the steps t from 1 to n, quantum advantages of our proposed QGM in terms of the representa-
this quantum algorithm will be efficient with its runtime bounded by tional power and the runtimes for learning and inference. Before ending
poly (n). the paper, we briefly discuss the possibility to apply the model here for
Gradient of the KL divergence for each parameter is constituted solving practical problems in machine learning. We mention two
by several simple terms (see section S8), where each term is similar to examples in section S10 and show a sketchy diagram there on how to
E T SUPPLEMENTARY MATERIALS
yhistory ¼ pffiffiffiffiffiffiffiffiffiffiffiffi ∑ jti⊗Vt ⋯V1 j0i⊗m
1
ð5Þ Supplementary material for this article is available at http://advances.sciencemag.org/cgi/
T þ 1 t¼0 content/full/4/12/eaat9004/DC1
Section S1. A brief introduction to generative models
Section S2. Representational power and generalization ability
where T is the number of total gates in a quantum circuit, which is as- Section S3. Reduction of typical generative models to factor graphs
sumed to be a polynomial function of the total qubit number m; |t〉 is a Section S4. Universal approximation theorem
quantum state encoding the step counter t; |0〉⊗m is the input state of the Section S5. Proof of theorem 2
Section S6. NP hardness for preparation of an arbitrary QGM state |Q〉 21. I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, 2016).
Section S7. Parent Hamiltonian of the state |Q〉 22. C. M. Bishop, Pattern Recognition and Machine Learning (Springer, 2006).
Section S8. Training and inference in the QGM 23. M. H. Amin, E. Andriyash, J. Rolfe, B. Kulchytskyy, R. Melko, Quantum Boltzmann machine.
Section S9. Proof of theorem 3 Phys. Rev. X 8, 021050 (2018).
Section S10. Applying the QGM to practical examples 24. R. Raussendorf, H. J. Briegel, A one-way quantum computer. Phys. Rev. Lett. 86,
Fig. S1. Parameter space of factor graph and QGM. 5188–5191 (2001).
Fig. S2. Probabilistic graphical models. 25. N. Le Roux, Y. Bengio, Representational power of restricted Boltzmann machines and
Fig. S3. Energy-based neural networks. deep belief networks. Neural Comput. 20, 1631–1649 (2008).
Fig. S4. Simulating graphs of unbounded degrees with graphs of constantly bounded degrees. 26. L. J. Stockmeyer, The polynomial-time hierarchy. Theor. Comput. Sci. 3, 1–22 (1976).
Fig. S5. Illustration of the universal approximation theorem by a restricted Boltzmann machine. 27. E. Y. Loh Jr., J. E. Gubernatis, R. T. Scalettar, S. R. White, D. J. Scalapino, R. L. Sugar, Sign
Fig. S6. #P-hardness for the QGM. problem in the numerical simulation of many-electron systems. Phys. Rev. B 41,
Fig. S7. Contraction between two local tensors using the structure of QGM state and 9301–9307 (1990).
conditioned variables. 28. S. Shalev-Shwartz, S. Ben-David, Understanding Machine Learning: From Theory to
Fig. S8. Construction of the history state. Algorithms (Cambridge Univ. Press, 2014).
Fig. S9. Flow chart for a machine learning process using the QGM. 29. M. Schwarz, K. Temme, F. Verstraete, Preparing projected entangled pair states on a
References (38–40) quantum computer. Phys. Rev. Lett. 108, 110502 (2012).
30. D. Perez-Garcia, F. Verstraete, M. M. Wolf, J. I. Cirac, Peps as unique ground states of local
Hamiltonians. Quantum Inf. Comput. 8, 650–663 (2008).
31. D. S. Abrams, S. Lloyd, Quantum algorithm providing exponential speed increase for
REFERENCES AND NOTES finding eigenvalues and eigenvectors. Phys. Rev. Lett. 83, 5162–5165 (1999).
1. P. W. Shor, Polynomial-time algorithms for prime factorization and discrete logarithms on 32. D. Nagaj, P. Wocjan, Y. Zhang, Fast amplification of QMA. Quantum Inf. Comput. 9,
a quantum computer. SIAM Rev. 41, 303–332 (1999). 1053–1068 (2009).
2. R. P. Feynman, Simulating physics with computers. Int. J. Theor. Phys. 21, 467–488 (1982). 33. M. A. Nielsen, I. L. Chuang, Quantum Computation and Quantum Information (Cambridge
3. S. Lloyd, Universal quantum simulators. Science 273, 1073–1078 (1996). Univ. Press, 2000).
4. J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, S. Lloyd, Quantum machine 34. D. W. Berry, G. Ahokas, R. Cleve, B. C. Sanders, Efficient quantum algorithms for simulating