Wall Et Al - 2021 - Generative Machine Learning With Tensor Networks
Wall Et Al - 2021 - Generative Machine Learning With Tensor Networks
Generative machine learning with tensor networks: Benchmarks on near-term quantum computers
(Received 16 October 2020; revised 3 March 2021; accepted 7 March 2021; published 2 April 2021)
Noisy, intermediate-scale quantum (NISQ) computing devices have become an industrial reality in the last few
years, and cloud-based interfaces to these devices are enabling the exploration of near-term quantum computing
on a range of problems. As NISQ devices are too noisy for many of the algorithms with a known quantum
advantage, discovering impactful applications for near-term devices is the subject of intense research interest.
We explore a quantum-assisted machine learning (QAML) workflow using NISQ devices through the perspective
of tensor networks (TNs), which offer a robust platform for designing resource-efficient and expressive machine
learning models to be dispatched on quantum devices. In particular, we lay out a framework for designing and
optimizing TN-based QAML models using classical techniques, and then compiling these models to be run
on quantum hardware, with demonstrations for generative matrix product state (MPS) models. We put forth
a generalized canonical form for MPS models that aids in compilation to quantum devices, and demonstrate
greedy heuristics for compiling with a given topology and gate set that outperforms known generic methods in
terms of the number of entangling gates, e.g., CNOTs, in some cases by an order of magnitude. We present an
exactly solvable benchmark problem for assessing the performance of MPS QAML models and also present an
application for the canonical MNIST handwritten digit dataset. The impacts of hardware topology and day-to-day
experimental noise fluctuations on model performance are explored by analyzing both raw experimental counts
and statistical divergences of inferred distributions. We also present parametric studies of depolarization and
readout noise impacts on model performance using hardware simulators.
DOI: 10.1103/PhysRevResearch.3.023010
FIG. 1. Overview of QAML workflow. Classical data in (a) is preprocessed and transformed to quantum states embedded in an exponen-
tially large Hilbert space in (b). A TN model is learned from a collection of quantum training data in (c), which has the interpretation in (d) of
a sequential preparation scheme involving a small number of readout qubits coupled to ancillary resources. The isometries of the sequential
preparation scheme in (d) are conditioned using inherent freedom in the TN representation in (e), and then converted into native gates for
a target hardware architecture, displayed as the IBMQ-X2 processor for concreteness. Running on cloud-based hardware in (f), we obtain
measurements defining output predictions, as in (g). For interpretation of graphical representations, see text.
focus on fully generative unsupervised learning tasks, which classically trained “preconditioner” for the quantum models
have been identified as a promising avenue for QAML [13], that can help avoid local minima and reduce optimization
and focus on the most resource-efficient matrix product state time, as has been explored using classical simulations for
(MPS) TN topology. We present a framework for QAML— other architectures in Ref. [22]. We present exemplar results
outlined in Fig. 1—that includes translation of classical data for our workflow for synthetic data that can be described by
into quantum states, optimization of an MPS model using an exactly solvable two-qubit MPS QAML model, as well as
classical techniques, the conversion of this classically trained on features extracted from the canonical MNIST handwritten
model into a sequence of isometric operations to be performed digit dataset [25].
on quantum resources, and the optimization and compilation The remainder of this work is organized as follows: Sec. II
of these isometric operations into native operations for a given discusses QAML with tensor networks (TNs) broadly, includ-
hardware topology and allowed gate set. In particular, we ing embedding of classical data into quantum states, classical
develop several novel techniques for the compilation stage training of a TN model, and the conversion of TN models into
aimed at TN models for QAML on NISQ devices, such as the resource-efficient sequential preparation schemes; Sec. III dis-
permutation of auxiliary quantum degrees of freedom in the cusses our approach for compiling TN-based QAML models
TN to optimize mapping to hardware resources and heuristics for running on quantum hardware, including the utilization of
for the translation of isometries into native operations using as ambiguity in the TN representation and greedy compilation
few entangling operations (e.g., CNOTs) as possible. heuristics for minimizing model gate depth; Sec. IV presents
The tools developed herein enable the robust design and an exactly solvable two-qubit QAML model and assesses the
performance assessment of QAML models on NISQ devices performance of our QAML workflow on quantum hardware;
in the regime where classical simulations are still possible, in Sec. V, we give an example application to generative
and will inform architectures and noise levels for scaling to modeling of features extracted from the MNIST dataset and
the classically intractable regime. Even in the classically in- analyze the performance of our models as a function of hard-
tractable regime in which the model must be optimized using ware noise using a quantum hardware simulator; finally, in
a quantum device in a hybrid quantum/classical loop [23,24], Sec. VI, we conclude and give an outlook. Details of the
our techniques provide a means of obtaining an approximate, MNIST model studied in Sec. V are given in Appendix E.
023010-2
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
II. QUANTUM-ASSISTED MACHINE LEARNING WITH For one, they are undoubtedly the most well-understood and
TENSOR NETWORKS mature of all tensor networks, which has led to robust op-
timization strategies that are widely used in the quantum
Figure 1 broadly outlines the QAML workflow explored in
many-body community. In addition, MPSs are highly quan-
the present work. We begin with a collection of classical data
tum resource efficient, in the sense that their associated wave
vectors in a training set T = {x j }Nj=1
T
, where each element x j
functions can be sequentially prepared, and so qubits can be
is an N-length vector. The first step in our QAML workflow
reused in deployment on quantum hardware. In fact, it can
is to define a mapping of classical data vectors to vectors in
be shown that every state that can be sequentially prepared
a quantum Hilbert space. Here, the only restriction we will
can be written as an MPS [33–35]. This last point importantly
place on the encoding of classical data in quantum states is
means that any QAML model that achieves the optimal O(1)
that each classical data vector is encoded in an unentangled
scaling of qubit requirements with data vector length N using
product state. This is useful for several reasons. For one,
sequential preparation can be expressed as an MPS, and the
unentangled states are the simplest to prepare experimentally
methods and analyses presented in this work apply.
with high fidelity, and also enable us to use qubit-efficient
In recent years, TNs have found applications outside of
sequential preparation schemes. From a learning perspective,
the condensed matter and quantum information domains. The
encoding individual data vectors in product states ensures that
mathematical analysis community has proposed TN methods
any entanglement that results in a quantum model comes from
for data analysis, e.g., large-scale principal component analy-
correlations in an ensemble of data and not from a priori as-
sis [36,37]. In this community, MPSs are referred to as tensor
sumptions about pre-existing correlations for individual data
trains [38]. Using TN methods to design quantum-inspired
vectors [26].
ML models was first proposed by Stoudenmire and Schwab
The simplest case of a quantum embedding occurs when
[26], who put forth a scheme using a MPS network as a linear
the data is discrete, and so can be formulated as vectors x
classifier in a Hilbert space whose dimension is exponentially
whose individual elements xi ∈ {0, 1}. In this case each ele-
large in the length of the raw data vector. Since then, many
ment is mapped to a qubit as |xi , such that the embedding of
other proposals for quantum-assisted or quantum-inspired TN
the full N-dimensional classical data vector into a register of
ML models have appeared in the literature [22,39–56], in-
N qubits is [27,28]
cluding generative modeling of binary data using MPSs in
N−1 Ref. [57]. In the majority of approaches, DMRG-inspired
|x = |xi . (1) algorithms for optimization have been employed. However
i=1 the authors of Ref. [58] recently demonstrated an alternate
Embeddings can also be formulated for vectors of continuous strategy where a TN was implemented as a neural network
variable data, as has been explored in Refs. [26,29–31]. We using standard deep learning software, and the tensors of the
will not review those cases here, as only the binary embedding TN were optimized using backpropagation strategies ubiq-
Eq. (1) is utilized in this work. uitous in classical ML. While this strategy has shown good
performance, it has also been shown to be suboptimal with
respect to the DMRG-like approach [59]. Nonetheless, the
A. Tensor networks and sequential preparation use of deep learning “preconditioners” and the intersection of
The next step in our QAML workflow outlined in Fig. 1 is QAML and neural networks remains intriguing [60–62].
to learn a quantum model for the collection of quantum states The fact that MPSs define a sequential preparation scheme
{|x j }Nj=1
T
resulting from applying the encoding map from the means that MPSs provide highly resource efficient archi-
previous section to the training data. Here, we define a quan- tectures for learning [15] and quantum simulation [63]. In
tum model as a collection of operations applied to quantum particular, the qubit resource requirements for an MPS model
resources to produce a state that encodes the properties of the are logarithmic in the bond dimension χ , which encapsulates
ensemble {|x j }. In what follows, we specialize to the case the expressivity of the model, and are independent of the
of tensor network (TN) models, which provide a convenient length of the input data vector N. The details of the procedure
parametrization of the structure of quantum operations and connecting MPS models and sequential preparation schemes
resources. Generally speaking, TNs represent the high-rank are reviewed in Appendix A, where it is shown that a single
tensor describing a quantum wave function in a specified basis “physical” or “readout” qubit together with a χ -level ancilla
as a contraction over low-rank tensors, and hence define fam- (which itself can be made of multiple qubits) suffices to ex-
ilies of low-rank approximations whose computational power tract information from an MPS model, provided we are not
can be expressed in terms of the maximum dimension of any measuring any multiqubit properties of the state. As shown in
contracted index χ , known as the bond dimension. Appendix A, a sequential preparation procedure defined as an
A wide variety of TN topologies have been considered MPS takes the form
which are able to efficiently capture certain classes of quan-
|ψ = Tr[L[0] j0 . . . L[N−1] jN−1 ]| j0 . . . jN−1 , (2)
tum states [19–21]; in the present work, we focus on matrix
j0 ... jN−1
product states (MPSs). MPSs use a one-dimensional TN
topology, as shown using the Penrose graphical notation for in which the χ × χ matrices L[i] ji satisfy the isometry condi-
tensors [19] in Fig. 1(c), and form the basis for the enormously tion
successful density matrix renormalization group (DMRG) al-
gorithm in quantum condensed matter physics [32]. MPSs L[i] ji † L[i] ji = Iχ . (3)
have several properties that make them attractive for QAML. ji
023010-3
WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
Those familiar with the theory of MPSs will recognize Eq. (3) and using the binary encoding of data, Eq. (1), we find
as defining the left-canonical form of an MPS; standard pro-
1 1
cedures exist for putting an MPS into this form [19–21]. L(T ) = ln |Tr[A . . . A
x0 †
]| ,
xN−1 † 2
The condition of not measuring any multiqubit properties NT x∈T Z
of the state holds for our specified use case of generating data
vector samples a single element at a time, as shown in Fig. 1 where the normalization factor (partition function) is
(d). Here, the single physical qubit is coupled to three ancilla
Z= Tr[Ai0 . . . AiN−1 ]Tr[Ai0 . . . AiN−1 ]. (8)
qubits forming a χ = 8-level resource, and the physical qubit
i0 ...iN−1
is measured, with the measurement outcome forming the data
sample element xN−1 . The ancilla qubits are left unmeasured, We will optimize the Born machine by a DMRG-style
the physical qubit is reinitialized, and the procedure of cou- procedure using gradient descent, where the gradient is taken
pling the physical and ancilla qubits and measuring the latter with respect to the tensors of the MPS. Namely, we will
repeated, resulting in the remaining elements of the data vec- consider the gradient with respect to a group of s neighboring
tor xN−2 , . . . , x0 . The process of coupling the physical {| ji } tensors = Ail . . . Ail+s , with s typically being one or two,
and ancilla {|α} states is defined by the isometric operators noting that the gradient of an object with respect to a tensor
[i] j is a tensor whose elements are the partial derivatives with
L̂ [i] = Lαβ i |α ji β0| . (4) respect to the individual tensor elements. We take the gradient
αβ ji with respect to the conjugates of the tensors Ai j , formally
The sequential preparation scheme with a single readout considering these conjugates independent of the tensors them-
qubit requires mid-circuit measurement and reset which is not selves. This gradient may be written as
universally available in present-day hardware, but has been 1 ∇ ψ|xx|ψ ∇ Z
demonstrated in, e.g., trapped ion platforms [64]. Finally, we ∇ L(T ) = −
NT x∈T ψ|xx|ψ Z
stress that the scheme for converting from an MPS model to
a sequential preparation procedure is formal in the sense that 1 ∇ ψ|xx|ψ ∇ ψ|xx|ψ
it produces isometries acting on quantum resources without = − .
NT x∈T ψ|xx|ψ x
Z
reference to their actual physical representation or other hard-
ware constraints such as limited coherence time, connectivity, (9)
gate sets, etc.. The translation of these formal isometries into With this gradient in hand, we update the local block of
operations to be dispatched on a given target hardware are tensors as
detailed in Sec. III.
→ + η∇ L(T ) , (10)
B. Generative MPS models and classical training procedure in which η is a learning rate (note that this is equivalent
We now further specialize to generative models, in which to minimizing the negative log likelihood). For the single-
a collection of quantum data vectors are encoded into a wave site algorithm (s = 1), this update does not change the bond
function |ψ such that the probability distribution evaluated at dimension or canonical form of the MPS. For the two-site
data vector x is algorithm (s = 2), we can now split the updated tensor into
ψ|xx|ψ its component MPS tensors as
P(x) = . (5)
Z iαβj
= j
Aiαμ Aμβ , (11)
Here, Z = ψ|ψ = x ψ|xx|ψ is a normalization factor, μ
and |x denotes the encoding of a classical data vector x into a
quantum state as in Eq. (1). As this corresponds to Born’s rule using, e.g., the SVD. Hence, the addition of the gradient
for measurement outcomes, the resulting structure is referred can increase the bond dimension, and thus the representation
to as a Born machine [57,65]. The state |ψ, or, equivalently, power, adaptively based on the data. The bond dimension can
the process of generating |ψ from a known fiducial state, is also be set implicitly by requiring that the L2 -norm of the
our quantum machine learning model. tensor is represented with a bounded relative error ε. The
In order to discuss data representation using Born ma- above update has affected only a small group of the tensors
chines, we define the average log-likelihood of the data in the with all others held fixed. A complete optimization cycle,
training set T as or “sweep,” occurs when we have updated all tensors twice,
moving in a back-and-forth motion over the MPS. The sweep-
1 ψ|xx|ψ ing process is converged once the negative log-likelihood no
L(T ) = ln . (6)
NT x∈T Z longer decreases substantially. Example convergence behavior
will be given later in Sec. V.
The minimization of the negative log-likelihood with respect
to the parameters in our Born machine is equivalent to maxi-
III. COMPILATION OF MPS MODELS FOR
mizing the probability that the data is generated by the Born
QUANTUM HARDWARE
machine. Parameterizing the wave function |ψ as an MPS as
In this section, we address how to take an MPS model
|ψ = Tr[Ai0 † . . . AiN−1 † ]|i0 . . . iN−1 , (7) resulting from the classical optimization procedure outlined
i0 ...iN−1 in Sec. II B and convert it into a sequence of operations to be
023010-4
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
023010-5
WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
When the MPS tensor is the orthogonality center, this condi- which “integrates out” the physical qubit from the isometry
tion is equivalent to a L2 -norm optimization of the full wave used for sequential preparation, and so acts only in the ancilla
function. Replacing A[i] by the truncated U for a right-moving space. A diagonal M[i] is desired, as this would perfectly
update or by V for a left-moving update and contracting the preserve the individual ancilla basis states and so reduce the
truncated SV or U S into the neighboring tensor completes number of quantum operations required. Recalling that we
the local optimization. Sweeping the optimization across all are only changing either the left or right basis of M[i] at a
tensors completes the filtering step. Since the optimization time, one possible option to increase its diagonal dominance
only deals with the parameters of a single MPS tensor at through transformation of either the left or right basis is to use
a time, it is not guaranteed to be globally optimal, but this the polar decomposition M[i] → U [i] P [i] or M[i] → P [i] U [i]
simple procedure works well in practice. As a side benefit, with U [i] unitary and P [i] Hermitian and positive semidef-
ending the optimization by applying the update Eq. (14) and inite. Using (U [i] )1/2 to transform the bases of {L[i] ji , ji =
replacing the MPS tensor A[i] with U for each tensor places 1, . . . , d} would transform M[i] into P [i] ; however, this trans-
the MPS in left-canonical form, from which the isometries formation does not preserve sparsity in the L[i] ji , and we
for sequential preparation can be constructed from the tensor have found that it often leads to more complex operators in
elements [see Eq. (2)]. practice. Instead, we use the values of U [i] from the polar
decomposition to define a permutation of the ancilla basis
A. Ancilla permutation and the diagonal gauge states as, e.g.,
The conversion of an MPS into left canonical form uses [i] j [i] j
L̃α,argmax|U [i] = Lαβ . (21)
the gauge freedom inherent in MPSs to ensure that each of the :β|
023010-6
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
023010-7
WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
FIG. 5. Comparison of greedy gate compilation procedure with methods of Ref. [70]. (a) Target isometry, which can be completed to a
permutation operator. (b) Matrix plot of result from greedy compilation procedure (cost function ∼2 × 10−15 ). (c) Matrix plot of result from
the methods of Ref. [70]. (d) Quantum circuit representation of greedy compilation procedure result. (e) Quantum circuit representation of
Ref. [70] result.
of the number of search considerations, we limit the number to map the permutation into a reversible circuit comprised
of gates forming the starting point of the priority queue (i.e., of single-target gates, and these single-target gates are then
before appending new entangling gate and single-qubit rota- compiled into networks of CNOTs, Hadamard gates, and
tions) to a fixed number. This number is used as a convergence R̂z (θ ) = |00| + eiθ |11| rotations.
parameter, and can vary between optimization cycles; we find In order to compare our methods with the generic, con-
that it is useful to allow more gates in early optimization structive method for compiling isometries, we consider the
cycles where the operations involve fewer parameters and so isometry shown in Fig. 3. As noted above, in order to utilize
optimization is fast, and then to decrease the number of kept the generic methods we have to map this isometry into a
gates as the circuits become deeper. Further details on our complete isometry over a set of qubits, which requires us to
implementation of this procedure and some problem-specific define the action of the isometry on the state in which the
implementations are provided in Appendix C. ancilla qubits are all in the state |1, which was left uncon-
Several “generic” methods for the compilation of isome- strained by the optimization procedure. For simplicity, we
tries exist, as reviewed in, e.g., Ref. [71], which can be used a use the “isometric completion” in which the operator takes
baseline for comparison. These algorithms also underlie the this state to itself without modifying the state of the phys-
implementation in QISKIT [4]. In the generic approach, the ical qubit. Using the iso method of the QuantumCircuit
matrix representation of the isometry is decomposed, e.g., class from Qiskit [4] implementing the generic methods
a single column at a time or by the cosine-sine decompo- of Ref. [71] on the unconstrained ibmq_qasm_simulator
sition, and the resulting decompositions expressed in terms hardware topology produces a gate representation with 122
of multiqubit controlled operations, which are themselves de- CNOTs at optimization_level 0, and 120 CNOTs at
composed into a target gate set using known representations. optimization_level 3. The greedy compilation procedure
These approaches are constructive, and so will find decompo- presented in this work achieves a representation with a cost
sitions of any isometry in principle, but they are not designed function error of 5.6 × 10−10 with an order of magnitude
to find the most efficient representation by some metric, e.g., fewer entangling gates for this particular isometry. An explicit
the number of entangling gates. Further, as noted above, the circuit representation is given in Fig. 36(d) of Appendix E.
use of such generic algorithms requires an “isometric comple- As a point of comparison for the specialized methods
tion” in the case that the bond dimension χ is not a power of for permutation gates studied in Ref. [70], we consider the
2, and may expend additional resources in exactly compiling isometry shown in Fig. 5(a). This is indeed a permutation
noise in the isometries. Special purpose methods have also on the space acted upon, and so can be represented by a
been developed for compiling permutation gates in Ref. [70], family of “unitary completions.” We take the straightforward
which have been shown to outperform the generic algorithms choice of unitary completion in which we leave the ancilla
in some cases. This method uses a reversible logic synthesis qubits unchanged by the permutation, as shown in Fig. 5(c).
023010-8
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
023010-9
WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
023010-10
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
FIG. 9. Comparison of hand-compiled and auto-compiled circuits for exactly solvable test case. The exactly solvable benchmark with three
physical qubits (0, 1, and 3) and a single ancilla qubit (2) implemented as a quantum circuit using the hand-compiled circuits in Fig. 7 (top) or
the autocompiled gates in Fig. 8 (bottom).
(8/31, 18/31, 5/31), the center blue bars are the raw exper- the tops of the bars indicate the 1σ confidence intervals from
imental measurements without noise calibration applied, and the jackknife procedure. As noted above, qubits 0, 1, and 3
the leftmost orange bars are the experimental measurements map to the probabilities p0 , p1 , and p2 , respectively, and qubit
with the noise calibration applied. The black lines centered on 2 is the ancilla. Clearly, the application of the measurement
5000 5000
Uncorrected
Corrected
(a) Uncorrected
Corrected
(b)
4000 Expected 4000 Expected
3000 3000
Counts
Counts
2000 2000
1000 1000
0 0
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
111 111 110 110 101 101 100 100 011 011 010 010 001 001 000 000 111 111 110 110 101 101 100 100 011 011 010 010 001 001 000 000
5000 5000
Uncorrected
Corrected
(c) Uncorrected
Corrected
(d)
4000 Expected 4000 Expected
3000 3000
Counts
Counts
2000 2000
1000 1000
0 0
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
111 111 110 110 101 101 100 100 011 011 010 010 001 001 000 000 111 111 110 110 101 101 100 100 011 011 010 010 001 001 000 000
FIG. 10. Comparison of ideal, measured, and noise-corrected measurement outcomes on quantum hardware. The results of the hand-
compiled benchmark model shown in Fig. 9(a) are jackknifed over several days of independent experimental runs on the IBMQ-X2 [panel
(a)] and IBMQ-Vigo (c) hardware. Similarly, the jackknifed results for autocompiled benchmark model shown in Fig. 9(b) are shown for
the IBMQ-X2 (b) and IBMQ-Vigo (d) hardware. The rightmost green bars are the noiseless expectations, the center blue bars are the raw
measurements, and the left orange bars have a measurement filter applied. Black lines indicate 1σ confidence intervals.
023010-11
WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
0.35
Hand-Compiled, Uncorrected Auto-Compiled, Uncorrected
Hand-Compiled, Corrected Auto-Compiled, Corrected
0.30
KL Divergence
0.25
0.20
FIG. 12. Example processed MNIST data produced by down-
sampling through a max filter to 7 × 7 pixels and binarization.
0.15
Clockwise from top left, the truth labels are 5,0,4,1,9,4,1,3,1,2.
0.10
1 2 3 4 5 sulting from the distributions averaged over all days. Clearly,
Day
the application of the measurement noise filter improves
the estimation of probabilities, as indicated by a lower KL
0.16 Hand-Compiled, Uncorrected Auto-Compiled, Uncorrected divergence with respect to the ideal results. In addition,
Hand-Compiled, Corrected Auto-Compiled, Corrected
the autocompiled circuits (squares) show a lower KL diver-
0.14 gence than the hand-compiled circuits, likely due to their
shallower circuits. Finally, we find that the Vigo results in
KL Divergence
0.06
1 2 3 4 5
Day V. EXAMPLE USING THE MNIST DATASET
023010-12
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
023010-13
WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
0.8
ζ = 0.00
KL Divergence
0.6 ζ = 0.01
ζ = 0.03 p=0.0005 p=0.0005 p=0.0005 p=0.0004
0.4 ζ = 0.04
ζ = 0.05
0.2
p=0.0004 p=0.0004 p=0.0004 p=0.0004
0.0
0.000 0.005 0.010 0.015 0.020 0.025
CNOT Error Rate
1.25
KL Divergence
023010-14
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
ξ2 and the number of bits sampled in the bitstring at zero the large-data regime. We also note that other studies have
measurement error ζ = 0. Nonmonotonic behavior is due to indicated that TN models with current training strategies gen-
the significant differences in complexity of the gate sequences erally have a tendency towards overfitting [77]. Second, we
to produce individual bits, see Figs. 20–68 of Appendix E. have focused on the applications of MPSs to generative mod-
A rise in error is seen as the number of bits increases as eling, but other TN structures, such as tree tensor networks
the total gate depth increases, peaking around ten bits. Af- [39,78] may also be useful for QAML applications, as well
ter this point the KL divergence levels off as additional bits as other tasks such as feature extraction and classification.
may require shallower gate sequences, bringing the overall The procedures outlined in this paper can be readily adapted
agreement between the noisy and true probability distribution to compiling the isometries appearing in models for other
closer. TNs and other applications. Finally, the procedure outlined in
this paper wherein a model is trained classically before being
compiled to a quantum device cannot by itself yield a quantum
VI. CONCLUSIONS AND OUTLOOK
advantage, as it requires the model to be both classically and
We have presented a complete workflow for generative quantumly simulable. However, our procedures will be useful
quantum-assisted machine learning (QAML) using Born ma- in designing and analyzing TN-inspired model structures for
chines with a matrix product state (MPS) tensor network scaling towards the classically intractable regime, and can also
(TN) structure. In our workflow, classical data is encoded serve as “preconditioners” where a model trained using opti-
into quantum states using an embedding map, the ensemble mal classical strategies is augmented with additional quantum
of quantum states is learned as a TN Born machine using resources and then trained directly on the quantum device or
a classical DMRG-like procedure with gradient descent of in a hybrid quantum/classical optimization loop, potentially
the negative log-likelihood, and the model is compiled into avoiding local minima and speeding up optimization times.
operations for target quantum hardware to obtain data samples
as measurement outcomes. Using MPS-based models enables
the use of highly quantum resource-efficient sequential prepa- ACKNOWLEDGMENTS
ration schemes requiring O(1) qubits for a classical data We would like to thank Dave Clader, Giuseppe
vector length N and O(log2 χ ) qubits for bond dimension χ , D’Aguanno, and Colin Trout for useful discussions and would
which encapsulates the model expressivity. This condition is like to acknowledge funding from the Internal Research and
also sufficient; any model that uses a sequential preparation Development program of the Johns Hopkins University
procedure to generate samples using only O(1) qubits for a Applied Physics Laboratory.
classical data vector length N can be expressed as an MPS and
so our methods are applicable. We expect this class of maxi-
mally quantum resource-efficient models will be of paramount APPENDIX A: PROCEDURE FOR CONVERTING AN MPS
importance when benchmarking QAML applications on near- MODEL INTO A SEQUENTIAL PREPARATION SCHEME
term devices. The purpose of this section is to review the procedure for
We presented several optimizations in the compilation converting between matrix product state models and a sequen-
stage of our QAML workflow, such as the introduction of tial preparation procedure with a χ -level ancilla. To begin, we
the diagonal gauge of the MPS model that utilizes inherent first consider that we have a register of N qubits with states
freedom in the model representation to reduce the complex- | ji , i = 0, . . . , N − 1, ji = 0, 1 in which we want to encode
ity of the compiled model, as well as greedy heuristics for data and a χ -level ancilla |α, α = 0, . . . , χ − 1 that can be
finding shallow gate sequences matching a target isometry to used to entangle the qubits. Starting at the “right” end of the
a specified tolerance given hardware topology and allowed system, we can initialize the (N − 1)st qubit using an operator
gate constraints. We presented an exactly solvable benchmark L̂ [N−1] defined as
model requiring two qubits, and assessed its performance
on currently available quantum hardware. We also presented L̂ [N−1] = Lα[N−1] jN−1 | jN−1 α00| , (A1)
an example application modeling features extracted from the α, jN−1
MNIST dataset parametrically with depolarizing and readout
hardware noise using a hardware simulator. in which the coefficients L [N−1] satisfy the isometry condition
Our results lay the groundwork for utilizing TN models
in practical QAML applications, and leave several avenues
for future research. First, the QAML demonstrations given Lα[N−1] jN−1 Lα[N−1] jN−1 = 1 . (A2)
α, jN−1
in this work consist of overfit models, and so do not consti-
tute “true” machine learning models which should be able
Clearly, if we start our qubit and ancilla system in the state
to appropriately generalize from data. This is a result of ei-
|00, this operation transforms it into the (entangled) state
ther using data with very simple structure, as in our exactly jN−1
solvable model, or using a very small sample size of train- α jN−1 Lα |α jN−1 , and the isometry condition ensures that
ing data, as in our MNIST application. Small sample sizes this state is normalized. Moving to the next qubit, we now
were used in the present work to enable detailed analysis entangle it with the ancilla using the operator
of model performance with limited quantum resources. In [N−2] jN−2
future work, the generalization power of TN-based QAML L̂ [N−2] = Lβα | jN−2 β0α| , (A3)
models on NISQ hardware will be explored moving towards α, jN−2 ,β
023010-15
WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
[N−2] jN−2 trapped ion platforms [64]. We then re-entangle the ancilla and
in which the coefficients Lβα expressed as a matrix
L [N−2] jN−2
are subject to the isometry condition qubit using the operator L̂ [N−2] defined in Eq. (A3), measure
the qubit and record the outcome as xN−2 , and again return
L[N−2] j† L[N−2] j = Iχ , (A4) the qubit to the |0 state. This procedure is repeated with the
j other operations L̂ [ j] until a complete set of N measurements
x is made, which constitutes a data sample. This procedure is
with Iχ the χ × χ identity matrix. This operation now puts
denoted graphically in Fig. 1(d). Clearly, this only requires a
the system in the state
single “physical” or “data” qubit (i.e., the one that is sampled)
L̂ [N−2] L̂ [N−1] |0N−2 0N−1 0ancilla independent of the input data size N, and the construction of
the χ -level ancilla requires only log2 χ qubits. We stress that
= [L[N−2] jN−2 L[N−1] jN−1 ]α | jN−2 jN−1 α. (A5) the scheme above is formal in the sense that it produces isome-
jN−2 , jN−1 ,α tries acting on quantum resources without reference to their
We follow this same logic for all subsequent qubits, defining actual physical representation or other hardware constraints
isometric operators that entangle them to the rest of the system such as limited coherence time, connectivity, gate sets, etc..
using the ancilla, until we reach qubit 1, which is attached The translation of these formal isometries into operations to be
using the isometric operator dispatched on a given target hardware are detailed in Sec. III.
L̂ [0] = |i0 00β|. (A6)
i0 ,β APPENDIX B: RESOLVING AMBIGUITIES IN THE
This operator puts the full system into the state TRANSFORMATION TO THE DIAGONAL GAUGE
L̂ [0] . . . L̂ [N−1] |00 . . . 0N−1 0ancilla In this section we provide additional detail on the reso-
lution of two ambiguities that arise when converting MPS
= L[0] j0 . . . L[N−1] jN−1 | j0 . . . jN−1 0ancilla . (A7) isometries into the diagonal gauge, as described in Sec. III A.
j0 ... jN−1 The permutation operation Eq. (21) is ambiguous whenever
Hence, in the last step, the qubit states decouple from the the matrix U [i] derived from the polar decomposition of M[i]
ancilla. The qubit state takes the form of an MPS with the [see Eq. (20)] has multiple elements in a column with the same
additional constraint that each of the MPS matrices L satisfies absolute value. Recalling that our sequential MPS preparation
the left-orthogonal condition Eq. (A4). The above procedure scheme requires that the ancilla start and end in the vacuum
can readily be read in reverse; given a general MPS QAML state, we see that this occurs for tensors near the extremal
model with bond dimension χ , values of the representation when an ancilla qubit is first
utilized or an ancilla qubit is decoupled from the remaining
A[0] j0 . . . A[N−1] jN−1 | j0 . . . jN−1 , (A8) qubits. In such cases, we use the following alternate procedure
j0 ... jN−1 to decide between permutations. First, we enumerate all basis
permutations resulting from these ambiguities for a given
we can convert it into a sequential qubit preparation scheme
with a χ -dimensional ancilla by putting the MPS in left- tensor L[i] ji and construct their associated isometries L̃ˆ (ζ ) ,
canonical form. This transformation to left-canonical form in which ζ indexes permutations. To decide between these
can be done without loss of generality using a well-known permutations, we again would like to make this operator as
procedure involving an orthogonal decomposition, e.g., the “diagonal” as possible, in the sense of minimizing the number
singular value or QR decomposition [19]. Thus the tensors of qubit operations being applied. We construct a simple cost
appearing in an MPS, which could result from a classical function as follows: for each state indexed by the ancilla state
training optimization, can be formally (i.e., modulo compi- α and the physical qubit q as above, we convert the state index
lation into native quantum operations for a given hardware into its binary representation b, which effectively maps the
architecture) translated into operations for deployment on ancilla state onto a collection of log2 χ qubits. As an example,
quantum resources. the states of a four-dimensional ancilla and a single physical
The above prescription assumed the presence of a register qubit give the representations
of N qubits, but due to the sequential nature of the preparation
this is unnecessary, and a single “physical” qubit together with index(|0, 0) = 0 → (0, 0, 0) , (B1)
the χ -level ancilla suffices, provided we are not measuring index(|0, 1) = 1 → (0, 0, 1) , (B2)
any multiqubit properties of the state. As an example, we will
index(|1, 0) = 2 → (0, 1, 0) , (B3)
consider drawing a sample from an MPS wave function gen-
erative model with the binary map Eq. (1). In this application, index(|1, 1) = 3 → (0, 1, 1) , (B4)
we first couple the qubit and ancilla as in Eq. (A1) starting ..
from both in the fiducial state |0. We then measure the qubit ., (B5)
in the computational basis, record its outcome as xN−1 , and index(|3, 1) = 7 → (1, 1, 1) . (B6)
then return it to the fiducial |0 state while leaving the ancilla
unmeasured. We now calculate a distance between two basis states (α, j)
We note that the ability to re-initialize a single qubit and (α , j ) with respective binary representations b and b as
independent of the others is not universally available in D[(α, j), (α , j )] = ( μ |bμ − bμ |)2 . The term in parenthe-
present-day hardware, but has been demonstrated in, e.g., ses counts the number of individual qubit “flips” required to
023010-16
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
convert one of the states into the other, and the square strongly our methods apply to any other choice of single-qubit and
penalizes multiqubit coordinated flips. We then use the cost entangling gates. While there is no guarantee that there are
function not operations with fewer entangling gates that could be found
using complex-valued gates, we find that the reduction in
Cζ = Tr(|L(ζ ) |D), (B7)
the number of parameters when using real gates significantly
in which D is the matrix with D[•, •] as elements and |L(ζ ) | is improves the optimization time.
the matrix of absolute values of L(ζ ) , to choose from between We utilize sparse representations of the individual parame-
the L(ζ ) . terized gate elements Ma (θ a ), but not for the full unitary Û (θ),
In addition to this permutation ambiguity, there is also a as the latter is not guaranteed to be sparse. The matrix product
sign ambiguity on each of the bond states of the isometry. We in Eq. (C1) and the related expression for gradients dominates
again use diagonal dominance to fix this sign ambiguity in the the computational scaling of the algorithm, which is O(χ 3 )
following fashion: when our diagonal gauge update sweep is in the bond dimension χ . This is the same asymptotic scaling
proceeding towards tensor to the right, we are updating the with χ appearing in the DMRG-type optimization of the MPS
right bond basis of the current tensor. Hence, we identify the network itself, described in Sec. IV A.
nonzero element closest to the diagonal in each column, and The final optimization we have included is to introduce
reverse the sign of this element if it is negative. In order that longer gate sequence “motifs” into the optimization alongside
we do not change the overall state, we multiply the elements in the native entangling gates. In particular, the two motifs we
each corresponding row of the tensor to the right in the MPS have utilized in our work are a two-qubit rotation gate
description. Similarly, in a left-moving sweep, we identify the Ŝ (θ , θ )
nonzero elements closest to the diagonal in each row, flip signs ⎛ θ−θ θ−θ ⎞
if this element is negative, and flip signs in the corresponding cos 2 0θ+θ 0θ+θ sin 2
columns of the tensor to the left in the MPS representation. ⎜ 0 cos 2 sin 2 0 ⎟
=⎜
⎝
⎟,
⎠
Following transformation to diagonal gauge, we fix the signs 0 − sin θ+θ2
cos θ+θ2 0
of all elements of the diagonality center (chosen, as above, to θ−θ θ−θ
− sin 2
0 0 cos 2
be a permutation operator) to be positive. Any sign flips that
occur in this transformation are propagated through neighbor- (C3)
ing MPS tensors until a row or column with elements of mixed which is allowed between any two qubits that have CNOT
sign is identified. This procedure discourages the appearance connectivity, and a version of the Ŝ gate we call F̂ that is
of permutation operators whose elements are both positive and controlled on a third qubit. We find that the former gate can be
negative. compiled using two CNOTs using the ansatz sequence shown
in Eq. (C4)
APPENDIX C: IMPLEMENTATION AND OPTIMIZATION
DETAILS OF GREEDY COMPILATION HEURISTICS Ry (φ0 ) • Ry (φ1 ) • Ry (φ2 )
(C4)
In this section, we briefly note details of our implementa- Ry (φ0 ) Ry (φ1 ) Ry (φ2 )
tion of the greedy compilation heuristics of Sec. III B, along
with some problem-specific optimizations. Our subroutine for
the cost function takes as input a vector of parameters θ, and the latter gate with control on qubit c and the operation Ŝ
constructs a matrix representation of the parameterized gate applied to qubits q1 and q2 can be constructed using
sequence θ θ
F̂c;q1 q2 (θ , θ ) = CNOT(c, q2 )CNOT(c, q1 )Ŝq1 q2 − , −
Û (θ) = M̂NG θ NG . . . M̂1 (θ 1 ), (C1) 2 2
θ θ
in which θ i is the vector of parameters used by gate i, and × CNOT(c, q2 )CNOT(c, q1 )Ŝq1 q2 , .
then evaluates the cost function Eq. (23). This enables us to 2 2
obtain analytic gradients of the cost function also as elements (C5)
of products of matrices. We optimize the cost function using
the BFGS method, and allow for multiple batches of input pa- Hence, Ŝ gates require 2 CNOTs for compilation and F̂ gates
rameters with random variations added to avoid local minima. require 8 CNOTs for compilation. Both gates were identified
Additionally, as noted above, all of the isometries that result from experiments with the greedy optimization procedure
from the use of a real-valued quantum embedding map will be outlined above using only CNOTs, and their direct inclusion
real, and so we can restrict our attention to real-valued gates. into the optimization enables more rapid convergence. As
Hence, in our implementation, we parametrize single-qubit these gates require multiple entangling gates, it is useful to
gates as y rotations introduce a heuristic penalty function h into the cost function
for ordering the next priority queue to ensure that they are not
cos θ2 − sin θ2 chosen over shorter gates with a similar cost function. Such
R̂y (θ ) ≡ , (C2)
sin θ2 cos θ2 a penalty function was advocated in Ref. [69], and could also
be used to account for, e.g., hardware-dependent noise [79].
which relate to the gates in Eq. (22) as R̂y (θ ) = Û3 (θ , 0, 0), The choice of this penalty function will be problem-specific,
and CNOTs for the entangling gates. While we have made and finding ways for optimizing it in a data-driven fashion for
the above gate choices for use in this paper, we stress that problems of interest is an intriguing area for further research.
023010-17
WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
FIG. 18. Example of greedy compilation procedure. Example gates represented as circuits and matrix plots resulting from applying the
greedy compilation procedure to the isometry shown in the upper left (same isometry as in the right panel of Fig. 3). The starting ansatz is a
single-qubit rotation on each qubit, given in the top center of the figure. The next row down shows the gates resulting from adding a single
entangling gate to this ansatz, ordered left to right by their cost functions C. A constant penalty 0.6 is added to the cost function for use of a F̂
gate in ordering the priority queue, resulting in the given ordering. The gates indicated by green lines denote those passed to the next level of
optimization. This procedure terminates in the gate shown at the bottom of the figure with the given cost function tolerance of 5 × 10−4 .
We also note that the use of multiqubit controlled gates is for ordering the priority queue when no gates meet the cost
penalized through the choice of the cost function Eq. (B7) function tolerance.
for choosing the permutation to diagonal gauge; the choice
of a cost function of 4 or 8 for a gate requiring two and three
bit flips is in rough accordance with the number of CNOTs
required for Ŝ and F̂, respectively.
An example application of this procedure to the isometry APPENDIX D: EXPLICIT CONSTRUCTION OF
shown in the right panel of Fig. 3 is given in Fig. 18. Here, EXACTLY SOLVABLE MPS MODEL
we give cost function penalties of 0.6 and 0.2 for F̂ and Ŝ
In this section, we detail the transformation of the MPS
gates, respectively, use a cost function tolerance of 5 × 10−4 ,
representation of the exactly solvableng model Eq. (24)–
and keep the four lowest cost gates to generate the priority
(26) to left-canonical form, defining a sequential preparation
queue from the first optimization and the two lowest-cost
scheme. We then further detail a “by hand” compilation of
gates on subsequent optimizations. The successive rows show
a particular unitary completion of the resulting isometries as
the optimized gates resulting from adding a single entangling
a benchmark for our automated procedure. For simplicity of
gate to the ansatz resulting from the last round of optimization,
exposition, we will take all phases φ j = 0, though we will
starting with a single-qubit rotation on each qubit (top center).
relax this condition shortly. Performing the QR decomposition
The green lines show the gates which are kept to form the
on the first tensor, we find
new priority queue. Here and throughout, the quantum circuits
are ordered with the physical (i.e. readout) qubit on the top
line and the ancilla qubits in increasing order on lower lines.
Following an optimization in which Ŝ or F̂ gates may be
1 0
used, the “raw” circuit containing these parameterized gates A[0] = √ (D1)
is then compiled into CNOTs using Eqs. (C4) and (C5), 0 p0
products of single-qubit rotations are collected together, and 1 0 1 0
then optimization passes are run to determine if single-qubit →QR = p0 √ , (D2)
0 |p0 | 0 p0
gates with rotations smaller than a certain threshold can be
removed without affecting the cost function. We note that no 1 0
cost function penalty is applied when an Ŝ or F̂ gate brings ⇒ L [0] = p0 , (D3)
0 |p0 |
the cost function below its desired tolerance, as in the last √ √
00 = 1 , A01 =
A[1]0 p1 , A[2]0 11 = p0 .
[1]1
step of the optimization shown in Fig. 18, but is only used (D4)
023010-18
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
Reshaping the second tensor and decomposing, we find • Ry (θj ) • Rz (π/2) • Ry (θj ) • Rz (−π/2)
⎛ ⎞
1 0
√ Rz (π/2) Rz (−π/2)
⎜0 √ p1 ⎟
(αi)β = ⎝0
A[1]
p ⎠
(D5)
0
0 0 Ry (2θN −1 ) X • X
⎛ ⎞
1 0
⎜0 p1 ⎟
⎜ ⎟ 1
→QR = ⎜
⎜
p0 +p1
⎟
⎟ 0 √ 0 , (D6) FIG. 19. Circuit decompositions for Û [ j] in Eq. (D14) (top) and
⎝0
p0
⎠ p0 + p1 U [N−1]
in Eq. (D16) (bottom) based on a gateset of single qubit
p0 +p1
rotations and CNOTs. In both diagrams, the upper line is the physical
0 0
(sampled) qubit and the lower line is the ancilla.
p1 p0
⇒ L00
[1]0
= 1 , L01
[1]1
= , L11
[1]0
= .
p0 + p1 p0 + p1 We note that there is a “natural” unitary completion of the
(D7) operators in Eq. (D13) given by
This generalizes to Û [ j] = |0a 0q 0a 0q | + |1a 1q 1a 1q |
pj pi
[ j]0 [ j]1
= 1 , L01 = [ j]0
, L11 =
i< j
, (D8) p j i< j pi
L00
pi pi + e iφ j
|0a 1q + |1a 0q 1a 0q |
i j i j i j pi i j pi
023010-19
WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
Û [ j] = exp(−iĤ[ j] ) , (D18)
when φ j = π /2. This gate is readily achieved in trapped (a) (b) (c)
023010-20
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
(d)
Ry (0.98) • Ry (−1.58) • Ry (−2.25) •
Ry (−1.57) Ry (1.58)
FIG. 27. Optimization for site 7. (a) Isometry. (b) Optimized FIG. 30. Optimization for site 10. (a) Isometry. (b) Optimized
gate. (c) Circuit from optimization. gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.
(a) (b)
(c)
Ry (3.14) Ry (−0.81)
S0,1 (1.56, −1.58)
Ry (−0.80) Ry (−0.00)
(d) FIG. 31. Optimization for site 11. (a) Isometry. (b) Optimized
Ry (1.57) • Ry (−1.56) • Ry (0.76)
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
Ry (−0.80) Ry (−1.58) circuit from optimization.
(a) (b)
(c)
FIG. 32. Optimization for site 12. (a) Isometry. (b) Optimized
Ry (−1.57) • Ry (−1.58) • Ry (0.00) gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
Ry (−1.57) Ry (0.01) • Ry (1.57) • Ry (3.14) circuit from optimization.
Ry (−1.56) Ry (−1.57) Ry (−0.00) Ry (0.01)
(d)
Ry (1.57) • Ry (−1.56) •
Ry (−1.58) Ry (1.56)
023010-21
WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
FIG. 34. Optimization for site 14. (a) Isometry. (b) Optimized
gate. (c) Circuit from optimization. (c)
Ry (−0.77) • Ry (0.01) • Ry (−0.77) • Ry (0.02) • Ry (0.77) • Ry (−0.01) • Ry (2.37)
Ry (−1.57) Ry (1.57)
(d)
Ry (−0.78) • • Ry (−0.79) • • Ry (−0.78) • • Ry (2.36)
Ry (−1.57) Ry (1.57)
Ry (−1.57) Ry (1.57)
FIG. 37. Optimization for site 17. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.
(a) (b)
FIG. 35. Optimization for site 15. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned (c)
Ry (−0.79) • Ry (−0 78) • Ry (−0.00) • Ry (−0.78) • Ry (2.36)
circuit from optimization.
Ry (−1.57) . Ry (1.56)
Ry (−1.57) Ry (1.56)
(d)
Ry (−0.77) • Ry (−0.79) • • Ry (−0.78) • Ry (2.36)
Ry (1.58) Ry (−1.58)
Ry (1.58) Ry (−1.58)
Ry (−1.57) Ry (1.57)
FIG. 38. Optimization for site 18. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.
FIG. 36. Optimization for site 16. (a) Isometry. (b) Optimized FIG. 39. Optimization for site 19. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization. circuit from optimization.
023010-22
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
FIG. 40. Optimization for site 20. (a) Isometry. (b) Optimized
gate. (c) Circuit from optimization.
FIG. 44. Optimization for site 24. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.
FIG. 41. Optimization for site 21. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization. FIG. 45. Optimization for site 25. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.
FIG. 42. Optimization for site 22. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.
FIG. 46. Optimization for site 26. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.
FIG. 43. Optimization for site 23. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned FIG. 47. Optimization for site 27. (a) Isometry. (b) Optimized
circuit from optimization. gate. (c) Circuit from optimization.
023010-23
WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
FIG. 48. Optimization for site 28. (a) Isometry. (b) Optimized
gate. (c) Circuit from optimization.
FIG. 52. Optimization for site 32. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.
FIG. 49. Optimization for site 29. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.
FIG. 53. Optimization for site 33. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.
FIG. 50. Optimization for site 30. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.
FIG. 54. Optimization for site 34. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.
FIG. 51. Optimization for site 31. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned FIG. 55. Optimization for site 35. (a) Isometry. (b) Optimized
circuit from optimization. gate. (c) Circuit from optimization.
023010-24
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
FIG. 56. Optimization for site 36. (a) Isometry. (b) Optimized FIG. 60. Optimization for site 40. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned gate. (c) Circuit from optimization.
circuit from optimization.
FIG. 57. Optimization for site 37. (a) Isometry. (b) Optimized FIG. 61. Optimization for site 41. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned gate. (c) Circuit from optimization.
circuit from optimization.
FIG. 62. Optimization for site 42. (a) Isometry. (b) Optimized
gate. (c) Circuit from optimization.
FIG. 58. Optimization for site 38. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.
FIG. 59. Optimization for site 39. (a) Isometry. (b) Optimized FIG. 63. Optimization for site 43. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization. circuit from optimization.
023010-25
WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
FIG. 67. Optimization for site 47. (a) Isometry. (b) Optimized
gate. (c) Circuit from optimization.
FIG. 64. Optimization for site 44. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.
(a) (b)
FIG. 68. Optimization for site 48. (a) Isometry. (b) Optimized
gate. (c) Circuit from optimization.
(c)
Ry (−0.42) • Ry (−1.14) • Ry (−0.02)
(d)
Ry (0.42) • Ry (−1.14) •
Ry (−1.55) Ry (1.56)
FIG. 65. Optimization for site 45. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.
FIG. 66. Optimization for site 46. (a) Isometry. (b) Optimized
gate. (c) Raw circuit from optimization. (d) Expanded and cleaned
circuit from optimization.
023010-26
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
[1] R. S. Smith, M. J. Curtis, and W. J. Zeng, A practical quantum [15] W. Huggins, P. Patil, B. Mitchell, K. B. Whaley, and E. M.
instruction set architecture, arXiv:1608.03355 [quant-ph]. Stoudenmire, Towards quantum machine learning with tensor
[2] D. S. Steiger, T. Häner, and M. Troyer, Projectq: An open networks, Quantum Sci. Technol. 4, 024001 (2019).
source software framework for quantum computing, Quantum [16] I. Glasser, R. Sweke, N. Pancotti, J. Eisert, and J. I. Cirac,
2, 49 (2018). Expressive power of tensor-network factorizations for proba-
[3] T. Häner, D. S. Steiger, K. Svore, and M. Troyer, A software bilistic modeling, with applications from hidden markov models
methodology for compiling quantum programs, Quantum Sci. to quantum machine learning, Advances in Neural Information
Technol. 3, 020501 (2018). Processing Systems 32, Proceedings of the NeurIPS 2019 Con-
[4] G. Aleksandrowicz, T. Alexander, P. Barkoutsos, L. Bello, Y. ference (2019).
Ben-Haim, D. Bucher, F. J. Cabrera-Hernández, J. Carballo- [17] M. Benedetti, E. Lloyd, S. Sack, and M. Fiorentini, Parame-
Franquis, A. Chen, C.-F. Chen, J. M. Chow, A. D. Córcoles- terized quantum circuits as machine learning models, Quantum
Gonzales, A. J. Cross, A. Cross, J. Cruz-Benito, C. Culver, Sci. Technol. 4, 043001 (2019).
S. D. L. P. González, E. D. L. Torre, D. Ding, E. Dumitrescu, [18] V. Havlíček, A. D. Córcoles, K. Temme, A. W. Harrow, A.
I. Duran, P. Eendebak, M. Everitt, I. F. Sertage, A. Frisch, Kandala, J. M. Chow, and J. M. Gambetta, Supervised learning
A. Fuhrer, J. Gambetta, B. G. Gago, J. Gomez-Mosquera, D. with quantum-enhanced feature spaces, Nature (London) 567,
Greenberg, I. Hamamura, V. Havlicek, J. Hellmers, Ł. Herok, 209 (2019).
H. Horii, S. Hu, T. Imamichi, T. Itoko, A. Javadi-Abhari, N. [19] U. Schollwöck, The density-matrix renormalization group in
Kanazawa, A. Karazeev, K. Krsulich, P. Liu, Y. Luh, Y. Maeng, the age of matrix product states, Ann. Phys. 326, 96 (2011).
M. Marques, F. J. Martín-Fernández, D. T. McClure, D. McKay, [20] R. Orús, A practical introduction to tensor networks: Matrix
S. Meesala, A. Mezzacapo, N. Moll, D. M. Rodríguez, G. product states and projected entangled pair states, Ann. Phys.
Nannicini, P. Nation, P. Ollitrault, L. J. O’Riordan, H. Paik, 349, 117 (2014).
J. Pérez, A. Phan, M. Pistoia, V. Prutyanov, M. Reuter, J. [21] R. Orús, Tensor networks for complex quantum systems, Nat.
Rice, A. R. Davila, R. H. P. Rudy, M. Ryu, N. Sathaye, C. Rev. Phys. 1, 538 (2019).
Schnabel, E. Schoute, K. Setia, Y. Shi, A. Silva, Y. Siraichi, S. [22] E. Grant, M. Benedetti, S. Cao, A. Hallam, J. Lockhart, V.
Sivarajah, J. A. Smolin, M. Soeken, H. Takahashi, I. Tavernelli, Stojevic, A. G. Green, and S. Severini, Hierarchical quantum
C. Taylor, P. Taylour, K. Trabing, M. Treinish, W. Turner, D. classifiers, npj Quantum Inf. 4, 1 (2018).
Vogt-Lee, C. Vuillot, J. A. Wildstrom, J. Wilson, E. Winston, [23] Z.-L. Xiang, S. Ashhab, J. Q. You, and F. Nori, Hybrid quantum
C. Wood, S. Wood, S. Wörner, I. Y. Akhalwaya, and C. Zoufal, circuits: Superconducting circuits interacting with other quan-
Qiskit, https://qiskit.org (2020), [Online; accessed 17-February- tum systems, Rev. Mod. Phys. 85, 623 (2013).
2020]. [24] M. Schuld, A. Bocharov, K. M. Svore, and N. Wiebe, Circuit-
[5] R. LaRose, Overview and comparison of gate level quantum centric quantum classifiers, Phys. Rev. A 101, 032308 (2020).
software platforms, Quantum 3, 130 (2019). [25] Y. LeCun, C. Cortes, and C. J. Burges, Mnist handwritten digit
[6] Quantiki, List of QC Simulators, https://www.quantiki.org/ database, ATT Labs [Online]. Available: http://yann.lecun.com/
wiki/list-qc-simulators (2020), [Online; accessed 17-February- exdb/mnist 2 (2010).
2020]. [26] E. Stoudenmire and D. J. Schwab, Supervised learning with
[7] E. T. Campbell, B. M. Terhal, and C. Vuillot, Roads towards tensor networks, in Advances in Neural Information Processing
fault-tolerant universal quantum computation, Nature (London) Systems (Curran Associates, Inc., Red Hook, NY, 2016), pp.
549, 172 (2017). 4799–4807.
[8] P. W. Shor, Polynomial-time algorithms for prime factorization [27] E. Farhi and H. Neven, Classification with quantum neural
and discrete logarithms on a quantum computer, SIAM Rev. 41, networks on near term processors, arXiv:1802.06002.
303 (1999). [28] M. Schuld and N. Killoran, Quantum Machine Learning in
[9] C. Gidney and M. Ekerå, How to factor 2048 bit rsa integers in Feature Hilbert Spaces, Phys. Rev. Lett. 122, 040504 (2019).
8 hours using 20 million noisy qubits, arXiv:1905.09749. [29] S. Lloyd, M. Schuld, A. Ijaz, J. Izaac, and N. Killoran, Quantum
[10] A. D. Córcoles, A. Kandala, A. Javadi-Abhari, D. T. McClure, embeddings for machine learning, arXiv:2001.03622.
A. W. Cross, K. Temme, P. D. Nation, M. Steffen, and J. M. [30] M. Schuld, R. Sweke, and J. J. Meyer, Effect of data encoding
Gambetta, Challenges and opportunities of near-term quantum on the expressive power of variational quantum-machine-
computing systems, Proc. IEEE 108, 1338 (2020). learning models, Phys. Rev. A 103, 032430 (2021).
[11] J. Preskill, Quantum computing in the nisq era and beyond, [31] R. LaRose and B. Coyle, Robust data encodings for quantum
Quantum 2, 79 (2018). classifiers, Phys. Rev. A 102, 032420 (2020).
[12] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, [32] S. R. White, Density Matrix Formulation for Quantum Renor-
and S. Lloyd, Quantum machine learning, Nature (London) malization Groups, Phys. Rev. Lett. 69, 2863 (1992).
549, 195 (2017). [33] C. Schön, E. Solano, F. Verstraete, J. I. Cirac, and M. M. Wolf,
[13] A. Perdomo-Ortiz, M. Benedetti, J. Realpe-Gómez, and R. Sequential Generation of Entangled Multiqubit States, Phys.
Biswas, Opportunities and challenges for quantum-assisted ma- Rev. Lett. 95, 110503 (2005).
chine learning in near-term quantum computers, Quantum Sci. [34] C Schön, K Hammerer, M. M. Wolf, J. I. Cirac, and E Solano,
Technol. 3, 030502 (2018). Sequential generation of matrix-product states in cavity qed,
[14] C. Ciliberto, M. Herbster, A. D. Ialongo, M. Pontil, A. Phys. Rev. A 75, 032311 (2007).
Rocchetto, S. Severini, and L. Wossnig, Quantum machine [35] D Perez-Garcia, F. Verstraete, M. M. Wolf, and J. I. Cirac,
learning: A classical perspective, Proc. R. Soc. A 474, Matrix product state representations, Quantum Inf. Comput. 7,
20170551 (2018). 401 (2007).
023010-27
WALL, ABERNATHY, AND QUIROZ PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
[36] A. Cichocki, Tensor networks for big data analytics and large- [59] Z.-Z. Sun, S.-J. Ran, and G. Su, Tangent-space gradient opti-
scale optimization problems, arXiv:1407.3124. mization of tensor network for machine learning, Phys. Rev. E
[37] A. Cichocki, A.-H. Phan, Q. Zhao, N. Lee, I. V. Oseledets, M. 102, 012152 (2020).
Sugiyama, and D. Mandic, Tensor networks for dimensionality [60] Y. Levine, O. Sharir, N. Cohen, and A. Shashua, Bridging
reduction and large-scale optimizations. part 2 applications and Many-Body Quantum Physics and Deep Learning Via Tensor
future perspectives, Found. Trends Mach. Learn. 9, 431 (2017). Networks, Phys. Rev. Lett. 122, 065301 (2019).
[38] I. V. Oseledets, Tensor-train decomposition, SIAM J. Sci. [61] J. Chen, S. Cheng, H. Xie, L. Wang, and T. Xiang, Equivalence
Comput. 33, 2295 (2011). of restricted boltzmann machines and tensor network states,
[39] E. M. Stoudenmire, Learning relevant features of data with Phys. Rev. B 97, 085104 (2018).
multi-scale tensor networks, Quantum Sci. Technol. 3, 034003 [62] I. Glasser, N. Pancotti, M. August, I. D. Rodriguez, and J. I.
(2018). Cirac, Neural-Network Quantum States, String-Bond States,
[40] C. Guo, Z. Jie, W. Lu, and D. Poletti, Matrix product operators and Chiral Topological States, Phys. Rev. X 8, 011006 (2018).
for sequence-to-sequence learning, Phys. Rev. E 98, 042114 [63] M. Foss-Feig, D. Hayes, J. M. Dreiling, C. Figgatt, J. P.
(2018). Gaebler, S. A. Moses, J. M. Pino, and A. C. Potter, Holographic
[41] J. Carrasquilla, G. Torlai, R. G. Melko, and L. Aolita, Recon- quantum algorithms for simulating correlated spin systems,
structing quantum states with generative models, Nat. Mach. arXiv:2005.03023.
Intell. 1, 155 (2019). [64] J. M. Pino, J. M. Dreiling, C Figgatt, J. P. Gaebler, S. A.
[42] G. Evenbly, Number-state preserving tensor networks as classi- Moses, C. H. Baldwin, M Foss-Feig, D Hayes, K Mayer, C
fiers for supervised learning, arXiv:1905.06352. Ryan-Anderson et al., Demonstration of the qccd trapped-ion
[43] S. Klus and P. Gelß, Tensor-based algorithms for image classi- quantum computer architecture, arXiv:2003.01293.
fication, Algorithms 12, 240 (2019). [65] B. Coyle, D. Mills, V. Danos, and E. Kashefi, The born
[44] S. Cheng, L. Wang, T. Xiang, and P. Zhang, Tree tensor net- supremacy: Quantum advantage and training of an ising born
works for generative modeling, Phys. Rev. B 99, 155131 (2019). machine, npj Quantum Inf. 6, 1 (2020).
[45] D. Liu, S.-J. Ran, P. Wittek, C. Peng, R. B. García, G. Su, and [66] A. JavadiAbhari, S. Patil, D. Kudrow, J. Heckey, A. Lvov,
M. Lewenstein, Machine learning by unitary tensor network of F. T. Chong, and M. Martonosi, Scaffcc: A framework for
hierarchical tree structure, New J. Phys. 21, 073059 (2019). compilation and analysis of quantum computing programs, in
[46] I. Glasser, N. Pancotti, and J. I. Cirac, From probabilistic Proceedings of the 11th ACM Conference on Computing Fron-
graphical models to generalized tensor networks for supervised tiers (Association for Computing Machinery, New York, NY,
learning, IEEE Access 8, 68169 (2020). 2014), pp. 1–10.
[47] M. Trenti, L. Sestini, A. Gianelle, D. Zuliani, T. Felser, [67] M. Amy and V. Gheorghiu, staq-a full-stack quantum process-
D. Lucchesi, and S. Montangero, Quantum-inspired machine ing toolkit, Quantum Sci. Technol. 5, 034016 (2020).
learning on high-energy physics data, arXiv:2004.13747. [68] A. JavadiAbhari, S. Patil, D. Kudrow, J. Heckey, A. Lvov, F. T.
[48] T.-D. Bradley, E. M. Stoudenmire, and J. Terilla, Modeling Chong, and M. Martonosi, Scaffcc: Scalable compilation and
sequences with quantum states: A look under the hood, Mach. analysis of quantum programs, Parallel Comput. 45, 2 (2015).
Learn.: Sci. Technol. 1, 035008 (2020). [69] M. G. Davis, E. Smith, A. Tudor, K. Sen, I. Siddiqi, and C.
[49] E. Gillman, D. C. Rose, and J. P. Garrahan, A tensor network ap- Iancu, Heuristics for quantum compiling with a continuous gate
proach to finite markov decision processes, arXiv:2002.05185. set, Presented at the 3rd International Workshop on Quantum
[50] J. Miller, G. Rabusseau, and J. Terilla, Tensor networks for Compilation as part of the International Conference On Com-
language modeling, arXiv:2003.01039. puter Aided Design 2019 (2019).
[51] R. Selvan and E. B. Dam, Tensor networks for medical image [70] M. Soeken, F. Mozafari, B. Schmitt, and G. De Micheli, Com-
classification, in Proceedings of the Third Conference on Medi- piling permutations for superconducting qpus, in 2019 Design,
cal Imaging with Deep Learning (2020). Automation & Test in Europe Conference & Exhibition (DATE)
[52] J. Wang, C. Roberts, G. Vidal, and S. Leichenauer, Anomaly (IEEE, New York, NY, 2019) pp. 1349–1354.
detection with tensor networks, arXiv:2006.02516. [71] R. Iten, R. Colbeck, I. Kukuljan, J. Home, and M. Christandl,
[53] J. Reyes and M. Stoudenmire, A multi-scale tensor network ar- Quantum circuits for isometries, Phys. Rev. A 93, 032318
chitecture for classification and regression, arXiv:2001.08286. (2016).
[54] M. Lubasch, J. Joo, P. Moinier, M. Kiffner, and D. Jaksch, [72] F. Vatan and C. Williams, Optimal quantum circuits for general
Variational quantum algorithms for nonlinear problems, Phys. two-qubit gates, Phys. Rev. A 69, 032315 (2004).
Rev. A 101, 010301(R) (2020). [73] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T.
[55] A. S. Kardashin, A. V. Uvarov, and J. D. Biamonte, Quantum Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser,
machine learning tensor network states, Front. Phys. 8, 644 J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. Jarrod
(2021). Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern,
[56] A. V. Uvarov, A. S. Kardashin, and J. D. Biamonte, Machine E. Larson, C. J. Carey, İ. Polat, Yu. Feng, E. W. Moore, J.
learning phase transitions with a quantum processor, Phys. Rev. Vand erPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen,
A 102, 012415 (2020). E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro,
[57] Z.-Y. Han, J. Wang, H. Fan, L. Wang, and P. Zhang, Unsuper- F. Pedregosa, P. van Mulbregt, and SciPy 1. 0 Contributors,
vised Generative Modeling Using Matrix Product States, Phys. SciPy 1.0: Fundamental Algorithms for Scientific Computing
Rev. X 8, 031012 (2018). in Python, Nat. Methods 17, 261 (2020).
[58] S. Efthymiou, J. Hidary, and S. Leichenauer, Tensornetwork for [74] B. Efron, The Jackknife, the Bootstrap and other Resampling
machine learning, arXiv:1906.06329. Plans (SIAM, Philadelphia, PA, 1982).
023010-28
GENERATIVE MACHINE LEARNING WITH TENSOR … PHYSICAL REVIEW RESEARCH 3, 023010 (2021)
[75] S. Bravyi, S. Sheldon, A. Kandala, D. C. Mckay, and J. M. [81] B. Foxen, C. Neill, A. Dunsworth, P. Roushan, B. Chiaro, A.
Gambetta, Mitigating measurement errors in multi-qubit exper- Megrant, J. Kelly, Z. Chen, K. Satzinger, R. Barends et al.,
iments, arXiv:2006.14044. Demonstrating a Continuous Set of Two-Qubit Gates for Near-
[76] B. Nachman, M. Urbanek, W. A. de Jong, and C. W. Bauer, Term Quantum Algorithms, Phys. Rev. Lett. 125, 120504
Unfolding quantum computer readout noise, npj Quantum Inf. (2020).
6, 84 (2020). [82] N. Schuch and J. Siewert, Natural two-qubit gate for quantum
[77] J. Martyn, G. Vidal, C. Roberts, and S. Leichenauer, Entangle- computation using the xy interaction, Phys. Rev. A 67, 032301
ment and tensor networks for supervised image classification, (2003).
arXiv:2007.06082. [83] M. Kjaergaard, M. E. Schwartz, J. Braumüller, P.
[78] Y.-Y. Shi, L.-M. Duan, and G. Vidal, Classical simulation of Krantz, J. I.-J. Wang, S. Gustavsson, and W. D.
quantum many-body systems with a tree tensor network, Phys. Oliver, Superconducting qubits: Current state of
Rev. A 74, 022320 (2006). play, Annu. Rev. Condens. Matter Phys. 11, 369
[79] L. Cincio, K. Rudinger, M. Sarovar, and P. J. Coles, Ma- (2020).
chine learning of noise-resilient quantum circuits, Phys. Rev. [84] A. Sørensen and K. Mølmer, Entanglement and quantum com-
X Quantum 2, 010324 (2021). putation with ions in thermal motion, Phys. Rev. A 62, 022311
[80] I. D. Kivlichan, J. McClean, N. Wiebe, C. Gidney, A. Aspuru- (2000).
Guzik, G. K.-L. Chan, and R. Babbush, Quantum Simulation of [85] T. Tanamoto, Y.-x. Liu, X. Hu, and F. Nori, Efficient Quantum
Electronic Structure with Linear Depth and Connectivity, Phys. Circuits for One-Way Quantum Computing, Phys. Rev. Lett.
Rev. Lett. 120, 110501 (2018). 102, 100501 (2009).
023010-29